Excavating the time frame of human evolution from its Y chromosome DNA

Next generation sequencing is now providing a flood of Y sequencing data. Appropriate software was developed to analyze these data and deduce the structure of the tree, finding new nodes and each new sequencing is providing a detailed branch of the tree. The part of the tree shown below is from an analysis by Yfull (www.Yfull.com – with permission) and the part shown is shared by all people of the large R1b-M269 group , about half of the men in western Europe countries such as France or England.
My aim with this article is to show that these trees with a large amount of SNPs can provide a frame to read the past, the different stages from the near origin of Homo sapiens to recent times. I call near origin the practical root of these trees. As shortly presented the very top of the tree has been subject to recent changes but we don't need these details ; the split from the A african group is close enough for the purpose of this article. As always on this blog I'll detail the reasoning allowing deduction on the history of the human group behind the SNP data, advanced readers will find some discussion on the possible impact of the mount Toba eruption on the human expansion. Surprisingly no team has published, so far, their own interpretation. Mine might be contested but it can also be a start for a better understanding ; this is my hope.

R1b-M269 branch of the Y tree


Each « code » as P108 is the name of a SNP (see A1b for P108 - at the top of this tree), it's a mutation on the Y chromosome and let's recall that these mutations are all neutral, meaning that they are not associated with any phenotype, no visible changes. They are merely DNA changes detected by sequencing (no other way) and used as markers. As there are several research teams and each is giving their own names, one single mutation can get several names and this tree is accounting for this fact with the following mark : [ • ] , separating the different names for one same mutation. For example L1063 • CTS8728 • PF6480 • S13 (just above, see bottom of this tree) is showing 4 different names equivalent for what is 1 SNP and counted as such in the counting to come.
A sequencing is providing more SNPs than shown here as « recent » SNPs at the bottom of the tree, below M269 were removed. The reason for this is that the bottom of the tree is bushy and still difficult to read ; this article is only focusing on the history « available » between A1b / [BT] and M269. In France (the same in England) about half of the men have this same segment of the Y tree. From the top of the tree (as displayed here) to M269 the SNP count is roughly 1000. It's important to understand that the DNA sequencing producing the data (mostly the « Big Y » from FTDNA company) is far from finding all SNPs : the depth of the sequencing can be improve and it would allow detecting more SNPs. These data are usable (my opinion) as they reflect a percentage of the total, already in significant counts.
The early Homo sapiens fossils are pointing to the time range between 160 000 – 180 000 years ago (with still a margin of error). I took 170 000 years ago for the time range and the SNP count was estimated around 1200 when considering mean branch length till today (not limited to M269) corresponding to an average time interval between 2 SNPs of 140 years. On this basis we can read the tree presented above.
The top of the Y tree was found to be far more complex with the discovery of rare variants ; below is the root of the Y tree as published in 2013. I won't use it but I wanted to account for it :

• A00 AF4, AF5, AF6/L1284, AF7, AF8, AF9, AF10, AF13, L1086, L1087, L1088, L1091, L1092, L1094, L1096, L1097, L1102, L1103, L1104, L1106, L1107, L1108, L1109, L1110, L1111, L1113, L1114, L1115, L1117, L1119, L1122, L1126, L1131, L1133, L1134, L1138, L1139, L1140, L1141, L1144, L1146, L1147, L1148, L1151, L1152, L1154, L1156, L1157, L1158, L1159, L1160, L1161, L1163, L1233, L1234, L1236

• A0-T AF3, L1085, L1089, L1090, L1093, L1095, L1098, L1099, L1101, L1105, L1116, L1118, L1120, L1121, L1123, L1124, L1125, L1127, L1128, L1129, L1130, L1132, L1135, L1136, L1137, L1142, L1143, L1145, L1150, L1155, L1235

• • A0 L529.2, L896, L982, L984, L990, CTS2809/L991, L993, L995, L997, L998, L999, L1000, L1001, L1006, L1008, L1010, L1012, L1016, L1018, L1055, V148, V149, V154, V164, V165, V166, V167, V172, V173, V176, V177, V190, V196, V223, V225, V229, V233, V239

• • A1 L985, L986, L989, L1002, L1003, L1004, L1005, L1009, L1013, L1053, L1084, L1112, L1153, P305, V161.2, V168, V171, V174, V238, V241, V250

• • • A1* -

• • • A1a M31, P82, V4, V14, V15, V25, V26, V28, V30, V40, V48, V57, V58, V63, V191, V201, V204, V215

• • • A1b P108, V221

• • • • A1b* -

• • • • A1b1 L419/PF712

I just wanted to present the the deep rooting among A haplogroups, all being african while I'll using a simplified connection between A1b and [BT] (see below). So, just like fossils show it there must have been an origin of Homo sapiens in Africa. Again, I am not showing it but it would be a wrong idea that people, like Bushmen people, belonging to A0 groups (for some of them) have a less complex history ; there is a branch going to these A0 men with just as many SNPs than shown here for the R1b branch.

Below is another way to picture the complexity of the top of the Y tree and this is showing the BT branch as one outcome among many, all are localized in Africa with the only exception of the BT branch.



[BT] produced groups localized in Africa as are all B branch groups ; B groups are found among Pygmies, for example and [BT] produced all groups outside Africa (see below).

We now start the analysis of the SNPs and the first point is about the way SNPs come together in dense packs of SNPs : a node. The [BT] node displayed here is 353 SNPs large (see the tree of SNP above – in blue). Again, more SNPs will be found (more are already known) but thanks to the coherent displaying of the same percentage because most results are from sequencing by Big Y test (fro FTDNA company) that comparison between nodes will be possible. Trees with nearly all SNPs will come in a few years from now.
Now, let's think about it : what do these 353 SNPs at this « node » can mean ? 353 with a time interval of 140 years (in average) between 2 SNPs is a period of time 50 000 years long. So, it's from 170 000 years ago to 120 000 years ago in the last interglacial (LIG) usually said between -130 000 and -115 000. Here is a measurement of oxygen 18 in ice correlating with temperatures :

It can't be by chance that the end of the node correlates with cooling temperatures. It's easy to see also that the temperatures were cold until -130 000 and one may wonder why one same group is overlapping the cold and warm period. It could be that a branch was lost (in fact many branches were lost – I meant a main branching like those produced by the last glaciation – R1a / R1b). Impossible to tell where the [BT] people were localized but [BT] is splitting in [CT] (discussed below) and B , a branch found among african groups like Pygmies ; a localization in Africa is still likely but B groups going back to Africa (because of temperature change) is a possibility if [BT] was localized close enough ; old remains of modern humans were found in Palestine / Israel, a possible place for [BT] between -130 000 and -120 000 (of course my dating of -120 000 with the margin of error can be -115 000 – the end of the warm period).
Every SNP can be seen as a putative branching and the ancestral branch (not carrying the mutation) died out. This dilution out of so many branches is explained by a genetic drift and this in turn is only possible with a not too large group. A bottle neck, a period of time with a drastic decrease of population size is then likely at the bottom line of the [BT] node, -120 000 years ago when temperatures went down quickly. It's important to really catch this point : among the few lucky survivors one man had the succession of SNPs known today and a statistical effect made his descent to win over all others. Simulations show that this is what is expected when every man has only 2 surviving children in average : one boy and one girl ; hence any fluctuation (2 girls) is stopping a Y lineage and the result is : only 1 lineage left (for Y chromosome) with the specific set of SNPs now carried by all descent.
The small size of a group separated from other african groups might have allowed specific traits to evolve in the emerging new groups. As mentioned above the split 120 000 years ago separates the B group from a group, now called [CT] carrying the well known M168 marker, the first known of the set of now over 300 SNPs . The M168 marker is found among ALL ethnic groups out of Africa but not in those specific african groups like A0 and B groups already discussed. This finding was at the onset of the so called « out of Africa » theory. As discussed above it's likely that the BT group was already outside of Africa and the logic of the people proposing this theory wasn't strong enough. Also, the timing presented here is fairly different from the timing too often given with no data sustaining it.
I find it logical that the group who populated parts outside Africa when all other groups from the top of the Y tree are african might have lived already outside of Africa. Dienekes Pontikos was one of the first to propose the arabian peninsula as a place for such a stage and I find it a good idea as old human industries are well known there but this is not the only possible place. Northern area were still under Neandertal control and, as temperature was going down the limit might have left little room for modern humans. Indeed a near coastal stay (for example at the mouth of the Tigre river) would be in accordance with the next split in a coastal route of expansion and a land one, as discussed below.The SNP count is 319 making it (still with t=140) a 45000 years long period in round numbers. Hence, from 120 000 years ago to 75000 years ago the [CT] group developped in a changing environment with changing temperature.
Just like a bottleneck was implied for the end of the [BT] node, another bottleneck could explain the -75000 figure. As already discussed on this blog a giant eruption took place at mount Toba, dated -73000 with a margin of error of 4000 years. Scientists are still discussing the exact impact on the whole planet but the more it's studied the more the impact appears strong. It was sudden and temperatures dropped by 10 degrees, some say 15 degrees. Hence it can't be a coincidence if a major change is detected at that time. To better discuss the timing we need to look forward at what happened next . There was a split is [CF] and [DE] and this seems to correlate with the known split in the mitochondrial tree in N and M branches. The [DE] / M group has been associated with a coastal route, possibly on early rafts, a quick way to progress around India. The idea is that this expansion was started at the time of the eruption. Some |DE] might have been already at the Ganges delta while some might have populated the south border of the arabian peninsula and possibly the nearby horn of Africa. Why so ? We won't follow anymore the [DE] branch but the data are saying that [DE] was cut in 2 distant pools and these never met again. One pool of [DE] evolved in D, mainly in Japan today but it was in all south Asia and some are in Tibet (obviously a refuge) and one pool of [DE] evolved in E at the horn of Africa. Very rare [DE] cases were found south of Tibet and one should understand that it's a third branch (may be more) saying that once [DE] people not yet D or E settled there. I insisted on the explanation of the [DE] fate because it's part of the indication leading to the fact that Toba eruption had a deep impact and [CT] ends with the onset of 2 pools of [DE] with no more connection between them. [CT] also evolved in [CF] (see R1b-Tree in blue above) and these might have been less impacted as in a more remote localization from the Toba volcano.
The next C and F split occurs quickly after and one can see it as one group staying in place and one group (C) expanding in a novel environment. So, one word on the C branching as the R1b branch drives us on the F side. There are data suggesting that C haplogroup successfully went in all part of te world as a first wave, later displaced by others. Aboriginal Australian have the C4 branch, probably depicting the first men in Australia while the Mongol Genghis Kahn belong to the C5 branch. Some C people might have reached America but only rare cases are found today. Old remains of the C6 branch were recently found in Europe. So, except Africa, C men went everywhere it seems.
The F group is at the root of most groups populating Europe and Asia today as all neolithics are from this branch. SNPs count is 163 corresponding to 23000 years. Hence this large F node is from -75000 to -52000. The following steps are several branching which mean that rather than a bottle neck the « end » of the F node is when groups from the F « nest » started going in all directions, including Europe. Yet all these 163 SNPs mean that 163 times the ancestral branch was diluted out, meaning a very small group at some times. The temperatures as known by oxygen 18 concentrations in ice had a warming up around -60 000 , so this expansion is late relative to the improved conditions, possibly meaning a first phase of local expansion followed by dispersal in G, H, [IJ], and then L ; T and |MP]. At this time Neandertal starts retreating and modern man takes its place. I don't know of any accepted explanation for this change. Genetics and the time intervals found here are pointing for a significant progress in adaptation to cold of modern humans 60 000 years ago.
One word about the next step as it may give some clue on the positioning of the [MP] part of the original F group. The M haplogroup is mainly restricted to papuan in new Guinea, probably a refuge when neolithics populated south Asia but this can only be if [MP] was much closer to Indonesia than to the Indus basin. If M went south, P probably went in a more northern place just east of present day China to account for the M cousins. The 47000 years old Ust’-Ishim bones were tested of the parent group of P (K (xLT) and found to have large Neandertal proportion of DNA ; the expansion to northern areas might have been restricted to those who had the Neandertal DNA.
The P node is 135 SNPs large which fits with a 19000 years long period. Hence we are dealing with the time interval from -50000 to - 31000, a relative warming before the last glaciation. A bottleneck might explain the set of SNPs as before and it was followed by a split in R and Q groups. Q was clearly adapted to cold as this group managed to go through the Bering straight and get in America where it's (by far) the main haplogroup among amerindians. So, P, the parent group was a transition to a colder environment, possibly thanks to needles and clothing with instances known from 28000 years ago and it might be slightly older.



The diagram above is summarizing the fate of P group with a focus on R1a and R1b haplogroups. SNP counts are depicted in blue for each branch. I'll be using these counts. The ice age west of China isn't very well known but R node (40 SNPs) would end some 26000 years ago shortly before the ice maximum when temperatures dropped to a bottom level. The 24000 years old Mal'ta mammoth hunter encampment had bones tested of R haplogroup. This dating is with the margin of error and fits with this broad timing. This R group, in turn splits into a southern group R2 and the R1 group before the ice maximum and I would interpret it as a very difficult time for a group near extinction (so, again, a bottleneck). At ice maximum the R1 group splits into 2 subgroups seen as 2 geographical localizations that got separated and started evolving separately. These 2 localizations are not known but it should be clear that it's unlikely in Europe though there are indications that R1 was in Europe. 107 SNPs until R1b-M269  would make this important stage of the R1b branch only 6000 years ago but the precision isn't enough ; this is the limit of this approach and it's important to understand it not to say stupidities. Recent times were characterized by fast changes ; in the case of R1b-M269 found here to be 6000 years old a 3000 years change, well in the margin of error, is in accordance with more precise timings done with other methods. That's all.
In conclusion, the approach described here found a correlation between bottleneck of key stages and major climate changes. The description of the fate of the [DE] branch attempted here is providing a coherent explanation of the so called « out of Africa » expansion with a modified timing as explained. In turn the common ancestry in Africa while not denied is from a much deeper ancestry than it was said too many times. This approach will have to be refined but, as it is, it gives a time frame of early events.

[ Many thanks to the Yfull team for their SNP tree and permission to reproduce it. ]

