Enlarge /. A building in a Ndebele village, South Africa. The Ndebele speakers, who are currently around one million people, came to South Africa with the Bantu expansion.
Mankind originated in Africa and stayed there for tens of thousands of years. To understand our shared genetic history, it is inevitable that we must look to Africa. Unlike elsewhere on the planet, however, African populations have been present throughout our history – they have not been exposed to the same founder effects that resulted from population expansion into unoccupied territories. Instead, these populations were mixed up as groups migrated to new areas within the continent.
Clarifying all of this would be a challenge, but it is a challenge made more difficult by the fact that most of the genomic data comes from humans in the industrialized world and the vast populations of Africa remain poorly sampled. That is starting to change, and a new paper reports on the efforts of a group that has just analyzed over 400 African genomes, many of them from populations that have never participated in genomic studies before.
New genetic variants are constantly emerging. As a result, the oldest populations – those in Africa – should have the newest variations. However, identifying these populations can be difficult when there are so many. The study mentions that there are over 2,000 ethnolinguistic groups in sub-Saharan Africa, only a small number of which were surveyed. The new study is a huge step forward with over 400 complete genome sequences from geographically dispersed populations. But even there it is limited and only adding 50 new ethno-linguistic groups and two huge regions of the continent represented by people from a single country (Zambia for central Africa and Botswana for southern Africa).
Even so, the study still recorded more than 3.4 million genetic variants that had not previously been described. These are individual locations in the genome with a base (A, T, C or G) that were not seen there in other populations.
To put that in perspective, most of us have many genetic variations. In the typical individual in the new study, these newly identified variants only account for about 2 to 5 percent of the total variations in their genomes – the rest had been seen previously. In addition, a large majority of them (88 percent) have only been seen in a single individual and therefore can only represent a variation that appeared due to a mutation within the previous generations. While there may be some new variations here that will help us untangle Africa's population history, most of what we found is what you would expect if you looked at random people elsewhere.
If we were to get close to the genetic variation in Africa, we would expect the number of new variants to decrease as we add new genomic sequences to the analysis, as each would add fewer new ones and fewer undiscovered ones. Therefore, the researchers analyzed the genomes one by one and found no evidence of them – we are still far from fully cataloging human diversity. However, they find that looking beyond the West African population would result in the greatest increase in previously undescribed variations.
To find out what the genomes say about population history, the researchers turned to principal component analysis, which identifies the main sources of difference in a large data set. The biggest difference separated the speakers of the Niger-Congo languages from all others. The second largest difference reflected the geographic distance between the speakers of Niger-Congo in West Africa and those in Southern Africa. This is likely a product of the Bantu migration, which spread a mixture of technology, language and DNA from one source in west central Africa and brought it to the rest of the continent.
The researchers use this data to argue that Bantu migration passed through Zambia on its way to southern and east Africa, but their data includes many people from Zambia so it is not clear whether this might have skewed their results.
The work also identified a number of ethno-linguistic groups that may be worth considering. One looked genetically like East Africans but was in West Africa. Two other populations were clearly associated with known language groups, but were not part of the tight genetic cluster into which most other speakers of that language fell.
Almost every population on earth is a mix of many sources – Indians, for example, are largely a mix of East Asian and ancient Siberian populations. Africans are certainly no different, but the fact that they have stayed on the same continent for so long adds to the complexity of these interactions. The new data really powers this house when analyzed for the origins of different segments of DNA.
People from the far west of Africa got a large part of their DNA from a West African source. But as you move east to Central Africa, there is always more of what you would call West-Central African DNA, which is then supplemented and later supplanted by Central African and later by South and East African sources. There is a sudden shift to a majority from East African sources as you leave Central Africa and move east, with an increasing contribution from southern Africa as you turn a little south.
While geography seems to determine most of the differences, there are contributions from distant areas of the continent across all population groups. While the Bantu migration may have been the greatest event in recent African history, it is based on a long history of population interactions.
Most variations in the human genome are completely silent as they do not affect genes or other functions and so hover randomly through populations. However, some offer an evolutionary advantage and it may be possible to capture the signal of choice for or against certain variations.
In searching for these signals, the authors found exactly what you'd expect based on previous studies on human populations. The greatest pressures on human evolution is disease, and the genes under the most pressure are involved in immune functions. After illness comes diet, and here too the Africans are quite typical, with strong evidence of selection for a handful of genes involved in carbohydrate and lipid metabolism. There have been some strange results, however, like selection of variants of genes involved in DNA repair, kidney disease, and uterine fibroids. Obviously, these need to be examined more closely before we can make any sense of them or find out if it's just wrong.
Immune function isn't the only way to deal with disease, as illustrated by the effects of the sickle cell trait on malaria. And since these are African populations, there is evidence in some of them of a selection for it. Hemoglobin is not the only pathway to malaria resistance, however, and some populations have evidence of selection for another gene (G6PD). In some cases, populations with a high frequency of sickle cell traits have landed right next to others with high G6PD selection, likely due to migration.
Apart from the cases in which clear selection signals are present, there are a number of cases in which genes have been deactivated by mutation but are still present in several people in this data set. This has been seen a few times and ran into a bit of confusion. In many cases, we have no idea what the gene is doing and so we cannot say whether or not we should be surprised by its loss. In other cases, based on studies of its loss in mice, the gene actually appears to be essential. In time, we will likely get closer to understanding what is going on, but each of these genes must be studied individually.
The beginning of a story
While this represents a great effort to understand the common genetic history of humankind, it is more of a prologue than a full story. We are closer to capturing the full diversity of the African population, but we are not done yet. And we have been able to gather more information about some of the migrations we know of in Africa, but we haven't gotten to the point where we can infer the migrations we don't know about.
This latter point is quite critical. At this stage, we can examine a piece of DNA and determine that it is likely from a West African population, for example. But we can't say much about how it ended up in West Africa in the first place. There is evidence that African populations picked up DNA from earlier branches of the human family tree, just as Eurasian populations picked up archaic DNA from Neanderthals. However, without fossil or DNA-based descriptions of these branches, they remain "ghost lines" that are invisible to us. It is possible that a small percentage of the sequences we are currently assigning to an African region belong to one of these branches and we do not yet have the tools to identify them.
Nature, 2020. DOI: 10.1038 / s41586-020-2859-7 (About DOIs).