We finally have more African genomes, revealing over 3 million new variations.
JOHN TIMMER - 10/28/2020
Enlarge / A building in a Ndebele village, South Africa. The Ndebele-language speakers, currently about a million strong, arrived in South Africa with the Bantu expansion.
Humanity originated in Africa, and it remained there for tens of thousands of years. To understand our shared genetic history, it's inevitable that we have to look to Africa. Unlike elsewhere on the planet, however, African populations were present throughout our history—they weren't subject to the same sorts of founder effects seen as populations expanded into unoccupied areas. Instead, those populations were scrambled as groups migrated to new areas within the continent.
Sorting out all of this would be a challenge, but it's one that has been made harder by the fact that most genome data comes from people in the industrialized world, leaving the vast populations of Africa poorly sampled. That's starting to change, and a new paper reports on the efforts of a group that has just analyzed over 400 African genomes, many coming from populations that have never participated in genome studies before.
New diversity
New genetic variants arise all the time. As a result, the oldest populations—those in Africa—should have the most novel variations. But identifying these populations can be hard when there are so many; the study mentions that there are over 2,000 ethnolinguistic groups in sub-Saharan Africa, and only a small number of those have been sampled. The new study is a huge step forward, with over 400 complete genome sequences from geographically dispersed populations. But even there, it's limited, adding only 50 new ethnolinguistic groups and two vast regions of the continent represented by people from a single country (Zambia for Central Africa and Botswana for Southern Africa).
That said, the study still picked up more than approximately 3.4 million genetic variants that hadn't been described previously. These are single sites in the genome with a base (A, T, C, or G) that hadn't been seen there in other populations.
To put that in perspective, most of us carry lots of genetic variations. In the typical individual in the new study, these newly identified variants only account for about 2-5 percent of the total variations in their genomes—all the rest had been seen previously. In addition, a large majority of them (88 percent) were only seen in a single individual and so may only represent a variation that had occurred through a mutation within the last few generations. So, while there might be some new variants here that will help us untangle Africa's population history, most of what we've found is the sort of thing you'd expect from looking at random humans elsewhere.
If we were getting close to having a good grip on the genetic variation present in Africa, then we'd expect to see the number of new variants tail off as we add new genome sequences to the analysis, as each new one would add fewer and fewer undiscovered ones. So, the researchers analyzed the genomes one at a time and found no evidence of this happening—we're still nowhere close to fully cataloging human diversity. They do find, however, that looking beyond West African populations would give us the biggest increase in previously undescribed variation.
Population churn
To try to identify what the genomes tell us about population histories, the researchers turned to principal component analysis, which identifies the major sources of difference in a large set of data. The largest difference separated speakers of Niger-Congo languages from all the rest. The second-largest difference mirrored the geographic distance between Niger-Congo speakers in West Africa and those in Southern Africa. This is likely a product of the Bantu migration, which spread a mix of technology, language, and DNA from a source in West-Central Africa, bringing them to the rest of the continent.
The researchers use this data to argue that the Bantu Migration passed through Zambia on its way to Southern and East Africa, but their data includes a lot of people from Zambia, so it's not clear whether that might have biased their results.
FURTHER READING How the Bantu people surged across two-thirds of Africa
The work also identified a number of ethnolinguistic groups that might be worth looking at in more detail. One looked genetically like East Africans but was located in West Africa. Two other populations were clearly associated with known language groups but weren't part of the tight genetic cluster that most other speakers of that language fell within.
Nearly every population on Earth is a mixture of many sources—Native Americans are largely a mixture of East Asian and ancient Siberian populations, for example. Africans are certainly no different, but the fact that they've stayed on the same continent for so long increases the complexity of these interactions. The new data really drives that home when analyzed for the origins of different segments of DNA.
People from the far west of Africa have a large majority of their DNA from what you could call a West African source. But as you move east into Central Africa, there's an increasing amount of what you'd have to call West-Central African DNA, which is then joined and later displaced by Central African and then a smattering of Southern and East African sources. There's a sudden shift to a majority from East African sources as you exit Central Africa moving east, with an increasing contribution from Southern Africa if you turn south a bit.
While geography seems to drive the majority of the differences, in all populations there are contributions from distant areas of the continent. So, while the Bantu migration may have been the largest event in recent African history, it's layered on top of a long history of population interactions.
What’s changing
Most variations in the human genome are completely silent, as they don't affect genes or other functions and so float through populations at random. A few, however, provide evolutionary advantage, and it can be possible to detect the signal of the selection for or against specific variations.
Searching for these signals, the authors found exactly what you'd expect based on past studies of human populations. The strongest pressure on human evolution is disease, and the genes that are subject to the most pressure are involved in immune functions. After disease comes diet, and again, Africans are quite typical, with strong signs of selection on a handful of genes involved with carbohydrate and lipid metabolism. There were some oddball results, however, such as selection for variants of genes involved in DNA repair, kidney disease, and uterine fibroids. Obviously, those will have to be examined in more detail before we can make any sense out of them or see if it's just spurious.
Immune function isn't the only way to handle diseases, as the sickle cell trait's effects on malaria make clear. And, these being African populations, there's evidence of selection for that in some of them. But hemoglobin isn't the only route to malaria resistance, and some populations show evidence of selection for a different gene (G6PD). In some cases, populations that have high frequency of sickle cell trait have ended up right next to others that have high levels of G6PD selection, likely as a result of migration.
Beyond the cases where there are clear signals of selection, there are a number of cases where genes have been disabled by mutation but are still present in multiple individuals in this data set. That has been something that has been seen a number of times before and has been met with a bit of confusion. In many cases, we have no idea what the gene does and so can't tell whether we should be surprised by its loss or not. In others, the gene actually appears to be essential based on studies of its loss in mice. Over time, we'll probably get closer to understanding what's going on, but each of these genes will have to be studied individually in order to do so.
The start of a story
While this represents a major effort toward understanding humanity's shared genetic history, it's more of a prologue than a complete story. We've gotten closer to capturing the full diversity of African populations but clearly aren't done yet. And we've been able to piece together more information on some of the migrations within Africa that we know about but aren't at the point where we can infer anything about the migrations we don't know about.
That latter point is rather critical. At this stage, we can examine a piece of DNA and determine that it probably originated in, say, a West African population. But we can't say much about how it ended up in West Africa in the first place. There's evidence that, much as Eurasian populations picked up archaic DNA from Neanderthals, African populations picked up DNA from earlier branches from the human family tree. But, without fossil or DNA-based descriptions of those branches, they remain "ghost lineages" that are invisible to us. It's possible that some small percentage of the sequences we currently assign to an African region belong to one of these branches, and we don't have the tools to identify it yet.
Nature, 2020. DOI: 10.1038/s41586-020-2859-7 (About DOIs).
JOHN TIMMER became Ars Technica's science editor in 2007 after spending 15 years doing biology research at places like Berkeley and Cornell.