London, February 25
Researchers have created the largest human family tree ever, dating back to 100,000 years, which can predict common ancestors, including approximately when and where they lived.
The study, published on Thursday in the journal Science, could have widespread applications in medical research, for instance identifying genetic predictors of disease risk.
Until now, the main challenges in developing such a family tree were working out a way to combine genome sequences from many different databases and developing algorithms to handle data of this size.
However, a new method developed by researchers at the University of Oxford in the UK can easily combine data from multiple sources and scale to accommodate millions of genome sequences.
The method predicts common ancestors, including approximately when and where they lived, they said.
“We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today,” said Yan Wong, an evolutionary geneticist at Oxford’s Big Data Institute.
“This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome,” Wong, one of the principal authors of the study, explained.
Since individual genomic regions are only inherited from one parent, either the mother or the father, the ancestry of each point on the genome can be thought of as a tree, the researchers said.
The set of trees, known as a “tree sequence” or “ancestral recombination graph”, links genetic regions back through time to ancestors where the genetic variation first appeared, they said.
“Essentially, we are reconstructing the genomes of our ancestors and using them to form a vast network of relationships. We can then estimate when and where these ancestors lived,” said study lead author Anthony Wilder Wohns, who undertook the research as part of his PhD at the Big Data Institute.
“The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples,” said Wohns, who is now a post-doctoral researcher at the Broad Institute of MIT and Harvard, US.
The study integrated data on modern and ancient human genomes from eight different databases and included a total of 3,609 individual genome sequences from 215 populations.
The ancient genomes included samples found across the world with ages ranging from 1,000s to over 100,000 years.
The algorithms predicted where common ancestors must be present in the evolutionary trees to explain the patterns of genetic variation, the researchers said.
The resulting network contained almost 27 million ancestors, they said.
After adding location data on these sample genomes, the researchers used the network to estimate where the predicted common ancestors had lived.
The results successfully recaptured key events in human evolutionary history, including the migration out of Africa, they said.
Although the genealogical map is already an extremely rich resource, the team plans to make it even more comprehensive by continuing to incorporate genetic data as it becomes available.
Because tree sequences store data in a highly efficient way, the dataset could easily accommodate millions of additional genomes.
“This study is laying the groundwork for the next generation of DNA sequencing,” Wong said.
“As the quality of genome sequences from modern and ancient DNA samples improves, we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today,” the scientist added.
#big data institute
#university of oxford