A new scientific study under the auspices of MyHeritage.com entitled, "Quantitative analysis of population-scale family trees with millions of relatives" promises to change many of these preconceived negative opinions about the Family Tree and other similar online cooperative family trees. I wrote an introductory post about this study entitled, "Large Online Family Trees Validated by Scientifically Conducted MyHeritage Study." Here is the citation to the published article:
Kaplanis, Joanna, Assaf Gordon, Tal Shor, Omer Weissbrod, Dan Geiger, Mary Wahl, Michael Gershovits, et al. “Quantitative Analysis of Population-Scale Family Trees with Millions of Relatives.” Science, March 1, 2018, eaam9309. https://doi.org/10.1126/science.aam9309.
This study will have a tremendous influence on the way we view online family trees and how they are going to influence the way we think about family history in the future. The study clearly shows that many of the misconceptions and negative opinions about online family trees are misplaced and simply without any basis. The study also validates some of the more involved genealogical methodologies and opens up a new avenue to demonstrate how genealogical research should be conducted. From my own perspective, some of my own opinions are directly challenged by the study's findings.
Over the next few weeks, I will be analyzing the findings of the study and showing how I think they apply to what we are doing when we say we are doing our family history or genealogy. I believe that this study will begin the process of rewriting the book about genealogical research and it is about time they did.
The article is based on a study done using the Geni.com family tree program containing over 86 million profiles (individuals in the family tree). Here is the description of the process from the MyHeritage blog.
The researchers downloaded 86 million public family tree profiles from MyHeritage’s daughter company, Geni.com, one of the world’s largest collaborative genealogy websites, out of which 43 million had detailed genealogical data, such as dates and places of birth and death. Dr. Erlich, MyHeritage’s Chief Science Officer, and his team anonymized, cleaned and validated the data; they reconciled conflicting information and fixed inaccuracies such as individuals connected to more than two biological parents, or individuals recorded as being both the parent and the child of the same individual. They validated their algorithms by comparing samples of the decisions the algorithms made against the decisions expert genealogists made in the same scenarios, and found that the outcomes matched in more than 90% of cases. Ultimately, they ended up with 5.3 million cleaned and validated independent family trees. The largest of these trees includes 13 million individuals, spanning an average of 11 generations.
After collecting, organizing, cleaning, and validating the data, the team leveraged the unique resource to investigate scientific research questions about the roles of genes in longevity and how families spread out geographically. They also created a tool that will facilitate future research by their team and others to leverage population-scale datasets of this type to answer a wide range of research questions in the future.One of the most interesting findings in the published article about this validated information is summarized in this statement:
Taken together, these results demonstrate that millions of genealogists can collaborate in order to produce high quality population-scale family trees.One of the most common criticisms of the FamilySearch.org Family Tree is that the information contains many inaccuracies and that it cannot be relied upon for good "genealogical" information. The study contradicts this assumption. Yes, there are errors, but overall, with the millions of source references being entered to the FamilySearch.org Family Tree, it is producing a high-quality population-scale family tree. There are some segments of the genealogical community that tend to look down on online family trees in general as beneath their level of genealogical expertise. This study opens the door to using the data for a variety of valid scientific purposes. Here is another conclusion from the study:
In this work, we leveraged genealogy-driven media to build a dataset of human pedigrees of massive scale that covers nearly every country in the Western world. Multiple valida-tion procedures indicated that it is possible to obtain a dataset that has similar quality to traditionally collected studies, but at much greater scale and lower cost.In other words, the information in the family trees is as accurate as other scientifically obtained data for use in research. This conclusion does not say anything about the accuracy of the individual entries, however. As my son, Jared Tanner, a professor of neuropsychology at the University of Florida preliminarily observed about the study:
The researchers did a lot of good validation analyses. I'm not through the whole article yet but they were thorough. There are still problems with the underlying validity of specific people in the pedigree but it looks like their analyses demonstrate at least high level validity.
As I mentioned above, there are a number of other findings made by the study that should be discussed. Stay tuned.