Genealogy from the perspective of a member of The Church of Jesus Christ of Latter-day Saints (Mormon, LDS)

Wednesday, December 2, 2015

The FamilySearch Family Tree -- A Review and Retrospective -- Part One

For many years now, I have been intensely involved with FamilySearch both as a company and with their online program, FamilySearch.org and, as a consequence of this contact, I have written dozens of blog posts about the introduction and progress of the now abandoned new.FamilySearch.org program and its successor, the Family Tree program. As we draw to the end of another year, I decided to review the current status of the Family Tree from the perspective of my long term involvement. This post may become somewhat lengthy, but we shall see.

To introduce this subject it is very important to distinguish between the content or data contained in the two programs and the programs themselves. The new.FamilySearch.org program (hereinafter NFS) was an innovative approach to handling a huge amount of complex data. Unfortunately, right from its introduction, it did not work. This was caused not only by design flaws but by the designers underestimating the complexity and challenges of the data. Apparently no one had quantified the vast number of duplicate entries that would result from combining the historically accumulated and user generated data. Initially the NFS program was seeded by the following large databases:
  • The accumulated computer database extracted from user contributed family group records and called the Ancestral File or AF.
  • Another huge compiled computer database composed of user submitted records that were used to perform Temple ordinances combined with records extracted, primarily from English parish registers, also used for generating names for Temple ordinances called the International Genealogical Index or IGI.
  • A random accumulation of user submitted family trees called the Pedigree Resource File or PRF.
  • Membership records from The Church of Jesus Christ of Latter-day Saints.
  • Accumulated records from the Temples concerning ordinances performed. 
The total number of names in these files alone probably exceeded a billion. Initially the NFS program used a computer algorithm to "combine" suspected duplicate records. So, at the time the program began to be introduced in stages by Temple districts across the United States beginning in Florida, the NFS program already contained a huge number of identified combined entries. For example when I first examined the program (I began writing about FamilySearch in 2008), I found that I personally had 5 combined entries from my membership record, AF records, and PRF submissions. Some of my ancestors' records exceeded 800 combined entries. I remember hearing comments from FamilySearch representatives that they had no idea about the extent of the number of duplicates in the data at the time NFS was introduced. This data was the combined efforts of over 100 years of accumulated family history research by tens of thousands of individuals. 

NFS never progressed beyond the BETA test stage as a program; there was never a final release version. As more data was added to the program from its registered users, even more duplicates were discovered. Apparently, the program was not capable of handling the vast number of duplicates and an arbitrary limit was imposed on the allowed number of combined records for any one individual. This left a reservoir of duplicate, uncombined records "floating" around in the system.

The issue of duplicate records was a serious one going back to the late 1800s when the Church first began to address the problem. The issue was that the same individuals were being processed with duplicate Temple ordinances. This was the result of individuals submitting the same individual's names with no effective method for eliminating the duplicate entries. There is a long history of the attempts to address this problem. The only real source for this history is a book published by Brigham Young University. Here is the reference:

Allen, James B, Jessie L Embry, and Kahlile B Mehr. Hearts Turned to the Fathers: A History of the Genealogical Society of Utah, 1894-1994. Provo, Utah: BYU Studies, Brigham Young University, 1995.

My initial reaction to the problem of duplicate entries in NFS was that it was a mistake to have seeded the program with all of that unreliable, user generated data especially since comparatively little of the data had adequate supporting sources. I have since come to the conclusion that addressing the issue of the mass of accumulated data was inevitable and the only way this could have been accomplished was to metaphorically and actually dump it all out in a huge pile in order to begin the process of sorting it out despite the pain and agony this might cause. 

Very soon after its introduction, FamilySearch realized that the NFS program would not work and plans were made to move to another program referred to as "Family Tree." This new program was based on a widely used wiki model of handling data. As it turned out this approach, utilizing a wiki-based program, was really the best ultimate solution for processing the vast amount of data seeded to the NFS program and subsequent extensive additions by users. 

At RootsTech 2012, FamilySearch introduced the Family Tree program. Since that time, the program has been vigorously updated and substantially changed. From the onset, my opinion was, and still is, that the Family Tree program is the solution to the data problems not the problem itself. Unfortunately the acceptance of the entire concept behind regularizing the data in a universal family tree program has been rather difficult for many users to accept. 

It is important to note that from its introduction, the Family Tree program had the potential to resolve nearly all the accumulated issues including the substantial number of duplicates. At this point it is also important to understand that when an individual entry in the NFS program was a duplicate, in many cases this was an indication that there was an entire pedigree of associated duplicates in the data. Most submissions to the Temple involved entire families, so if there is was one duplicate there were likely several more on the same family group record submission. In addition, it was early on very obvious that there were a substantial number of duplicate records in the Ancestral File program. There were also duplicate submissions in the Pedigree Resource File. There were even duplicate membership and Temple records.

It is also very important to understand that the Family Tree program did not "begin over" with new data. The database in use by NFS was adopted into the Family Tree with all of its duplicates and defects. The adoption of the preexisting NFS database also imposed some arbitrary limits on the data in the Family Tree, but more about this later. 

Before the Family Tree program was introduced, the issue of the accuracy of the data in the accumulated database on NFS was not a primary concern since there was no easily accessible way of adding sources, especially sources that impacted the data already in the program. Further, the users could not edit entries in NFS and the only recourse was to add additional information. Thus if I found an alternative name, date or place, I could only add the information to NFS where it competed with the previous information. There was also no particular way to resolve differences between multiple entires. You could combine the entries that appeared to be duplicates, but if the person in the program had reached their combinatory limit, there was nothing that could be done.

Family Tree introduced a robust method of adding sources and implemented the ability to merge duplicates, not just combine them with the existing entries. Because of its open, universally accessible format as a wiki-based program, any registered user could add, correct, delete, and change any of the information in the Family Tree. Finally there was a way to move beyond the accumulated duplicates and errors of the past to a reliable reference database. Unfortunately the Family Tree's main strength was seen as a problem for most of the users. Genealogy or family history has traditionally been a very private and closed pursuit characterized by the accumulation of massive Books of Remembrance. Any collaboration was done at "arm's length" through letters. This reality had engendered a culture of ownership. The family historian became a culturally defined role in the Church. The family historian was great for stories at family reunions but someone to be mostly ignored at other times. I have accounts of family historians who would not let their own children look at their accumulation of records and documents. I could go on about the difficulty of assuming the job of being family historian, but that will have to wait another day's writing.

This cultural overlay of ownership of the family history created an instant conflict with the evolving Family Tree. In addition, the fact that the Family Tree was an open forum where anyone, regardless of their genealogical background, could alter the data made the program seem insubstantial. Because the program was constantly and rapidly evolving, casual or infrequent users of the program got the impression that they had to "relearn" the program every time they logged in. Despite the negative reaction of those who were family historians or even the positive reactions of many users, for the most part, the program was simply ignored.

When a user came to the program with adequate computer and network skills and no predetermined opinions about how to do "family history," the Family Tree was seen as the efficient and competent program it actually was. Changes to the program are viewed as upgrades and improvements and are accepted as a matter of course.

I guess I have a lot more to say on this subject and this is beginning to look more and more like a series. So I guess I will end Part One here. Stay tuned.

4 comments:

  1. Around 8 years ago I asked my dad to show me the genealogy he had done. He replied that it was all lost because his floppy disks were damaged or too old. I thought it was so sad that all that work had been lost. Now with Family Tree I've realized that everything he had done was probably replicated several times by our relatives. Personally I have really liked using the program and especially being able to collaborate with my relatives to back up everything in the tree with sources and life histories. I guess I'm new enough to family history that the culture of keeping it private was never something I encountered.

    ReplyDelete
  2. For me, the big issue is still that there are no validation checks as new data is input. Users can completely ignore notes, sources, and discussions, and put in obviously incorrect data. The system will create a data problem flag after the obviously incorrect data has been input, but there is nothing to question the user as data is being put in. This imposes a huge cost on the rest of us who have to constantly monitor for changes and then spend hours putting things back the way they were. It seems like there has to be a better way.

    ReplyDelete
    Replies
    1. I heartily agree. There are already some programs, such as RootsMagic, that check to see if a county was in existence at the time a place is entered. There should be something that flags when a source has the information and not allow a change without a reference to a different source or at least a statement as to why the existing source is not acceptable.

      Delete
  3. Thanks for the background of all of this. I'm looking forward to your follow-up posts. This is such important information to have available when confronted by the 'family historians" of the old methods.

    ReplyDelete