The Genealogist’s Internet |
While online genealogy is essentially about finding and making use of information, it is important to be aware of some general issues involved in using internet resources and in using the web as a publishing medium. Also important are the limitations in what is and is not likely to be on the internet. The aim of this chapter is to discuss some of these issues.
Needless to say, technophobes, Luddites and other folk of a backward-looking disposition are happy to accuse the internet of dumbing down the noble art of genealogy — anything so easy surely cannot be sound research.
Loath as I am to agree with technophobes, there is actually some truth in this. Though the medium itself can hardly be blamed for its misuse, the internet does give scope to a sort of ‘trainspotting’ attitude to genealogy, where it is just a matter of filling out your family tree with plausible and preferably interesting ancestors, with little regard for accuracy or traditional standards of proof. Because more can (apparently) be done without consulting original records, it becomes easy to overlook the fact that a family tree constructed solely from online sources, unchecked against any original records, is sure to contain many inaccuracies even if it is not entirely unsound. This is far from new, of course; today’s is hardly the first generation where some people want their family tree to be impressive rather than accurate. The internet just makes it easier both to construct and to disseminate pedigrees of doubtful accuracy.
But genealogy is a form of historical research, and you cannot really do it successfully without developing some understanding of the records from which a family history is constructed, and the principles for drawing reliable conclusions from them. Some of the tutorial materials mentioned in Chapter 2 address these issues — see ‘Research methods’ on p. 11 — but the most coherent set of principles and standards available online are those developed by the US National Genealogical Society, which can be found at <www.familysearch.org/en/wiki/Genealogical_Standards_and_Guidelines_-_International_Institute>:
The first of these is essential reading for anyone new to genealogy, while the third has sound advice for anyone using the internet to research their family tree.
The nature of the primary data online has an important implication for how you use information found on the internet: you need to be very cautious about inferences drawn from it. For a start, all transcriptions and indexes of any size contain errors — the only question is how many.
Where information comes from parish registers, for example, you need to be cautious about identifying an individual ancestor from a single record in an online database. The fact that you have found a baptism with the right name, at about the right date and in about the right place, does not mean you have found an ancestor. How do you know this child was not buried two weeks later, a fact recorded in a burial register which is not online? How do you know there is not a very similar baptism in a neighbouring parish whose records you didn’t look for? How do you know there is not an error in the index/transcription? As more records are put online with images accompanying transcriptions or indexes, the last question, to be sure, will become less important, but no future internet development will allow you to ignore the other questions.
Unfortunately, the very ease of the internet can sometimes make beginners think that constructing a pedigree is easier than it is. It is not enough to find a plausible-looking baptism online. You have to be able to demonstrate that this must be (not just could be) the same individual who marries twenty years later or who is the parent of a particular child. The internet does not do this for you. The only thing it can do is provide some of the material you need for that proof, and even then you will have to be more careful with online material than you would be with original records.
In particular, negative inferences (for example, an ancestor wasn’t born later than such and such a date) can be very important in constructing a family tree, but the original material on the internet will rarely allow you to make such inferences. Not even where a particular set of records has been put online in its entirety could you start to be confident in drawing a negative inference. For example, there is no simple conclusion to be drawn if you fail to find an ancestor in a particular census. He or she could have no longer been alive, or was living abroad, or is in the census but has been mistranscribed in the index, or was in the census until the relevant enumeration book went missing. Of course, such problems relate to all indexes, not just those online, but you can never be more confident about online records.
Also, you need to be very cautious about drawing conclusions based not on primary sources but on compiled pedigrees put online by other genealogists. Some of these represent careful genealogical work and come with detailed documentation of sources; others may just have a name and possible birth year, perhaps supplied from memory by an ageing relative — insufficient detail to be of great value, with no guarantee of accuracy, and impossible to verify. At best, you can regard such materials as helpful pointers to someone who might have useful information, or to sources you have not yet examined yourself. It would be very unwise simply to incorporate the information in your own pedigree simply because it appears to refer to an individual you have already identified as an ancestor.
While the increasing amount and range of genealogical material online, both free and commercial, can only be a good thing, it does not mean that these datasets are without their problems.
In particular, there is the question of the accuracy of the indexing. Of course, anyone who indexes the 30 million records or so in a census is not going to do so without a level of error, but the question is: what is an acceptable level of error? What can digitizers reasonably be expected to do, without incurring insupportable extra costs, to minimize the level of error?
With so many massive datasets, where it’s impossible to check every entry, one of the problems is that it’s extremely difficult to come to firm conclusions about which site has the best quality data and which has the most (and most serious) errors. Also, because of differences in the search facilities, it’s not always possible to make direct comparisons, and it is therefore not even a straightforward matter to develop diagnostic tests as a basis for some sort of independent benchmarking
But with civil registration indexes and all the censuses available at more than one site, one hopes that competition, not to mention pride in their own products, will keep the data services striving for a good reputation. On the other hand, perhaps this is an optimistic view: as there are so many reasons why one might fail to locate an ancestor in a census, only some of which can be put down to errors in transcription or indexing, it may be they can afford to be cavalier about quality. However, competition has surely led to the improvement in the quality of census images on the commercial sites, where older images have gradually been replaced by higher resolution versions.
There is an argument that the number of competing commercial data services perhaps makes the fact of errors less important. Genealogists just have to accept that the alternative to having better quality data, which would come at a significantly higher price, is that occasionally you will need to use more than one site when looking for a particular record.
While there are many free resources on the web, the fact is that with a few exceptions (FamilySearch, FreeBMD, the Irish census site) the major sets of digitized records are available on commercial data services. But given that these are almost entirely public records, is this appropriate?
Prior to the release of the 1901 census in January 2002, there was considerable debate within the genealogical community about the appropriateness of government agencies, already funded by the tax payer, seeking income by charging for online access to public records. There was a feeling that the limited offline availability of the 1901 census on microfiche, which cynics viewed simply as a move to safeguard online income, took insufficient account of the many people who had no internet access.
Ten years on, that particular argument has now lost any validity it might have had at the time. Anyone who has difficulty finding a place with internet access nowadays will surely find it even harder to get to The National Archives or a record office! For almost everyone, the costs of using a commercial data service are significantly less than the costs in time and travel of visiting a repository, not to mention the fact that the money goes to the data providers rather than to transport or oil companies. In fact, if you can get to The National Archives in Kew, you can indeed enjoy free access to much of the data for England and Wales.
But, in fact, now that internet access is so widely available whether in the home or from public libraries, providing a service primarily online is not the contentious issue it once was. Indeed, with the government promoting the use of the internet for the delivery of all sorts of services, it is now very hard for a public body to publish any records or data without being obliged to make them available on the web.
Also, it shouldn’t be forgotten that traditional modes of access to records are also heavily biased against quite large groups of people: anyone who is not mobile, lives far from repositories they need to consult, or has no free time during the working day has always found it hard to make progress with their family history. One of the reasons for the growth of genealogy in recent years is that the internet has made it realistic for these people to devote time to family history research.
In purely practical terms, the fact is that progress on digitizing the nation’s historical records would have been very much slower if it had to be done from existing funds, rely on Lottery funding, or just use volunteer indexers. Look at FreeBMD: even with 10,000 volunteers and highly professional infrastructure it has taken 13 years to transcribe 210 million very brief records, many of which are in printed form (see p. 70). Since the creation of large digital resources is immensely expensive at a time when public funding for repositories is decreasing, charging seems to be the only option unless we are prepared to wait quite a long time.
And one mustn’t overlook the argument of the non-genealogists: unless you’re prepared to start contributing to their football season tickets or yoga classes, why should they be subsidizing your hobby?
However, while it’s difficult to argue that we shouldn’t pay data services for using indexes that they have created, you might still ask why we can’t have access to the images of the records themselves, (which remain the property of the public body which holds them), without signing up with a commercial service. Why shouldn’t the census and other record images be freely available in the way that Medway Council has made its parish registers available (see p. 103), or that FreeBMD has been allowed by the GRO to make the GRO index images available.
Even in less straitened times, though, this is not as straightforward as it may seem — whenever you download a free image, someone is paying to run and maintain the equipment on which it is held and for the bandwidth to transmit it. FreeBMD, for example, is free to the user not only because the transcription is being done by volunteers but also because the expenses of the project are being borne by others who are not demanding any money in return and the project is supported by sponsorship from RootsWeb. In any case, you would not argue that a record office should let you have free photocopies of the records in their keeping.
But this obvious truth does not tackle the broader issues of the ownership of public records, which the Open Genealogy Alliance (see p. 395) has recently raised.
I see no problem in public bodies licensing commercial developers to create digital images of public records, or granting a licence to subsequent companies to create new indexes from those digitized records. But there is a real problem where such licensing is exclusive. Regardless of any political preference for or against a free market in genealogical records, there is a fundamental practical objection to a public body granting a monopoly licence to an individual supplier, as has been the case with Scottish records for the last 14 years. For all the merits of the online Scottish records, it is inescapable that any index of a nation’s entire population will have a significant level of error. And a monopoly index means that the records of some individuals, in the absence of an alternative index with different errors, are essentially unfindable. As it happens, the monopoly on Scottish census indexes was broken when Ancestry UK went ahead and created their own Scottish census indexes without the agreement of GROS. But GROS, in spite of public statements to the contrary, seem to be unwilling to allow their own digital images to be licensed to others or to allow alternative digitizations, so one still needs to go to ScotlandsPeople to verify the information. Now that there are several well-established data services, each with a substantial customer base and the ability to undertake new large-scale digitizations, it would seem to be problematic, in spite of competitive tendering, for government agencies to have long-term exclusive deals with one company for sets of national records.
This is not to deny that close co-operation between record providers and digitizers is undoubtedly beneficial, since those providing the records will be aware of many of the problems in the original documents, and their involvement can help to ensure a better quality of indexing.
Secondly, there is the question of restrictions on the use of the material. The data services of course have a right to protect the indexes which are the product of their very considerable investment. But should images of public records be treated the same way? Whether or not modern images of out-of-copyright historical documents are protected by UK copyright law, which is unclear,[1] record holders and data services have in fact found a way to confer on them a much more stringent form of protection: getting you to agree to their terms and conditions, effectively granting a permanent pseudo-copyright.
With copyright, the protection expires, and there are exemptions for certain types of use — and all this, even if you don’t agree with it, has at least been subject to public scrutiny in Parliament. In the terms and conditions for accessing images of historical records online, the record holders and data services are effectively free to override this on the basis of a purely internal, administrative decision.
Now in purely pragmatic terms, this might be regarded as inevitable — expensive digitization projects will never be undertaken if the income can be easily undermined — and on the personal level it is rather unproblematic. Would most users of the British Newspaper Archive (p. 198) in practice be happy to give up rights which they may well not exploit, in exchange for not having to set aside a whole day to go to the British Newspaper Library in Colindale and spend hours hunting for an article? Of course they would. However, whether it is appropriate for a public body which is specifically funded to act as the guardian of such materials to declare unilaterally, in effect, a permanent copyright is another matter entirely.
Of course, it is not in the interest of family historians to do anything to discourage data services from creating new indexes, or to put at risk any of the already inadequate funds of the public bodies who preserve our documentary heritage, but the concerns of the Open Genealogy Alliance certainly merit wider discussion.
The internet makes it very easy to disseminate information, but just because you can disseminate material it does not mean that you should. Both websites and email messages are treated by the law as publications. If you circulate or republish material you did not create, you may be infringing someone’s copyright by doing so. Of course, genealogical facts themselves are not subject to copyright, but a modern transcription of an original record might be, and a compilation of facts in a database is also protected, though for a more limited duration.
This means you should not put on your own website, upload to a database or post to a mailing list:
There is an exemption of ‘fair dealing’ which allows some copying, but this is only for purposes of criticism or private study, not for republishing or passing on to others. Extracting a single record from a CD-ROM and emailing it to an individual is probably OK, but posting the same information to a mailing list, which means it will be permanently archived, is not. Note that some companies include licence conditions with CD-ROMs stating that you must not supply the information to third parties, though it is not clear that such a condition is legally enforceable — a similar ban on lookups in a reference book would seem to be ridiculous.
A number of people have been shocked to find their own genealogical databases submitted to an online pedigree database without their knowledge. Mark Howells covers this issue very thoroughly in ‘Share and Beware — Sharing Genealogy in the Information Age’ at <www.oz.net/~markhow/writing/share.htm>. Barbara A. Brown discusses the dissemination of ‘dishonest research’ in ‘Restoring Ethics to Genealogy’ at <www.iigs.org/newsletter/9904news/ethics.htm.en>. Steve’s Genealogy Blog has a posting about ‘Ethics in Publishing Family Histories’ at <stephendanko.com/blog/2007/07/31/ethics-in-publishing-family-histories/>.
The current Crown Copyright rules, however, mean that you can include extracts from unpublished copyright material held by The National Archives as long as the source is acknowledged. The National Archives’ ‘Copyright’ leaflet at <www.nationalarchives.gov.uk/legal/pdf/copyright_full.pdf> explains which of their holdings are and are not covered by Crown Copyright, and the Government’s ‘Crown Copyright in the Information Age’ <www.opsi.gov.uk/advice/crown-copyright/crown-copyright-in-the-information-age.pdf> gives general guidance about Crown Copyright. In general, you should have no qualms about the textual content of other historical material over 150 years old if you are transcribing it yourself. But a present-day transcription of a manuscript document is prima facie the original work of the transcriber, though it might be argued that transcribing a printed resource hardly involves the element of creativity or skill which justifies a copyright claim. The creators of recently made images of historical documents tend to claim that these images are copyright, even though, strictly, this is not addressed by UK copyright legislation.
David Hawgood’s ‘Copyright for Family Historians’ at <www.genuki.org.uk/org/Copyright.html> offers some informal guidance tailored for genealogists, while for more general and definitive information, there is the official website of the UK Intellectual Property Office at <www.ipo.gov.uk>.
Another important issue is privacy. Contrary to a widespread popular belief, the UK’s Data Protection legislation does not prohibit the publication of private information about an individual — if this were the case, then, rather obviously, certain newspapers would no longer be commercially viable. The Human Rights Act enshrines in law a right to private life, but it’s difficult to see how this could be used to censor information derived from official, publicly available sources. Of course, if your online family tree says your still-living Uncle Arthur is a drunkard and he disagrees, that’s another matter. The real problem with publishing information about living family members is that many people will regard it as discourteous at the very least. Your Uncle Arthur will probably not sue for libel, but he might stop talking to you or not leave you the family photographs.
In any case, even if it’s just a matter of births, marriages and deaths, it’s difficult to see any need to publish this information about the living in order to further genealogical research, which would be the only other justification. Conversely, though, in the absence of any legal protection, it’s not clear that you have any legal recourse if someone publishes information about your immediate family online, though if they have used a pedigree database such as those discussed in Chapter 14, you should be able to get the service to take action. Where there’s a need to use the web to share information within a family, there are many sites that will allow you to restrict who can see what information.
The objection to publishing information about living family members was always that they might take umbrage. But there is nowadays a more serious objection. Many commercial services use questions about someone’s past as a security check. If you can amass enough information about someone, you can impersonate them online. In principle, an online tree might put someone at risk of identity theft. But, in fact, you probably don’t list the names of your cousin’s first school, pet hamster, favourite book, etc., in an online tree. And any company still using the mother’s maiden name as a security check should be avoided as incompetent.
Indeed, it is the information that is already available, often made so by the individual concerned, that is the real threat. In these days of blogging, Facebook and Flickr, much about people’s lives is publicly available in a way which goes well beyond the ‘secrets’ revealed by a family tree.
On the other hand, it seems that a concern with privacy might be a threat to reasonable publication of genealogical data. In its original proposals for digitization of the civil registration service, the GRO argued that certain items of data should be withheld, including occupations, addresses, and causes of death. While no-one would argue that someone’s privacy should be threatened just because genealogists ‘need’ access to certain types of information, there has to be a very good case for suppressing information on public records that are the foundations of citizenship. Given that details of marriages, for example, are published in advance specifically to permit public scrutiny, why on earth would anyone consider that the details on the eventual certificate give rise to privacy concerns? Considering how often we see reports of credit card details accidentally exposed on websites, highly confidential personal information absent-mindedly left in taxis or sent, unencrypted and unsecured, by ordinary post, it seems absurd to be worrying about 20-year-old addresses and the privacy of the dead.
In the USA, there are currently moves to restrict access to the Social Security Death Index (SSDI) as a fraud prevention measure, and US genealogists understandably see this as a misplaced and unwelcome attempt to restrict access to public records.
There is a mailing list, LEGAL-ENGWLS, for the discussion of ‘legal aspects of genealogical research in England and Wales including copyright, database rights, data protection, and privacy’ — details at <lists.rootsweb.ancestry.com/index/intl/UK/LEGAL-ENGWLS.html>.
Information is not much use if you cannot find it, and search engines are able to capture only a fraction of the material on the web. Of course, it is impossible to foresee technological advances, but there is no sign at the moment that the coverage of search engines will improve significantly. Websites of individual genealogists, in particular, will probably become harder to find. In addition, the increasing amount of data held in online databases is not discoverable by search engines, and it becomes more important than ever that there should be gateways and directories (or even books!) to direct people to the sources of online data.
The quality of indexing provided by search engines is limited by the poor facilities currently available for marking up text in HTML with semantic information. Search engines cannot tell that Kent is a surname in ‘Clark Kent’ but a place-name in ‘Maidstone, Kent’. This is because web authors have no way of indicating this in HTML markup. As so many British surnames are the names of places or occupations, this is a significant problem for UK genealogists.
The situation could improve when a more sophisticated markup language, XML, starts to be used widely on the web — this allows information to be tagged descriptively, and will enable the development of a special markup language for genealogical information. Such a development (and its retrospective application to material already published on the web) is very slow in coming and will require considerable work, though the LDS Church has made a start by proposing an XML successor to GEDCOM (see the GEDCOM FAQ at <www.gedcom.org/faq.html>). But the benefits of such an approach are already apparent in a project like the Old Bailey Proceedings (see p. 124), which can distinguish between the names of the accused, the victim, and witnesses.
Another problem is the increasing number of sites with surname resources, making it impossible to check everywhere for others who share your interests. Mercifully, the number of pedigree databases (see Chapter 14) remains manageable for the present, but the number of sites, particularly message boards, with surname-related material makes exhaustive searching impossible.
However, on a more positive note, it’s clear that, with so much work being done on making archival catalogues available, it will become easier than ever to track down original documents in record offices and other repositories, and genealogists in general will start to make much more use of records that in the past only the expert might have been able to take advantage of.
To anyone who has not grown up with the web, there is one deeply troubling aspect of internet resources: their tendency to disappear. We are used to the idea that once information is published in book form, it may become hard to find, but it doesn’t generally disappear, particularly if it is important or useful, in less than a century or two. But the fact is that important internet resources are constantly at risk.
Large digitized datasets are not really threatened, because they have a commercial value which protects them from oblivion, but there are two types of valuable resource which are particularly vulnerable: publicly funded and volunteer projects.
In the first of these cases, even if the initial funding does envisage some provision for long-term hosting and maintenance, it will not be open-ended. Also, there can be no guarantee that some new broom will not cancel funding already promised, deciding that the money can be better used for some new project. Unfortunately, there is often more kudos in getting a new project off the ground than in maintaining an old one, particularly if new management is keen to make its mark.
A salutary example is Familia, a very useful, indeed award-winning site which listed genealogical holdings in UK and Irish public libraries. Its initial funding was pulled in 2001. It limped on for another nine years hosted by organizations which showed little interest in maintaining the site (by updating dead links), let alone carrying on with its remit, since after all they had limited funds, which understandably were prioritized for their own projects. Finally it was abandoned. Mercifully much of the data has in fact been rescued (by Cornucopia, see p. 207) though it is far from easy to find, but the full site is preserved only on the Wayback Machine (<www.familia.org.uk>, 5 July 2009). I would be very surprised if any reader of this book didn’t think the site ought to have been kept going.
Familia has now been restored in its original location.
You can get a good idea of the problems faced by even the most successful projects from the following message posted on A Vision of Britain in December 2008 by the project’s director, Humphrey Southall:
A Vision of Britain through Time launched in October 2004, and for the first three years running costs were paid by the British Library. We managed to save up a little money in that period and we earned a bit more by licensing data, so we were able to keep it going for a fourth year, until September 2008.
The site is still running in December 2008 through a new grant from the Joint Information Systems Committee, the IT arm of the Higher Education Funding Councils. This grant is to build an extended version of the site to launch in the spring of 2009, but we are also using it to pay Edinburgh University, who host the site for us.
That will keep us going only until the end of March 2009, and from then on we have to pay our way. This means the site is going to look a little different, but it is still far from commercial: the only use to which money generated from the site will be put is keeping it going. The JISC grant will be funding a new web server for us, but we really need to start saving for the next new server which will be needed around 2012/13. However, the immediate problem is simply covering a five-figure annual hosting bill.
…
It is very frustrating that there seems to be no route at all by which a resource created by individual initiative can apply for public money to keep it running, no matter how uncommercial the original motive, how useful the content, how popular the end result.[2]
In 2012, it is good to see that A Vision of Britain’s funding is in fact secure for another couple of years, but the problem is not going to go away, and there are many other projects in the same situation.
Some of the most valuable genealogical resources on the web are the results of a single individual or a small group devoting massive amounts of time to them. Of course, these don’t have the large-scale funding issues of A Vision of Britain, but they too are at risk: inevitably, the individuals concerned will at some point be forced by circumstances to give up their efforts, even if it is only the ultimate circumstance of their death. Unless arrangements for succession have already been put in place, all the material and any domain name for the project will become the responsibility of the next of kin, who, apart from having more pressing concerns, may not know what to do or who to contact to secure its future, or have the technical skills to manage a transition.
It’s true that the Wayback Machine at <www.archive.org> can often provide a partial back-stop, but that is not a satisfactory basis for preserving valuable resources. The British Library has a digital archiving project (see p. xv), but at present that is solely for resources of their choosing, though, logically (if not practically), it is a small step for the BL’s remit for the preservation of the printed word to be extended to online publication. Of course, any archiving is better than none, but just copying files is quite inadequate for many modern sites, which use a variety of techniques to generate pages dynamically, rather than delivering static pages, and which may require the web server to be appropriately configured.
As far as I can see, this set of problems has received scant attention from the genealogical world, which is otherwise so concerned and so careful about the preservation of materials and information.
The changes in the practice of family history that have been brought about by the internet are extraordinary and on the whole very positive. However, it’s important to keep a sense of perspective, and to recognize that none of this has made any difference to the fundamentals of family history research: consulting records, drawing conclusions and sharing information. Nor is there any prospect of basing a family tree solely on digitized records — just consider how long it’s taking to get civil registration records online!
The internet has not ‘automated’ family history or modified its principles and methods. Nor does it need to — there is nothing wrong with the traditional methods of genealogy. The fact that many historical records are easier than ever to access doesn’t actually make them any easier to interpret. Indeed, it may make them harder to interpret, if a search delivers an individual piece of information shorn of its context.
What the internet has revolutionized is not the process of genealogy, but the ease with which some of the research can be carried out. The key aspects of this are:
Although microfilm and microfiche are not going to disappear in the immediate future, any more than books are, the internet is now the publishing medium of choice for all large genealogical data projects, whether official, commercial, or volunteer-run. Where public records or public funding are concerned, the web, because of its low cost and universal access, is now the default publishing medium as a matter of principle.
Both the number of internet users and the amount of data available have reached a critical mass, with the result that the genealogist without internet access is in a minority and at a significant disadvantage in terms of access to data and contact with other genealogists.
Of course you can still research your family tree without using the internet — just about — but why would you choose to?
1 Digital images of historical documents are regarded explicitly as copyright-free in some jurisdictions, including the US and Germany. For this reason many such images are available, quite legally, on sites like Wikipedia. [Update: The Intellectual Property Office has now clarified that this also applies in the UK.]
2 <www.visionofbritain.org.uk/footer/doc_text_for_title.jsp?topic=news&seq=12>