CiG Home Page This issue This volume

Genealogical Publishing
on the World-Wide Web (2)

by Peter Christian [e-mail] - [Web]

This article first appeared in Computers in Genealogy, Vol 5, No. 10 (June 1996)
© 1996 Society of Genealogists & the Author.

In the first part of this article, I put the case for the World Wide Web as a medium particularly suited to genealogical publishing. In this second part, I want to look at some of the practical issues that arise once you’ve decided to publish on the Web: how do you create and publish your pages, how do you make sure people know about them, and what does it cost? I would also like to consider at some of the problems and disadvantages that the Web has for genealogical publishing.

Creating Web Pages

There are three pieces of good news about creating Web pages: first, the basics of it are very straightforward (certainly in comparison with desktop publishing); second, you don’t need to spend money on software; third, you can do it on any computer.

The reason for this positive state of affairs is simple: a Web page is nothing more than a plain text file, which can be created in even the most primitive text editor. All the layout and design effects on Web pages are achieved by typing in special "tags", which tell a Web browser how to display a particular piece of text. The available tags and what they do are defined in a "markup language" called HTML ("HyperText Markup Language"), which all Web browsers can make sense of.

For example, Figure 1 shows how one of my pages about the Christians of Pevensey looks when displayed in a text editor:

    <HTML>
    <HEAD>
    <TITLE>The Christians of Pevensey, East Sussex</TITLE>
    </HEAD>
    <BODY>
    <H1>The Christians of Pevensey, East Sussex</H1>
    <HR>
    <H2>The Earliest Records</H2>
    The earliest record of anyone called Christian in Sussex is in
    the Subsidy Roll of 1296 which records that a Peter Christian
    of Ford in West Sussex paid 1s 8d.<P>
    <A HREF="subsidy.htm">References to Christian in the Subsidy Rolls</A>
    .... other stuff omitted ....
    </BODY>
    </HTML>
Figure 1

The items between angled brackets are the "tags". So anything between <H1> and </H1> tags is displayed in the Main Heading style; the <P> indicates a paragraph break; <HR> provides the horizontal rule under the heading; and the <A HREF=...>...</A> creates the link to a page with extracts from the Subsidy Rolls (subsidy.htm is the file containing the page). Everything that is not a tag is text that will appear on the page, and Figure 2 shows what this looks like when viewed with a Web browser.

Figure 2

The tags I have highlighted in bold in Figure 1 are slightly different: they don’t have any obvious effect on the page, but are the essential structural tags which allow the Web browser to recognise a file as a Web page. The content of the page must be placed between <BODY> and </BODY> . In fact the simplest possible Web page requires just these 8 structural tags and some plain text in the BODY section.

    <HTML>
    <HEAD>
	<TITLE></TITLE>
    </HEAD>
    <BODY>
    This is the simplest possible Web page. Well, almost.
    </BODY>
    </HTML>
Figure 3

The number of tags in HTML is fairly small (most essential page design functions can be achieved with a dozen), and it is possible to create the tagged text for a page "by hand", either by using a straightforward text editor or by using a word-processor and saving the page as ASCII text.

However, there are also a number of shareware and freeware HTML editors which simplify the tagging process,[1] and my own pages are mostly created with an add-on for Microsoft Word for Windows, the Internet Assistant, which is distributed free by Microsoft.[] This allows pages to be created in Word for Windows just like any other sort of word-processed document. You use normal word-processing features such as bold and headings, and the program converts this to a form with all the appropriate tags when you save the document. It will also save existing Word documents in HTML format, so it takes almost no time to publish on the Web material you have already typed up.[3]

There are some even more helpful tools starting to become available. Products such as Adobe’s Pagemill (so far only available for the Mac) and CompuServe’s Home Page Wizard (for Windows)[4] make it easy to produce groups of linked pages without having to know anything about HTML.

There is a further advantage of preparing material in HTML: HTML is an international, non-proprietary standard [5] and therefore HTML files are guaranteed to be exchangeable with other people (and other computers) in a way that is not true of files in proprietary word-processor formats. This means that HTML provides a good basis for long term storage of information. In fact it is not even necessary to use the Internet for distributing HTML-tagged texts. Most Web browsers are equally capable of reading HTML-tagged texts on a local hard disk or floppy. So, even if you and those interested in your family history do not have Internet access, distributing your family history as a collection of HTML files on a floppy disk gives you all the advantages of universal exchangeability and of the hypertextual information structure. With the developments currently underway in HTML, it is likely that within a few years there will be a standard way of including tags to indicate, for example, the sources of any information, and HTML could even develop sufficiently to function as a standard format for the exchange of genealogical information. [6]

Publishing

Once you have created your Web pages, which you can do on your own computer, you then need to make them available on the World Wide Web so that other people can access them, and you also need to you publicise them.

In order to publish on the World Wide Web you need to have Internet access via an Internet provider who will allow you to store pages on their server. Many Internet providers already do this, usually giving you a minimum storage area free, and then charging for larger amounts of storage. (I will return to issues of cost below.) Once your pages have been uploaded from your computer to your Internet Provider’s Web server, they are available to anyone else with an Internet connection and a browser.

Assuming you have a main page (usually called a home page), from which all the other pages can be accessed via hypertext links, the only thing potential readers need to know is the Uniform Resource Locator (URL) of that Home Page. The URL is a unique identifier for the location of the page on the Internet, which gives the name of the Internet host on which the page is stored and where it is stored on the host.[7] (Your Internet provider will be able to tell you the URL for your home page.)

Publicity

Of course, you also need to tell people about your page. There are three main approaches. First, you can send an e-mail message to an appropriate mailing list. For example, if your pages concern British genealogy you could send a message to the GENUKI-L mailing list, and everyone who receives that mailing list would know about your pages. Second, post a message to an appropriate Usenet newsgroup such as soc.genealogy.surnames. Since both mailing lists and newsgroups tend to be archived, this means there will be a permanent record of the existence of your pages.

The third possibility is to make use of some of the subject trees or search engines that exist on the Web. These are essentially special Web servers whose purpose is to collect information about what is on the Web and provide a searchable index. If you, as a potential reader, wanted to know about genealogical sites on the Web, you might use a well-known site such as Yahoo to search for "genealogy" or a combination of terms such as "Robinson genealogy Rutland". If you want your own pages to be found, you simply need to fill in the Web form on Yahoo’s pages to submit the details. For some search engines you don’t even need to submit the details of your pages. Alta Vista, for example, is a system which continually browses the Web looking for new and updated pages, and then indexes not just the title and the headings but large amounts of the text. If you don’t want to submit the details of your pages to each search engine individually, you can use a site like Submit-it, which will automatically submit your details to the other search sites (currently 16) when you fill in a form (see Figure 4).

Figure 4. Submit-it’s form for submitting details of your pages to Web search engines.

Finally, many individuals with Web pages relating to genealogy are willing to add a link to other genealogy pages.[8]

It would be nice to think that, before long, Family History societies will have Web pages providing links to the home pages of members. Although, at the moment, ensuring that those who will be interested in your family history find out about it is a rather hit and miss affair, this situation is likely to improve considerably, as better search tools become available and as various groups with common interests become more organised on the Web.

Costs

The final issue is cost: if you are using a commercial Internet provider you may need to pay for having Web pages on their system. It is difficult to be sure how things will develop, but it seems likely that, as the cost of disk space continues to fall and the competition between service providers increases, so providers will include an increasing amount of "free" storage for Web pages in their basic charge. At present, CIX provide 256k of free disk space for each user, Demon offer 0.5Mb and CompuServe 1Mb of free storage for Web pages. To put these allocations in perspective, my Web pages on the Pevensey Christians, covering in detail the first two generations, including transcripts of wills and other documents, and parish register extracts, currently takes up about 400k, of which 300k is a 10-generation family tree of 180 individuals created from a GEDCOM file. One could certainly publish quite a reasonable summary of one’s family history within a megabyte, as long as there weren’t too many images. It does not look, therefore, as if the issue of cost will be a barrier in the long term.

It could even be that genealogical and family history societies will have not only their own Web pages, but their own disk space on a Web server providing free storage for Web versions of their member’s researches, though this may be a few years off.

Outstanding Problems

Of course, every publishing medium has it disadvantages, and it is important to recognise that, for all the advantages, there are certain long-term problems with publishing on the Web, particularly for genealogists.

When you publish your research on paper and it is deposited in a library, you put it in a public place and hand over responsibility for cataloguing it, making it accessible and preserving it to the archivists and librarians. Your Web pages, however, will be stored in your own private area on the disk of your Internet Access Provider. If you change your provider, the location of your Web pages will change, and the old URL will be come invalid. Those who have Internet access via the organisations or institutions where they work will have the same problem if they change jobs. This is not a problem encountered in paper publication. The only comfort one can draw is that this problem is not unique to genealogists and it is an issue that is current being tackled.[9]

Since it is important for the information we collect to be preserved beyond the lifetime of the individual genealogist, there is another problem: when you die, your book remains in the library, but your Internet account will no doubt be closed down, making your Web pages inaccessible. Of course, there are archives on the Internet, but these are generally of public data. If Web publishing is to be a long-term possibility for genealogists, there needs to be a way of "depositing" one’s genealogical Web pages to preserve them.[10]

The Future

Only two years ago the Web was still a novelty, but because of its ease of use, its ability to display text and images, and the ease with which Web pages can be created, it seems certain to be the main form of publishing on the Internet, at least for the foreseeable future. We can expect many of the minor irritations and limitations to be overcome — searching and indexing, for example, are bound to improve. Also, we can expect an increasing number of tools to be available to the average Internet user for preparing pages with increasingly sophisticated layout.

Along with these technical developments, we can also expect experiment and discussion amongst genealogists on the best ways of presenting the results of our researches. It seems likely, therefore, that the Internet will not only change our access to information but also change how we as individuals publish our family history.


Further Information

There is an enormous amount of information available on the Internet about HTML and creating Web pages, ranging from technical specifications to guides for absolute beginners. A good starting place is Yahoo’s page:
http://www.yahoo.com/computers_and_internet/internet/world_wide_web/
which has links to pages on "HTML", "HTML Editors" and "Page Design & Layout". The HTML page has, in turn, links to "Guides & Tutorials".

Of the large number of books on Web authoring, I would recommend Laura Lemay’s Teach Yourself Web Publishing with HTML in a Week, SAMS 1995. The book has its own Web page.

Many of the books on Web publishing come with a CD-ROM containing tools such as HTML editors.


Notes:
  1. For example, TUCOWS has around 40 Windows 3.1 HTML editors available for downloading.
  2. Available from http://www.microsoft.com/pages/deskapps/word/ia/index.html. [Update March 1998: this no longer seems to be available, but recent versions of the entire Microsoft Office suite have HTML export as a standard option in all applications.]
  3. The ability to save a word-processed document as an HTML file should be standard in all new word-processors before long. [Update March 1998: This is now the case]
  4. CompuServe users can download this free of charge: GO HPWIZ.
  5. It is an IETF (Internet Engineering Task Force) standard rather than an ISO standard. However, HTML is based on SGML (Standard Generalised Markup Language), ISO 8879, which defines a standard for markup languages.
  6. At the moment, with a few exceptions, HTML tags only indicate structural features of documents, but there is some pressure towards developing a system which can allow for the markup of content, not least as a way of improving automatic indexing and searching.
  7. For example, in the URL of the page shown in Figures 1 and 2, www.gold.ac.uk is the name of the server, /~peter/genealogy/ is the location of the page on that server, and xtian.htm is the name of the file. The http indicates that this it is a Web page.
  8. For example, Yvon Cyr's page
  9. The current use of a Uniform Resource Locator is like giving the library shelf mark instead of the publication details of a book (URLs don't identify the author of a page, for example) .
  10. One could of course simply copy them to disk and deposit the disk. At worst one could simply print out the HTML files and deposit the printout like any other paper archive.

© 1996 Society of Genealogists & the author. This page was last revised by Peter Christian 26th March 1998.