Preface
As you walk up Walton Street in Oxford the road bears slightly to the left and a large 19th century building comes into view. It is not an Oxford college but the headquarters of the Oxford University Press. OUP is the largest university press in the world, and can date its origins back to around 1480. In 1983 I arrived at this building carrying a Texas Silent 700 terminal. This used thermal ink printer technology and had two rubber ears on the top into which a telephone handset could be inserted to link the printer into the public telephone network. A decade earlier I had used the same technology to use the first computer-based search services developed by the Lockheed Corporation and System Development Corporation.
I was heading up early attempts by Reed Publishing to develop electronically published products and services, notably airline flight timetables. Reed owned International Computaprint Corporation, based in Fort Washington, PA, which specialised in keyboarding and printing telephone directories. Reed had been working with IBM and the University of Waterloo, Canada on the New Oxford English Dictionary (NOED) project, which was to create a digital version of the Oxford English Dictionary. The proof of concept was to digitise the one of the Supplements to the First Edition, starting at the letter S. The digitisation and indexing had now been completed and I, together with Hans Nickel, the founder and CEO of ICC, were to demonstrate what we had achieved to the NOED project team, led by Tim Benbow and Edmund Weiner. Many of the team of lexicographers were sceptical of the value of the project, and there was a mixture of expectation and disinterest around the table.
The OED seeks not only to provide a definitive definition of a word, but also the origins of when the word was first used, with examples of subsequent use which may have modified the definition. All these examples were contained on around four million slips of paper. With the terminal we set up a connection (at 300 baud) to the computer in Fort Washington. I can still remember the first question, which came from one of the more sceptical lexicographers, who wanted to know how many words in the OED originated in the Times newspaper. Because all the text had been marked up in Standard Generalised MarkUp language (a forerunner of XML) we could identify the source, and not only provide a count but print out (albeit very slowly) all the examples. There was a short period of silence and then these distinguished scholars suddenly realised the potential of information retrieval. They also recognised that it was not going to put them out of a job but enable them to improve the value of the product. Many more queries were undertaken and the session only came to an end when we ran out of supplies of thermal paper.
The NOED project was an enormous success, not only for the OUP but also for Dr Gaston Gonnet and his team at University of Waterloo. This team became the nucleus of Open Text Corporation. IBM used the knowledge gained from the project in the development of its search technology as the OED files provided a rich source of syntax information to help with query development.
For me it was a day of discovery about the power of search to discover new relationships between items of information. I learned three important lessons from this project. The first of these was the value of metadata structure in searching. Because of the way that the individual elements of the entries had been marked up in SGML it was easy to search for words that had first been used by Charles Dickens after his return from his first visit to the United States in 1842. The second lesson was gained in listening to the members of the project team from IBM and the University of Waterloo as they talked about the importance of computers being able to understand the structure of sentences, work that would lead to the development of semantic search technologies. The third lesson was in understanding the impact that search could have on organisational processes and outputs.