In the 1970s products emerged which are clearly the antecedents of what we would regard as enterprise search applications. From here on in the focus on academic research in this history will be significantly less, not because less research is being carried out but because it is well documented in a range of books. In particular each chapter of Introduction to Information Retrieval by Manning, Raghavan and Schutze (Manning, Raghavan and Schutze 2008) has an annotated bibliography and can be downloaded as a pdf. However, there are three academics that deserve mention. The first of these is Gerard Salton. He developed the SMART software application as a ‘test bed’ at Harvard University and took it with him to Cornell University where he stayed for the rest of his career. Salton developed the cosine vector space model (VSM) to compare the relevance of a group of search results. The evolution of this model took place over a number of years and David Durbin has tried to unravel the way in which it developed, providing a good bibliography.
Karen Spärck Jones worked in a number of departments at Cambridge University from the time of her PhD in 1964. A profile of her work whilst at Cambridge links to papers describing her research, all of which has had a major impact on information retrieval. Her overview of information retrieval research (Spärck Jones 2006) is essential reading. The third person is Stephen Robertson, a research colleague of Karen Spärck Jones, who went on to work at the Microsoft Research Laboratories in Cambridge. His work has extended from the mid-1970s until quite recently, the scope of which is indicated by his list of research papers. Stephen is especially noted for his development of the BM25 ranking model, which built on the work of Karen Spärck-Jones on the term frequency.inverse document frequency model.
If you want to choose a date to mark the beginning of commercial enterprise search then 1970 is that date. It marked the launch by IBM of STAIRS (Storage and Information Retrieval System), an evolution of the AQUARIUS software that IBM developed to cope with the documentation for the defence of an anti-trust suit in the USA that started in 1969. STAIRS was specifically designed for multi-user time-share applications (the typical enterprise scenario) and remained on the IBM product list until the early 1990s. Jumping out of any sort of chronology in 1985 STAIRS was subject to a very thorough evaluation which raised doubts about the effectiveness of full text indexing. A review article by David Blair (Blair 1996), is a must-read for anyone with an interest in enterprise search and evaluation as it looks back at the 1985 evaluation with the benefit of substantial hindsight, and benefits from the fact that although Blair was one of the authors of the original review it comes across as an independent and unbiased assessment.
In the Conclusions section, Blair states:
“We have shown that the system did not work well in the environment in which it was tested and that there are theoretical reasons why full-text retrieval systems applied to large databases are unlikely to perform well in any retrieval environment.”
By the mid-1970s mini-computers were being adopted very widely, and many organisations and companies saw this as an opportunity to develop text/document retrieval software products for these mini-computers. These included BASIS (Battelle Institute) and INQUIRE (Infodata).
So far this history has been dominated by developments in the USA but the mini-computer market stimulated software development in the UK, including ASSASSIN (ICI), STATUS (Atomic Weapons Research Establishment), CAIRS (Leatherhead Food Research Association) and DECO (Unilever). (I had a role on the development team of DECO from 1979-1981 which gave me a very valuable insight into the programming of search applications.) These and other applications all emerged towards the end of the 1970s. An interesting comparative review of them by John Ashford (a highly respected consultant) was published in 1984 (ashford 1984). These applications all evolved from specific organisational requirements which were then productised for use more widely, demonstrating that you did not need to be a large academic institution or software company to develop retrieval software. These systems were accessed through networked terminals; the IBM PC was not launched until 1981. The scale of the development of these products can best be assessed from A Technical Index of Interactive Information Systems, published as Technical Note 819 from the National Bureau of Standards in 1974. This report provides brief details of almost 50 software products.
The first Association for Computing Machinery (ACM) conference on information retrieval took place in 1971. The 1st Annual International SIGIR Conference on Information Storage and Retrieval took place in 1978. In 1979 the Institute of Information Scientists organised a two-day conference held at the Royal Society, London, entitled Computer Packages for Information Storage and Retrieval. The event attracted over 200 delegates.
As a footnote to this section on the 1970s it is important to highlight that the first assessment of the potential role of artificial intelligence in information retrieval was published in 1976 (Smith 1976). Just over a decade later Verity, the prototype for all enterprise search applications, emerged from a company specialising in AI development.
Ashford, J. (1984). Information storage and retrieval systems on mainframes and minicomputers: a comparison of text retrieval packages available in the UK. Program: electronic library and information systems, 18(2).
Blair, D. (1996). STAIRS Redux: thoughts on the STAIRS evaluation, ten years after. Journal of the American Society for Information Science 47(1), 4-22. https://yunus.hacettepe.edu.tr/~tonta/courses/spring2008/bby703/Blair.pdf
Manning, C.D., Raghaven, P. & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Smith, L. (1976). Artificial intelligence in information retrieval systems. Information Processing and Management, 12(3), 189-222.
Spärck Jones, K. (2006). Information retrieval and digital libraries: lessons of research. Proceedings of the International Workshop on Research Issues in Digital Libraries (IWRIDL 2006).