1 1938 -1948 Punched cards as the genesis of enterprise searching
The choice of the year 1938 is somewhat arbitrary. From the mid-1930s onwards in the USA in particular the use of punched cards to enable collections of information to be sorted was gradually being adopted. Punched cards were initially developed by Hollerith to help the US Census Bureau process the 1890 Census, taking as a model the Jacquard loom. This loom had been invented by Joseph Marie Jacquard in 1804, using punched cards linked together to create complex patterns.
The adoption of punched cards to manage book and report catalogues started to be more widely adopted in the late 1930s but still on a small scale. Moving into the 1940s, and unbeknown to the library community, punched cards were being used on an industrial scale by the code-breaking teams at Bletchley Park (UK) to manage the analysis of decoded messages in order to create operational intelligence about the movement of enemy military units and personnel. Towards the end of WW2 Bletchley Park was processing two million cards a week. The techniques used to manage these cards remained secret until the 1970s. However, the initial outcome was the availability of very robust card tabulators that were on show at the 1948 Royal Society Conference without any indication of their origin.
During WW2, the rapid growth in research in the USA in particular (especially in chemical synthesis) led to a very substantial growth in published research after the war had ended. Chemical Abstracts, the central abstracting publication for the field worldwide, shows 33,672 abstracts published annually in 1945; by 1950 it had reached 59,098; and by 1955, 86,322 (57% and 68% growth rates respectively in the five-year periods). Much of this growth was in organic chemistry, where the development of infra-red spectroscopy in particular led to important advances in determining the structure of organic compounds and then assessing the activity of pharmaceutically active compounds to their chemical structure.
The problem that chemists had faced for many years was that it was possible for a given chemical entity to have a number of text descriptions, leading to a significant amount of confusion.
For example, the chemical formula CSCl4 could be described as
- Perchloromethyl mecapatan
- Thiocarbonyl tetrachloride
- Trichloromethyl sulphur chloride
- Tetrachloromethyl thiol
- Trichloromethyl sulfenyl chloride
To make matters worse there were British, French, German and American naming conventions.
A solution to this problem was developed by the British chemist George Malcolm Dyson (1902-1978) who developed a linear alphanumeric code that was unique to each structure.
The first announcement of what would become known as the Dyson Notation was a letter by Dyson dated 24 June 1944 and published in Nature on 22 July 1944. In the letter he mentions that he would be publishing a book on the systematic notation that he was developing. He stated the objective as establishing a database (though he did not use this term) of codes, each of which represented the structure of a unique chemical entity. The notation was based around determining and then supplementing the longest carbon chain.
The first public presentation by Dyson of his notation for organic compounds was at a meeting of the Royal Institute of Chemistry in 1946. The Institute was so impressed it circulated a copy of his lecture to its members. The first edition of his book A New Notation and Enumeration System for Organic Compounds was published by Longmans in 1947. Then on 3 February 1948 he gave a lecture to the British Society for International Bibliography that was reprinted in the inaugural issue of Aslib Proceedings (Dyson 1949) along with the discussion which followed his presentation. A second edition of his book was published in 1949. The major change between the editions is a final chapter on the potential of punched cards for managing chemical information.
In the development of his notation Dyson had built up a friendship with James Perry, a highly respected chemist working in the Library at MIT. Both could see the potential to manage chemical information using punched cards. This led to Dyson and Perry meeting with Thomas Watson, the President of IBM, though sources differ if this meeting took place in 1948 or 1949. Watson was impressed with their vision and arranged for H.P. (Pete) Luhn to work with them on developing punched card devices for information retrieval.
By now the benefits of using punched cards by major pharmaceutical companies in the USA and the UK as a means of searching through collections of reports was becoming very evident, and the processes they used could certainly be described as enterprise searching. It was the combination of these processes and the advent of computers that could transform the selection process from a mechanical tabulator to a digital machine that formed the basis for the evolution of enterprise search as we see it today.
A full account of the adoption and development of punched card systems (often referred to at the time as ‘mechanical indexing’) and the transition to digital storage and search has been prepared by Robert Williams (Williams 2002) who was in the forefront of this work in the USA and writes from personal experience of the pioneers.
References
Dyson, G.M. (1949). International chemical abstracts and the new notation for organic chemistry, Aslib Proc., 1 (1) 5-21
Williams, R.V. (2002). The Use of Punched Cards in US Libraries and Documentation Centers, 1936-1965. IEEE Annals of the History of Computing, 24(2), 16-33. https://ieeexplore.ieee.org/document/1010067