A Note on Reaction Database Searching


The number of new organic reaction and especially novel reaction applications published on a yearly basis is huge - in fact, far too large for the average chemist to deal with by memory alone. Traditionally, this avalanche of information has been combatted with printed collective series summarizing the primary literature (e.g., Theilheimer's Methods of Synthetic Chemistry), ChemInform, Beilsteins Handbuch der organischen Chemie). While such works may contain extensive text-based indices, they offer limited or no facilities for structure searching.

Reaction database systems, or reaction retrieval systems as they are often called, enable the chemist to access the vast synthetic literature in a structure- and reaction-based manner. Reaction retrieval systems are in fact computerized libraries of reactions. They offer the chemist a selection of reactions actually performed in the laboratory and published in the primary or secondary literature. Works from the secondary literature are generally covered completely (a good example is Theilheimer), but the primary literature is abstracted selectively, mostly by expert synthetic chemists. The reaction databases do not attempt to cover all reactions ever published, since in that case almost any reaction query would lead to an unmanageably large number of answers for a specific query -- a number of 20 to 40 answers is generally considered ideal. Instead, an attempt is made to select representative, as well as special, reactions and reagents. As an additional bonus for this restricted coverage of the synthetic literature, search times are kept reasonable for interactive use (at most a couple of minutes).

Early attempts at computer-assisted reaction retrieval in the 1960s offered only text-oriented indexing. Thus, query options in these systems were authors, keywords, compound name, and sometimes compound formula. Developments in the mid-1980s are reflected in the power of contemporary systems, which is derived from two techniques: substructure searching and atom-to-atom mapping.

Substructure searching.
A chemist can specify a crucial part of his target molecule (a substructure) to the system, which then will retrieve all reactions which contain that same fragment in their products. This search facility offers a huge advantage over traditional literature search methods, because it is directly linked to the chemist's natural language. Despite this advantage, substructure searching alone still poses serious limitations to the type of structural reaction queries that can be constructed.
Atom-to-atom mapping.
The use of atom-to-atom mapping considerably enhances reaction query construction. This technique enables the chemist to specify that a particular substructure in the product is derived from a corresponding substructure in the reactant(s).

Atom-to-atom mapping. Note that the two fragments do not necessarily belong to different molecules.

In this example, the reaction query is a selective ester reduction in the presence of an aldehyde. The use of atom-to-atom mapping in this query (depicted with atom labels) ensures that the primary alcohol derives from the ester carbonyl. Without the mapping, the system would retrieve many false hits, such as an ester hydrolysis (with a primary R group and a non-reacting aldehyde).

A recent development is the availability of databases from different vendors. An additional trend is the production of databases devoted to specialized fields such as protective group chemistry, biomolecule-catalyzed reactions, and solid-phase reactions. This very selective coverage considerably enhances the information content of the databases.


Return to the ISIS Tutorial Pages
Return to the other SAMSAM Tutorials
Return to the SAMSAM Home Page


Last updated on September 25, 1996.