1. Chemical patent information systems
- Author
-
John M. Barnard and Geoffrey M. Downs
- Subjects
Information retrieval ,Computer science ,business.industry ,Chemical nomenclature ,Biochemistry ,Data science ,Computer Science Applications ,Automated data ,Computational Mathematics ,Software ,Fragment (logic) ,Cheminformatics ,Similarity (psychology) ,Materials Chemistry ,Information system ,Physical and Theoretical Chemistry ,business - Abstract
The chemical structure information in patents remains difficult to access, partly because it is frequently expressed in the form of Markush structures, which can encompass enormous numbers of individual compounds. Early search systems were based on chemical ‘fragment codes’ that have still not been entirely superseded by the ‘topological’ systems developed during the 1980s. There are a number of databases of specific patented structures, which can be searched using standard substructure search software, and the more recently developed ones use automated data mining techniques to extract chemical nomenclature from patent text and translate it into searchable representations. Although some work has been done on automatic reconstruction of searchable Markush structures from patent text, this has proved to be considerably more refractory. A number of alternative approaches to chemical patent searching are being explored, some involving similarity and nearest-neighbor searching concepts, and some based on both existing curated databases and direct utilization of full-text patents. In-house systems, which facilitate integration with other cheminformatics systems, are also under development. These new systems may allow improvements in retrieval performance, especially with regard to search precision. © 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 727-741 DOI: 10.1002/wcms.41
- Published
- 2011
- Full Text
- View/download PDF