Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Southan C, Varkonyi P, Boppana K, Jagarlapudi SA, Muresan S. Tracking 20 years of compound-to-target output from literature and patents. PLoS One 2013;8:e77142. [PMID: 24204758 DOI: 10.1371/journal.pone.0077142] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Accepted: 08/28/2013] [Indexed: 12/19/2022] Open

For:	Southan C, Varkonyi P, Boppana K, Jagarlapudi SA, Muresan S. Tracking 20 years of compound-to-target output from literature and patents. PLoS One 2013;8:e77142. [PMID: 24204758 DOI: 10.1371/journal.pone.0077142] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Accepted: 08/28/2013] [Indexed: 12/19/2022] Open

Number

Cited by Other Article(s)

Morin L, Weber V, Meijer GI, Yu F, Staar PWJ. PatCID: an open-access dataset of chemical structures in patent documents. Nat Commun 2024;15:6532. [PMID: 39095357 PMCID: PMC11297020 DOI: 10.1038/s41467-024-50779-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open

Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020;16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open

Abstract

Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC50) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.

Collapse

Yen YC, Kammeyer AM, Jensen KC, Tirlangi J, Ghosh AK, Mesecar AD. Development of an Efficient Enzyme Production and Structure-Based Discovery Platform for BACE1 Inhibitors. Biochemistry 2019;58:4424-4435. [PMID: 31549827 PMCID: PMC7284891 DOI: 10.1021/acs.biochem.9b00714] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Southan C. Caveat Usor: Assessing Differences between Major Chemistry Databases. ChemMedChem 2018;13:470-481. [PMID: 29451740 PMCID: PMC5900829 DOI: 10.1002/cmdc.201700724] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/07/2018] [Indexed: 12/24/2022]

Ashenden SK, Kogej T, Engkvist O, Bender A. Innovation in Small-Molecule-Druggable Chemical Space: Where are the Initial Modulators of New Targets Published? J Chem Inf Model 2017;57:2741-2753. [PMID: 29068231 DOI: 10.1021/acs.jcim.7b00295] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Prioritizing multiple therapeutic targets in parallel using automated DNA-encoded library screening. Nat Commun 2017;8:16081. [PMID: 28714473 PMCID: PMC5520047 DOI: 10.1038/ncomms16081] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Accepted: 05/24/2017] [Indexed: 12/18/2022] Open

Schneider N, Lowe DM, Sayle RA, Tarselli MA, Landrum GA. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J Med Chem 2016;59:4385-402. [DOI: 10.1021/acs.jmedchem.6b00153] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Reichman M, Simpson PB. Open innovation in early drug discovery: roadmaps and roadblocks. Drug Discov Today 2015;21:779-88. [PMID: 26743597 DOI: 10.1016/j.drudis.2015.12.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Revised: 11/26/2015] [Accepted: 12/21/2015] [Indexed: 01/16/2023]

Papadatos G, Davies M, Dedman N, Chambers J, Gaulton A, Siddle J, Koks R, Irvine SA, Pettersson J, Goncharoff N, Hersey A, Overington JP. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res 2015;44:D1220-8. [PMID: 26582922 PMCID: PMC4702887 DOI: 10.1093/nar/gkv1253] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/01/2015] [Indexed: 11/13/2022] Open

Senger S, Bartek L, Papadatos G, Gaulton A. Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J Cheminform 2015;7:49. [PMID: 26457120 PMCID: PMC4594083 DOI: 10.1186/s13321-015-0097-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 09/29/2015] [Indexed: 11/28/2022] Open

Abstract

Background

First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases.

Results

When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys.

Conclusions

In our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant ‘gold standards’ is required.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-015-0097-z) contains supplementary material, which is available to authorized users.

Collapse

Activity, assay and target data curation and quality in the ChEMBL database. J Comput Aided Mol Des 2015. [PMID: 26201396 PMCID: PMC4607714 DOI: 10.1007/s10822-015-9860-5] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

BHARDWAJ GUNJAN, AGRAWAL ANSHIT, TYAGI RUPESH. COMBINATION THERAPIES OR STANDALONE INTERVENTIONS? INNOVATION OPTIONS FOR PHARMACEUTICAL FIRMS FIGHTING CANCER. INTERNATIONAL JOURNAL OF INNOVATION MANAGEMENT 2015. [DOI: 10.1142/s1363919615400034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Caldwell GW. In silico tools used for compound selection during target-based drug discovery and development. Expert Opin Drug Discov 2015;10:901-23. [DOI: 10.1517/17460441.2015.1043885] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Expanding opportunities for mining bioactive chemistry from patents. DRUG DISCOVERY TODAY. TECHNOLOGIES 2015;14:3-9. [PMID: 26194581 PMCID: PMC4548146 DOI: 10.1016/j.ddtec.2014.12.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 11/12/2014] [Accepted: 12/05/2014] [Indexed: 11/24/2022]

Wild C, Cunningham KA, Zhou J. Allosteric Modulation of G Protein-Coupled Receptors: An Emerging Approach of Drug Discovery. AUSTIN JOURNAL OF PHARMACOLOGY AND THERAPEUTICS 2014;2:1101. [PMID: 27148592 PMCID: PMC4852709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]