1
|
Lardos A, Aghaebrahimian A, Koroleva A, Sidorova J, Wolfram E, Anisimova M, Gil M. Computational Literature-based Discovery for Natural Products Research: Current State and Future Prospects. FRONTIERS IN BIOINFORMATICS 2022; 2:827207. [PMID: 36304281 PMCID: PMC9580913 DOI: 10.3389/fbinf.2022.827207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open
Abstract
Literature-based discovery (LBD) mines existing literature in order to generate new hypotheses by finding links between previously disconnected pieces of knowledge. Although automated LBD systems are becoming widespread and indispensable in a wide variety of knowledge domains, little has been done to introduce LBD to the field of natural products research. Despite growing knowledge in the natural product domain, most of the accumulated information is found in detached data pools. LBD can facilitate better contextualization and exploitation of this wealth of data, for example by formulating new hypotheses for natural product research, especially in the context of drug discovery and development. Moreover, automated LBD systems promise to accelerate the currently tedious and expensive process of lead identification, optimization, and development. Focusing on natural product research, we briefly reflect the development of automated LBD and summarize its methods and principal data sources. In a thorough review of published use cases of LBD in the biomedical domain, we highlight the immense potential of this data mining approach for natural product research, especially in context with drug discovery or repurposing, mode of action, as well as drug or substance interactions. Most of the 91 natural product-related discoveries in our sample of reported use cases of LBD were addressed at a computer science audience. Therefore, it is the wider goal of this review to introduce automated LBD to researchers who work with natural products and to facilitate the dialogue between this community and the developers of automated LBD systems.
Collapse
Affiliation(s)
- Andreas Lardos
- Natural Product Chemistry and Phytopharmacy Research Group, Institute of Chemistry and Biotechnology, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
- *Correspondence: Andreas Lardos,
| | - Ahmad Aghaebrahimian
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Anna Koroleva
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Julia Sidorova
- Instituto de Tecnología del Conocimiento, Universidad Complutense de Madrid, Madrid, Spain
| | - Evelyn Wolfram
- Natural Product Chemistry and Phytopharmacy Research Group, Institute of Chemistry and Biotechnology, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Manuel Gil
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Waedenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
2
|
Henry S, McInnes BT. Indirect association and ranking hypotheses for literature based discovery. BMC Bioinformatics 2019; 20:425. [PMID: 31416434 PMCID: PMC6694578 DOI: 10.1186/s12859-019-2989-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 07/09/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and compare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and direct co-occurrence vector cosine. Our proposed indirect association measures extend traditional association measures to quantify indirect rather than direct associations while preserving valuable statistical properties. RESULTS We perform a comparison between several different hypothesis ranking methods for LBD, and compare them against our proposed indirect association measures. We intrinsically evaluate each method's performance using its ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each method's ability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another time-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking term pairs and applying a threshold at each rank. CONCLUSIONS Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of biases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the best suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from.
Collapse
Affiliation(s)
- Sam Henry
- Department of Computer Science, Virginia Commonwealth University, 601 W. Main St. Rm 435, Richmond, 23284 USA
| | - Bridget T. McInnes
- Department of Computer Science, Virginia Commonwealth University, 601 W. Main St. Rm 435, Richmond, 23284 USA
| |
Collapse
|
3
|
Duran‐Frigola M, Fernández‐Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1408] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Miquel Duran‐Frigola
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Adrià Fernández‐Torras
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Martino Bertoni
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Patrick Aloy
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) Barcelona Spain
| |
Collapse
|
4
|
Capuzzi SJ, Thornton TE, Liu K, Baker N, Lam WI, O’Banion CP, Muratov EN, Pozefsky D, Tropsha A. Chemotext: A Publicly Available Web Server for Mining Drug-Target-Disease Relationships in PubMed. J Chem Inf Model 2018; 58:212-218. [PMID: 29300482 PMCID: PMC6063520 DOI: 10.1021/acs.jcim.7b00589] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Elucidation of the mechanistic relationships between drugs, their targets, and diseases is at the core of modern drug discovery research. Thousands of studies relevant to the drug-target-disease (DTD) triangle have been published and annotated in the Medline/PubMed database. Mining this database affords rapid identification of all published studies that confirm connections between vertices of this triangle or enable new inferences of such connections. To this end, we describe the development of Chemotext, a publicly available Web server that mines the entire compendium of published literature in PubMed annotated by Medline Subject Heading (MeSH) terms. The goal of Chemotext is to identify all known DTD relationships and infer missing links between vertices of the DTD triangle. As a proof-of-concept, we show that Chemotext could be instrumental in generating new drug repurposing hypotheses or annotating clinical outcomes pathways for known drugs. The Chemotext Web server is freely available at http://chemotext.mml.unc.edu .
Collapse
Affiliation(s)
- Stephen J. Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Thomas E. Thornton
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kammy Liu
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Nancy Baker
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Wai In Lam
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Colin P. O’Banion
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Eugene N. Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Chemical Technology, Odessa National Polytechnic University, Odessa, 65000, Ukraine
| | - Diane Pozefsky
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
5
|
Henry S, McInnes BT. Literature Based Discovery: Models, methods, and trends. J Biomed Inform 2017; 74:20-32. [PMID: 28838802 DOI: 10.1016/j.jbi.2017.08.011] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/21/2017] [Accepted: 08/20/2017] [Indexed: 01/25/2023]
Abstract
OBJECTIVES This paper provides an introduction and overview of literature based discovery (LBD) in the biomedical domain. It introduces the reader to modern and historical LBD models, key system components, evaluation methodologies, and current trends. After completion, the reader will be familiar with the challenges and methodologies of LBD. The reader will be capable of distinguishing between recent LBD systems and publications, and be capable of designing an LBD system for a specific application. TARGET AUDIENCE From biomedical researchers curious about LBD, to someone looking to design an LBD system, to an LBD expert trying to catch up on trends in the field. The reader need not be familiar with LBD, but knowledge of biomedical text processing tools is helpful. SCOPE This paper describes a unifying framework for LBD systems. Within this framework, different models and methods are presented to both distinguish and show overlap between systems. Topics include term and document representation, system components, and an overview of models including co-occurrence models, semantic models, and distributional models. Other topics include uninformative term filtering, term ranking, results display, system evaluation, an overview of the application areas of drug development, drug repurposing, and adverse drug event prediction, and challenges and future directions. A timeline showing contributions to LBD, and a table summarizing the works of several authors is provided. Topics are presented from a high level perspective. References are given if more detailed analysis is required.
Collapse
Affiliation(s)
- Sam Henry
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA.
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA
| |
Collapse
|