1
|
Alharbey R, Kim JI, Daud A, Song M, Alshdadi AA, Hayat MK. Indexing important drugs from medical literature. Scientometrics 2022. [DOI: 10.1007/s11192-022-04340-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
2
|
Sun G, Ahn-Horst TA, Covert MW. The E. coli Whole-Cell Modeling Project. EcoSal Plus 2021; 9:eESP00012020. [PMID: 34242084 PMCID: PMC11163835 DOI: 10.1128/ecosalplus.esp-0001-2020] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 05/26/2021] [Indexed: 12/22/2022]
Abstract
The Escherichia coli whole-cell modeling project seeks to create the most detailed computational model of an E. coli cell in order to better understand and predict the behavior of this model organism. Details about the approach, framework, and current version of the model are discussed. Currently, the model includes the functions of 43% of characterized genes, with ongoing efforts to include additional data and mechanisms. As additional information is incorporated in the model, its utility and predictive power will continue to increase, which means that discovery efforts can be accelerated by community involvement in the generation and inclusion of data. This project will be an invaluable resource to the E. coli community that could be used to verify expected physiological behavior, to predict new outcomes and testable hypotheses for more efficient experimental design iterations, and to evaluate heterogeneous data sets in the context of each other through deep curation.
Collapse
Affiliation(s)
- Gwanggyu Sun
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Travis A. Ahn-Horst
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | - Markus W. Covert
- Department of Bioengineering, Stanford University, Stanford, California, USA
| |
Collapse
|
3
|
Yehya A, Altaany Z. A Decade of Pharmacogenetic Studies in Jordan: A Systemic Review. THE PHARMACOGENOMICS JOURNAL 2021; 21:543-550. [PMID: 33850297 DOI: 10.1038/s41397-021-00236-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 02/25/2021] [Accepted: 03/23/2021] [Indexed: 02/02/2023]
Abstract
The aim of this study was to perform a systematic overview of the pharmacogenetic studies conducted in Jordan. A structured search of Medline was conducted for articles over the last decade (January 2010-July 2020). Studies were classified by design, sample size, drug-gene combination, and the significance of the results. Thirty-two studies met the criteria for review. Most pharmacogenomic studies had a case-only design (n = 23). Only five studies included >500 participants. The total number of genetic variants in all studies was one hundred fifteen, which were found in forty genes, including dynamic (n = 27), and kinetic (n = 9) genes. The most commonly studied drugs were within the hematology and cardiology therapeutic areas and included statins, warfarin, aspirin, and clopidogrel. Most studies (n = 18) reported results with mixed p values [<0.05 and >0.05]. Pharmacogenomic research in Jordan is still in its infancy and is limited mainly to replication attempts. The need for standardization is imperative, especially in developing countries with scarce funding resources.
Collapse
Affiliation(s)
- Alaa Yehya
- PhD. Pharmacology - Department of Clinical Pharmacy and Pharmacy Practice, Faculty of Pharmacy, Yarmouk University, Irbid, Jordan.
| | - Zaid Altaany
- PhD. Biotechnology - Department of Basic Medical Sciences, Faculty of Medicine, Yarmouk University, Irbid, Jordan
| |
Collapse
|
4
|
Gong L, Whirl-Carrillo M, Klein TE. PharmGKB, an Integrated Resource of Pharmacogenomic Knowledge. Curr Protoc 2021; 1:e226. [PMID: 34387941 DOI: 10.1002/cpz1.226] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The Pharmacogenomics Knowledgebase (PharmGKB) is an integrated online knowledge resource for the understanding of how genetic variation contributes to variation in drug response. Our focus includes not only pharmacogenomic information useful for clinical implementation (e.g., drug dosing guidelines and annotated drug labels), but also information to catalyze scientific research and drug discovery (e.g., variant-drug annotations and drug-centered pathways). As of April 2021, the annotated content of PharmGKB spans 715 drugs, 1761 genes, 227 diseases, 165 clinical guidelines, and 784 drug labels. We have manually curated data from more than 9000 published papers to generate the content of PharmGKB. Recently, we have also implemented an automated natural language processing (NLP) tool to broaden our coverage of the pharmacogenomic literature. This article contains a basic protocol describing how to navigate the PharmGKB website to retrieve information on how genes and genetic variations affect drug efficacy and toxicity. It also includes a protocol on how to use PharmGKB to facilitate interpretation of findings for a pharmacogenomic variant genotype or metabolizer phenotype. PharmGKB is freely available at http://www.pharmgkb.org. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Navigating the homepage of PharmGKB and searching by drug Basic Protocol 2: Using PharmGKB to facilitate interpretation of pharmacogenomic variant genotypes or metabolizer phenotypes.
Collapse
Affiliation(s)
- Li Gong
- Departments of Biomedical Data Science and Medicine (BMIR), Stanford University, Stanford, California
| | - Michelle Whirl-Carrillo
- Departments of Biomedical Data Science and Medicine (BMIR), Stanford University, Stanford, California
| | - Teri E Klein
- Departments of Biomedical Data Science and Medicine (BMIR), Stanford University, Stanford, California
| |
Collapse
|
5
|
Guin D, Rani J, Singh P, Grover S, Bora S, Talwar P, Karthikeyan M, Satyamoorthy K, Adithan C, Ramachandran S, Saso L, Hasija Y, Kukreti R. Global Text Mining and Development of Pharmacogenomic Knowledge Resource for Precision Medicine. Front Pharmacol 2019; 10:839. [PMID: 31447668 PMCID: PMC6692532 DOI: 10.3389/fphar.2019.00839] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/01/2019] [Indexed: 11/20/2022] Open
Abstract
Understanding patients' genomic variations and their effect in protecting or predisposing them to drug response phenotypes is important for providing personalized healthcare. Several studies have manually curated such genotype-phenotype relationships into organized databases from clinical trial data or published literature. However, there are no text mining tools available to extract high-accuracy information from such existing knowledge. In this work, we used a semiautomated text mining approach to retrieve a complete pharmacogenomic (PGx) resource integrating disease-drug-gene-polymorphism relationships to derive a global perspective for ease in therapeutic approaches. We used an R package, pubmed.mineR, to automatically retrieve PGx-related literature. We identified 1,753 disease types, and 666 drugs, associated with 4,132 genes and 33,942 polymorphisms collated from 180,088 publications. With further manual curation, we obtained a total of 2,304 PGx relationships. We evaluated our approach by performance (precision = 0.806) with benchmark datasets like Pharmacogenomic Knowledgebase (PharmGKB) (0.904), Online Mendelian Inheritance in Man (OMIM) (0.600), and The Comparative Toxicogenomics Database (CTD) (0.729). We validated our study by comparing our results with 362 commercially used the US- Food and drug administration (FDA)-approved drug labeling biomarkers. Of the 2,304 PGx relationships identified, 127 belonged to the FDA list of 362 approved pharmacogenomic markers, indicating that our semiautomated text mining approach may reveal significant PGx information with markers for drug response prediction. In addition, it is a scalable and state-of-art approach in curation for PGx clinical utility.
Collapse
Affiliation(s)
- Debleena Guin
- Genomics and Molecular Medicine Unit, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
- Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Jyoti Rani
- Department of Biomedical Sciences, Acharya Narayan Dev College, University of Delhi, New Delhi, India
- G N Ramachandran Knowledge Centre, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
| | - Priyanka Singh
- Genomics and Molecular Medicine Unit, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, India
| | - Sandeep Grover
- Institute of Medical Biometry and Statistics, University of Lübeck University Medical Center Schleswig-Holstein - Campus Lübeck, Lübeck, Germany
| | - Shivangi Bora
- Genomics and Molecular Medicine Unit, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
- Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Puneet Talwar
- Institute of Human Behaviour and Allied Sciences, Delhi, India
| | | | - K Satyamoorthy
- School of Life Sciences, Manipal University, Manipal, India
| | - C Adithan
- Central Inter-Disciplinary Research Facility (CIDRF), Pondicherry, India
| | - S Ramachandran
- G N Ramachandran Knowledge Centre, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, India
| | - Luciano Saso
- Department of Physiology and Pharmacology “Vittorio Erspamer,” Sapienza University of Rome, Rome, Italy
| | - Yasha Hasija
- Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Ritushree Kukreti
- Genomics and Molecular Medicine Unit, Council of Scientific and Industrial Research (CSIR)—Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
- Academy of Scientific & Innovative Research (AcSIR), New Delhi, India
| |
Collapse
|
6
|
Monnin P, Legrand J, Husson G, Ringot P, Tchechmedjiev A, Jonquet C, Napoli A, Coulet A. PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics 2019; 20:139. [PMID: 30999867 PMCID: PMC6471679 DOI: 10.1186/s12859-019-2693-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant – drug – adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks “clinical counterpart” or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources. Results In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins. Conclusions The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources. Electronic supplementary material The online version of this article (10.1186/s12859-019-2693-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pierre Monnin
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.
| | - Joël Legrand
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | - Graziella Husson
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | - Patrice Ringot
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | | | - Clément Jonquet
- LIRMM, Université de Montpellier, CNRS, Montpellier, 34095, France.,Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, 94305, California, USA
| | - Amedeo Napoli
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | - Adrien Coulet
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.,Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, 94305, California, USA
| |
Collapse
|
7
|
Fujiwara T, Yamamoto Y, Kim JD, Buske O, Takagi T. PubCaseFinder: A Case-Report-Based, Phenotype-Driven Differential-Diagnosis System for Rare Diseases. Am J Hum Genet 2018; 103:389-399. [PMID: 30173820 DOI: 10.1016/j.ajhg.2018.08.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 08/01/2018] [Indexed: 01/29/2023] Open
Abstract
Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.
Collapse
|
8
|
Vasilevich A, de Boer J. Robot-scientists will lead tomorrow's biomaterials discovery. CURRENT OPINION IN BIOMEDICAL ENGINEERING 2018. [DOI: 10.1016/j.cobme.2018.03.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
9
|
Chen L, Friedman C, Finkelstein J. Automated Metabolic Phenotyping of Cytochrome Polymorphisms Using PubMed Abstract Mining. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:535-544. [PMID: 29854118 PMCID: PMC5977704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Pharmacogenetics-related publications, which are increasing rapidly, provide important new pharmacogenetics knowledge. Automated approaches to extract information of new alleles and to identify their impact on metabolic phenotypes from publications are urgently needed to facilitate personalized medicine and improve clinical outcomes. Cytochrome polymorphisms, responsible for a wide variation of drug pharmacodynamics, individual efficacy and adverse effects, have significant potential for optimizing drug therapy. A few studies have addressed specialized efforts to automatically extract cytochrome polymorphisms and their characterizations regarding metabolic phenotypes from the literature. In this paper, we present a novel rule-based text-mining system to extract metabolic phenotypes of polymorphisms from PubMed abstracts with a focus on cytochrome P450. This system is promising as it achieved a precision of 85.71% in a preliminary proof-of-concept evaluation and is expected to automatically provide up-to-date metabolic information for cytochrome polymorphisms, which is critical to advance personalized medicine and improve clinical care.
Collapse
Affiliation(s)
- Luoxin Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| | - Joseph Finkelstein
- Department of Biomedical Informatics, Columbia University, New York, NY, US
| |
Collapse
|
10
|
Sharma V, Law W, Balick MJ, Sarkar IN. Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1537-1546. [PMID: 29854223 PMCID: PMC5977595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement.
Collapse
Affiliation(s)
| | - Wayne Law
- New York Botanical Garden, Bronx, NY
- College of Arts and Sciences, Lynn University, Boca Raton, FL
| | | | | |
Collapse
|
11
|
Mahmood ASMA, Rao S, McGarvey P, Wu C, Madhavan S, Vijay-Shanker K. eGARD: Extracting associations between genomic anomalies and drug responses from text. PLoS One 2017; 12:e0189663. [PMID: 29261751 PMCID: PMC5738129 DOI: 10.1371/journal.pone.0189663] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 11/29/2017] [Indexed: 12/25/2022] Open
Abstract
Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for 'best-fit' therapies and readily generate hypotheses for new clinical trials.
Collapse
Affiliation(s)
- A. S. M. Ashique Mahmood
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Shruti Rao
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
| | - Peter McGarvey
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
- Protein Information Resource, Georgetown University Medical Center, Washington D.C, United States of America
| | - Cathy Wu
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
- Protein Information Resource, Georgetown University Medical Center, Washington D.C, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Subha Madhavan
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington D.C, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
12
|
Zwierzyna M, Overington JP. Classification and analysis of a large collection of in vivo bioassay descriptions. PLoS Comput Biol 2017; 13:e1005641. [PMID: 28678787 PMCID: PMC5517062 DOI: 10.1371/journal.pcbi.1005641] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 07/19/2017] [Accepted: 06/21/2017] [Indexed: 12/17/2022] Open
Abstract
Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced by in vivo bioassays. This is partly due to their complexity and lack of accepted reporting standards-publicly available animal screening data are only accessible in unstructured free-text format, which hinders computational analysis. In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening-related assays in rats and mice. We retrieve our dataset from ChEMBL-an open-source literature-based database focused on preclinical drug discovery. We show that in vivo assay descriptions can be effectively mined for relevant information, including experimental factors that might influence the outcome and reproducibility of animal research: genetic strains, experimental treatments, and phenotypic readouts used in the experiments. We further systematize extracted information using unsupervised language model (Word2Vec), which learns semantic similarities between terms and phrases, allowing identification of related animal models and classification of entire assay descriptions. In addition, we show that random forest models trained on features generated by Word2Vec can predict the class of drugs tested in different in vivo assays with high accuracy. Finally, we combine information mined from text with curated annotations stored in ChEMBL to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas.
Collapse
Affiliation(s)
- Magdalena Zwierzyna
- BenevolentAI, London, United Kingdom
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - John P. Overington
- BenevolentAI, London, United Kingdom
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| |
Collapse
|
13
|
Dalleau K, Marzougui Y, Da Silva S, Ringot P, Ndiaye NC, Coulet A. Learning from biomedical linked data to suggest valid pharmacogenes. J Biomed Semantics 2017; 8:16. [PMID: 28427468 PMCID: PMC5399403 DOI: 10.1186/s13326-017-0125-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/29/2017] [Indexed: 12/15/2022] Open
Abstract
Background A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. Method We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene–drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods –random forest and graph kernel–, which results are compared in this article. Results We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0125-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kevin Dalleau
- LORIA (CNRS, Inria Nancy-Grand Est, University of Lorraine), Campus Scientifique, Nancy, France
| | - Yassine Marzougui
- LORIA (CNRS, Inria Nancy-Grand Est, University of Lorraine), Campus Scientifique, Nancy, France.,Ecole nationale supérieure des mines de Nancy, Campus Artem, Nancy, France
| | - Sébastien Da Silva
- LORIA (CNRS, Inria Nancy-Grand Est, University of Lorraine), Campus Scientifique, Nancy, France
| | - Patrice Ringot
- LORIA (CNRS, Inria Nancy-Grand Est, University of Lorraine), Campus Scientifique, Nancy, France
| | - Ndeye Coumba Ndiaye
- UMR U1122 IGE-PCV (INSERM, University of Lorraine), 30 Rue Lionnois, Nancy, France
| | - Adrien Coulet
- LORIA (CNRS, Inria Nancy-Grand Est, University of Lorraine), Campus Scientifique, Nancy, France.
| |
Collapse
|
14
|
Huang Y, Wang L, Zan ALS. ARN: analysis and prediction by adipogenic professional database. BMC SYSTEMS BIOLOGY 2016; 10:57. [PMID: 27503118 PMCID: PMC4977645 DOI: 10.1186/s12918-016-0321-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 07/14/2016] [Indexed: 12/23/2022]
Abstract
Adipogenesis is the process of cell differentiation by which mesenchymal stem cells become adipocytes. Extensive research is ongoing to identify genes, their protein products, and microRNAs that correlate with fat cell development. The existing databases have focused on certain types of regulatory factors and interactions. However, there is no relationship between the results of the experimental studies on adipogenesis and these databases because of the lack of an information center. This information fragmentation hampers the identification of key regulatory genes and pathways. Thus, it is necessary to provide an information center that is quickly and easily accessible to researchers in this field. We selected and integrated data from eight external databases based on the results of text-mining, and constructed a publicly available database and web interface (URL: http://210.27.80.93/arn/ ), which contained 30873 records related to adipogenic differentiation. Then, we designed an online analysis tool to analyze the experimental data or form a scientific hypothesis about adipogenesis through Swanson's literature-based discovery process. Furthermore, we calculated the "Impact Factor" ("IF") value that reflects the importance of each node by counting the numbers of relation records, expression records, and prediction records for each node. This platform can support ongoing adipogenesis research and contribute to the discovery of key regulatory genes and pathways.
Collapse
Affiliation(s)
- Yan Huang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Li Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - And Lin-Sen Zan
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
15
|
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform 2015; 17:33-42. [PMID: 26420781 PMCID: PMC4719073 DOI: 10.1093/bib/bbv087] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.
Collapse
|
16
|
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:910423. [PMID: 26347797 PMCID: PMC4546954 DOI: 10.1155/2015/910423] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 06/17/2015] [Accepted: 06/29/2015] [Indexed: 11/27/2022]
Abstract
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
Collapse
|
17
|
Ravikumar KE, Wagholikar KB, Li D, Kocher JP, Liu H. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature. BMC Bioinformatics 2015; 16:185. [PMID: 26047637 PMCID: PMC4457984 DOI: 10.1186/s12859-015-0609-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 04/30/2015] [Indexed: 12/03/2022] Open
Abstract
Background Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains ‘locked’ in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. Results We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3 % for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10 % in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5 %. Conclusions Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
Collapse
Affiliation(s)
- Komandur Elayavilli Ravikumar
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First St SW, Harvick 3rd, Rochester, MN, 55905, USA.
| | - Kavishwar B Wagholikar
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First St SW, Harvick 3rd, Rochester, MN, 55905, USA.
| | - Dingcheng Li
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First St SW, Harvick 3rd, Rochester, MN, 55905, USA.
| | - Jean-Pierre Kocher
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First St SW, Harvick 3rd, Rochester, MN, 55905, USA.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First St SW, Harvick 3rd, Rochester, MN, 55905, USA.
| |
Collapse
|
18
|
Segura-Bedmar I, Martínez P, Herrero-Zazo M. Lessons learnt from the DDIExtraction-2013 Shared Task. J Biomed Inform 2014; 51:152-64. [PMID: 24858490 DOI: 10.1016/j.jbi.2014.05.007] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Revised: 05/05/2014] [Accepted: 05/08/2014] [Indexed: 10/25/2022]
Abstract
The DDIExtraction Shared Task 2013 is the second edition of the DDIExtraction Shared Task series, a community-wide effort to promote the implementation and comparative assessment of natural language processing (NLP) techniques in the field of the pharmacovigilance domain, in particular, to address the extraction of drug-drug interactions (DDI) from biomedical texts. This edition has been the first attempt to compare the performance of Information Extraction (IE) techniques specific for each of the basic steps of the DDI extraction pipeline. To attain this aim, two main tasks were proposed: the recognition and classification of pharmacological substances and the detection and classification of drug-drug interactions. DDIExtraction 2013 was held from January to June 2013 and attracted wide attention with a total of 14 teams (6 of the teams participated in the drug name recognition task, while 8 participated in the DDI extraction task) from 7 different countries. For the task of the recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was an F1 of 65.1%. The results show advances in the state of the art and demonstrate that significant challenges remain to be resolved. This paper focuses on the second task (extraction of DDIs) and examines its main challenges, which have yet to be resolved.
Collapse
Affiliation(s)
- Isabel Segura-Bedmar
- Dpto. de Informática, Universidad Carlos III de Madrid, Leganés 28911, Madrid, Spain.
| | - Paloma Martínez
- Dpto. de Informática, Universidad Carlos III de Madrid, Leganés 28911, Madrid, Spain.
| | - María Herrero-Zazo
- Dpto. de Informática, Universidad Carlos III de Madrid, Leganés 28911, Madrid, Spain.
| |
Collapse
|
19
|
Jones DE, Igo S, Hurdle J, Facelli JC. Automatic extraction of nanoparticle properties using natural language processing: NanoSifter an application to acquire PAMAM dendrimer properties. PLoS One 2014; 9:e83932. [PMID: 24392101 PMCID: PMC3879259 DOI: 10.1371/journal.pone.0083932] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 11/11/2013] [Indexed: 11/19/2022] Open
Abstract
In this study, we demonstrate the use of natural language processing methods to extract, from nanomedicine literature, numeric values of biomedical property terms of poly(amidoamine) dendrimers. We have developed a method for extracting these values for properties taken from the NanoParticle Ontology, using the General Architecture for Text Engineering and a Nearly-New Information Extraction System. We also created a method for associating the identified numeric values with their corresponding dendrimer properties, called NanoSifter. We demonstrate that our system can correctly extract numeric values of dendrimer properties reported in the cancer treatment literature with high recall, precision, and f-measure. The micro-averaged recall was 0.99, precision was 0.84, and f-measure was 0.91. Similarly, the macro-averaged recall was 0.99, precision was 0.87, and f-measure was 0.92. To our knowledge, these results are the first application of text mining to extract and associate dendrimer property terms and their corresponding numeric values.
Collapse
Affiliation(s)
- David E. Jones
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| | - Sean Igo
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Center for High Performance Computing, University of Utah, Salt Lake City, Utah, United States of America
| | - John Hurdle
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
| | - Julio C. Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America
- Center for High Performance Computing, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
20
|
Luo G. Open issues in intelligent personal health record--an updated status report for 2012. J Med Syst 2013; 37:9943. [PMID: 23584758 DOI: 10.1007/s10916-013-9943-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/20/2013] [Indexed: 12/16/2022]
Abstract
To improve the capability and usability of the personal health record (PHR) as a tool to empower consumers in the management of their own health, we have proposed the concept of an intelligent PHR (iPHR) and built a prototype iPHR system with four functions. These four functions use various health knowledge and computer science techniques to automatically provide users with personalized healthcare information to facilitate their well-being. This paper discusses several open issues in iPHR, including two enhancements to an existing function and two potential new functions. The two enhancements are for automatically compiling relevant self-care activities for each health issue and automatically identifying contraindicated self-care activities, respectively. One potential new function is personalized search for individual healthcare providers. Another potential new function is personalized local search for health-related services to help maintain patients in their homes. We include some preliminary thoughts on how to address these open issues with the hope to stimulate future research work on iPHR.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics, University of Utah, HSEB Room 5725B, 26 South 2000 East, Salt Lake City, UT, 84112, USA,
| |
Collapse
|
21
|
Xu R, Wang Q. A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine. J Biomed Inform 2013; 46:585-93. [PMID: 23570835 DOI: 10.1016/j.jbi.2013.04.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 03/15/2013] [Accepted: 04/01/2013] [Indexed: 12/14/2022]
Abstract
Personalized medicine is to deliver the right drug to the right patient in the right dose. Pharmacogenomics (PGx) is to identify genetic variants that may affect drug efficacy and toxicity. The availability of a comprehensive and accurate PGx-specific drug-gene relationship knowledge base is important for personalized medicine. However, building a large-scale PGx-specific drug-gene knowledge base is a difficult task. In this study, we developed a bootstrapping, semi-supervised learning approach to iteratively extract and rank drug-gene pairs according to their relevance to drug pharmacogenomics. Starting with a single PGx-specific seed pair and 20 million MEDLINE abstracts, the extraction algorithm achieved a precision of 0.219, recall of 0.368 and F1 of 0.274 after two iterations, a significant improvement over the results of using non-PGx-specific seeds (precision: 0.011, recall: 0.018, and F1: 0.014) or co-occurrence (precision: 0.015, recall: 1.000, and F1: 0.030). After the extraction step, the ranking algorithm further improved the precision from 0.219 to 0.561 for top ranked pairs. By comparing to a dictionary-based approach with PGx-specific gene lexicon as input, we showed that the bootstrapping approach has better performance in terms of both precision and F1 (precision: 0.251 vs. 0.152, recall: 0.396 vs. 0.856 and F1: 0.292 vs. 0.254). By integrative analysis using a large drug adverse event database, we have shown that the extracted drug-gene pairs strongly correlate with drug adverse events. In conclusion, we developed a novel semi-supervised bootstrapping approach for effective PGx-specific drug-gene pair extraction from large number of MEDLINE articles with minimal human input.
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Division, Case Western Reserve University, OH, USA.
| | | |
Collapse
|
22
|
Jiang G, Wang C, Zhu Q, Chute CG. A Framework of Knowledge Integration and Discovery for Supporting Pharmacogenomics Target Predication of Adverse Drug Events: A Case Study of Drug-Induced Long QT Syndrome. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2013; 2013:88-92. [PMID: 24303306 PMCID: PMC3814489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
Collapse
|
23
|
Xu R, Wang Q. An iterative searching and ranking algorithm for prioritising pharmacogenomics genes. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2013; 6:18-31. [PMID: 23428471 PMCID: PMC6100784 DOI: 10.1504/ijcbdd.2013.052199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Pharmacogenomics (PGx) studies are to identify genetic variants that may affect drug efficacy and toxicity. A machine understandable drug-gene relationship knowledge is important for many computational PGx studies and for personalised medicine. A comprehensive and accurate PGx-specific gene lexicon is important for automatic drug-gene relationship extraction from the scientific literature, rich knowledge source for PGx studies. In this study, we present a bootstrapping learning technique to rank 33,310 human genes with respect to their relevance to drug response. The algorithm uses only one seed PGx gene to iteratively extract and rank co-occurred genes using 20 million MEDLINE abstracts. Our ranking method is able to accurately rank PGx-specific genes highly among all human genes. Compared to randomly ranked genes (precision: 0.032, recall: 0.013, F1: 0.018), the algorithm has achieved significantly better performance (precision: 0.861, recall: 0.548, F1: 0.662) in ranking the top 2.5% of genes.
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Division, Case Western Reserve University, Cleveland, OH 44106, USA
| | | |
Collapse
|
24
|
Trugenberger CA, Wälti C, Peregrim D, Sharp ME, Bureeva S. Discovery of novel biomarkers and phenotypes by semantic technologies. BMC Bioinformatics 2013; 14:51. [PMID: 23402646 PMCID: PMC3605201 DOI: 10.1186/1471-2105-14-51] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 02/01/2013] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents. RESULTS This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated. CONCLUSIONS The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions.
Collapse
Affiliation(s)
- Carlo A Trugenberger
- InfoCodex AG, Semantic Technologies, Bahnhofstrasse 50, Buchs (SG), CH-9470, Switzerland
| | - Christoph Wälti
- InfoCodex AG, Semantic Technologies, Bahnhofstrasse 50, Buchs (SG), CH-9470, Switzerland
| | - David Peregrim
- Merck Research Laboratories, 126 East Lincoln Avenue, Rahway, NJ 07065, USA
| | - Mark E Sharp
- Merck Research Laboratories, 126 East Lincoln Avenue, Rahway, NJ 07065, USA
| | - Svetlana Bureeva
- Thomson Reuters, 5901 Priestly Drive, STE 200, Carlsbad, CA, 92008, USA
| |
Collapse
|
25
|
Nawaz R, Thompson P, Ananiadou S. Negated bio-events: analysis and identification. BMC Bioinformatics 2013; 14:14. [PMID: 23323936 PMCID: PMC3561152 DOI: 10.1186/1471-2105-14-14] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 01/10/2013] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations. RESULTS We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP'09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP'09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events. CONCLUSIONS Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events.
Collapse
Affiliation(s)
- Raheel Nawaz
- National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Paul Thompson
- National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| |
Collapse
|
26
|
Abstract
There is great variation in drug-response phenotypes, and a “one size fits all” paradigm for drug delivery is flawed. Pharmacogenomics is the study of how human genetic information impacts drug response, and it aims to improve efficacy and reduced side effects. In this article, we provide an overview of pharmacogenetics, including pharmacokinetics (PK), pharmacodynamics (PD), gene and pathway interactions, and off-target effects. We describe methods for discovering genetic factors in drug response, including genome-wide association studies (GWAS), expression analysis, and other methods such as chemoinformatics and natural language processing (NLP). We cover the practical applications of pharmacogenomics both in the pharmaceutical industry and in a clinical setting. In drug discovery, pharmacogenomics can be used to aid lead identification, anticipate adverse events, and assist in drug repurposing efforts. Moreover, pharmacogenomic discoveries show promise as important elements of physician decision support. Finally, we consider the ethical, regulatory, and reimbursement challenges that remain for the clinical implementation of pharmacogenomics.
Collapse
|
27
|
Li J, Lu Z. Systematic identification of pharmacogenomics information from clinical trials. J Biomed Inform 2012; 45:870-8. [PMID: 22546622 PMCID: PMC3760158 DOI: 10.1016/j.jbi.2012.04.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Revised: 03/13/2012] [Accepted: 04/11/2012] [Indexed: 11/23/2022]
Abstract
Recent progress in high-throughput genomic technologies has shifted pharmacogenomic research from candidate gene pharmacogenetics to clinical pharmacogenomics (PGx). Many clinical related questions may be asked such as 'what drug should be prescribed for a patient with mutant alleles?' Typically, answers to such questions can be found in publications mentioning the relationships of the gene-drug-disease of interest. In this work, we hypothesize that ClinicalTrials.gov is a comparable source rich in PGx related information. In this regard, we developed a systematic approach to automatically identify PGx relationships between genes, drugs and diseases from trial records in ClinicalTrials.gov. In our evaluation, we found that our extracted relationships overlap significantly with the curated factual knowledge through the literature in a PGx database and that most relationships appear on average 5 years earlier in clinical trials than in their corresponding publications, suggesting that clinical trials may be valuable for both validating known and capturing new PGx related information in a more timely manner. Furthermore, two human reviewers judged a portion of computer-generated relationships and found an overall accuracy of 74% for our text-mining approach. This work has practical implications in enriching our existing knowledge on PGx gene-drug-disease relationships as well as suggesting crosslinks between ClinicalTrials.gov and other PGx knowledge bases.
Collapse
Affiliation(s)
- Jiao Li
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| |
Collapse
|
28
|
The state of the art in text mining and natural language processing for pharmacogenomics. J Biomed Inform 2012. [DOI: 10.1016/j.jbi.2012.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
29
|
Gyimesi G, Borsodi D, Sarankó H, Tordai H, Sarkadi B, Hegedűs T. ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application. Hum Mutat 2012; 33:1547-56. [PMID: 22693078 DOI: 10.1002/humu.22138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 05/29/2012] [Indexed: 11/08/2022]
Abstract
To overcome the pathological phenomena caused by altered function of ABC (ATP Binding Cassette) proteins, their mechanisms of action are extensively investigated, often involving the design of mutant constructs for experiments. Designing mutagenetic constructs, interpreting the result of mutagenetic experiments, and finding individual genetic variants require an extensive knowledge of previously published mutations. To aid the recapitulation of mutations described in the literature, we set up a database of ABC protein mutations (ABCMdb) extracted from full-text papers using an automatic mining approach. We have also developed a Web application interface to compare mutations in different ABC proteins using sequence alignments and to interactively map the mutations to 3D structural models. Currently our database contains protein mutations published for ABCB1, ABCB11, ABCC1, ABCC6, ABCC7, and the proteins of the ABCG subfamily. The database will be extended to include other members and subfamilies, and to provide information on whether or not a mutation is disease causing, represents a high-incidence polymorphism, or was generated only in vitro. The ABCMdb database should already help to compare the effects of mutations at homologous positions in different ABC proteins, and its interactive tools aim to advance the design of experiments for a wider range of proteins.
Collapse
Affiliation(s)
- Gergely Gyimesi
- Membrane Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | | | | | | | | | | |
Collapse
|
30
|
Samwald M, Coulet A, Huerga I, Powers RL, Luciano JS, Freimuth RR, Whipple F, Pichler E, Prud'hommeaux E, Dumontier M, Marshall MS. Semantically enabling pharmacogenomic data for the realization of personalized medicine. Pharmacogenomics 2012; 13:201-12. [PMID: 22256869 DOI: 10.2217/pgs.11.179] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Understanding how each individual's genetics and physiology influences pharmaceutical response is crucial to the realization of personalized medicine and the discovery and validation of pharmacogenomic biomarkers is key to its success. However, integration of genotype and phenotype knowledge in medical information systems remains a critical challenge. The inability to easily and accurately integrate the results of biomolecular studies with patients' medical records and clinical reports prevents us from realizing the full potential of pharmacogenomic knowledge for both drug development and clinical practice. Herein, we describe approaches using Semantic Web technologies, in which pharmacogenomic knowledge relevant to drug development and medical decision support is represented in such a way that it can be efficiently accessed both by software and human experts. We suggest that this approach increases the utility of data, and that such computational technologies will become an essential part of personalized medicine, alongside diagnostics and pharmaceutical products.
Collapse
Affiliation(s)
- Matthias Samwald
- Department of Medical Statistics & Bioinformatics, Leiden University Medical Center/Informatics Institute, University of Amsterdam, Einthovenweg 20, 2333 ZC Leiden, The Netherlands
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature--a survey of the state of the art. Brief Bioinform 2012; 13:460-94. [PMID: 22833496 PMCID: PMC3404399 DOI: 10.1093/bib/bbs018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/23/2012] [Indexed: 01/05/2023] Open
Abstract
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
| | | | | | | |
Collapse
|
32
|
A mutation-centric approach to identifying pharmacogenomic relations in text. J Biomed Inform 2012; 45:835-41. [PMID: 22683993 DOI: 10.1016/j.jbi.2012.05.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 05/19/2012] [Accepted: 05/21/2012] [Indexed: 11/21/2022]
Abstract
OBJECTIVES To explore the notion of mutation-centric pharmacogenomic relation extraction and to evaluate our approach against reference pharmacogenomic relations. METHODS From a corpus of MEDLINE abstracts relevant to genetic variation, we identify co-occurrences between drug mentions extracted using MetaMap and RxNorm, and genetic variants extracted by EMU. The recall of our approach is evaluated against reference relations curated manually in PharmGKB. We also reviewed a random sample of 180 relations in order to evaluate its precision. RESULTS One crucial aspect of our strategy is the use of biological knowledge for identifying specific genetic variants in text, not simply gene mentions. On the 104 reference abstracts from PharmGKB, the recall of our mutation-centric approach is 33-46%. Applied to 282,000 abstracts from MEDLINE, our approach identifies pharmacogenomic relations in 4534 abstracts, with a precision of 65%. CONCLUSIONS Compared to a relation-centric approach, our mutation-centric approach shows similar recall, but slightly lower precision. We show that both approaches have limited overlap in their results, but are complementary and can be used in combination. Rather than a solution for the automatic curation of pharmacogenomic knowledge, we see these high-throughput approaches as tools to assist biocurators in the identification of pharmacogenomic relations of interest from the published literature. This investigation also identified three challenging aspects of the extraction of pharmacogenomic relations, namely processing full-text articles, sequence validation of DNA variants and resolution of genetic variants to reference databases, such as dbSNP.
Collapse
|
33
|
Pakhomov S, McInnes BT, Lamba J, Liu Y, Melton GB, Ghodke Y, Bhise N, Lamba V, Birnbaum AK. Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies. J Biomed Inform 2012; 45:862-9. [PMID: 22564551 DOI: 10.1016/j.jbi.2012.04.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 04/04/2012] [Accepted: 04/11/2012] [Indexed: 11/19/2022]
Abstract
The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets "suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.
Collapse
Affiliation(s)
- S Pakhomov
- College of Pharmacy, University of Minnesota, Minneapolis, MN 55455, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, Kors JA, Furlong LI. The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform 2012; 45:879-84. [PMID: 22554700 DOI: 10.1016/j.jbi.2012.04.004] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Revised: 02/02/2012] [Accepted: 04/11/2012] [Indexed: 11/25/2022]
Abstract
Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.
Collapse
Affiliation(s)
- Erik M van Mulligen
- Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Rinaldi F, Clematide S, Garten Y, Whirl-Carrillo M, Gong L, Hebert JM, Sangkuhl K, Thorn CF, Klein TE, Altman RB. Using ODIN for a PharmGKB revalidation experiment. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas021. [PMID: 22529178 PMCID: PMC3332569 DOI: 10.1093/database/bas021] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usability of the system's interface, as perceived by users, and (ii) the correlation of the ranking of interactions, as provided by the text-mining system, with the choices of the curators.
Collapse
Affiliation(s)
- Fabio Rinaldi
- Institute of Computational Linguistics, Binzmuhlestrasse 171, 8050 Zurich, Switzerland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
McDonagh EM, Whirl-Carrillo M, Garten Y, Altman RB, Klein TE. From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource. Biomark Med 2012; 5:795-806. [PMID: 22103613 DOI: 10.2217/bmm.11.94] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The mission of the Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org ) is to collect, encode and disseminate knowledge about the impact of human genetic variations on drug responses. It is an important worldwide resource of clinical pharmacogenomic biomarkers available to all. The PharmGKB website has evolved to highlight our knowledge curation and aggregation over our previous emphasis on collecting primary data. This review summarizes the methods we use to drive this expanded scope of 'Knowledge Acquisition to Clinical Applications', the new features available on our website and our future goals.
Collapse
Affiliation(s)
- Ellen M McDonagh
- Department of Genetics, Stanford University, 1501 S California Avenue, Palo Alto, CA 94304, USA
| | | | | | | | | |
Collapse
|
37
|
WU YONGHUI, LIU MEI, ZHENG WJIM, ZHAO ZHONGMING, XU HUA. Ranking gene-drug relationships in biomedical literature using Latent Dirichlet Allocation. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2012:422-433. [PMID: 22174297 PMCID: PMC4095990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Drug responses vary greatly among individuals due to human genetic variations, which is known as pharmacogenomics (PGx). Much of the PGx knowledge has been embedded in biomedical literature and there is a growing interest to develop text mining approaches to extract such knowledge. In this paper, we present a study to rank candidate gene-drug relations using Latent Dirichlet Allocation (LDA) model. Our approach consists of three steps: 1) recognize gene and drug entities in MEDLINE abstracts; 2) extract candidate gene-drug pairs based on different levels of co-occurrence, including abstract level, sentence level, and phrase level; and 3) rank candidate gene-drug pairs using multiple different methods including term frequency, Chi-square test, Mutual Information (MI), a reported Kullback-Leibler (KL) distance based on topics derived from LDA (LDA-KL), and a newly defined probabilistic KL distance based on LDA (LDA-PKL). We systematically evaluated these methods by using a gold standard data set of gene-drug relations derived from PharmGKB. Our results showed that the proposed LDA-PKL method achieved better Mean Average Precision (MAP) than any other methods, suggesting its promising uses for ranking and detecting PGx relations.
Collapse
Affiliation(s)
| | - MEI LIU
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN 37232, USA
| | - W. JIM ZHENG
- Department of Biochemistry, Medical University of South Carolina Charleston, SC 29425, USA
| | - ZHONGMING ZHAO
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN 37232, USA
| | - HUA XU
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN 37232, USA
| |
Collapse
|
38
|
Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2012:410-421. [PMID: 22174296 PMCID: PMC3345566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.
Collapse
Affiliation(s)
- Bethany Percha
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | | | | |
Collapse
|
39
|
Stringent response of Escherichia coli: revisiting the bibliome using literature mining. MICROBIAL INFORMATICS AND EXPERIMENTATION 2011; 1:14. [PMID: 22587779 PMCID: PMC3372295 DOI: 10.1186/2042-5783-1-14] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 12/30/2011] [Indexed: 12/11/2022]
Abstract
Background Understanding the mechanisms responsible for cellular responses depends on the systematic collection and analysis of information on the main biological concepts involved. Indeed, the identification of biologically relevant concepts in free text, namely genes, tRNAs, mRNAs, gene products and small molecules, is crucial to capture the structure and functioning of different responses. Results In this work, we review literature reports on the study of the stringent response in Escherichia coli. Rather than undertaking the development of a highly specialised literature mining approach, we investigate the suitability of concept recognition and statistical analysis of concept occurrence as means to highlight the concepts that are most likely to be biologically engaged during this response. The co-occurrence analysis of core concepts in this stringent response, i.e. the (p)ppGpp nucleotides with gene products was also inspected and suggest that besides the enzymes RelA and SpoT that control the basal levels of (p)ppGpp nucleotides, many other proteins have a key role in this response. Functional enrichment analysis revealed that basic cellular processes such as metabolism, transcriptional and translational regulation are central, but other stress-associated responses might be elicited during the stringent response. In addition, the identification of less annotated concepts revealed that some (p)ppGpp-induced functional activities are still overlooked in most reviews. Conclusions In this paper we applied a literature mining approach that offers a more comprehensive analysis of the stringent response in E. coli. The compilation of relevant biological entities to this stress response and the assessment of their functional roles provided a more systematic understanding of this cellular response. Overlooked regulatory entities, such as transcriptional regulators, were found to play a role in this stress response. Moreover, the involvement of other stress-associated concepts demonstrates the complexity of this cellular response.
Collapse
|
40
|
Tsuruoka Y, Miwa M, Hamamoto K, Tsujii J, Ananiadou S. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics 2011; 27:i111-9. [PMID: 21685059 PMCID: PMC3117364 DOI: 10.1093/bioinformatics/btr214] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Motivation: Discovering useful associations between biomedical concepts has been one of the main goals in biomedical text-mining, and understanding their biomedical contexts is crucial in the discovery process. Hence, we need a text-mining system that helps users explore various types of (possibly hidden) associations in an easy and comprehensible manner. Results: This article describes FACTA+, a real-time text-mining system for finding and visualizing indirect associations between biomedical concepts from MEDLINE abstracts. The system can be used as a text search engine like PubMed with additional features to help users discover and visualize indirect associations between important biomedical concepts such as genes, diseases and chemical compounds. FACTA+ inherits all functionality from its predecessor, FACTA, and extends it by incorporating three new features: (i) detecting biomolecular events in text using a machine learning model, (ii) discovering hidden associations using co-occurrence statistics between concepts, and (iii) visualizing associations to improve the interpretability of the output. To the best of our knowledge, FACTA+ is the first real-time web application that offers the functionality of finding concepts involving biomolecular events and visualizing indirect associations of concepts with both their categories and importance. Availability: FACTA+ is available as a web application at http://refine1-nactem.mc.man.ac.uk/facta/, and its visualizer is available at http://refine1-nactem.mc.man.ac.uk/facta-visualizer/. Contact:tsuruoka@jaist.ac.jp
Collapse
Affiliation(s)
- Yoshimasa Tsuruoka
- School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), Nomi, Japan.
| | | | | | | | | |
Collapse
|
41
|
Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. ACTA ACUST UNITED AC 2011; 27:1741-8. [PMID: 21596790 PMCID: PMC3117361 DOI: 10.1093/bioinformatics/btr295] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
MOTIVATION Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics. RESULTS This review outlines recent developments in sequencing technologies and genome analysis methods for application in personalized medicine. New methods are needed in four areas to realize the potential of personalized medicine: (i) processing large-scale robust genomic data; (ii) interpreting the functional effect and the impact of genomic variation; (iii) integrating systems data to relate complex genetic interactions with phenotypes; and (iv) translating these discoveries into medical practice. CONTACT russ.altman@stanford.edu
Collapse
Affiliation(s)
- Guy Haskin Fernald
- Biomedical Informatics Training Program, Stanford University School of Medicine, Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
42
|
Coulet A, Garten Y, Dumontier M, Altman RB, Musen MA, Shah NH. Integration and publication of heterogeneous text-mined relationships on the Semantic Web. J Biomed Semantics 2011; 2 Suppl 2:S10. [PMID: 21624156 PMCID: PMC3102890 DOI: 10.1186/2041-1480-2-s2-s10] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering. Results We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network. Conclusions The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.
Collapse
Affiliation(s)
- Adrien Coulet
- LORIA - INRIA Nancy - Grand-Est, Campus Scientifique - BP 239 - 54506 Vandoeuvre-lès-Nancy Cedex, France.
| | | | | | | | | | | |
Collapse
|