1
|
Theodosiou T, Vrettos K, Baltsavia I, Baltoumas F, Papanikolaou N, Antonakis AΝ, Mossialos D, Ouzounis CA, Promponas VJ, Karaglani M, Chatzaki E, Brandau S, Pavlopoulos GA, Andreakos E, Iliopoulos I. BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery. Comput Struct Biotechnol J 2024; 23:3247-3253. [PMID: 39279874 PMCID: PMC11399685 DOI: 10.1016/j.csbj.2024.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/05/2024] [Accepted: 08/15/2024] [Indexed: 09/18/2024] Open
Abstract
The process of navigating through the landscape of biomedical literature and performing searches or combining them with bioinformatics analyses can be daunting, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related repositories. Herein, we present BioTextQuest v2.0, a tool for biomedical literature mining. BioTextQuest v2.0 is an open-source online web portal for document clustering based on sets of selected biomedical terms, offering efficient management of information derived from PubMed abstracts. Employing established machine learning algorithms, the tool facilitates document clustering while allowing users to customize the analysis by selecting terms of interest. BioTextQuest v2.0 streamlines the process of uncovering valuable insights from biomedical research articles, serving as an agent that connects the identification of key terms like genes/proteins, diseases, chemicals, Gene Ontology (GO) terms, functions, and others through named entity recognition, and their application in biological research. Instead of manually sifting through articles, researchers can enter their PubMed-like query and receive extracted information in two user-friendly formats, tables and word clouds, simplifying the comprehension of key findings. The latest update of BioTextQuest leverages the EXTRACT named entity recognition tagger, enhancing its ability to pinpoint various biological entities within text. BioTextQuest v2.0 acts as a research assistant, significantly reducing the time and effort required for researchers to identify and present relevant information from the biomedical literature.
Collapse
Affiliation(s)
- Theodosios Theodosiou
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Konstantinos Vrettos
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Ismini Baltsavia
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Fotis Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Athens 16672, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Andreas Ν Antonakis
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| | - Dimitrios Mossialos
- Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - Christos A Ouzounis
- Biological Computation & Computational Biology Group, AIIA Lab, School of Informatics, Aristotle University of Thessalonica, 57001 Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia 1678, Cyprus
| | - Makrina Karaglani
- Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece
| | - Ekaterini Chatzaki
- Medical School, Democritus University of Thrace, 68100 Alexandroupolis, Greece
| | - Sven Brandau
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Athens 16672, Greece
| | - Evangelos Andreakos
- Center for Immunology and Transplantation, Biomedical Research Foundation Academy of Athens, Athens, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece
| |
Collapse
|
2
|
Lee H, Jeon J, Jung D, Won JI, Kim K, Kim YJ, Yoon J. RelCurator: a text mining-based curation system for extracting gene-phenotype relationships specific to neurodegenerative disorders. Genes Genomics 2023; 45:1025-1036. [PMID: 37300788 DOI: 10.1007/s13258-023-01405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 05/18/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form. OBJECTIVE We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships. METHODS We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders. RESULTS We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype. CONCLUSION RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.
Collapse
Affiliation(s)
- Heonwoo Lee
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Junbeom Jeon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Dawoon Jung
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Jung-Im Won
- Center for Innovation in Engineering Education, Hanyang University, Seoul, Republic of Korea
| | - Kiyong Kim
- Department of Electronic Engineering, Kyonggi University, Suwon, Republic of Korea
| | - Yun Joong Kim
- Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Department of Neurology, Yongin Severance Hospital, Yonsei University College of Medicine, Yonsei University Health System, Yongin, Gyeonggi-do, 16995, Republic of Korea.
| | - Jeehee Yoon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea.
| |
Collapse
|
3
|
Prediction and Ranking of Biomarkers Using multiple UniReD. Int J Mol Sci 2022; 23:ijms231911112. [PMID: 36232413 PMCID: PMC9569535 DOI: 10.3390/ijms231911112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/06/2022] [Accepted: 09/17/2022] [Indexed: 11/23/2022] Open
Abstract
Protein–protein interactions (PPIs) are of key importance for understanding how cells and organisms function. Thus, in recent decades, many approaches have been developed for the identification and discovery of such interactions. These approaches addressed the problem of PPI identification either by an experimental point of view or by a computational one. Here, we present an updated version of UniReD, a computational prediction tool which takes advantage of biomedical literature aiming to extract documented, already published protein associations and predict undocumented ones. The usefulness of this computational tool has been previously evaluated by experimentally validating predicted interactions and by benchmarking it against public databases of experimentally validated PPIs. In its updated form, UniReD allows the user to provide a list of proteins of known implication in, e.g., a particular disease, as well as another list of proteins that are potentially associated with the proteins of the first list. UniReD then automatically analyzes both lists and ranks the proteins of the second list by their association with the proteins of the first list, thus serving as a potential biomarker discovery/validation tool.
Collapse
|
4
|
Research on Literature Clustering Algorithm for Massive Scientific and Technical Literature Query Service. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3392489. [PMID: 36045966 PMCID: PMC9420566 DOI: 10.1155/2022/3392489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 07/25/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022]
Abstract
Traditional science and technology literature search mainly provides users with reliable and detailed information materials and services through technical means, data resources, and service strategies. With the development of network technology, computer technology, and information technology, digital information resources are increasing day by day, which continuously impact the traditional knowledge service mode. Some traditional technical methods and service means can no longer meet the information needs of users under large data sets. This paper proposes a model of large-scale literature search service in the context of big data by studying the technical means and service modes used for scientific and technical literature search in universities in the era of big data. Specifically, this paper proposes a method for fast literature retrieval by combining R-tree indexing for the characteristics of diverse data types and large data volume of science and technology literature. The method uses an improved k-mean clustering algorithm to construct an R-tree clustering model and improve the retrieval efficiency of the system by retrieving scientific and technical literature data through R-tree indexing. Experiments on university science and technology literature datasets show that the method in this paper improves both efficiency and precision when searching literature.
Collapse
|
5
|
Fortunato Costa K, Almeida Araújo F, Morais J, Lisboa Frances CR, Ramos RTJ. Text mining for identification of biological entities related to antibiotic resistant organisms. PeerJ 2022; 10:e13351. [PMID: 35539017 PMCID: PMC9080439 DOI: 10.7717/peerj.13351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 04/07/2022] [Indexed: 01/13/2023] Open
Abstract
Antimicrobial resistance is a significant public health problem worldwide. In recent years, the scientific community has been intensifying efforts to combat this problem; many experiments have been developed, and many articles are published in this area. However, the growing volume of biological literature increases the difficulty of the biocuration process due to the cost and time required. Modern text mining tools with the adoption of artificial intelligence technology are helpful to assist in the evolution of research. In this article, we propose a text mining model capable of identifying and ranking prioritizing scientific articles in the context of antimicrobial resistance. We retrieved scientific articles from the PubMed database, adopted machine learning techniques to generate the vector representation of the retrieved scientific articles, and identified their similarity with the context. As a result of this process, we obtained a dataset labeled "Relevant" and "Irrelevant" and used this dataset to implement one supervised learning algorithm to classify new records. The model's overall performance reached 90% accuracy and the f-measure (harmonic mean between the metrics) reached 82% accuracy for positive class and 93% for negative class, showing quality in the identification of scientific articles relevant to the context. The dataset, scripts and models are available at https://github.com/engbiopct/TextMiningAMR.
Collapse
Affiliation(s)
- Kelle Fortunato Costa
- Programa de pós-graduação em Engenharia Elétrica, Universidade Federal do Pará, Belém, Pará, Brazil
| | - Fabrício Almeida Araújo
- Biological Science Institute, Universidade Federal do Pará, Belém, Pará, Brazil,Universidade Federal Rural da Amazônia, Belém, Pará, Brazil
| | | | | | - Rommel T. J. Ramos
- Biological Science Institute, Universidade Federal do Para, Belém, Pará, Brazil
| |
Collapse
|
6
|
Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining. Biomolecules 2022; 12:biom12040520. [PMID: 35454109 PMCID: PMC9028073 DOI: 10.3390/biom12040520] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 12/15/2022] Open
Abstract
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format.
Collapse
|
7
|
Bhasuran B. Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries. Methods Mol Biol 2022; 2496:123-140. [PMID: 35713862 DOI: 10.1007/978-1-0716-2305-3_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene-disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene-disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA.
Collapse
Affiliation(s)
- Balu Bhasuran
- DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA.
| |
Collapse
|
8
|
Soto AJ, Zerva C, Batista-Navarro R, Ananiadou S. LitPathExplorer: a confidence-based visual text analytics tool for exploring literature-enriched pathway models. Bioinformatics 2018; 34:1389-1397. [PMID: 29228271 DOI: 10.1093/bioinformatics/btx774] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 12/07/2017] [Indexed: 01/25/2023] Open
Abstract
Motivation Pathway models are valuable resources that help us understand the various mechanisms underpinning complex biological processes. Their curation is typically carried out through manual inspection of published scientific literature to find information relevant to a model, which is a laborious and knowledge-intensive task. Furthermore, models curated manually cannot be easily updated and maintained with new evidence extracted from the literature without automated support. Results We have developed LitPathExplorer, a visual text analytics tool that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models using statements (i.e. events) extracted automatically from the literature and organized according to levels of confidence. LitPathExplorer supports pathway modellers and curators alike by: (i) extracting events from the literature that corroborate existing models with evidence; (ii) discovering new events which can update models; and (iii) providing a confidence value for each event that is automatically computed based on linguistic features and article metadata. Our evaluation of event extraction showed a precision of 89% and a recall of 71%. Evaluation of our confidence measure, when used for ranking sampled events, showed an average precision ranging between 61 and 73%, which can be improved to 95% when the user is involved in the semi-supervised learning process. Qualitative evaluation using pair analytics based on the feedback of three domain experts confirmed the utility of our tool within the context of pathway model exploration. Availability and implementation LitPathExplorer is available at http://nactem.ac.uk/LitPathExplorer_BI/. Contact sophia.ananiadou@manchester.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Axel J Soto
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester M1 7DN, UK
| | - Chrysoula Zerva
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester M1 7DN, UK
| | - Riza Batista-Navarro
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester M1 7DN, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester M1 7DN, UK
| |
Collapse
|
9
|
ElShal S, Tranchevent LC, Sifrim A, Ardeshirdavani A, Davis J, Moreau Y. Beegle: from literature mining to disease-gene discovery. Nucleic Acids Res 2016; 44:e18. [PMID: 26384564 PMCID: PMC4737179 DOI: 10.1093/nar/gkv905] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 08/25/2015] [Accepted: 08/29/2015] [Indexed: 01/06/2023] Open
Abstract
Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.
Collapse
Affiliation(s)
- Sarah ElShal
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium
| | - Léon-Charles Tranchevent
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France Université de Lyon 1, Villeurbanne, France Centre Léon Bérard, Lyon, France
| | - Alejandro Sifrim
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium Wellcome Trust Genome Campus, Hinxton, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| | - Amin Ardeshirdavani
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium
| | - Jesse Davis
- Department of Computer Science (DTAI), KU Leuven, Leuven 3001, Belgium
| | - Yves Moreau
- Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics Department, KU Leuven, Leuven 3001, Belgium iMinds Future Health Department, KU Leuven, Leuven 3001, Belgium
| |
Collapse
|
10
|
Zhai X, Li Z, Gao K, Huang Y, Lin L, Wang L. Research status and trend analysis of global biomedical text mining studies in recent 10 years. Scientometrics 2015. [DOI: 10.1007/s11192-015-1700-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
11
|
Himmelstein DS, Baranzini SE. Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes. PLoS Comput Biol 2015; 11:e1004259. [PMID: 26158728 PMCID: PMC4497619 DOI: 10.1371/journal.pcbi.1004259] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/26/2015] [Indexed: 12/13/2022] Open
Abstract
The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks—graphs with multiple node and edge types—for accomplishing both tasks. First we constructed a network with 18 node types—genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections—and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains. For complex human diseases, identifying the genes harboring susceptibility variants has taken on medical importance. Disease-associated genes provide clues for elucidating disease etiology, predicting disease risk, and highlighting therapeutic targets. Here, we develop a method to predict whether a given gene and disease are associated. To capture the multitude of biological entities underlying pathogenesis, we constructed a heterogeneous network, containing multiple node and edge types. We built on a technique developed for social network analysis, which embraces disparate sources of data to make predictions from heterogeneous networks. Using the compendium of associations from genome-wide studies, we learned the influential mechanisms underlying pathogenesis. Our findings provide a novel perspective about the existence of pervasive pleiotropy across complex diseases. Furthermore, we suggest transcriptional signatures of perturbations are an underutilized resource amongst prioritization approaches. For multiple sclerosis, we demonstrated our ability to prioritize future studies and discover novel susceptibility genes. Researchers can use these predictions to increase the statistical power of their studies, to suggest the causal genes from a set of candidates, or to generate evidence-based experimental hypothesis.
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Biological & Medical Informatics, University of California, San Francisco, San Francisco, California, United States of America
| | - Sergio E. Baranzini
- Biological & Medical Informatics, University of California, San Francisco, San Francisco, California, United States of America
- Department of Neurology, University of California, San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California, San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Pennings JLA, Jennen DGJ, Nygaard UC, Namork E, Haug LS, van Loveren H, Granum B. Cord blood gene expression supports that prenatal exposure to perfluoroalkyl substances causes depressed immune functionality in early childhood. J Immunotoxicol 2015; 13:173-80. [PMID: 25812627 DOI: 10.3109/1547691x.2015.1029147] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) are a class of synthetic compounds that have widespread use in consumer and industrial applications. PFAS are considered environmental pollutants that have various toxic properties, including effects on the immune system. Recent human studies indicate that prenatal exposure to PFAS leads to suppressed immune responses in early childhood. In this study, data from the Norwegian BraMat cohort was used to investigate transcriptomics profiles in neonatal cord blood and their association with maternal PFAS exposure, anti-rubella antibody levels at 3 years of age and the number of common cold episodes until 3 years. Genes associated with PFAS exposure showed enrichment for immunological and developmental functions. The analyses identified a toxicogenomics profile of 52 PFAS exposure-associated genes that were in common with genes associated with rubella titers and/or common cold episodes. This gene set contains several immunomodulatory genes (CYTL1, IL27) as well as other immune-associated genes (e.g. EMR4P, SHC4, ADORA2A). In addition, this study identified PPARD as a PFAS toxicogenomics marker. These markers can serve as the basis for further mechanistic or epidemiological studies. This study provides a transcriptomics connection between prenatal PFAS exposure and impaired immune function in early childhood and supports current views on PPAR- and NF-κB-mediated modes of action. The findings add to the available evidence that PFAS exposure is immunotoxic in humans and support regulatory policies to phase out these substances.
Collapse
Affiliation(s)
- Jeroen L A Pennings
- a Centre for Health Protection, National Institute for Public Health and the Environment (RIVM) , Bilthoven , the Netherlands
| | - Danyel G J Jennen
- b Department of Toxicogenomics , Maastricht University , Maastricht , the Netherlands , and
| | - Unni C Nygaard
- c Division of Environmental Medicine , Norwegian Institute of Public Health (NIPH) , Oslo , Norway
| | - Ellen Namork
- c Division of Environmental Medicine , Norwegian Institute of Public Health (NIPH) , Oslo , Norway
| | - Line S Haug
- c Division of Environmental Medicine , Norwegian Institute of Public Health (NIPH) , Oslo , Norway
| | - Henk van Loveren
- a Centre for Health Protection, National Institute for Public Health and the Environment (RIVM) , Bilthoven , the Netherlands .,b Department of Toxicogenomics , Maastricht University , Maastricht , the Netherlands , and
| | - Berit Granum
- c Division of Environmental Medicine , Norwegian Institute of Public Health (NIPH) , Oslo , Norway
| |
Collapse
|
13
|
Application of text mining in the biomedical domain. Methods 2015; 74:97-106. [PMID: 25641519 DOI: 10.1016/j.ymeth.2015.01.015] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 12/12/2022] Open
Abstract
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for.
Collapse
|
14
|
Ellero-Simatos S, Fleuren WWM, Bauerschmidt S, Dokter WHA, Toonen EJM. Identification of gene signatures for prednisolone-induced metabolic dysfunction in collagen-induced arthritic mice. Pharmacogenomics 2014; 15:629-41. [PMID: 24798720 DOI: 10.2217/pgs.14.3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Prednisolone is a potent anti-inflammatory glucocorticoid (GC) but chronic use is hampered by metabolic side effects. Little is known about the long-term effects of GCs on gene-expression in vivo during inflammation. AIM Identify gene signatures underlying prednisolone-induced metabolic side effects in a complex in vivo inflammatory setting after long-term treatment. MATERIALS & METHODS We performed whole-genome expression profiling in liver and muscle from arthritic and nonarthritic mice treated with several doses of prednisolone for 3 weeks and used text-mining to link gene signatures to metabolic pathways. RESULTS Prednisolone-induced gene signatures were highly tissue specific. We identified a short-list of genes significantly affected by both prednisolone and inflammation in liver and involved in glucose and fatty acid metabolism. For several of these genes the association with GCs is novel. CONCLUSION The identified gene signatures may provide useful starting points for the development of GCs with a better safety profile.
Collapse
Affiliation(s)
- Sandrine Ellero-Simatos
- Division Analytical Biosciences, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | | | | | | | | |
Collapse
|
15
|
Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of disease-gene associations. Methods 2014; 74:83-9. [PMID: 25484339 DOI: 10.1016/j.ymeth.2014.11.020] [Citation(s) in RCA: 351] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Revised: 11/15/2014] [Accepted: 11/25/2014] [Indexed: 12/18/2022] Open
Abstract
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.
Collapse
Affiliation(s)
- Sune Pletscher-Frankild
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Albert Pallejà
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kalliopi Tsafou
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Janos X Binder
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany; Bioinformatics Core Facility, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
| | - Lars Juhl Jensen
- Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
16
|
Jennen D, Polman J, Bessem M, Coonen M, van Delft J, Kleinjans J. Drug-induced liver injury classification model based on in vitro human transcriptomics and in vivo rat clinical chemistry data. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.29400] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Jung JY, DeLuca TF, Nelson TH, Wall DP. A literature search tool for intelligent extraction of disease-associated genes. J Am Med Inform Assoc 2014; 21:399-405. [PMID: 23999671 PMCID: PMC3994846 DOI: 10.1136/amiajnl-2012-001563] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 07/15/2013] [Accepted: 08/08/2013] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. METHODS We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. RESULTS We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. CONCLUSIONS We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
Collapse
Affiliation(s)
- Jae-Yoon Jung
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Todd F DeLuca
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tristan H Nelson
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Dennis P Wall
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| |
Collapse
|
18
|
Wang JH, Zhao LF, Lin P, Su XR, Chen SJ, Huang LQ, Wang HF, Zhang H, Hu ZF, Yao KT, Huang ZX. GenCLiP 2.0: a web server for functional clustering of genes and construction of molecular networks based on free terms. Bioinformatics 2014; 30:2534-6. [PMID: 24764463 DOI: 10.1093/bioinformatics/btu241] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
UNLABELLED Identifying biological functions and molecular networks in a gene list and how the genes may relate to various topics is of considerable value to biomedical researchers. Here, we present a web-based text-mining server, GenCLiP 2.0, which can analyze human genes with enriched keywords and molecular interactions. Compared with other similar tools, GenCLiP 2.0 offers two unique features: (i) analysis of gene functions with free terms (i.e. any terms in the literature) generated by literature mining or provided by the user and (ii) accurate identification and integration of comprehensive molecular interactions from Medline abstracts, to construct molecular networks and subnetworks related to the free terms. AVAILABILITY AND IMPLEMENTATION http://ci.smu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Hong Wang
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Ling-Feng Zhao
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Pei Lin
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Xiao-Rong Su
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Shi-Jun Chen
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Li-Qiang Huang
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Hua-Feng Wang
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Hai Zhang
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Zhen-Fu Hu
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Kai-Tai Yao
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Zhong-Xi Huang
- Cancer Institute, Key Laboratory of Zebrafish Modeling and Drug Screening for Human Diseases of Guangdong Higher Education Institutes, Department of Cell Biology, Southern Medical University, Guangzhou 510515, Guangzhou Biotechnology Center, Guangzhou, 510630, School of Basic Medical Sciences, Network Center and Department of Plastic Surgery, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
19
|
Wu C, Gudivada RC, Aronow BJ, Jegga AG. Computational drug repositioning through heterogeneous network clustering. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S6. [PMID: 24564976 PMCID: PMC4029299 DOI: 10.1186/1752-0509-7-s5-s6] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Given the costly and time consuming process and high attrition rates in drug discovery and development, drug repositioning or drug repurposing is considered as a viable strategy both to replenish the drying out drug pipelines and to surmount the innovation gap. Although there is a growing recognition that mechanistic relationships from molecular to systems level should be integrated into drug discovery paradigms, relatively few studies have integrated information about heterogeneous networks into computational drug-repositioning candidate discovery platforms. RESULTS Using known disease-gene and drug-target relationships from the KEGG database, we built a weighted disease and drug heterogeneous network. The nodes represent drugs or diseases while the edges represent shared gene, biological process, pathway, phenotype or a combination of these features. We clustered this weighted network to identify modules and then assembled all possible drug-disease pairs (putative drug repositioning candidates) from these modules. We validated our predictions by testing their robustness and evaluated them by their overlap with drug indications that were either reported in published literature or investigated in clinical trials. CONCLUSIONS Previous computational approaches for drug repositioning focused either on drug-drug and disease-disease similarity approaches whereas we have taken a more holistic approach by considering drug-disease relationships also. Further, we considered not only gene but also other features to build the disease drug networks. Despite the relative simplicity of our approach, based on the robustness analyses and the overlap of some of our predictions with drug indications that are under investigation, we believe our approach could complement the current computational approaches for drug repositioning candidate discovery.
Collapse
|
20
|
Wu C, Schwartz JM, Nenadic G. PathNER: a tool for systematic identification of biological pathway mentions in the literature. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 3:S2. [PMID: 24555844 PMCID: PMC3852116 DOI: 10.1186/1752-0509-7-s3-s2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Biological pathways are central to many biomedical studies and are frequently discussed in the literature. Several curated databases have been established to collate the knowledge of molecular processes constituting pathways. Yet, there has been little focus on enabling systematic detection of pathway mentions in the literature. Results We developed a tool, named PathNER (Pathway Named Entity Recognition), for the systematic identification of pathway mentions in the literature. PathNER is based on soft dictionary matching and rules, with the dictionary generated from public pathway databases. The rules utilise general pathway-specific keywords, syntactic information and gene/protein mentions. Detection results from both components are merged. On a gold-standard corpus, PathNER achieved an F1-score of 84%. To illustrate its potential, we applied PathNER on a collection of articles related to Alzheimer's disease to identify associated pathways, highlighting cases that can complement an existing manually curated knowledgebase. Conclusions In contrast to existing text-mining efforts that target the automatic reconstruction of pathway details from molecular interactions mentioned in the literature, PathNER focuses on identifying specific named pathway mentions. These mentions can be used to support large-scale curation and pathway-related systems biology applications, as demonstrated in the example of Alzheimer's disease. PathNER is implemented in Java and made freely available online at http://sourceforge.net/projects/pathner/.
Collapse
|
21
|
CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation. PLoS One 2013; 8:e74074. [PMID: 24086311 PMCID: PMC3781068 DOI: 10.1371/journal.pone.0074074] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 07/30/2013] [Indexed: 01/17/2023] Open
Abstract
A routine approach to inferring functions for a gene set is by using function enrichment analysis based on GO, KEGG or other curated terms and pathways. However, such analysis requires the existence of overlapping genes between the query gene set and those annotated by GO/KEGG. Furthermore, GO/KEGG databases only maintain a very restricted vocabulary. Here, we have developed a tool called "CoCiter" based on literature co-citations to address the limitations in conventional function enrichment analysis. Co-citation analysis is widely used in ranking articles and predicting protein-protein interactions (PPIs). Our algorithm can further assess the co-citation significance of a gene set with any other user-defined gene sets, or with free terms. We show that compared with the traditional approaches, CoCiter is a more accurate and flexible function enrichment analysis method. CoCiter is freely available at www.picb.ac.cn/hanlab/cociter/.
Collapse
|
22
|
Fleuren WWM, Linssen MML, Toonen EJM, van der Zon GCM, Guigas B, de Vlieg J, Dokter WHA, Ouwens DM, Alkema W. Prednisolone induces the Wnt signalling pathway in 3T3-L1 adipocytes. Arch Physiol Biochem 2013; 119:52-64. [PMID: 23506355 PMCID: PMC3665230 DOI: 10.3109/13813455.2013.774022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Synthetic glucocorticoids are potent anti-inflammatory drugs but show dose-dependent metabolic side effects such as the development of insulin resistance and obesity. The precise mechanisms involved in these glucocorticoid-induced side effects, and especially the participation of adipose tissue in this are not completely understood. We used a combination of transcriptomics, antibody arrays and bioinformatics approaches to characterize prednisolone-induced alterations in gene expression and adipokine secretion, which could underlie metabolic dysfunction in 3T3-L1 adipocytes. Several pathways, including cytokine signalling, Akt signalling, and Wnt signalling were found to be regulated at multiple levels, showing that these processes are targeted by prednisolone. These results suggest that mechanisms by which prednisolone induce insulin resistance include dysregulation of wnt signalling and immune response processes. These pathways may provide interesting targets for the development of improved glucocorticoids.
Collapse
Affiliation(s)
- Wilco W. M. Fleuren
- CDD, CMBI, NCMLS, Radboud University Medical CentreNijmegenThe Netherlands
- Netherlands Bioinformatics Centre (NBIC)NijmegenThe Netherlands
| | - Margot M. L. Linssen
- Department of Molecular Cell Biology, Leiden University Medical CenterLeidenThe Netherlands
| | - Erik J. M. Toonen
- Department of Medicine, Radboud University Medical CentreNijmegenThe Netherlands
| | | | - Bruno Guigas
- Department of Molecular Cell Biology, Leiden University Medical CenterLeidenThe Netherlands
- Department of Parasitology, Leiden University Medical CenterLeidenThe Netherlands
| | - Jacob de Vlieg
- CDD, CMBI, NCMLS, Radboud University Medical CentreNijmegenThe Netherlands
- Netherlands eScience CenterAmsterdamThe Netherlands
| | | | - D. Margriet Ouwens
- Institute of Clinical Biochemistry and Pathobiochemistry, German Diabetes CenterDüsseldorfGermany
- Department of Endocrinology, Ghent University HospitalGhentBelgium
| | - Wynand Alkema
- CDD, CMBI, NCMLS, Radboud University Medical CentreNijmegenThe Netherlands
| |
Collapse
|
23
|
Li C, Liakata M, Rebholz-Schuhmann D. Biological network extraction from scientific literature: state of the art and challenges. Brief Bioinform 2013; 15:856-77. [PMID: 23434632 DOI: 10.1093/bib/bbt006] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research.
Collapse
|
24
|
Fleuren WWM, Toonen EJM, Verhoeven S, Frijters R, Hulsen T, Rullmann T, van Schaik R, de Vlieg J, Alkema W. Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining. BioData Min 2013; 6:2. [PMID: 23379763 PMCID: PMC3577498 DOI: 10.1186/1756-0381-6-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 01/02/2013] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. RESULTS We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. CONCLUSIONS With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.
Collapse
Affiliation(s)
- Wilco WM Fleuren
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Erik JM Toonen
- Department of Medicine, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | - Raoul Frijters
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: Rijk Zwaan Nederland BV, Fijnaart, The Netherlands
| | - Tim Hulsen
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: Philips Research Europe, Eindhoven, The Netherlands
| | | | | | - Jacob de Vlieg
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Netherlands eScience Center, Amsterdam, The Netherlands
| | - Wynand Alkema
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: NIZO Food Research BV, Ede, The Netherlands
| |
Collapse
|
25
|
Molecular targets for 17α-ethynyl-5-androstene-3β,7β,17β-triol, an anti-inflammatory agent derived from the human metabolome. PLoS One 2012; 7:e32147. [PMID: 22384159 PMCID: PMC3286445 DOI: 10.1371/journal.pone.0032147] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 01/24/2012] [Indexed: 01/28/2023] Open
Abstract
HE3286, 17α-ethynyl-5-androstene-3β, 7β, 17β-triol, is a novel synthetic compound related to the endogenous sterol 5-androstene-3β, 7β, 17β-triol (β-AET), a metabolite of the abundant adrenal steroid dehydroepiandrosterone (DHEA). HE3286 has shown efficacy in clinical studies in impaired glucose tolerance and type 2 diabetes, and in vivo models of types 1 and 2 diabetes, autoimmunity, and inflammation. Proteomic analysis of solid-phase HE3286-bound bead affinity experiments, using extracts from RAW 264.7 mouse macrophage cells, identified 26 binding partners. Network analysis revealed associations of these HE3286 target proteins with nodes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for type 2 diabetes, insulin, adipokine, and adipocyte signaling. Binding partners included low density lipoprotein receptor-related protein (Lrp1), an endocytic receptor; mitogen activated protein kinases 1 and 3 (Mapk1, Mapk3), protein kinases involved in inflammation signaling pathways; ribosomal protein S6 kinase alpha-3 (Rsp6ka3), an intracellular regulatory protein; sirtuin-2 (Sirt2); and 17β-hydroxysteroid dehydrogenase 1 (Hsd17β4), a sterol metabolizing enzyme.
Collapse
|
26
|
Senger C, Grüning BA, Erxleben A, Döring K, Patel H, Flemming S, Merfort I, Günther S. Mining and evaluation of molecular relationships in literature. Bioinformatics 2012; 28:709-14. [DOI: 10.1093/bioinformatics/bts026] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|