101
|
Abstract
The current Ebola virus epidemic may provide some suggestions of how we can better prepare for the next pathogen outbreak. We propose several cost effective steps that could be taken that would impact the discovery and use of small molecule therapeutics including: 1. text mine the literature, 2. patent assignees and/or inventors should openly declare their relevant filings, 3. reagents and assays could be commoditized, 4. using manual curation to enhance database links, 5. engage database and curation teams, 6. consider open science approaches, 7. adapt the "box" model for shareable reference compounds, and 8. involve the physician's perspective.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA ; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA
| | - Christopher Southan
- IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative Physiology, University of Edinburgh, Hugh Robson Building, Edinburgh, EH8 9XD, UK
| | - Megan Coffee
- Center for Infectious Diseases and Emergency Readiness, University of California at Berkeley, 1918 University Ave, Berkeley, CA, 94704, USA
| |
Collapse
|
102
|
Bravo À, Piñero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 2015; 16:55. [PMID: 25886734 PMCID: PMC4466840 DOI: 10.1186/s12859-015-0472-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/19/2015] [Indexed: 11/23/2022] Open
Abstract
Background Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. Results By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. Conclusions BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Núria Queralt-Rosinach
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Michael Rautschka
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
103
|
Pérez-Pérez M, Glez-Peña D, Fdez-Riverola F, Lourenço A. Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 118:242-251. [PMID: 25480679 DOI: 10.1016/j.cmpb.2014.11.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Revised: 10/24/2014] [Accepted: 11/18/2014] [Indexed: 06/04/2023]
Abstract
BACKGROUND AND OBJECTIVES Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. METHODS At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. RESULTS Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. CONCLUSIONS Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky.
Collapse
Affiliation(s)
- Martín Pérez-Pérez
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Daniel Glez-Peña
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Florentino Fdez-Riverola
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Anália Lourenço
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1); Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
| |
Collapse
|
104
|
Xie B, Ding Q, Wu D. Text Mining on Big and Complex Biomedical Literature. BIG DATA ANALYTICS IN BIOINFORMATICS AND HEALTHCARE 2015. [DOI: 10.4018/978-1-4666-6611-5.ch002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Driven by the rapidly advancing techniques and increasing interests in biology and medicine, about 2,000 to 4,000 references are added daily to MEDLINE, the US national biomedical bibliographic database. Even for a specific research topic, extracting useful and comprehensive information out of the huge literature data pool is challenging. Text mining techniques become extremely useful when dealing with the abundant biomedical information and they have been applied to various areas in the realm of biomedical research. Instead of providing a brief overview of all text mining techniques and every major biomedical text mining application, this chapter explores in-depth the microRNA profiling area and related text mining tools. As an illustrative example, one rule-based text mining system developed by the authors is discussed in detail. This chapter also includes the discussion of the challenges and potential research areas in biomedical text mining.
Collapse
|
105
|
Machado CM, Rebholz-Schuhmann D, Freitas AT, Couto FM. The semantic web in translational medicine: current applications and future directions. Brief Bioinform 2015; 16:89-103. [PMID: 24197933 PMCID: PMC4293377 DOI: 10.1093/bib/bbt079] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 10/08/2013] [Indexed: 11/14/2022] Open
Abstract
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice.
Collapse
Affiliation(s)
- Catia M. Machado
- *Corresponding author. Catia M. Machado, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal and Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Universidade de Lisboa, Portugal. E-mail:
| | | | | | | |
Collapse
|
106
|
Nim HT, Boyd SE, Rosenthal NA. Systems approaches in integrative cardiac biology: illustrations from cardiac heterocellular signalling studies. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 117:69-77. [PMID: 25499442 DOI: 10.1016/j.pbiomolbio.2014.11.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 11/26/2014] [Accepted: 11/28/2014] [Indexed: 12/27/2022]
Abstract
Understanding the complexity of cardiac physiology requires system-level studies of multiple cardiac cell types. Frequently, however, the end result of published research lacks the detail of the collaborative and integrative experimental design process, and the underlying conceptual framework. We review the recent progress in systems modelling and omics analysis of the heterocellular heart environment through complementary forward and inverse approaches, illustrating these conceptual and experimental frameworks with case studies from our own research program. The forward approach begins by collecting curated information from the niche cardiac biology literature, and connecting the dots to form mechanistic network models that generate testable system-level predictions. The inverse approach starts from the vast pool of public omics data in recent cardiac biological research, and applies bioinformatics analysis to produce novel candidates for further investigation. We also discuss the possibility of combining these two approaches into a hybrid framework, together with the benefits and challenges. These interdisciplinary research frameworks illustrate the interplay between computational models, omics analysis, and wet lab experiments, which holds the key to making real progress in improving human cardiac wellbeing.
Collapse
Affiliation(s)
- Hieu T Nim
- Systems Biology Institute (SBI) Australia, Level 1, Building 75, Monash University, VIC 3800, Australia; Australian Regenerative Medicine Institute, Level 1, Building 75, Monash University, VIC 3800, Australia.
| | - Sarah E Boyd
- Systems Biology Institute (SBI) Australia, Level 1, Building 75, Monash University, VIC 3800, Australia; Australian Regenerative Medicine Institute, Level 1, Building 75, Monash University, VIC 3800, Australia
| | - Nadia A Rosenthal
- Australian Regenerative Medicine Institute, Level 1, Building 75, Monash University, VIC 3800, Australia
| |
Collapse
|
107
|
Abstract
The first ncRNA found was an alanine tRNA in baker's yeast, and the first detected microRNAs (miRNAs) promoted ncRNA research to a whole new level. Research on ncRNAs in animals has focused on the medical field, while in plant scientists are more concerned with improving agronomic traits. In 2010, we constructed a plant miRNA database named PMRD to meet the demand for miRNA research in plants. To provide a way to do fundamental research on plant ncRNAs and take full advantage of tremendous public resources, we designed an updated platform called plant ncRNA database (PNRD) based on its predecessor PMRD, which is accessible at http://structuralbiology.cau.edu.cn/PNRD. We collected a total of 25739 entries of 11 different types of ncRNAs from 150 plant species. Targets of miRNAs were extended to 178138 pairs in 46 species, while the number of miRNA expression profiles reached 35. Improvements in PNRD are not only the larger amounts of data, but also better service, such as a more user-friendly interface, more multifunctional and browsing options and more background data for users to download. We also integrated currently prevalent technologies and toolkits to strengthen the capability of the database and provide a one-stop service for scientific users.
Collapse
Affiliation(s)
- Xin Yi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhenhai Zhang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yi Ling
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
108
|
Jamieson DG, Moss A, Kennedy M, Jones S, Nenadic G, Robertson DL, Sidders B. The pain interactome: connecting pain-specific protein interactions. Pain 2014; 155:2243-52. [PMID: 24978826 PMCID: PMC4247380 DOI: 10.1016/j.pain.2014.06.020] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Revised: 06/13/2014] [Accepted: 06/23/2014] [Indexed: 11/29/2022]
Abstract
Understanding the molecular mechanisms associated with disease is a central goal of modern medical research. As such, many thousands of experiments have been published that detail individual molecular events that contribute to a disease. Here we use a semi-automated text mining approach to accurately and exhaustively curate the primary literature for chronic pain states. In so doing, we create a comprehensive network of 1,002 contextualized protein-protein interactions (PPIs) specifically associated with pain. The PPIs form a highly interconnected and coherent structure, and the resulting network provides an alternative to those derived from connecting genes associated with pain using interactions that have not been shown to occur in a painful state. We exploit the contextual data associated with our interactions to analyse subnetworks specific to inflammatory and neuropathic pain, and to various anatomical regions. Here, we identify potential targets for further study and several drug-repurposing opportunities. Finally, the network provides a framework for the interpretation of new data within the field of pain.
Collapse
Affiliation(s)
- Daniel G Jamieson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK; Computer Science, Faculty of Engineering and Physical Sciences, University of Manchester, Manchester, UK
| | - Andrew Moss
- Neusentis, Pfizer, Worldwide Research & Development, Cambridge, UK
| | - Michael Kennedy
- Neusentis, Pfizer, Worldwide Research & Development, Cambridge, UK
| | - Sherrie Jones
- Cancer Research UK Manchester Institute, University of Manchester, Manchester, UK
| | - Goran Nenadic
- Computer Science, Faculty of Engineering and Physical Sciences, University of Manchester, Manchester, UK; Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - David L Robertson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK
| | - Ben Sidders
- Neusentis, Pfizer, Worldwide Research & Development, Cambridge, UK.
| |
Collapse
|
109
|
Grady CR, Knepper MA, Burg MB, Ferraris JD. Database of osmoregulated proteins in mammalian cells. Physiol Rep 2014; 2:e12180. [PMID: 25355853 PMCID: PMC4254105 DOI: 10.14814/phy2.12180] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 09/15/2014] [Accepted: 09/29/2014] [Indexed: 11/24/2022] Open
Abstract
Biological information, even in highly specialized fields, is increasing at a volume that no single investigator can assimilate. The existence of this vast knowledge base creates the need for specialized computer databases to store and selectively sort the information. We have developed a manually curated database of the effects of hypertonicity on target proteins. Effects include changes in mRNA abundance and protein abundance, activity, phosphorylation state, binding, and cellular compartment. The biological information used in this database was derived from three research approaches: transcriptomic, proteomic, and reductionist (hypothesis-driven). The data are presented in the form of grammatical triplets consisting of subject, verb phrase, and object. The purpose of this format is to allow the data to be read from left to right as an English sentence. It is readable either by humans or by computers using natural language processing algorithms. An example of a data entry reads "Hypertonicity increases activity of ABL1 in HEK293." This database was created to provide access to a wealth of information on the effects of hypertonicity in a format that can be selectively sorted.
Collapse
Affiliation(s)
- Cameron R. Grady
- Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Mark A. Knepper
- Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Maurice B. Burg
- Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Joan D. Ferraris
- Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
110
|
Bui QC, Sloot PMA, van Mulligen EM, Kors JA. A novel feature-based approach to extract drug-drug interactions from biomedical text. Bioinformatics 2014; 30:3365-71. [PMID: 25143286 DOI: 10.1093/bioinformatics/btu557] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Knowledge of drug-drug interactions (DDIs) is crucial for health-care professionals to avoid adverse effects when co-administering drugs to patients. As most newly discovered DDIs are made available through scientific publications, automatic DDI extraction is highly relevant. RESULTS We propose a novel feature-based approach to extract DDIs from text. Our approach consists of three steps. First, we apply text preprocessing to convert input sentences from a given dataset into structured representations. Second, we map each candidate DDI pair from that dataset into a suitable syntactic structure. Based on that, a novel set of features is used to generate feature vectors for these candidate DDI pairs. Third, the obtained feature vectors are used to train a support vector machine (SVM) classifier. When evaluated on two DDI extraction challenge test datasets from 2011 and 2013, our system achieves F-scores of 71.1% and 83.5%, respectively, outperforming any state-of-the-art DDI extraction system. AVAILABILITY AND IMPLEMENTATION The source code is available for academic use at http://www.biosemantics.org/uploads/DDI.zip.
Collapse
Affiliation(s)
- Quoc-Chinh Bui
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation
| | - Peter M A Sloot
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation
| | - Erik M van Mulligen
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation
| | - Jan A Kors
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Informatics Institute, University of Amsterdam, The Netherlands, Complexity Institute, Nanyang Technological University, Singapore and ITMO University, St. Petersburg, Russian Federation
| |
Collapse
|
111
|
Abstract
OBJECTIVES To summarise current research that takes advantage of "Big Data" in health and biomedical informatics applications. METHODS Survey of trends in this work, and exploration of literature describing how large-scale structured and unstructured data sources are being used to support applications from clinical decision making and health policy, to drug design and pharmacovigilance, and further to systems biology and genetics. RESULTS The survey highlights ongoing development of powerful new methods for turning that large-scale, and often complex, data into information that provides new insights into human health, in a range of different areas. Consideration of this body of work identifies several important paradigm shifts that are facilitated by Big Data resources and methods: in clinical and translational research, from hypothesis-driven research to data-driven research, and in medicine, from evidence-based practice to practice-based evidence. CONCLUSIONS The increasing scale and availability of large quantities of health data require strategies for data management, data linkage, and data integration beyond the limits of many existing information systems, and substantial effort is underway to meet those needs. As our ability to make sense of that data improves, the value of the data will continue to increase. Health systems, genetics and genomics, population and public health; all areas of biomedicine stand to benefit from Big Data and the associated technologies.
Collapse
Affiliation(s)
- F Martin-Sanchez
- Fernando Martin-Sanchez, Health and Biomedical Informatics Centre, The University of Melbourne, Parkville VIC 3010, Australia, E-mail:
| | | |
Collapse
|
112
|
Titz B, Elamin A, Martin F, Schneider T, Dijon S, Ivanov NV, Hoeng J, Peitsch MC. Proteomics for systems toxicology. Comput Struct Biotechnol J 2014; 11:73-90. [PMID: 25379146 PMCID: PMC4212285 DOI: 10.1016/j.csbj.2014.08.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Current toxicology studies frequently lack measurements at molecular resolution to enable a more mechanism-based and predictive toxicological assessment. Recently, a systems toxicology assessment framework has been proposed, which combines conventional toxicological assessment strategies with system-wide measurement methods and computational analysis approaches from the field of systems biology. Proteomic measurements are an integral component of this integrative strategy because protein alterations closely mirror biological effects, such as biological stress responses or global tissue alterations. Here, we provide an overview of the technical foundations and highlight select applications of proteomics for systems toxicology studies. With a focus on mass spectrometry-based proteomics, we summarize the experimental methods for quantitative proteomics and describe the computational approaches used to derive biological/mechanistic insights from these datasets. To illustrate how proteomics has been successfully employed to address mechanistic questions in toxicology, we summarized several case studies. Overall, we provide the technical and conceptual foundation for the integration of proteomic measurements in a more comprehensive systems toxicology assessment framework. We conclude that, owing to the critical importance of protein-level measurements and recent technological advances, proteomics will be an integral part of integrative systems toxicology approaches in the future.
Collapse
|
113
|
Sanghi A, Zaringhalam M, Corcoran CC, Saeed F, Hoffert JD, Sandoval P, Pisitkun T, Knepper MA. A knowledge base of vasopressin actions in the kidney. Am J Physiol Renal Physiol 2014; 307:F747-55. [PMID: 25056354 DOI: 10.1152/ajprenal.00012.2014] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Biological information is growing at a rapid pace, making it difficult for individual investigators to be familiar with all information that is relevant to their own research. Computers are beginning to be used to extract and curate biological information; however, the complexity of human language used in research papers continues to be a critical barrier to full automation of knowledge extraction. Here, we report a manually curated knowledge base of vasopressin actions in renal epithelial cells that is designed to be readable either by humans or by computer programs using natural language processing algorithms. The knowledge base consists of three related databases accessible at https://helixweb.nih.gov/ESBL/TinyUrls/Vaso_portal.html. One of the component databases reports vasopressin actions on individual proteins expressed in renal epithelia, including effects on phosphorylation, protein abundances, protein translocation from one subcellular compartment to another, protein-protein binding interactions, etc. The second database reports vasopressin actions on physiological measures in renal epithelia, and the third reports specific mRNA species whose abundances change in response to vasopressin. We illustrate the application of the knowledge base by using it to generate a protein kinase network that connects vasopressin binding in collecting duct cells to physiological effects to regulate the water channel protein aquaporin-2.
Collapse
Affiliation(s)
- Akshay Sanghi
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Matthew Zaringhalam
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Callan C Corcoran
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Fahad Saeed
- Departments of Electrical and Computer Engineering and Computer Science, Western Michigan University, Kalamazoo, Michigan
| | - Jason D Hoffert
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Pablo Sandoval
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland
| | - Trairak Pisitkun
- Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand; and
| | - Mark A Knepper
- Epithelial Systems Biology Laboratory, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland;
| |
Collapse
|
114
|
Kwon D, Kim S, Shin SY, Chatr-aryamontri A, Wilbur WJ. Assisting manual literature curation for protein-protein interactions using BioQRator. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau067. [PMID: 25052701 PMCID: PMC4105708 DOI: 10.1093/database/bau067] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The time-consuming nature of manual curation and the rapid growth of biomedical literature severely limit the number of articles that database curators can scrutinize and annotate. Hence, semi-automatic tools can be a valid support to increase annotation throughput. Although a handful of curation assistant tools are already available, to date, little has been done to formally evaluate their benefit to biocuration. Moreover, most curation tools are designed for specific problems. Thus, it is not easy to apply an annotation tool for multiple tasks. BioQRator is a publicly available web-based tool for annotating biomedical literature. It was designed to support general tasks, i.e. any task annotating entities and relationships. In the BioCreative IV edition, BioQRator was tailored for protein– protein interaction (PPI) annotation by migrating information from PIE the search. The results obtained from six curators showed that the precision on the top 10 documents doubled with PIE the search compared with PubMed search results. It was also observed that the annotation time for a full PPI annotation task decreased for a beginner-intermediate level annotator. This finding is encouraging because text-mining techniques were not directly involved in the full annotation task and BioQRator can be easily integrated with any text-mining resources. Database URL:http://www.bioqrator.org/
Collapse
Affiliation(s)
- Dongseop Kwon
- Department of Computer Engineering, Myongji University, Yongin 449-728, South Korea, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, South Korea and Institute for Research in Immunology and Cancer, Université de Montréal, Montréal QC H3C 3J7, Canada
| | - Sun Kim
- Department of Computer Engineering, Myongji University, Yongin 449-728, South Korea, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, South Korea and Institute for Research in Immunology and Cancer, Université de Montréal, Montréal QC H3C 3J7, Canada
| | - Soo-Yong Shin
- Department of Computer Engineering, Myongji University, Yongin 449-728, South Korea, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, South Korea and Institute for Research in Immunology and Cancer, Université de Montréal, Montréal QC H3C 3J7, Canada
| | - Andrew Chatr-aryamontri
- Department of Computer Engineering, Myongji University, Yongin 449-728, South Korea, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, South Korea and Institute for Research in Immunology and Cancer, Université de Montréal, Montréal QC H3C 3J7, Canada
| | - W John Wilbur
- Department of Computer Engineering, Myongji University, Yongin 449-728, South Korea, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, Department of Biomedical Informatics, Asan Medical Center, Seoul 138-736, South Korea and Institute for Research in Immunology and Cancer, Université de Montréal, Montréal QC H3C 3J7, Canada
| |
Collapse
|
115
|
Jung JY, DeLuca TF, Nelson TH, Wall DP. A literature search tool for intelligent extraction of disease-associated genes. J Am Med Inform Assoc 2014; 21:399-405. [PMID: 23999671 PMCID: PMC3994846 DOI: 10.1136/amiajnl-2012-001563] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 07/15/2013] [Accepted: 08/08/2013] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. METHODS We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. RESULTS We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. CONCLUSIONS We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
Collapse
Affiliation(s)
- Jae-Yoon Jung
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Todd F DeLuca
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Tristan H Nelson
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Dennis P Wall
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| |
Collapse
|
116
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
117
|
Staunton L, Clancy T, Tonry C, Hernández B, Ademowo S, Dharsee M, Evans K, Parnell AC, Watson RW, Tasken KA, Pennington SR. Protein Quantification by MRM for Biomarker Validation. QUANTITATIVE PROTEOMICS 2014. [DOI: 10.1039/9781782626985-00277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
In this chapter we describe how mass spectrometry-based quantitative protein measurements by multiple reaction monitoring (MRM) have opened up the opportunity for the assembly of large panels of candidate protein biomarkers that can be simultaneously validated in large clinical cohorts to identify diagnostic protein biomarker signatures. We outline a workflow in which candidate protein biomarker panels are initially assembled from multiple diverse sources of discovery data, including proteomics and transcriptomics experiments, as well as from candidates found in the literature. Subsequently, the individual candidates in these large panels may be prioritised by application of a range of bioinformatics tools to generate a refined panel for which MRM assays may be developed. We describe a process for MRM assay design and implementation, and illustrate how the data generated from these multiplexed MRM measurements of prioritised candidates may be subjected to a range of statistical tools to create robust biomarker signatures for further clinical validation in large patient sample cohorts. Through this overall approach MRM has the potential to not only support individual biomarker validation but also facilitate the development of clinically useful protein biomarker signatures.
Collapse
Affiliation(s)
- L. Staunton
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - T. Clancy
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital Norway
| | - C. Tonry
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - B. Hernández
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - S. Ademowo
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - M. Dharsee
- Ontario Cancer Biomarker Network Toronto Ontario M5A 2K3 Canada
| | - K. Evans
- Ontario Cancer Biomarker Network Toronto Ontario M5A 2K3 Canada
| | - A. C. Parnell
- School of Mathematical Sciences, University College Dublin Dublin 4 Ireland
| | - R. W. Watson
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - K. A. Tasken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital Norway
| | - S. R. Pennington
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| |
Collapse
|
118
|
Stubben CJ, Challacombe JF. Mining locus tags in PubMed Central to improve microbial gene annotation. BMC Bioinformatics 2014; 15:43. [PMID: 24499370 PMCID: PMC3937057 DOI: 10.1186/1471-2105-15-43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 01/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed an R package called pmcXML to automatically mine and extract locus tags from full text, tables and supplements. RESULTS We mined locus tags from 1835 OA publications in ten microbial genomes and extracted tags mentioned in 30,891 sentences in main text and 20,489 rows in tables. We identified locus tag pairs marking the start and end of a region such as an operon or genomic island and expanded these ranges to add another 13,043 tags. We also searched for locus tags in supplementary tables and publications outside the OA subset in Burkholderia pseudomallei K96243 for comparison. There were 168 publications containing 48,470 locus tags and 83% of mentions were from supplementary materials and 9% from publications outside the OA subset. CONCLUSIONS B. pseudomallei locus tags within the full text and tables of OA publications represent only a small fraction of the total mentions in the literature. For microbial genomes with very few functionally characterized proteins, the locus tags mentioned in supplementary tables and within ranges like genomic islands contain the majority of locus tags. Significantly, the functions in the R package provide access to additional resources in the OA subset that are not currently indexed or returned by searching PMC.
Collapse
Affiliation(s)
- Chris J Stubben
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jean F Challacombe
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| |
Collapse
|
119
|
Pesquita C, Ferreira JD, Couto FM, Silva MJ. The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources. J Biomed Semantics 2014; 5:4. [PMID: 24438387 PMCID: PMC3926306 DOI: 10.1186/2041-1480-5-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 12/24/2013] [Indexed: 11/29/2022] Open
Abstract
Background Epidemiology is a data-intensive and multi-disciplinary subject, where data integration, curation and sharing are becoming increasingly relevant, given its global context and time constraints. The semantic annotation of epidemiology resources is a cornerstone to effectively support such activities. Although several ontologies cover some of the subdomains of epidemiology, we identified a lack of semantic resources for epidemiology-specific terms. This paper addresses this need by proposing the Epidemiology Ontology (EPO) and by describing its integration with other related ontologies into a semantic enabled platform for sharing epidemiology resources. Results The EPO follows the OBO Foundry guidelines and uses the Basic Formal Ontology (BFO) as an upper ontology. The first version of EPO models several epidemiology and demography parameters as well as transmission of infection processes, participants and related procedures. It currently has nearly 200 classes and is designed to support the semantic annotation of epidemiology resources and data integration, as well as information retrieval and knowledge discovery activities. Conclusions EPO is under active development and is freely available at https://code.google.com/p/epidemiology-ontology/. We believe that the annotation of epidemiology resources with EPO will help researchers to gain a better understanding of global epidemiological events by enhancing data integration and sharing.
Collapse
|
120
|
Graph theory enables drug repurposing--how a mathematical model can drive the discovery of hidden mechanisms of action. PLoS One 2014; 9:e84912. [PMID: 24416311 PMCID: PMC3886994 DOI: 10.1371/journal.pone.0084912] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 11/28/2013] [Indexed: 12/21/2022] Open
Abstract
We introduce a methodology to efficiently exploit natural-language expressed biomedical knowledge for repurposing existing drugs towards diseases for which they were not initially intended. Leveraging on developments in Computational Linguistics and Graph Theory, a methodology is defined to build a graph representation of knowledge, which is automatically analysed to discover hidden relations between any drug and any disease: these relations are specific paths among the biomedical entities of the graph, representing possible Modes of Action for any given pharmacological compound. We propose a measure for the likeliness of these paths based on a stochastic process on the graph. This measure depends on the abundance of indirect paths between a peptide and a disease, rather than solely on the strength of the shortest path connecting them. We provide real-world examples, showing how the method successfully retrieves known pathophysiological Mode of Action and finds new ones by meaningfully selecting and aggregating contributions from known bio-molecular interactions. Applications of this methodology are presented, and prove the efficacy of the method for selecting drugs as treatment options for rare diseases.
Collapse
|
121
|
Heinzel A, Mühlberger I, Fechete R, Mayer B, Perco P. Functional molecular units for guiding biomarker panel design. Methods Mol Biol 2014; 1159:109-133. [PMID: 24788264 DOI: 10.1007/978-1-4939-0709-0_7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The field of biomarker research has experienced a major boost in recent years, and the number of publications on biomarker studies evaluating given, but also proposing novel biomarker candidates is increasing rapidly for numerous clinically relevant disease areas. However, individual markers often lack sensitivity and specificity in the clinical context, resting essentially on the intra-individual phenotype variability hampering sensitivity, or on assessing more general processes downstream of the causative molecular events characterizing a disease term, in consequence impairing disease specificity. The trend to circumvent these shortcomings goes towards utilizing multimarker panels, thus combining the strength of individual markers to further enhance performance regarding both sensitivity and specificity. A way of identifying the optimal composition of individual markers in a panel approach is to pick each marker as representative for a specific pathophysiological (mechanistic) process relevant for the disease under investigation, hence resulting in a multimarker panel for covering the set of pathophysiological processes underlying the frequently multifactorial composition of a clinical phenotype.Here we outline a procedure of identifying such sets of disease-specific pathophysiological processes (units) delineated on the basis of disease-associated molecular feature lists derived from literature mining as well as aggregated, publicly available Omics profiling experiments. With such molecular units in hand, providing an improved reflection of a specific clinical phenotype, biomarker candidates can then be assigned to or novel candidates are to be selected from these units, subsequently resulting in a multimarker panel promising improved accuracy in disease diagnosis as well as prognosis.
Collapse
Affiliation(s)
- Andreas Heinzel
- emergentec biodevelopment GmbH, Gersthofer Strasse 29-31, 1180, Vienna, Austria
| | | | | | | | | |
Collapse
|
122
|
|
123
|
Pavlopoulos GA, Promponas VJ, Ouzounis CA, Iliopoulos I. Biological information extraction and co-occurrence analysis. Methods Mol Biol 2014; 1159:77-92. [PMID: 24788262 DOI: 10.1007/978-1-4939-0709-0_5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Nowadays, it is possible to identify terms corresponding to biological entities within passages in biomedical text corpora: critically, their potential relationships then need to be detected. These relationships are typically detected by co-occurrence analysis, revealing associations between bioentities through their coexistence in single sentences and/or entire abstracts. These associations implicitly define networks, whose nodes represent terms/bioentities/concepts being connected by relationship edges; edge weights might represent confidence for these semantic connections.This chapter provides a review of current methods for co-occurrence analysis, focusing on data storage, analysis, and representation. We highlight scenarios of these approaches implemented by useful tools for information extraction and knowledge inference in the field of systems biology. We illustrate the practical utility of two online resources providing services of this type-namely, STRING and BioTextQuest-concluding with a discussion of current challenges and future perspectives in the field.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete Medical School, Heraklion, 71110, Greece
| | | | | | | |
Collapse
|
124
|
Abstract
While the genomics-derived discoveries promise benefits to basic research and health care, the speed and affordability of sequencing following recent technological advances has further aggravated the data deluge. Seamless integration of the ever-increasing clinical, genomic, and experimental data and efficient mining for knowledge extraction, delivering actionable insight and generating testable hypotheses are therefore critical for the needs of biomedical research. For instance, high-throughput techniques are frequently applied to detect disease candidate genes. Experimental validation of these candidates however is both time-consuming and expensive. Hence, several computational approaches based on literature and data mining have been developed to identify the most promising candidates for follow-up studies. Based on "guilt by association" principle, most of these methods use prior knowledge about a disease of interest to discover and rank novel candidate genes. In this chapter, we provide a brief overview of recent advances made in literature- and data-mining-based approaches for candidate gene prioritization. As a case study, we focus on a Web-based computational approach that uses integrated heterogeneous data sources including gene-literature associations for ranking disease candidate genes and explain how to run typical queries using this system.
Collapse
|
125
|
Imboden M, Probst-Hensch NM. Biobanking across the phenome - at the center of chronic disease research. BMC Public Health 2013; 13:1094. [PMID: 24274136 PMCID: PMC4222669 DOI: 10.1186/1471-2458-13-1094] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 09/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Recognized public health relevant risk factors such as obesity, physical inactivity, smoking or air pollution are common to many non-communicable diseases (NCDs). NCDs cluster and co-morbidities increase in parallel to age. Pleiotropic genes and genetic variants have been identified by genome-wide association studies (GWAS) linking NCD entities hitherto thought to be distant in etiology. These different lines of evidence suggest that NCD disease mechanisms are in part shared. Discussion Identification of common exogenous and endogenous risk patterns may promote efficient prevention, an urgent need in the light of the global NCD epidemic. The prerequisite to investigate causal risk patterns including biologic, genetic and environmental factors across different NCDs are well characterized cohorts with associated biobanks. Prospectively collected data and biospecimen from subjects of various age, sociodemographic, and cultural groups, both healthy and affected by one or more NCD, are essential for exploring biologic mechanisms and susceptibilities interlinking different environmental and lifestyle exposures, co-morbidities, as well as cellular senescence and aging. A paradigm shift in the research activities can currently be observed, moving from focused investigations on the effect of a single risk factor on an isolated health outcome to a more comprehensive assessment of risk patterns and a broader phenome approach. Though important methodological and analytical challenges need to be resolved, the ongoing international efforts to establish large-scale population-based biobank cohorts are a critical basis for moving NCD disease etiology forward. Summary Future epidemiologic and public health research should aim at sustaining a comprehensive systems view on health and disease. The political and public discussions about the utilitarian aspect of investing in and contributing to cohort and biobank research are essential and are indirectly linked to the achievement of public health programs effectively addressing the global NCD epidemic.
Collapse
Affiliation(s)
- Medea Imboden
- Swiss Tropical and Public Health Institute, Basel, Switzerland.
| | | |
Collapse
|
126
|
van Haagen HHHBM, 't Hoen PAC, Mons B, Schultes EA. Generic information can retrieve known biological associations: implications for biomedical knowledge discovery. PLoS One 2013; 8:e78665. [PMID: 24260124 PMCID: PMC3834066 DOI: 10.1371/journal.pone.0078665] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2013] [Accepted: 09/13/2013] [Indexed: 02/01/2023] Open
Abstract
Motivation Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., ‘Homo sapiens’). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone. Results Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept “binding” is indicative for PPI retrieval and the concept “mutation abnormality” is indicative for gene-disease associations. Conclusion Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance.
Collapse
Affiliation(s)
| | - Peter A. C. 't Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Erik A. Schultes
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
127
|
Lee HJ, Shim SH, Song MR, Lee H, Park JC. CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations. BMC Bioinformatics 2013; 14:323. [PMID: 24225062 PMCID: PMC3833657 DOI: 10.1186/1471-2105-14-323] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 11/05/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations. RESULTS In this paper, we present a corpus for the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences. We describe CoMAGC, a corpus with multi-faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, we show that the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, we deal with changes in gene expression levels among other types of gene changes. The corpus is available at http://biopathway.org/CoMAGCunder the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0). CONCLUSIONS The corpus will be an important resource for the development of advanced TM systems on gene-cancer relations.
Collapse
Affiliation(s)
| | | | | | | | - Jong C Park
- Department of Computer Science, KAIST, 291 Daehak-ro, Daejeon, Republic of Korea.
| |
Collapse
|
128
|
Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013; 19:882-9. [PMID: 24201223 DOI: 10.1016/j.drudis.2013.10.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Revised: 09/24/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]
Abstract
In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Computerlinguistik, Universität Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland.
| | - Christoph Grabmüller
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvestras Kavaliauskas
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samuel Croset
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- GlaxoSmithKline, GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, D-79085 Freiburg, Germany
| | - Wendy Filsell
- Unilever R&D, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Dominic Clark
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
129
|
Rebholz-Schuhmann D, Kafkas S, Kim JH, Li C, Jimeno Yepes A, Hoehndorf R, Backofen R, Lewin I. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources. J Biomed Semantics 2013; 4:28. [PMID: 24112383 PMCID: PMC4021975 DOI: 10.1186/2041-1480-4-28] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 09/11/2013] [Indexed: 11/10/2022] Open
Abstract
Motivation The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. Results In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and – on the other hand – the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions. Conclusion The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.
Collapse
|
130
|
Rebholz-Schuhmann D, Kim JH, Yan Y, Dixit A, Friteyre C, Hoehndorf R, Backofen R, Lewin I. Evaluation and cross-comparison of lexical entities of biological interest (LexEBI). PLoS One 2013; 8:e75185. [PMID: 24124474 PMCID: PMC3790750 DOI: 10.1371/journal.pone.0075185] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Accepted: 08/14/2013] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Zürich, Switzerland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- * E-mail:
| | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ying Yan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Abhishek Dixit
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Caroline Friteyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, United Kingdom
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, Freiburg, Germany
| | - Ian Lewin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
131
|
Fluck J, Hofmann-Apitius M. Text mining for systems biology. Drug Discov Today 2013; 19:140-4. [PMID: 24070668 DOI: 10.1016/j.drudis.2013.09.012] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Revised: 09/05/2013] [Accepted: 09/12/2013] [Indexed: 01/08/2023]
Abstract
Scientific communication in biomedicine is, by and large, still text based. Text mining technologies for the automated extraction of useful biomedical information from unstructured text that can be directly used for systems biology modelling have been substantially improved over the past few years. In this review, we underline the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology. Furthermore, we emphasize the role of publicly organized scientific benchmarking challenges that reflect the current status of text-mining technology and are important in moving the entire field forward. Given further interdisciplinary development of systems biology-orientated ontologies and training corpora, we expect a steadily increasing impact of text-mining technology on systems biology in the future.
Collapse
Affiliation(s)
- Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53754 Sankt Augustin, Germany; Bonn-Aachen International Center for Information Technology (B-IT), Dahlmannstraβe 2, 53113 Bonn, Germany.
| |
Collapse
|
132
|
Rebholz-Schuhmann D, Kafkas S, Kim JH, Jimeno Yepes A, Lewin I. Monitoring named entity recognition: the League Table. J Biomed Semantics 2013; 4:19. [PMID: 24034148 PMCID: PMC4015903 DOI: 10.1186/2041-1480-4-19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 07/25/2013] [Indexed: 01/13/2023] Open
Abstract
Background Named entity recognition (NER) is an essential step in automatic text processing pipelines. A number of solutions have been presented and evaluated against gold standard corpora (GSC). The benchmarking against GSCs is crucial, but left to the individual researcher. Herewith we present a League Table web site, which benchmarks NER solutions against selected public GSCs, maintains a ranked list and archives the annotated corpus for future comparisons. Results The web site enables access to the different GSCs in a standardized format (IeXML). Upon submission of the annotated corpus the user has to describe the specification of the used solution and then uploads the annotated corpus for evaluation. The performance of the system is measured against one or more GSCs and the results are then added to the web site (“League Table”). It displays currently the results from publicly available NER solutions from the Whatizit infrastructure for future comparisons. Conclusion The League Table enables the evaluation of NER solutions in a standardized infrastructure and monitors the results long-term. For access please go to http://wwwdev.ebi.ac.uk/Rebholz-srv/calbc/assessmentGSC/. Contact: rebholz@ifi.uzh.ch.
Collapse
|
133
|
Vos R, Aarts S, van Mulligen E, Metsemakers J, van Boxtel MP, Verhey F, van den Akker M. Finding potentially new multimorbidity patterns of psychiatric and somatic diseases: exploring the use of literature-based discovery in primary care research. J Am Med Inform Assoc 2013; 21:139-45. [PMID: 23775174 DOI: 10.1136/amiajnl-2012-001448] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Multimorbidity, the co-occurrence of two or more chronic medical conditions within a single individual, is increasingly becoming part of daily care of general medical practice. Literature-based discovery may help to investigate the patterns of multimorbidity and to integrate medical knowledge for improving healthcare delivery for individuals with co-occurring chronic conditions. OBJECTIVE To explore the usefulness of literature-based discovery in primary care research through the key-case of finding associations between psychiatric and somatic diseases relevant to general practice in a large biomedical literature database (Medline). METHODS By using literature based discovery for matching disease profiles as vectors in a high-dimensional associative concept space, co-occurrences of a broad spectrum of chronic medical conditions were matched for their potential in biomedicine. An experimental setting was chosen in parallel with expert evaluations and expert meetings to assess performance and to generate targets for integrating literature-based discovery in multidisciplinary medical research of psychiatric and somatic disease associations. RESULTS Through stepwise reductions a reference set of 21,945 disease combinations was generated, from which a set of 166 combinations between psychiatric and somatic diseases was selected and assessed by text mining and expert evaluation. CONCLUSIONS Literature-based discovery tools generate specific patterns of associations between psychiatric and somatic diseases: one subset was appraised as promising for further research; the other subset surprised the experts, leading to intricate discussions and further eliciting of frameworks of biomedical knowledge. These frameworks enable us to specify targets for further developing and integrating literature-based discovery in multidisciplinary research of general practice, psychology and psychiatry, and epidemiology.
Collapse
Affiliation(s)
- Rein Vos
- School for Public Health and Primary Care: CAPHRI, Maastricht University, Maastricht, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
134
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
135
|
Li C, Jimeno-Yepes A, Arregui M, Kirsch H, Rebholz-Schuhmann D. PCorral--interactive mining of protein interactions from MEDLINE. Database (Oxford) 2013; 2013:bat030. [PMID: 23640984 PMCID: PMC3641755 DOI: 10.1093/database/bat030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Revised: 03/15/2013] [Accepted: 03/27/2013] [Indexed: 11/13/2022]
Abstract
The extraction of information from the scientific literature is a complex task-for researchers doing manual curation and for automatic text processing solutions. The identification of protein-protein interactions (PPIs) requires the extraction of protein named entities and their relations. Semi-automatic interactive support is one approach to combine both solutions for efficient working processes to generate reliable database content. In principle, the extraction of PPIs can be achieved with different methods that can be combined to deliver high precision and/or high recall results in different combinations at the same time. Interactive use can be achieved, if the analytical methods are fast enough to process the retrieved documents. PCorral provides interactive mining of PPIs from the scientific literature allowing curators to skim MEDLINE for PPIs at low overheads. The keyword query to PCorral steers the selection of documents, and the subsequent text analysis generates high recall and high precision results for the curator. The underlying components of PCorral process the documents on-the-fly and are available, as well, as web service from the Whatizit infrastructure. The human interface summarizes the identified PPI results, and the involved entities are linked to relevant resources and databases. Altogether, PCorral serves curator at both the beginning and the end of the curation workflow for information retrieval and information extraction. Database URL: http://www.ebi.ac.uk/Rebholz-srv/pcorral.
Collapse
Affiliation(s)
- Chen Li
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | |
Collapse
|
136
|
Sookoian S, Gianotti TF, Burgueño AL, Pirola CJ. Fetal metabolic programming and epigenetic modifications: a systems biology approach. Pediatr Res 2013; 73:531-42. [PMID: 23314294 DOI: 10.1038/pr.2013.2] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
A growing body of evidence supports the notion that epigenetic changes such as DNA methylation and histone modifications, both involving chromatin remodeling, contribute to fetal metabolic programming. We use a combination of gene-protein enrichment analysis resources along with functional annotations and protein interaction networks for an integrative approach to understanding the mechanisms underlying fetal metabolic programming. Systems biology approaches suggested that fetal adaptation to an impaired nutritional environment presumes profound changes in gene expression that involve regulation of tissue-specific patterns of methylated cytosine residues, modulation of the histone acetylation-deacetylation switch, cell differentiation, and stem cell pluripotency. The hypothalamus and the liver seem to be differently involved. In addition, new putative explanations have emerged about the question of whether in utero overnutrition modulates fetal metabolic programming in the same fashion as that of a maternal environment of undernutrition, suggesting that the mechanisms behind these two fetal nutritional imbalances are different. In conclusion, intrauterine growth restriction is most likely to be associated with the induction of persistent changes in tissue structure and functionality. Conversely, a maternal obesogenic environment is most probably associated with metabolic reprogramming of glucose and lipid metabolism, as well as future risk of metabolic syndrome (MS), fatty liver, and insulin (INS) resistance.
Collapse
Affiliation(s)
- Silvia Sookoian
- Department of Clinical and Molecular Hepatology, Institute of Medical Research A Lanari-IDIM, University of Buenos Aires, National Council of Scientific and Technological Research CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | | | | | | |
Collapse
|