1
|
Le Cunff Y, Chesneau L, Pastezeur S, Pinson X, Soler N, Fairbrass D, Mercat B, Rodriguez-Garcia R, Alayan Z, Abdouni A, de Neidhardt G, Costes V, Anjubault M, Bouvrais H, Héligon C, Pécréaux J. Unveiling inter-embryo variability in spindle length over time: Towards quantitative phenotype analysis. PLoS Comput Biol 2024; 20:e1012330. [PMID: 39236069 PMCID: PMC11376571 DOI: 10.1371/journal.pcbi.1012330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 07/15/2024] [Indexed: 09/07/2024] Open
Abstract
How can inter-individual variability be quantified? Measuring many features per experiment raises the question of choosing them to recapitulate high-dimensional data. Tackling this challenge on spindle elongation phenotypes, we showed that only three typical elongation patterns describe spindle elongation in C. elegans one-cell embryo. These archetypes, automatically extracted from the experimental data using principal component analysis (PCA), accounted for more than 95% of inter-individual variability of more than 1600 experiments across more than 100 different conditions. The two first archetypes were related to spindle average length and anaphasic elongation rate. The third archetype, accounting for 6% of the variability, was novel and corresponded to a transient spindle shortening in late metaphase, reminiscent of kinetochore function-defect phenotypes. Importantly, these three archetypes were robust to the choice of the dataset and were found even considering only non-treated conditions. Thus, the inter-individual differences between genetically perturbed embryos have the same underlying nature as natural inter-individual differences between wild-type embryos, independently of the temperatures. We thus propose that beyond the apparent complexity of the spindle, only three independent mechanisms account for spindle elongation, weighted differently in the various conditions. Interestingly, the spindle-length archetypes covered both metaphase and anaphase, suggesting that spindle elongation in late metaphase is sufficient to predict the late anaphase length. We validated this idea using a machine-learning approach. Finally, given amounts of these three archetypes could represent a quantitative phenotype. To take advantage of this, we set out to predict interacting genes from a seed based on the PCA coefficients. We exemplified this firstly on the role of tpxl-1 whose homolog tpx2 is involved in spindle microtubule branching, secondly the mechanism regulating metaphase length, and thirdly the central spindle players which set the length at anaphase. We found novel interactors not in public databases but supported by recent experimental publications.
Collapse
Affiliation(s)
- Yann Le Cunff
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Laurent Chesneau
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Sylvain Pastezeur
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Xavier Pinson
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Nina Soler
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Danielle Fairbrass
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Benjamin Mercat
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Ruddi Rodriguez-Garcia
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Zahraa Alayan
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Ahmed Abdouni
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Gary de Neidhardt
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Valentin Costes
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Mélodie Anjubault
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Hélène Bouvrais
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Christophe Héligon
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| | - Jacques Pécréaux
- CNRS, Univ Rennes, IGDR (Institut Genetics and Development of Rennes) - UMR 6290, Rennes, France
| |
Collapse
|
2
|
Demko V, Belova T, Messerer M, Hvidsten TR, Perroud PF, Ako AE, Johansen W, Mayer KFX, Olsen OA, Lang D. Regulation of developmental gatekeeping and cell fate transition by the calpain protease DEK1 in Physcomitrium patens. Commun Biol 2024; 7:261. [PMID: 38438476 PMCID: PMC10912778 DOI: 10.1038/s42003-024-05933-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/19/2024] [Indexed: 03/06/2024] Open
Abstract
Calpains are cysteine proteases that control cell fate transitions whose loss of function causes severe, pleiotropic phenotypes in eukaryotes. Although mainly considered as modulatory proteases, human calpain targets are directed to the N-end rule degradation pathway. Several such targets are transcription factors, hinting at a gene-regulatory role. Here, we analyze the gene-regulatory networks of the moss Physcomitrium patens and characterize the regulons that are misregulated in mutants of the calpain DEFECTIVE KERNEL1 (DEK1). Predicted cleavage patterns of the regulatory hierarchies in five DEK1-controlled subnetworks are consistent with a pleiotropic and regulatory role during cell fate transitions targeting multiple functions. Network structure suggests DEK1-gated sequential transitions between cell fates in 2D-to-3D development. Our method combines comprehensive phenotyping, transcriptomics and data science to dissect phenotypic traits, and our model explains the protease function as a switch gatekeeping cell fate transitions potentially also beyond plant development.
Collapse
Affiliation(s)
- Viktor Demko
- Department of Plant Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
- Department of Plant Physiology, Faculty of Natural Sciences, Comenius University in Bratislava, Ilkovicova 6, 84104, Bratislava, Slovakia
- Plant Science and Biodiversity Center, Slovak Academy of Sciences, Dubravska cesta 9, 84104, Bratislava, Slovakia
| | - Tatiana Belova
- Department of Plant Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
- Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Maxim Messerer
- Plant Genome and Systems Biology, Helmholtz Center Munich-Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Pierre-François Perroud
- Institut Jean-Pierre Bourgin, INRAE, AgroParisTech, Université Paris-Saclay, 78000, Versailles, France
| | - Ako Eugene Ako
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 31, 2318, Hamar, Norway
- School of Animal, Rural and Environmental Sciences, Nottingham Trent University, Brackenhurst Campus, Southwell, Nottinghamshire, NG25 0QF, UK
| | - Wenche Johansen
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 31, 2318, Hamar, Norway
| | - Klaus F X Mayer
- Plant Genome and Systems Biology, Helmholtz Center Munich-Research Center for Environmental Health, 85764, Neuherberg, Germany
- School of Life Sciences, Technical University Munich, 85354, Freising, Germany
| | - Odd-Arne Olsen
- Department of Plant Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432, Ås, Norway
| | - Daniel Lang
- Plant Genome and Systems Biology, Helmholtz Center Munich-Research Center for Environmental Health, 85764, Neuherberg, Germany.
- Bundeswehr Institute of Microbiology, Microbial Genomics and Bioforensics, 80937, Munich, Germany.
| |
Collapse
|
3
|
Andrés-Hernández L, Blumberg K, Walls RL, Dooley D, Mauleon R, Lange M, Weber M, Chan L, Malik A, Møller A, Ireland J, Segovia L, Zhang X, Burton-Freeman B, Magelli P, Schriever A, Forester SM, Liu L, King GJ. Establishing a Common Nutritional Vocabulary - From Food Production to Diet. Front Nutr 2022; 9:928837. [PMID: 35811979 PMCID: PMC9265659 DOI: 10.3389/fnut.2022.928837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open
Abstract
Informed policy and decision-making for food systems, nutritional security, and global health would benefit from standardization and comparison of food composition data, spanning production to consumption. To address this challenge, we present a formal controlled vocabulary of terms, definitions, and relationships within the Compositional Dietary Nutrition Ontology (CDNO, www.cdno.info) that enables description of nutritional attributes for material entities contributing to the human diet. We demonstrate how ongoing community development of CDNO classes can harmonize trans-disciplinary approaches for describing nutritional components from food production to diet.
Collapse
Affiliation(s)
| | - Kai Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Ramona L. Walls
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
- Data Collaboration Center at the Critical Path Institute, Tucson, AZ, United States
| | - Damion Dooley
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Ramil Mauleon
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia
| | - Matthew Lange
- International Center for Food Ontology Operability Data & Semantics (IC-FOODS), Davis, CA, United States
| | | | - Lauren Chan
- Nutrition Department, College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, United States
| | - Adnan Malik
- European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Hinxton, United Kingdom
| | | | | | - Lucia Segovia
- London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom
| | - Xuhuiqun Zhang
- Illinois Institute of Technology, Chicago, IL, United States
| | | | | | | | | | - Lei Liu
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia
| | - Graham J. King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia
- School of Biosciences, University of Nottingham, Sutton Bonington, United Kingdom
| |
Collapse
|
4
|
Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022; 22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
Collapse
|
5
|
Slater LT, Karwath A, Hoehndorf R, Gkoutos GV. Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity. Front Digit Health 2021; 3:781227. [PMID: 34939069 PMCID: PMC8685209 DOI: 10.3389/fdgth.2021.781227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/12/2021] [Indexed: 11/13/2022] Open
Abstract
Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.
Collapse
Affiliation(s)
- Luke T Slater
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Andreas Karwath
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Georgios V Gkoutos
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom.,National Institute for Health Research Experimental Cancer Medicine Centre, Birmingham, United Kingdom.,National Institute for Health Research Surgical Reconstruction and Microbiology Research Centre, Birmingham, United Kingdom.,National Institute for Health Research Biomedical Research Centre, Birmingham, United Kingdom
| |
Collapse
|
6
|
Ratnaike TE, Greene D, Wei W, Sanchis-Juan A, Schon KR, van den Ameele J, Raymond L, Horvath R, Turro E, Chinnery PF. MitoPhen database: a human phenotype ontology-based approach to identify mitochondrial DNA diseases. Nucleic Acids Res 2021; 49:9686-9695. [PMID: 34428295 PMCID: PMC8464050 DOI: 10.1093/nar/gkab726] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/03/2021] [Accepted: 08/18/2021] [Indexed: 12/23/2022] Open
Abstract
Diagnosing mitochondrial disorders remains challenging. This is partly because the clinical phenotypes of patients overlap with those of other sporadic and inherited disorders. Although the widespread availability of genetic testing has increased the rate of diagnosis, the combination of phenotypic and genetic heterogeneity still makes it difficult to reach a timely molecular diagnosis with confidence. An objective, systematic method for describing the phenotypic spectra for each variant provides a potential solution to this problem. We curated the clinical phenotypes of 6688 published individuals with 89 pathogenic mitochondrial DNA (mtDNA) mutations, collating 26 348 human phenotype ontology (HPO) terms to establish the MitoPhen database. This enabled a hypothesis-free definition of mtDNA clinical syndromes, an overview of heteroplasmy-phenotype relationships, the identification of under-recognized phenotypes, and provides a publicly available reference dataset for objective clinical comparison with new patients using the HPO. Studying 77 patients with independently confirmed positive mtDNA diagnoses and 1083 confirmed rare disease cases with a non-mitochondrial nuclear genetic diagnosis, we show that HPO-based phenotype similarity scores can distinguish these two classes of rare disease patients with a false discovery rate <10% at a sensitivity of 80%. Enriching the MitoPhen database with more patients will improve predictions for increasingly rare variants.
Collapse
Affiliation(s)
- Thiloka E Ratnaike
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Department of Paediatrics, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Daniel Greene
- Department of Haematology, University of Cambridge, NHS Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Wei Wei
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, NHS Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| | - Katherine R Schon
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Jelle van den Ameele
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Lucy Raymond
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Rita Horvath
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Ernest Turro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Patrick F Chinnery
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.,Medical Research Council Mitochondrial Biology Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| |
Collapse
|
7
|
Slater K, Karwath A, Williams JA, Russell S, Makepeace S, Carberry A, Hoehndorf R, Gkoutos GV. Towards similarity-based differential diagnostics for common diseases. Comput Biol Med 2021; 133:104360. [PMID: 33836447 PMCID: PMC8204262 DOI: 10.1016/j.compbiomed.2021.104360] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]
Abstract
Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Silver Makepeace
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Alexander Carberry
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
8
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
9
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
10
|
Manjang K, Tripathi S, Yli-Harja O, Dehmer M, Emmert-Streib F. Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance. Sci Rep 2020; 10:16672. [PMID: 33028846 PMCID: PMC7542435 DOI: 10.1038/s41598-020-73326-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 08/17/2020] [Indexed: 12/12/2022] Open
Abstract
Gene ontology (GO) is an eminent knowledge base frequently used for providing biological interpretations for the analysis of genes or gene sets from biological, medical and clinical problems. Unfortunately, the interpretation of such results is challenging due to the large number of GO terms, their hierarchical and connected organization as directed acyclic graphs (DAGs) and the lack of tools allowing to exploit this structural information explicitly. For this reason, we developed the R package GOxploreR. The main features of GOxploreR are (I) easy and direct access to structural features of GO, (II) structure-based ranking of GO-terms, (III) mapping to reduced GO-DAGs including visualization capabilities and (IV) prioritizing of GO-terms. The underlying idea of GOxploreR is to exploit a graph-theoretical perspective of GO as manifested by its DAG-structure and the containing hierarchy levels for cumulating semantic information. That means all these features enhance the utilization of structural information of GO and complement existing analysis tools. Overall, GOxploreR provides exploratory as well as confirmatory tools for complementing any kind of analysis resulting in a list of GO-terms, e.g., from differentially expressed genes or gene sets, GWAS or biomarkers. Our R package GOxploreR is freely available from CRAN.
Collapse
Affiliation(s)
- Kalifa Manjang
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Olli Yli-Harja
- Computational Systems Biology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.,Institute for Systems Biology, Seattle, WA, USA.,Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Matthias Dehmer
- Department of Biomedical Computer Science and Mechatronics, UMIT-The Health and Life Science University, 6060, Hall in Tyrol, Austria.,College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
| |
Collapse
|
11
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
12
|
Oyelade ON, Ezugwu AE. A case-based reasoning framework for early detection and diagnosis of novel coronavirus. INFORMATICS IN MEDICINE UNLOCKED 2020; 20:100395. [PMID: 32835080 PMCID: PMC7377815 DOI: 10.1016/j.imu.2020.100395] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/06/2020] [Accepted: 07/09/2020] [Indexed: 12/29/2022] Open
Abstract
Coronavirus, also known as COVID-19, has been declared a pandemic by the World Health Organization (WHO). At the time of conducting this study, it had recorded over 11,301,850 confirmed cases while more than 531,806 have died due to it, with these figures rising daily across the globe. The burden of this highly contagious respiratory disease is that it presents itself in both symptomatic and asymptomatic patterns in those already infected, thereby leading to an exponential rise in the number of contractions of the disease and fatalities. It is, therefore, crucial to expedite the process of early detection and diagnosis of the disease across the world. The case-based reasoning (CBR) model is a compelling paradigm that allows for the utilization of case-specific knowledge previously experienced, concrete problem situations or specific patient cases for solving new cases. This study, therefore, aims to leverage the very rich database of cases of COVID-19 to solve new cases. The approach adopted in this study employs the use of an improved CBR model for state-of-the-art reasoning task in the classification of suspected cases of COVID-19. The CBR model leverages on a novel feature selection and the semantic-based mathematical model proposed in this study for case similarity computation. An initial population of the archive was achieved from 71 (67 adults and 4 pediatrics) cases obtained from the Italian Society of Medical and Interventional Radiology (SIRM) repository. Results obtained revealed that the proposed approach in this study successfully classified suspected cases into their categories with an accuracy of 94.54%. The study found that the proposed model can support physicians to easily diagnose suspected cases of COVID-19 based on their medical records without subjecting the specimen to laboratory tests. As a result, there will be a global minimization of contagion rate occasioned by slow testing and in addition, reduced false-positive rates of diagnosed cases as observed in some parts of the globe.
Collapse
Affiliation(s)
- Olaide N Oyelade
- Department of Computer Science, Ahmadu Bello University Zaria, Nigeria
- School of Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Absalom E Ezugwu
- School of Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| |
Collapse
|
13
|
Jiang X, Wang S, Wang J, Lyu S, Skitmore M. A Decision Method for Construction Safety Risk Management Based on Ontology and Improved CBR: Example of a Subway Project. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17113928. [PMID: 32492976 PMCID: PMC7312838 DOI: 10.3390/ijerph17113928] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/14/2020] [Accepted: 05/22/2020] [Indexed: 11/30/2022]
Abstract
Early decision-making and the prevention of construction safety risks are very important for the safety, quality, and cost of construction projects. In the field of construction safety risk management, in the face of a loose, chaotic, and huge information environments, how to design an efficient construction safety risk management decision support method has long been the focus of academic research. An effective approach to safety management is to structuralize safety risk knowledge, then identify and reuse it, and establish a scientific and systematic construction safety risk management decision system. Based on ontology and improved case-based reasoning (CBR) methods, this paper proposes a decision-making approach for construction safety risk management in which the reasoning process is improved by integrating a similarity algorithm and correlation algorithm. Compared to the traditional CBR approach in which only the similarity of information is considered, this method can avoid missing important correlated information by making inferences from multiple sources of information. Finally, the method is applied to the safety risks of subway construction for verification to show that the method is effective and easy to implement.
Collapse
Affiliation(s)
- Xiaoyan Jiang
- School of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (S.W.); (J.W.)
- Correspondence:
| | - Sai Wang
- School of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (S.W.); (J.W.)
| | - Jie Wang
- School of Civil Engineering, Hefei University of Technology, Hefei 230009, China; (S.W.); (J.W.)
| | - Sainan Lyu
- School of Property, Construction and Project Management, RMIT University, Melbourne City Campus, Melbourne, VIC 3000, Australia;
| | - Martin Skitmore
- School of Civil Engineering and Built Environment, Queensland University of Technology, Brisbane, QLD 4001, Australia;
| |
Collapse
|
14
|
Galeota E, Kishore K, Pelizzola M. Ontology-driven integrative analysis of omics data through Onassis. Sci Rep 2020; 10:703. [PMID: 31959844 PMCID: PMC6971239 DOI: 10.1038/s41598-020-57716-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 01/06/2020] [Indexed: 12/11/2022] Open
Abstract
Public repositories of large-scale omics datasets represent a valuable resource for researchers. In fact, data re-analysis can either answer novel questions or provide critical data able to complement in-house experiments. However, despite the development of standards for the compilation of metadata, the identification and organization of samples still constitutes a major bottleneck hampering data reuse. We introduce Onassis, an R package within the Bioconductor environment providing key functionalities of Natural Language Processing (NLP) tools. Leveraging biomedical ontologies, Onassis greatly simplifies the association of samples from large-scale repositories to their representation in terms of ontology-based annotations. Moreover, through the use of semantic similarity measures, Onassis hierarchically organizes the datasets of interest, thus supporting the semantically aware analysis of the corresponding omics data. In conclusion, Onassis leverages NLP techniques, biomedical ontologies, and the R statistical framework, to identify, relate, and analyze datasets from public repositories. The tool was tested on various large-scale datasets, including compendia of gene expression, histone marks, and DNA methylation, illustrating how it can facilitate the integrative analysis of various omics data.
Collapse
Affiliation(s)
- Eugenia Galeota
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milano, Italy
| | - Kamal Kishore
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milano, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milano, Italy.
| |
Collapse
|
15
|
Cianciarullo AM, Bonini-Domingos CR, Vizotto LD, Kobashi LS, Beçak ML, Beçak W. Whole-genome duplication and hemoglobin differentiation traits between allopatric populations of Brazilian Odontophrynus americanus species complex (Amphibia, Anura). Genet Mol Biol 2019; 42:436-444. [PMID: 31259358 PMCID: PMC6726162 DOI: 10.1590/1678-4685-gmb-2017-0260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 07/25/2018] [Indexed: 11/21/2022] Open
Abstract
Two allopatric populations of Brazilian diploid and tetraploid
Odontophrynus americanus species complex, both from São
Paulo state, had their blood hemoglobin biochemically analyzed. In addition,
these specimens were cytogenetically characterized. Biochemical characterization
of hemoglobin expression showed a distinct banding pattern between the
allopatric specimens. Besides this, two distinct phenotypes, not linked to
ploidy, sex, or age, were observed in adult animals of both populations.
Phenotype A exhibits dark-colored body with small papillae, ogival-shaped jaw
with reduced interpupillary distance and shorter hind limbs. Phenotype B shows
yellowish-colored body with larger papillae, arch-shaped jaw with broader
interpupillary distance and longer hind limbs. Intermediate phenotypes were also
found. Considering the geographical isolation of both populations, differences
in chromosomal secondary constrictions and distinct hemoglobins banding
patterns, these data indicate that 2n and 4n populations represent cryptic
species in the O. americanus species complex. The observed
phenotypic diversity can be interpreted as population genetic variability.
Eventually future data may indicate a probable beginning of speciation in these
Brazilian frogs. Such inter- and intrapopulational differentiation/speciation
process indicates that O. americanus species complex taxonomy
deserves further evaluation by genomics and metabarcoding communities, also
considering the pattern of hemoglobin expression, in South American frogs.
Collapse
Affiliation(s)
| | - Claudia R Bonini-Domingos
- Department of Biology, Laboratory of Hemoglobins and Genetics of the Hematological Diseases, Universidade Estadual Paulista "Julio de Mesquita Filho (UNESP), São José do Rio Preto, SP, Brazil
| | - Luiz D Vizotto
- Department of Zoology, Universidade Estadual Paulista "Julio de Mesquita Filho (UNESP), São José do Rio Preto, SP, Brazil
| | - Leonardo S Kobashi
- Laboratory of Ecology and Evolution, Instituto Butantan, São Paulo, SP, Brazil.,Universidade Paulista (UNIP) São Paulo, SP, Brazil
| | | | - Willy Beçak
- Laboratory of Genetics, Instituto Butantan, São Paulo, SP, Brazil
| |
Collapse
|
16
|
Lithgow-Serrano O, Collado-Vides J. In the pursuit of semantic similarity for literature on microbial transcriptional regulation. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Oscar Lithgow-Serrano
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM), Morelos, México
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México
| | - Julio Collado-Vides
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM), Morelos, México
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
17
|
Leveraging Machine Learning to Extend Ontology-Driven Geographic Object-Based Image Analysis (O-GEOBIA): A Case Study in Forest-Type Mapping. REMOTE SENSING 2019. [DOI: 10.3390/rs11050503] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Ontology-driven Geographic Object-Based Image Analysis (O-GEOBIA) contributes to the identification of meaningful objects. In fusing data from multiple sensors, the number of feature variables is increased and object identification becomes a challenging task. We propose a methodological contribution that extends feature variable characterisation. This method is illustrated with a case study in forest-type mapping in Tasmania, Australia. Satellite images, airborne LiDAR (Light Detection and Ranging) and expert photo-interpretation data are fused for feature extraction and classification. Two machine learning algorithms, Random Forest and Boruta, are used to identify important and relevant feature variables. A variogram is used to describe textural and spatial features. Different variogram features are used as input for rule-based classifications. The rule-based classifications employ (i) spectral features, (ii) vegetation indices, (iii) LiDAR, and (iv) variogram features, and resulted in overall classification accuracies of 77.06%, 78.90%, 73.39% and 77.06% respectively. Following data fusion, the use of combined feature variables resulted in a higher classification accuracy (81.65%). Using relevant features extracted from the Boruta algorithm, the classification accuracy is further improved (82.57%). The results demonstrate that the use of relevant variogram features together with spectral and LiDAR features resulted in improved classification accuracy.
Collapse
|
18
|
Alonso-López D, Campos-Laborie FJ, Gutiérrez MA, Lambourne L, Calderwood MA, Vidal M, De Las Rivas J. APID database: redefining protein-protein interaction experimental evidences and binary interactomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5304002. [PMID: 30715274 PMCID: PMC6354026 DOI: 10.1093/database/baz005] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/07/2019] [Indexed: 12/20/2022]
Abstract
The collection and integration of all the known protein–protein physical interactions within a proteome framework are critical to allow proper exploration of the protein interaction networks that drive biological processes in cells at molecular level. APID Interactomes is a public resource of biological data (http://apid.dep.usal.es) that provides a comprehensive and curated collection of `protein interactomes’ for more than 1100 organisms, including 30 species with more than 500 interactions, derived from the integration of experimentally detected protein-to-protein physical interactions (PPIs). We have performed an update of APID database including a redefinition of several key properties of the PPIs to provide a more precise data integration and to avoid false duplicated records. This includes the unification of all the PPIs from five primary databases of molecular interactions (BioGRID, DIP, HPRD, IntAct and MINT), plus the information from two original systematic sources of human data and from experimentally resolved 3D structures (i.e. PDBs, Protein Data Bank files, where more than two distinct proteins have been identified). Thus, APID provides PPIs reported in published research articles (with traceable PMIDs) and detected by valid experimental interaction methods that give evidences about such protein interactions (following the `ontology and controlled vocabulary’: www.ebi.ac.uk/ols/ontologies/mi; developed by `HUPO PSI-MI’). Within this data mining framework, all interaction detection methods have been grouped into two main types: (i) `binary’ physical direct detection methods and (ii) `indirect’ methods. As a result of these redefinitions, APID provides unified protein interactomes including the specific `experimental evidences’ that support each PPI, indicating whether the interactions can be considered `binary’ (i.e. supported by at least one binary detection method) or not.
Collapse
Affiliation(s)
- Diego Alonso-López
- Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas and University of Salamanca, Salamanca, Spain
| | - Francisco J Campos-Laborie
- Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas and University of Salamanca, Salamanca, Spain
| | - Miguel A Gutiérrez
- Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas and University of Salamanca, Salamanca, Spain
| | - Luke Lambourne
- Center for Cancer Systems Biology, Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology, Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Marc Vidal
- Center for Cancer Systems Biology, Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Javier De Las Rivas
- Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas and University of Salamanca, Salamanca, Spain
| |
Collapse
|
19
|
Merlo G, Chiazzese G, Taibi D, Chifari A. Development and Validation of a Functional Behavioural Assessment Ontology to Support Behavioural Health Interventions. JMIR Med Inform 2018; 6:e37. [PMID: 29853438 PMCID: PMC6002668 DOI: 10.2196/medinform.7799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 09/05/2017] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background In the cognitive-behavioral approach, Functional Behavioural Assessment is one of the most effective methods to identify the variables that determine a problem behavior. In this context, the use of modern technologies can encourage the collection and sharing of behavioral patterns, effective intervention strategies, and statistical evidence about antecedents and consequences of clusters of problem behaviors, encouraging the designing of function-based interventions. Objective The paper describes the development and validation process used to design a specific Functional Behavioural Assessment Ontology (FBA-Ontology). The FBA-Ontology is a semantic representation of the variables that intervene in a behavioral observation process, facilitating the systematic collection of behavioral data, the consequential planning of treatment strategies and, indirectly, the scientific advancement in this field of study. Methods The ontology has been developed deducing concepts and relationships of the ontology from a gold standard and then performing a machine-based validation and a human-based assessment to validate the Functional Behavioural Assessment Ontology. These validation and verification processes were aimed to verify how much the ontology is conceptually well founded and semantically and syntactically correct. Results The Pellet reasoner checked the logical consistency and the integrity of classes and properties defined in the ontology, not detecting any violation of constraints in the ontology definition. To assess whether the ontology definition is coherent with the knowledge domain, human evaluation of the ontology was performed asking 84 people to fill in a questionnaire composed by 13 questions assessing concepts, relations between concepts, and concepts’ attributes. The response rate for the survey was 29/84 (34.52%). The domain experts confirmed that the concepts, the attributes, and the relationships between concepts defined in the FBA-Ontology are valid and well represent the Functional Behavioural Assessment process. Conclusions The new ontology developed could be a useful tool to design new evidence-based systems in the Behavioral Interventions practices, encouraging the link with other Linked Open Data datasets and repositories to provide users with new models of eHealth focused on the management of problem behaviors. Therefore, new research is needed to develop and implement innovative strategies to improve the poor reproducibility and translatability of basic research findings in the field of behavioral assessment.
Collapse
Affiliation(s)
- Gianluca Merlo
- Istituto per le Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Palermo, Italy
| | - Giuseppe Chiazzese
- Istituto per le Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Palermo, Italy
| | - Davide Taibi
- Istituto per le Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Palermo, Italy
| | - Antonella Chifari
- Istituto per le Tecnologie Didattiche, Consiglio Nazionale delle Ricerche, Palermo, Italy
| |
Collapse
|
20
|
Bagherifard K, Rahmani M, Nilashi M, Rafe V. Performance improvement for recommender systems using ontology. TELEMATICS AND INFORMATICS 2017. [DOI: 10.1016/j.tele.2017.08.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
21
|
Safaeipour H, Zarandi MHF, Bastani S. Using Fuzzy Ontology to Improve Similarity Assessment: Method and Evaluation. INT J INTELL SYST 2017. [DOI: 10.1002/int.21895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Hoda Safaeipour
- Department of Industrial Engineering; Amirkabir University of Technology; Tehran 15875-4413 Iran
| | - M. H. Fazel Zarandi
- Department of Industrial Engineering; Amirkabir University of Technology; Tehran 15875-4413 Iran
| | - Susan Bastani
- Department of Social Science; Alzahra University; Tehran Iran
| |
Collapse
|
22
|
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework. Sci Rep 2017; 7:381. [PMID: 28336965 PMCID: PMC5428484 DOI: 10.1038/s41598-017-00465-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 02/28/2017] [Indexed: 11/21/2022] Open
Abstract
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as “mixing strategy”, which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
Collapse
|
23
|
GFD-Net: A novel semantic similarity methodology for the analysis of gene networks. J Biomed Inform 2017; 68:71-82. [PMID: 28274758 DOI: 10.1016/j.jbi.2017.02.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 02/08/2017] [Accepted: 02/22/2017] [Indexed: 02/06/2023]
Abstract
Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section "GFD-Net applied to the study of human diseases" where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet).
Collapse
|
24
|
Bastos HP, Sousa L, Clarke LA, Couto FM. Functional coherence metrics in protein families. J Biomed Semantics 2016; 7:41. [PMID: 27338101 PMCID: PMC4917928 DOI: 10.1186/s13326-016-0076-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 05/17/2016] [Indexed: 12/03/2022] Open
Abstract
Background Biological sequences, such as proteins, have been provided with annotations that assign functional information. These functional annotations are associations of proteins (or other biological sequences) with descriptors characterizing their biological roles. However, not all proteins are fully (or even at all) annotated. This annotation incompleteness limits our ability to make sound assertions about the functional coherence within sets of proteins. Annotation incompleteness is a problematic issue when measuring semantic functional similarity of biological sequences since they can only capture a limited amount of all the semantic aspects the sequences may encompass. Methods Instead of relying uniquely on single (reductive) metrics, this work proposes a comprehensive approach for assessing functional coherence within protein sets. The approach entails using visualization and term enrichment techniques anchored in specific domain knowledge, such as a protein family. For that purpose we evaluate two novel functional coherence metrics, mUI and mGIC that combine aspects of semantic similarity measures and term enrichment. Results These metrics were used to effectively capture and measure the local similarity cores within protein sets. Hence, these metrics coupled with visualization tools allow an improved grasp on three important functional annotation aspects: completeness, agreement and coherence. Conclusions Measuring the functional similarity between proteins based on their annotations is a non trivial task. Several metrics exist but due both to characteristics intrinsic to the nature of graphs and extrinsic natures related to the process of annotation each measure can only capture certain functional annotation aspects of proteins. Hence, when trying to measure the functional coherence of a set of proteins a single metric is too reductive. Therefore, it is valuable to be aware of how each employed similarity metric works and what similarity aspects it can best capture. Here we test the behaviour and resilience of some similarity metrics. Electronic supplementary material The online version of this article (doi:10.1186/s13326-016-0076-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hugo P Bastos
- LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Lisete Sousa
- CEAUL, Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Luka A Clarke
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Francisco M Couto
- LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
25
|
Zhang SB, Lai JH. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures. Gene 2016; 586:148-57. [PMID: 27080954 DOI: 10.1016/j.gene.2016.04.024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 03/28/2016] [Accepted: 04/08/2016] [Indexed: 11/19/2022]
Abstract
Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/.
Collapse
Affiliation(s)
- Shu-Bo Zhang
- Department of Computer Science, Guangzhou Maritime Institute, Room 803 Building 88, Dashabei Road, Huangpu District, Guangzhou 510275, PR China.
| | - Jian-Huang Lai
- School of Information Science and Technology, Sun Yat-sen University, Room 105 Building 110 East District, 135 Xingangxi Road, Guangzhou 510275, PR China.
| |
Collapse
|
26
|
A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis. Artif Intell Med 2015; 65:179-208. [PMID: 26303105 DOI: 10.1016/j.artmed.2015.08.003] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 06/02/2015] [Accepted: 08/05/2015] [Indexed: 11/22/2022]
Abstract
OBJECTIVE Case-based reasoning (CBR) is a problem-solving paradigm that uses past knowledge to interpret or solve new problems. It is suitable for experience-based and theory-less problems. Building a semantically intelligent CBR that mimic the expert thinking can solve many problems especially medical ones. METHODS Knowledge-intensive CBR using formal ontologies is an evolvement of this paradigm. Ontologies can be used for case representation and storage, and it can be used as a background knowledge. Using standard medical ontologies, such as SNOMED CT, enhances the interoperability and integration with the health care systems. Moreover, utilizing vague or imprecise knowledge further improves the CBR semantic effectiveness. This paper proposes a fuzzy ontology-based CBR framework. It proposes a fuzzy case-base OWL2 ontology, and a fuzzy semantic retrieval algorithm that handles many feature types. MATERIAL This framework is implemented and tested on the diabetes diagnosis problem. The fuzzy ontology is populated with 60 real diabetic cases. The effectiveness of the proposed approach is illustrated with a set of experiments and case studies. RESULTS The resulting system can answer complex medical queries related to semantic understanding of medical concepts and handling of vague terms. The resulting fuzzy case-base ontology has 63 concepts, 54 (fuzzy) object properties, 138 (fuzzy) datatype properties, 105 fuzzy datatypes, and 2640 instances. The system achieves an accuracy of 97.67%. We compare our framework with existing CBR systems and a set of five machine-learning classifiers; our system outperforms all of these systems. CONCLUSION Building an integrated CBR system can improve its performance. Representing CBR knowledge using the fuzzy ontology and building a case retrieval algorithm that treats different features differently improves the accuracy of the resulting systems.
Collapse
|
27
|
Bettembourg C, Diot C, Dameron O. Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI. PLoS One 2015; 10:e0133579. [PMID: 26230274 PMCID: PMC4521860 DOI: 10.1371/journal.pone.0133579] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 06/30/2015] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The analysis of gene annotations referencing back to Gene Ontology plays an important role in the interpretation of high-throughput experiments results. This analysis typically involves semantic similarity and particularity measures that quantify the importance of the Gene Ontology annotations. However, there is currently no sound method supporting the interpretation of the similarity and particularity values in order to determine whether two genes are similar or whether one gene has some significant particular function. Interpretation is frequently based either on an implicit threshold, or an arbitrary one (typically 0.5). Here we investigate a method for determining thresholds supporting the interpretation of the results of a semantic comparison. RESULTS We propose a method for determining the optimal similarity threshold by minimizing the proportions of false-positive and false-negative similarity matches. We compared the distributions of the similarity values of pairs of similar genes and pairs of non-similar genes. These comparisons were performed separately for all three branches of the Gene Ontology. In all situations, we found overlap between the similar and the non-similar distributions, indicating that some similar genes had a similarity value lower than the similarity value of some non-similar genes. We then extend this method to the semantic particularity measure and to a similarity measure applied to the ChEBI ontology. Thresholds were evaluated over the whole HomoloGene database. For each group of homologous genes, we computed all the similarity and particularity values between pairs of genes. Finally, we focused on the PPAR multigene family to show that the similarity and particularity patterns obtained with our thresholds were better at discriminating orthologs and paralogs than those obtained using default thresholds. CONCLUSION We developed a method for determining optimal semantic similarity and particularity thresholds. We applied this method on the GO and ChEBI ontologies. Qualitative analysis using the thresholds on the PPAR multigene family yielded biologically-relevant patterns.
Collapse
Affiliation(s)
- Charles Bettembourg
- Université de Rennes 1, Rennes, France
- INRA, UMR1348 PEGASE, Saint-Gilles, France
- Agrocampus OUEST, UMR1348 PEGASE, Rennes, France
- IRISA, Campus de Beaulieu, Rennes, France
- INRIA, Rennes, France
- * E-mail:
| | - Christian Diot
- INRA, UMR1348 PEGASE, Saint-Gilles, France
- Agrocampus OUEST, UMR1348 PEGASE, Rennes, France
| | - Olivier Dameron
- Université de Rennes 1, Rennes, France
- IRISA, Campus de Beaulieu, Rennes, France
- INRIA, Rennes, France
| |
Collapse
|
28
|
Zanzoni A, Chapple CE, Brun C. Relationships between predicted moonlighting proteins, human diseases, and comorbidities from a network perspective. Front Physiol 2015; 6:171. [PMID: 26157390 PMCID: PMC4477069 DOI: 10.3389/fphys.2015.00171] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 05/20/2015] [Indexed: 12/26/2022] Open
Abstract
Moonlighting proteins are a subset of multifunctional proteins characterized by their multiple, independent, and unrelated biological functions. We recently set up a large-scale identification of moonlighting proteins using a protein-protein interaction (PPI) network approach. We established that 3% of the current human interactome is composed of predicted moonlighting proteins. We found that disease-related genes are over-represented among those candidates. Here, by comparing moonlighting candidates to non-candidates as groups, we further show that (i) they are significantly involved in more than one disease, (ii) they contribute to complex rather than monogenic diseases, (iii) the diseases in which they are involved are phenotypically different according to their annotations, finally, (iv) they are enriched for diseases pairs showing statistically significant comorbidity patterns based on Medicare records. Altogether, our results suggest that some observed comorbidities between phenotypically different diseases could be due to a shared protein involved in unrelated biological processes.
Collapse
Affiliation(s)
- Andreas Zanzoni
- INSERM, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090, TAGC Marseille, France
| | - Charles E Chapple
- INSERM, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090, TAGC Marseille, France
| | - Christine Brun
- INSERM, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090, TAGC Marseille, France ; Centre National de la Recherche Scientifique Marseille, France
| |
Collapse
|
29
|
Chapple CE, Herrmann C, Brun C. PrOnto database : GO term functional dissimilarity inferred from biological data. Front Genet 2015; 6:200. [PMID: 26089836 PMCID: PMC4452890 DOI: 10.3389/fgene.2015.00200] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/21/2015] [Indexed: 12/22/2022] Open
Abstract
Moonlighting proteins are defined by their involvement in multiple, unrelated functions. The computational prediction of such proteins requires a formal method of assessing the similarity of cellular processes, for example, by identifying dissimilar Gene Ontology terms. While many measures of Gene Ontology term similarity exist, most depend on abstract mathematical analyses of the structure of the GO tree and do not necessarily represent the underlying biology. Here, we propose two metrics of GO term functional dissimilarity derived from biological information, one based on the protein annotations and the other on the interactions between proteins. They have been collected in the PrOnto database, a novel tool which can be of particular use for the identification of moonlighting proteins. The database can be queried via an web-based interface which is freely available at http://tagc.univ-mrs.fr/pronto.
Collapse
Affiliation(s)
- Charles E Chapple
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France
| | - Carl Herrmann
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France
| | - Christine Brun
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France ; Centre National de la Recherche Scientifique Marseille, France
| |
Collapse
|
30
|
Oellrich A, Walls RL, Cannon EKS, Cannon SB, Cooper L, Gardiner J, Gkoutos GV, Harper L, He M, Hoehndorf R, Jaiswal P, Kalberer SR, Lloyd JP, Meinke D, Menda N, Moore L, Nelson RT, Pujar A, Lawrence CJ, Huala E. An ontology approach to comparative phenomics in plants. PLANT METHODS 2015; 11:10. [PMID: 25774204 PMCID: PMC4359497 DOI: 10.1186/s13007-015-0053-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 02/05/2015] [Indexed: 05/29/2023]
Abstract
BACKGROUND Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.
Collapse
Affiliation(s)
- Anika Oellrich
- />Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK
| | - Ramona L Walls
- />iPlant Collaborative, University of Arizona, 1657 E. Helen St., Tucson, Arizona 85721 USA
| | - Ethalinda KS Cannon
- />Department of Electrical and Computer Engineering Iowa State University, 1018 Crop Informatics Lab, Ames, Iowa 50011 USA
| | - Steven B Cannon
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
- />Department of Agronomy, Agronomy Hall, Iowa State University, Ames, IA 50010 USA
| | - Laurel Cooper
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Jack Gardiner
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Georgios V Gkoutos
- />Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB UK
| | - Lisa Harper
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - Mingze He
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Robert Hoehndorf
- />Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, P.O. Box 2882, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| | - Pankaj Jaiswal
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Scott R Kalberer
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - John P Lloyd
- />Department of Plant Biology, Michigan State University, 220 Trowbridge Rd, East Lansing, MI 48824 USA
| | - David Meinke
- />Department of Botany, Oklahoma State University, 301 Physical Sciences, Stillwater, OK 74078 USA
| | - Naama Menda
- />Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853 USA
| | - Laura Moore
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Rex T Nelson
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - Anuradha Pujar
- />Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853 USA
| | - Carolyn J Lawrence
- />Department of Agronomy, Agronomy Hall, Iowa State University, Ames, IA 50010 USA
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Eva Huala
- />Phoenix Bioinformatics, 643 Bair Island Rd Suite 403, Redwood City, CA 94063 USA
| |
Collapse
|
31
|
Carnielli CM, Winck FV, Paes Leme AF. Functional annotation and biological interpretation of proteomics data. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2015; 1854:46-54. [DOI: 10.1016/j.bbapap.2014.10.019] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 10/07/2014] [Accepted: 10/21/2014] [Indexed: 12/22/2022]
|
32
|
Iourov IY, Vorsanova SG, Yurov YB. In silico molecular cytogenetics: a bioinformatic approach to prioritization of candidate genes and copy number variations for basic and clinical genome research. Mol Cytogenet 2014; 7:98. [PMID: 25525469 PMCID: PMC4269961 DOI: 10.1186/s13039-014-0098-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 12/02/2014] [Indexed: 01/08/2023] Open
Abstract
Background The availability of multiple in silico tools for prioritizing genetic variants widens the possibilities for converting genomic data into biological knowledge. However, in molecular cytogenetics, bioinformatic analyses are generally limited to result visualization or database mining for finding similar cytogenetic data. Obviously, the potential of bioinformatics might go beyond these applications. On the other hand, the requirements for performing successful in silico analyses (i.e. deep knowledge of computer science, statistics etc.) can hinder the implementation of bioinformatics in clinical and basic molecular cytogenetic research. Here, we propose a bioinformatic approach to prioritization of genomic variations that is able to solve these problems. Results Selecting gene expression as an initial criterion, we have proposed a bioinformatic approach combining filtering and ranking prioritization strategies, which includes analyzing metabolome and interactome data on proteins encoded by candidate genes. To finalize the prioritization of genetic variants, genomic, epigenomic, interactomic and metabolomic data fusion has been made. Structural abnormalities and aneuploidy revealed by array CGH and FISH have been evaluated to test the approach through determining genotype-phenotype correlations, which have been found similar to those of previous studies. Additionally, we have been able to prioritize copy number variations (CNV) (i.e. differentiate between benign CNV and CNV with phenotypic outcome). Finally, the approach has been applied to prioritize genetic variants in cases of somatic mosaicism (including tissue-specific mosaicism). Conclusions In order to provide for an in silico evaluation of molecular cytogenetic data, we have proposed a bioinformatic approach to prioritization of candidate genes and CNV. While having the disadvantage of possible unavailability of gene expression data or lack of expression variability between genes of interest, the approach provides several advantages. These are (i) the versatility due to independence from specific databases/tools or software, (ii) relative algorithm simplicity (possibility to avoid sophisticated computational/statistical methodology) and (iii) applicability to molecular cytogenetic data because of the chromosome-centric nature. In conclusion, the approach is able to become useful for increasing the yield of molecular cytogenetic techniques.
Collapse
Affiliation(s)
- Ivan Y Iourov
- Mental Health Research Center, Russian Academy of Medical Sciences, 117152 Moscow, Russia ; Russian National Research Medical University named after N.I. Pirogov, Separated Structural Unit "Clinical Research Institute of Pediatrics", Ministry of Health of Russian Federation, 125412 Moscow, Russia ; Department of Medical Genetics, Russian Medical Academy of Postgraduate Education, Moscow, 123995 Russia
| | - Svetlana G Vorsanova
- Mental Health Research Center, Russian Academy of Medical Sciences, 117152 Moscow, Russia ; Russian National Research Medical University named after N.I. Pirogov, Separated Structural Unit "Clinical Research Institute of Pediatrics", Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Yuri B Yurov
- Mental Health Research Center, Russian Academy of Medical Sciences, 117152 Moscow, Russia ; Russian National Research Medical University named after N.I. Pirogov, Separated Structural Unit "Clinical Research Institute of Pediatrics", Ministry of Health of Russian Federation, 125412 Moscow, Russia
| |
Collapse
|
33
|
Correlating information contents of gene ontology terms to infer semantic similarity of gene products. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:891842. [PMID: 24963342 PMCID: PMC4054916 DOI: 10.1155/2014/891842] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 04/29/2014] [Indexed: 11/26/2022]
Abstract
Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.
Collapse
|
34
|
Bastos HP, Clarke LA, Couto FM. Annotation extension through protein family annotation coherence metrics. Front Genet 2013; 4:201. [PMID: 24130572 PMCID: PMC3795322 DOI: 10.3389/fgene.2013.00201] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 09/22/2013] [Indexed: 11/23/2022] Open
Abstract
Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families.
Collapse
Affiliation(s)
- Hugo P Bastos
- LaSIGE, Department of Informatics, Faculdade de Ciências, Universidade de Lisboa Lisboa, Portugal
| | | | | |
Collapse
|