1
|
Montanaro G, Balhoff JP, Girón JC, Söderholm M, Tarasov S. Computable species descriptions and nanopublications: applying ontology-based technologies to dung beetles (Coleoptera, Scarabaeinae). Biodivers Data J 2024; 12:e121562. [PMID: 38912113 PMCID: PMC11190572 DOI: 10.3897/bdj.12.e121562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/22/2024] [Indexed: 06/25/2024] Open
Abstract
Background Taxonomy has long struggled with analysing vast amounts of phenotypic data due to computational and accessibility challenges. Ontology-based technologies provide a framework for modelling semantic phenotypes that are understandable by computers and compliant with FAIR principles. In this paper, we explore the use of Phenoscript, an emerging language designed for creating semantic phenotypes, to produce computable species descriptions. Our case study centers on the application of this approach to dung beetles (Coleoptera, Scarabaeinae). New information We illustrate the effectiveness of Phenoscript for creating semantic phenotypes. We also demonstrate the ability of the Phenospy python package to automatically translate Phenoscript descriptions into natural language (NL), which eliminates the need for writing traditional NL descriptions. We introduce a computational pipeline that streamlines the generation of semantic descriptions and their conversion to NL. To demonstrate the power of the semantic approach, we apply simple semantic queries to the generated phenotypic descriptions. This paper addresses the current challenges in crafting semantic species descriptions and outlines the path towards future improvements. Furthermore, we discuss the promising integration of semantic phenotypes and nanopublications, as emerging methods for sharing scientific information. Overall, our study highlights the pivotal role of ontology-based technologies in modernising taxonomy and aligning it with the evolving landscape of big data analysis and FAIR principles.
Collapse
Affiliation(s)
- Giulio Montanaro
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| | - James P. Balhoff
- RENCI, University of North Carolina, Chapel Hill, North Carolina, United States of AmericaRENCI, University of North CarolinaChapel Hill, North CarolinaUnited States of America
| | - Jennifer C. Girón
- Museum of Texas Tech University, Texas, United States of AmericaMuseum of Texas Tech UniversityTexasUnited States of America
| | - Max Söderholm
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| | - Sergei Tarasov
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| |
Collapse
|
2
|
Hung SS, Tsai PS, Po CW, Hou PS. Pax6 isoforms shape eye development: Insights from developmental stages and organoid models. Differentiation 2024; 137:100781. [PMID: 38631141 DOI: 10.1016/j.diff.2024.100781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 04/05/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024]
Abstract
Pax6 is a critical transcription factor involved in the development of the central nervous system. However, in humans, mutations in Pax6 predominantly result in iris deficiency rather than neurological phenotypes. This may be attributed to the distinct functions of Pax6 isoforms, Pax6a and Pax6b. In this study, we investigated the spatial and temporal expression patterns of Pax6 isoforms during different stages of mouse eye development. We observed a strong correlation between Pax6a expression and the neuroretina gene Sox2, while Pax6b showed a high correlation with iris-component genes, including the mesenchymal gene Foxc1. During early patterning from E10.5, Pax6b was expressed in the hinge of the optic cup and neighboring mesenchymal cells, whereas Pax6a was absent in these regions. At E14.5, both Pax6a and Pax6b were expressed in the future iris and ciliary body, coinciding with the integration of mesenchymal cells and Mitf-positive cells in the outer region. From E18.5, Pax6 isoforms exhibited distinct expression patterns as lineage genes became more restricted. To further validate these findings, we utilized ESC-derived eye organoids, which recapitulated the temporal and spatial expression patterns of lineage genes and Pax6 isoforms. Additionally, we found that the spatial expression patterns of Foxc1 and Mitf were impaired in Pax6b-mutant ESC-derived eye organoids. This in vitro eye organoids model suggested the involvement of Pax6b-positive local mesodermal cells in iris development. These results provide valuable insights into the regulatory roles of Pax6 isoforms during iris and neuroretina development and highlight the potential of ESC-derived eye organoids as a tool for studying normal and pathological eye development.
Collapse
Affiliation(s)
- Shih-Shun Hung
- Institute of Anatomy and Cell Biology, School of Medicine, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong St., Beitou Dist, Taipei, 11221, Taiwan.
| | - Po-Sung Tsai
- Institute of Anatomy and Cell Biology, School of Medicine, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong St., Beitou Dist, Taipei, 11221, Taiwan.
| | - Ching-Wen Po
- Institute of Anatomy and Cell Biology, School of Medicine, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong St., Beitou Dist, Taipei, 11221, Taiwan; Institute of Brain Science, School of Medicine, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan.
| | - Pei-Shan Hou
- Institute of Anatomy and Cell Biology, School of Medicine, National Yang Ming Chiao Tung University, No. 155, Sec. 2, Linong St., Beitou Dist, Taipei, 11221, Taiwan; Institute of Brain Science, School of Medicine, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan; Brain Research Center, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan.
| |
Collapse
|
3
|
Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, Avillach P, Bagley AM, Bakštein E, Balhoff JP, Baynam G, Bello SM, Berk M, Bertram H, Bishop S, Blau H, Bodenstein DF, Botas P, Boztug K, Čady J, Callahan TJ, Cameron R, Carbon S, Castellanos F, Caufield JH, Chan LE, Chute C, Cruz-Rojo J, Dahan-Oliel N, Davids JR, de Dieuleveult M, de Souza V, de Vries BBA, de Vries E, DePaulo JR, Derfalvi B, Dhombres F, Diaz-Byrd C, Dingemans AJM, Donadille B, Duyzend M, Elfeky R, Essaid S, Fabrizzi C, Fico G, Firth HV, Freudenberg-Hua Y, Fullerton JM, Gabriel DL, Gilmour K, Giordano J, Goes FS, Moses RG, Green I, Griese M, Groza T, Gu W, Guthrie J, Gyori B, Hamosh A, Hanauer M, Hanušová K, He Y(O, Hegde H, Helbig I, Holasová K, Hoyt CT, Huang S, Hurwitz E, Jacobsen JOB, Jiang X, Joseph L, Keramatian K, King B, Knoflach K, Koolen DA, Kraus M, Kroll C, Kusters M, Ladewig MS, Lagorce D, Lai MC, Lapunzina P, Laraway B, Lewis-Smith D, Li X, Lucano C, Majd M, Marazita ML, Martinez-Glez V, McHenry TH, McInnis MG, McMurry JA, Mihulová M, Millett CE, Mitchell PB, Moslerová V, Narutomi K, Nematollahi S, Nevado J, Nierenberg AA, Čajbiková NN, Nurnberger JI, Ogishima S, Olson D, Ortiz A, Pachajoa H, Perez de Nanclares G, Peters A, Putman T, Rapp CK, Rath A, Reese J, Rekerle L, Roberts A, Roy S, Sanders SJ, Schuetz C, Schulte EC, Schulze TG, Schwarz M, Scott K, Seelow D, Seitz B, Shen Y, Similuk MN, Simon ES, Singh B, Smedley D, Smith CL, Smolinsky JT, Sperry S, Stafford E, Stefancsik R, Steinhaus R, Strawbridge R, Sundaramurthi JC, Talapova P, Tenorio Castano JA, Tesner P, Thomas RH, Thurm A, Turnovec M, van Gijn ME, Vasilevsky NA, Vlčková M, Walden A, Wang K, Wapner R, Ware JS, Wiafe AA, Wiafe SA, Wiggins LD, Williams AE, Wu C, Wyrwoll MJ, Xiong H, Yalin N, Yamamoto Y, Yatham LN, Yocum AK, Young AH, Yüksel Z, Zandi PP, Zankl A, Zarante I, Zvolský M, Toro S, Carmody LC, Harris NL, Munoz-Torres MC, Danis D, Mungall CJ, Köhler S, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res 2024; 52:D1333-D1346. [PMID: 37953324 PMCID: PMC10767975 DOI: 10.1093/nar/gkad1005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/12/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.
Collapse
Affiliation(s)
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Joel Anderton
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Anita M Bagley
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Eduard Bakštein
- National Institute of Mental Health, Klecany, Czech Republic
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Michael Berk
- Deakin University, IMPACT - the Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Barwon Health, Geelong, Australia
| | - Holli Bertram
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Somer Bishop
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - David F Bodenstein
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
| | | | - Kaan Boztug
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Jolana Čady
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, NY, NY, USA
| | | | - Seth J Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Jaime Cruz-Rojo
- UDISGEN (Dysmorphology and Genetics Unit), 12 de Octubre Hospital, Madrid, Spain
| | - Noémi Dahan-Oliel
- Department of Clinical Research, Shriners Hospitals for Children, Montreal, Quebec, Canada
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Maud de Dieuleveult
- Département I&D, AP-HP, Banque Nationale de Données Maladies Rares, Paris, France
| | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | | | - J Raymond DePaulo
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Beata Derfalvi
- Department of Pediatrics, Dalhousie University, Halifax, NS, Canada
| | - Ferdinand Dhombres
- Fetal Medicine Department, Armand Trousseau Hospital, Sorbonne University, GRC26, INSERM, Limics, Paris, France
| | - Claudia Diaz-Byrd
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Bruno Donadille
- St Antoine Hospital, Reference Center for Rare Growth Endocrine Disorders, Sorbonne University, AP-HP, INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | | | - Reem Elfeky
- Department of Immunology, GOS Hospital for Children NHS Foundation Trust, University College London, London, UK
| | - Shahim Essaid
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Giovanna Fico
- Bipolar and Depressive Disorders Unit, Institute of Neuroscience, Hospital Clinic, University of Barcelona, IDIBAPS, CIBERSAM, Barcelona, Catalonia, Spain
| | - Helen V Firth
- Addenbrooke's Hospital, Cambridge University Hospitals, Cambridge, UK
| | - Yun Freudenberg-Hua
- Department of Psychiatry, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | | | - Davera L Gabriel
- School of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | | | - Jessica Giordano
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Rachel Gore Moses
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ian Green
- SNOMED International, London W2 6BD, UK
| | - Matthias Griese
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Julia Guthrie
- Department of Structural and Computational Biology, University of Vienna; Max Perutz Labs, Vienna, Austria
| | - Benjamin Gyori
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Ada Hamosh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Kateřina Hanušová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | | | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ingo Helbig
- Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kateřina Holasová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Charles Tapley Hoyt
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Eric Hurwitz
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Lisa Joseph
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Kamyar Keramatian
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Bryan King
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Katrin Knoflach
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Carlo Kroll
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Maaike Kusters
- Immunology, NIHR Great Ormond Street Hospital BRC, London, UK
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - David Lagorce
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Meng-Chuan Lai
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Pablo Lapunzina
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Bryan Laraway
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | | | - Caterina Lucano
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Marzieh Majd
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Mary L Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Martinez-Glez
- Center for Genomic Medicine, Parc Taulí Hospital Universitari, Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
| | - Toby H McHenry
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Melvin G McInnis
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Michaela Mihulová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Caitlin E Millett
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Philip B Mitchell
- Discipline of Psychiatry & Mental Health, School of Clinical Medicine, Faculty of Medicine & Health, University of New South Wales, Sydney, NSW, Australia
| | - Veronika Moslerová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Kenji Narutomi
- Okinawa Prefectural Nanbu Medical Center & Children's Medical Center
| | - Shahrzad Nematollahi
- School of Physical and Occupational Therapy, McGill University, Montreal, Quebec, Canada
| | - Julian Nevado
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Andrew A Nierenberg
- Dauten Family Center for Bipolar Treatment Innovation, Massachusetts General Hospital, Boston, MA, USA
| | - Nikola Novák Čajbiková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - John I Nurnberger
- Stark Neurosciences Research Institute, Departments of Psychiatry and Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Daniel Olson
- Data Collaboration Center, Data Science, Critical Path Institute, Tucson, AZ, USA
| | - Abigail Ortiz
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Harry Pachajoa
- Centro de Investigaciones en Anomalías Congénitas y Enfermedades Raras (CIACER), Universidad Icesi, Cali, Colombia
| | - Guiomar Perez de Nanclares
- Molecular (epi) genetics lab, Bioaraba Health Research Institute, Araba University Hospital, Vitoria-Gasteiz, Spain
| | - Amy Peters
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christina K Rapp
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Angharad M Roberts
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | - Suzy Roy
- SNOMED International, London W2 6BD, UK
| | - Stephan J Sanders
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, University of Oxford, Oxford, UK
| | - Catharina Schuetz
- Universitätsklinikum Carl Gustav Carus, Medizinische Fakultät, TU, Dresden, Germany
| | - Eva C Schulte
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, LMU Munich, Munich, Germany
| | - Thomas G Schulze
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Martin Schwarz
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Katie Scott
- Department of Psychiatry, Dalhousie University, Halifax, NS, Canada
| | - Dominik Seelow
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg/Saar, Germany
| | | | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eric S Simon
- Eisenberg Family Depression Center, University of Michigan, Ann Arbor, MI, USA
| | - Balwinder Singh
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Jake T Smolinsky
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Sarah Sperry
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | | | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robin Steinhaus
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Rebecca Strawbridge
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | | | - Polina Talapova
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | | | - Pavel Tesner
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | - Audrey Thurm
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Marek Turnovec
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Marielle E van Gijn
- Department of Genetics, University Medical Center Groningen, Groningen, Netherlands
| | | | - Markéta Vlčková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Anita Walden
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kai Wang
- Chinese HPO Consortium, Beijing, China
| | - Ron Wapner
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - James S Ware
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | | | | | - Lisa D Wiggins
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Andrew E Williams
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | - Chen Wu
- Chinese HPO Consortium, Beijing, China
| | - Margot J Wyrwoll
- Centre for Regenerative Medicine, Institute for Regeneration and Repair, Institute for Stem Cell Research, University of Edinburgh, Edinburgh, UK
| | - Hui Xiong
- Chinese HPO Consortium, Beijing, China
| | - Nefize Yalin
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Japan
| | - Lakshmi N Yatham
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Anastasia K Yocum
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Allan H Young
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London & South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Monks Orchard Road, Beckenham, Kent, London SE5 8AF, UK
| | - Zafer Yüksel
- Department of Human Genetics, Bioscientia Healthcare GmbH, Ingelheim, Germany
| | - Peter P Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Andreas Zankl
- Faculty of Medicine and Health, The University of Sydney, Camperdown, Australia
| | - Ignacio Zarante
- Institute of Human Genetics, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Miroslav Zvolský
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
4
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
5
|
Girón JC, Tarasov S, González Montaña LA, Matentzoglu N, Smith AD, Koch M, Boudinot BE, Bouchard P, Burks R, Vogt L, Yoder M, Osumi-Sutherland D, Friedrich F, Beutel RG, Mikó I. Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeleto-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics. Syst Biol 2023; 72:1084-1100. [PMID: 37094905 DOI: 10.1093/sysbio/syad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/17/2023] [Accepted: 04/21/2023] [Indexed: 04/26/2023] Open
Abstract
The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multispecies anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to: 1) use controlled vocabularies and create semiautomated computer-parsable insect morphological descriptions; 2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and 3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies.
Collapse
Affiliation(s)
- Jennifer C Girón
- Department of Entomology, Purdue University, West Lafayette, IN, USA
- Natural Science Research Laboratory, Museum of Texas Tech University, Lubbock, TX, USA
| | - Sergei Tarasov
- Finnish Museum of Natural History, University of Helsinki, Pohjoinen Rautatiekatu 13, FI-00014 Helsinki, Finland
| | | | | | - Aaron D Smith
- Department of Entomology, Purdue University, West Lafayette, IN, USA
| | - Markus Koch
- Institute of Evolutionary Biology and Ecology, University of Bonn, An der Immenburg 1, 53121 Bonn, Germany
| | - Brendon E Boudinot
- Department of Entomology & Nematology, University of California, Davis, One Shields Ave, CA, USA
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington DC, USA
| | - Patrice Bouchard
- Biodiversity and Bioresources, Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario, K1A 0C6, Canada
| | - Roger Burks
- Entomology Department, University of California, Riverside, 900 University Ave. Riverside, CA, USA
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167 Hannover, Germany
| | - Matthew Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, IL, USA
| | | | - Frank Friedrich
- Institut für Zell- und Systembiologie der Tiere, Universität Hamburg, Martin-Luther-King-Platz 3, 20146, Hamburg, Germany
| | - Rolf G Beutel
- Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
| | - István Mikó
- Department of Biological Sciences, University of New Hampshire, Durham, NH, USA
| |
Collapse
|
6
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
7
|
Wright SN, Leger BS, Rosenthal SB, Liu SN, Jia T, Chitre AS, Polesskaya O, Holl K, Gao J, Cheng R, Garcia Martinez A, George A, Gileta AF, Han W, Netzley AH, King CP, Lamparelli A, Martin C, St Pierre CL, Wang T, Bimschleger H, Richards J, Ishiwari K, Chen H, Flagel SB, Meyer P, Robinson TE, Solberg Woods LC, Kreisberg JF, Ideker T, Palmer AA. Genome-wide association studies of human and rat BMI converge on synapse, epigenome, and hormone signaling networks. Cell Rep 2023; 42:112873. [PMID: 37527041 PMCID: PMC10546330 DOI: 10.1016/j.celrep.2023.112873] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/03/2023] Open
Abstract
A vexing observation in genome-wide association studies (GWASs) is that parallel analyses in different species may not identify orthologous genes. Here, we demonstrate that cross-species translation of GWASs can be greatly improved by an analysis of co-localization within molecular networks. Using body mass index (BMI) as an example, we show that the genes associated with BMI in humans lack significant agreement with those identified in rats. However, the networks interconnecting these genes show substantial overlap, highlighting common mechanisms including synaptic signaling, epigenetic modification, and hormonal regulation. Genetic perturbations within these networks cause abnormal BMI phenotypes in mice, too, supporting their broad conservation across mammals. Other mechanisms appear species specific, including carbohydrate biosynthesis (humans) and glycerolipid metabolism (rodents). Finally, network co-localization also identifies cross-species convergence for height/body length. This study advances a general paradigm for determining whether and how phenotypes measured in model species recapitulate human biology.
Collapse
Affiliation(s)
- Sarah N Wright
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Brittany S Leger
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Program in Biomedical Sciences, University of California San Diego, La Jolla, CA 93093, USA
| | - Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sophie N Liu
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Tongqiu Jia
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Katie Holl
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jianjun Gao
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Riyan Cheng
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Angel Garcia Martinez
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Anthony George
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA
| | - Alexander F Gileta
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Wenyan Han
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Alesa H Netzley
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Christopher P King
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | | | - Connor Martin
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | | | - Tengfei Wang
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Hannah Bimschleger
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Jerry Richards
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA
| | - Keita Ishiwari
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Pharmacology and Toxicology, University at Buffalo, Buffalo, NY 14203, USA
| | - Hao Chen
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Shelly B Flagel
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA; Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI 48109, USA
| | - Paul Meyer
- Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | - Terry E Robinson
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Jason F Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
8
|
Liu X, Gao L, Peng Y, Fang Z, Wang J. PheSom: a term frequency-based method for measuring human phenotype similarity on the basis of MeSH vocabulary. Front Genet 2023; 14:1185790. [PMID: 37496714 PMCID: PMC10366691 DOI: 10.3389/fgene.2023.1185790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Background: Phenotype similarity calculation should be used to help improve drug repurposing. In this study, based on the MeSH terms describing the phenotypes deposited in OMIM, we proposed a method, namely, PheSom (Phenotype Similarity On MeSH), to measure the similarity between phenotypes. PheSom counted the number of overlapping MeSH terms between two phenotypes and then took the weight of every MeSH term within each phenotype into account according to the term frequency-inverse document frequency (FIDC). Phenotype-related genes were used for the evaluation of our method. Results: A 7,739 × 7,739 similarity score matrix was finally obtained and the number of phenotype pairs was dramatically decreased with the increase of similarity score. Besides, the overlapping rates of phenotype-related genes were remarkably increased with the increase of similarity score between phenotypes, which supports the reliability of our method. Conclusion: We anticipate our method can be applied to identifying novel therapeutic methods for complex diseases.
Collapse
Affiliation(s)
- Xinhua Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ling Gao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yonglin Peng
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhonghai Fang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| |
Collapse
|
9
|
Pei XM, Yeung MHY, Wong ANN, Tsang HF, Yu ACS, Yim AKY, Wong SCC. Targeted Sequencing Approach and Its Clinical Applications for the Molecular Diagnosis of Human Diseases. Cells 2023; 12:493. [PMID: 36766834 PMCID: PMC9913990 DOI: 10.3390/cells12030493] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 01/19/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open
Abstract
The outbreak of COVID-19 has positively impacted the NGS market recently. Targeted sequencing (TS) has become an important routine technique in both clinical and research settings, with advantages including high confidence and accuracy, a reasonable turnaround time, relatively low cost, and fewer data burdens with the level of bioinformatics or computational demand. Since there are no clear consensus guidelines on the wide range of next-generation sequencing (NGS) platforms and techniques, there is a vital need for researchers and clinicians to develop efficient approaches, especially for the molecular diagnosis of diseases in the emergency of the disease and the global pandemic outbreak of COVID-19. In this review, we aim to summarize different methods of TS, demonstrate parameters for TS assay designs, illustrate different TS panels, discuss their limitations, and present the challenges of TS concerning their clinical application for the molecular diagnosis of human diseases.
Collapse
Affiliation(s)
- Xiao Meng Pei
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Martin Ho Yin Yeung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Alex Ngai Nick Wong
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Hin Fung Tsang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong 999077, China
- Department of Clinical Laboratory and Pathology, Hong Kong Adventist Hospital, Hong Kong, China
| | - Allen Chi Shing Yu
- Codex Genetics Limited, Unit 212, 2/F., Building 16W, No. 16 Science Park West Avenue, The Hong Kong Science Park, Hong Kong 852, China
| | - Aldrin Kay Yuen Yim
- Codex Genetics Limited, Unit 212, 2/F., Building 16W, No. 16 Science Park West Avenue, The Hong Kong Science Park, Hong Kong 852, China
| | - Sze Chuen Cesar Wong
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong 999077, China
| |
Collapse
|
10
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
11
|
Huang YS, Hsu C, Chune YC, Liao IC, Wang H, Lin YL, Hwu WL, Lee NC, Lai F. Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e37701. [PMID: 38935959 PMCID: PMC11168239 DOI: 10.2196/37701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 07/29/2022] [Accepted: 08/22/2022] [Indexed: 06/29/2024]
Abstract
BACKGROUND In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. OBJECTIVE This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. METHODS We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model. RESULTS We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). CONCLUSIONS We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.
Collapse
Affiliation(s)
- Yu-Shan Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Ching Hsu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yu-Chang Chune
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - I-Cheng Liao
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
| | - Hsin Wang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Yi-Lin Lin
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Wuh-Liang Hwu
- Department of Pediatrics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Ni-Chung Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei City, Taiwan
| | - Feipei Lai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei City, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| |
Collapse
|
12
|
Cheng KC, Burdine RD, Dickinson ME, Ekker SC, Lin AY, Lloyd KCK, Lutz CM, MacRae CA, Morrison JH, O'Connor DH, Postlethwait JH, Rogers CD, Sanchez S, Simpson JH, Talbot WS, Wallace DC, Weimer JM, Bellen HJ. Promoting validation and cross-phylogenetic integration in model organism research. Dis Model Mech 2022; 15:276675. [PMID: 36125045 PMCID: PMC9531892 DOI: 10.1242/dmm.049600] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Model organism (MO) research provides a basic understanding of biology and disease due to the evolutionary conservation of the molecular and cellular language of life. MOs have been used to identify and understand the function of orthologous genes, proteins, cells and tissues involved in biological processes, to develop and evaluate techniques and methods, and to perform whole-organism-based chemical screens to test drug efficacy and toxicity. However, a growing richness of datasets and the rising power of computation raise an important question: How do we maximize the value of MOs? In-depth discussions in over 50 virtual presentations organized by the National Institutes of Health across more than 10 weeks yielded important suggestions for improving the rigor, validation, reproducibility and translatability of MO research. The effort clarified challenges and opportunities for developing and integrating tools and resources. Maintenance of critical existing infrastructure and the implementation of suggested improvements will play important roles in maintaining productivity and facilitating the validation of animal models of human biology and disease.
Collapse
Affiliation(s)
- Keith C Cheng
- Department of Pathology, Penn State College of Medicine, Hershey, PA 17033, USA.,Institute for Computational and Data Sciences, Pennsylvania State University, Park, PA 16802, USA
| | - Rebecca D Burdine
- Department of Molecular Biology, Princeton University, Princeton, NJ 08540, USA
| | - Mary E Dickinson
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX 77007, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77007, USA
| | - Stephen C Ekker
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55906, USA
| | - Alex Y Lin
- Department of Pathology, Penn State College of Medicine, Hershey, PA 17033, USA
| | - K C Kent Lloyd
- Mouse Biology Program, School of Medicinel, University of California Davis, Davis, CA 95618, USA.,Department of Surgery, School of Medicine, University of California Davis, Davis, CA 95618, USA
| | - Cathleen M Lutz
- The Jackson Laboratory, Genetic Resource Science, Bar Harbor, ME 04609, USA
| | - Calum A MacRae
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 360 Longwood Avenue, Boston, MA 02215, USA
| | - John H Morrison
- California National Primate Research Center, University of California Davis, Davis, CA 95616, USA.,Department of Neurology, University of California Davis, Davis, CA 95616, USA
| | - David H O'Connor
- Department of Pathology and Laboratory Medicine, University ofWisconsin-Madison, Madison, WI 53711, USA
| | | | - Crystal D Rogers
- School of Veterinary Medicine, University of California Davis, Davis, CA 95616, USA
| | - Susan Sanchez
- Department of Infectious Diseases, College of Veterinary Medicine, The University of Georgia, Athens, GA 30602, USA
| | - Julie H Simpson
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Barbara, CA 93117, USA
| | - William S Talbot
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Douglas C Wallace
- Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jill M Weimer
- Pediatrics and Rare Diseases Group, Sanford Research, Sioux Falls, SD 57104, USA
| | - Hugo J Bellen
- Department of Molecular and Human Genetics, Neurological Research Institute (TCH), Baylor College of Medicine, Houston, TX 77007, USA
| |
Collapse
|
13
|
Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022; 15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene–disease associations. We found that mouse genotype–phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper. Editor's choice: We investigated the use of model organism phenotypes in the computational identification of disease genes, identifying several data biases and concluding that mouse model phenotypes contribute most to computational disease gene identification.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
14
|
Dhombres F, Morgan P, Chaudhari BP, Filges I, Sparks TN, Lapunzina P, Roscioli T, Agarwal U, Aggarwal S, Beneteau C, Cacheiro P, Carmody LC, Collardeau‐Frachon S, Dempsey EA, Dufke A, Duyzend MH, el Ghosh M, Giordano JL, Glad R, Grinfelde I, Iliescu DG, Ladewig MS, Munoz‐Torres MC, Pollazzon M, Radio FC, Rodo C, Silva RG, Smedley D, Sundaramurthi JC, Toro S, Valenzuela I, Vasilevsky NA, Wapner RJ, Zemet R, Haendel MA, Robinson PN. Prenatal phenotyping: A community effort to enhance the Human Phenotype Ontology. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2022; 190:231-242. [PMID: 35872606 PMCID: PMC9588534 DOI: 10.1002/ajmg.c.31989] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/01/2022] [Indexed: 01/07/2023]
Abstract
Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne University, GRC26, INSERM, Limics, Armand Trousseau Hospital, Fetal Medicine Department, APHPParisFrance
| | - Patricia Morgan
- American College of Medical Genetics and Genomics, Newborn Screening Translational Research NetworkBethesdaMarylandUSA
| | - Bimal P. Chaudhari
- Institute for Genomic MedicineNationwide Children's HospitalColumbusOhioUSA
| | - Isabel Filges
- University Hospital Basel and University of Basel, Medical GeneticsBaselSwitzerland
| | - Teresa N. Sparks
- Department of Obstetrics, Gynecology, & Reproductive SciencesUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
| | - Pablo Lapunzina
- CIBERER and Hospital Universitario La Paz, INGEMM‐Institute of Medical and Molecular GeneticsMadridSpain
| | - Tony Roscioli
- Neuroscience Research Australia (NeuRA), University of New South WalesSydneyNew South WalesAustralia
| | - Umber Agarwal
- Department of Maternal and Fetal MedicineLiverpool Women's NHS Foundation TrustLiverpoolUK
| | - Shagun Aggarwal
- Department of Medical GeneticsNizam's Institute of Medical SciencesHyderabadTelanganaIndia
| | - Claire Beneteau
- Service de Génétique Médicale, UF 9321 de Fœtopathologie et Génétique, CHU de NantesNantesFrance
| | - Pilar Cacheiro
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | - Leigh C. Carmody
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| | | | - Esther A. Dempsey
- St George's University of London, Molecular and Clinical Sciences Research InstituteLondonUK
| | - Andreas Dufke
- University of Tübingen, Institute of Medical Genetics and Applied GenomicsTübingenGermany
| | | | | | - Jessica L. Giordano
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Ragnhild Glad
- Department of Obstetrics and GynecologyUniversity Hospital of North NorwayTromsøNorway
| | - Ieva Grinfelde
- Department of Medical Genetics and Prenatal diagnosisChildren's University HospitalRigaLatvia
| | - Dominic G. Iliescu
- Department of Obstetrics and GynecologyUniversity of Medicine and Pharmacy CraiovaCraiovaDoljRomania
| | - Markus S. Ladewig
- Department of OphthalmologyKlinikum SaarbrückenSaarbrückenSaarlandGermany
| | - Monica C. Munoz‐Torres
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Marzia Pollazzon
- Azienda USL‐IRCCS di Reggio EmiliaMedical Genetics UnitReggio EmiliaItaly
| | | | - Carlota Rodo
- Vall d'Hebron Hospital Campus, Maternal & Fetal MedicineBarcelonaSpain
| | - Raquel Gouveia Silva
- Hospital Santa Maria, Serviço de Genética, Departamento de PediatriaHospital de Santa Maria, Centro Hospitalar Universitário Lisboa Norte, Centro Académico de Medicina de LisboaLisboaPortugal
| | - Damian Smedley
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | | | - Sabrina Toro
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Irene Valenzuela
- Hospital Vall d'Hebron, Clinical and Molecular Genetics AreaBarcelonaSpain
| | - Nicole A. Vasilevsky
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Ronald J. Wapner
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Roni Zemet
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexasUSA
| | - Melissa A Haendel
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Peter N. Robinson
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| |
Collapse
|
15
|
Fisher ME, Segerdell E, Matentzoglu N, Nenni MJ, Fortriede JD, Chu S, Pells TJ, Osumi-Sutherland D, Chaturvedi P, James-Zorn C, Sundararaj N, Lotay VS, Ponferrada V, Wang DZ, Kim E, Agalakov S, Arshinoff BI, Karimi K, Vize PD, Zorn AM. The Xenopus phenotype ontology: bridging model organism phenotype data to human health and development. BMC Bioinformatics 2022; 23:99. [PMID: 35317743 PMCID: PMC8939077 DOI: 10.1186/s12859-022-04636-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results Here we present the Xenopus phenotype ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype–phenotype data that can be directly related to other uPheno compliant resources. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04636-8.
Collapse
Affiliation(s)
- Malcolm E Fisher
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Erik Segerdell
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nicolas Matentzoglu
- Monarch Initiative, London, UK.,Semanticly Ltd, London, UK.,European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Mardi J Nenni
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Joshua D Fortriede
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Stanley Chu
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Troy J Pells
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | | | - Praneet Chaturvedi
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Christina James-Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nivitha Sundararaj
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Vaneet S Lotay
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Virgilio Ponferrada
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Dong Zhuo Wang
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Eugene Kim
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Sergei Agalakov
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Bradley I Arshinoff
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Kamran Karimi
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Peter D Vize
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Aaron M Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
16
|
Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med 2022; 14:23. [PMID: 35220969 PMCID: PMC8883622 DOI: 10.1186/s13073-022-01026-w] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
AbstractRare diseases affect 30 million people in the USA and more than 300–400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25–35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.
Collapse
|
17
|
Bradford YM, Van Slyke CE, Ruzicka L, Singer A, Eagle A, Fashena D, Howe DG, Frazer K, Martin R, Paddock H, Pich C, Ramachandran S, Westerfield M. Zebrafish Information Network, the knowledgebase for Danio rerio research. Genetics 2022; 220:6528852. [PMID: 35166825 PMCID: PMC8982015 DOI: 10.1093/genetics/iyac016] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/18/2022] [Indexed: 11/24/2022] Open
Abstract
The Zebrafish Information Network (zfin.org) is the central repository for Danio rerio genetic and genomic data. The Zebrafish Information Network has served the zebrafish research community since 1994, expertly curating, integrating, and displaying zebrafish data. Key data types available at the Zebrafish Information Network include, but are not limited to, genes, alleles, human disease models, gene expression, phenotype, and gene function. The Zebrafish Information Network makes zebrafish research data Findable, Accessible, Interoperable, and Reusable through nomenclature, curatorial and annotation activities, web interfaces, and data downloads. Recently, the Zebrafish Information Network and 6 other model organism knowledgebases have collaborated to form the Alliance of Genome Resources, aiming to develop sustainable genome information resources that enable the use of model organisms to understand the genetic and genomic basis of human biology and disease. Here, we provide an overview of the data available at the Zebrafish Information Network including recent updates to the gene page to provide access to single-cell RNA sequencing data, links to Alliance web pages, ribbon diagrams to summarize the biological systems and Gene Ontology terms that have annotations, and data integration with the Alliance of Genome Resources.
Collapse
Affiliation(s)
- Yvonne M Bradford
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Ceri E Van Slyke
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Leyla Ruzicka
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Amy Singer
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Anne Eagle
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - David Fashena
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Douglas G Howe
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Ken Frazer
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Ryan Martin
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Holly Paddock
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Christian Pich
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Sridhar Ramachandran
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| | - Monte Westerfield
- The Institute of Neuroscience, University of Oregon, Eugene, Oregon 97403-1254, USA
| |
Collapse
|
18
|
Phenotyping in the era of genomics: MaTrics—a digital character matrix to document mammalian phenotypic traits. Mamm Biol 2021. [DOI: 10.1007/s42991-021-00192-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractA new and uniquely structured matrix of mammalian phenotypes, MaTrics (Mammalian Traits for Comparative Genomics) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics.MaTrics was developed within a project aimed to find genetic causes of phenotypic traits of mammals using Forward Genomics. This approach requires genomes and comprehensive and recorded information on homologous phenotypes that are coded as discrete categories in a matrix. MaTrics is an evolving online resource providing information on phenotypic traits in numeric code; traits are coded either as absent/present or with several states as multistate. The state record for each species is linked to at least one reference (e.g., literature, photographs, histological sections, CT scans, or museum specimens) and so MaTrics contributes to digitalization of museum collections. Currently, MaTrics covers 147 mammalian species and includes 231 characters related to structure, morphology, physiology, ecology, and ethology and available in a machine actionable NEXUS-format*. Filling MaTrics revealed substantial knowledge gaps, highlighting the need for phenotyping efforts. Studies based on selected data from MaTrics and using Forward Genomics identified associations between genes and certain phenotypes ranging from lifestyles (e.g., aquatic) to dietary specializations (e.g., herbivory, carnivory). These findings motivate the expansion of phenotyping in MaTrics by filling research gaps and by adding taxa and traits. Only databases like MaTrics will provide machine actionable information on phenotypic traits, an important limitation to genomics. MaTrics is available within the data repository Morph·D·Base (www.morphdbase.de).
Collapse
|
19
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
20
|
Tarasov S, Mikó I, Yoder MJ. ontoFAST: An R package for interactive and semi‐automatic annotation of characters with biological ontologies. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Sergei Tarasov
- Finnish Museum of Natural History Helsinki Finland
- National Institute for Mathematical and Biological Synthesis University of Tennessee Knoxville TN USA
| | | | | |
Collapse
|
21
|
Konopka T, Vestito L, Smedley D. Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function. BIOINFORMATICS ADVANCES 2021; 1:vbab026. [PMID: 34870209 PMCID: PMC8633315 DOI: 10.1093/bioadv/vbab026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/09/2021] [Accepted: 10/07/2021] [Indexed: 01/27/2023]
Abstract
Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53 000 mouse models with mutations in over 15 000 genomic markers and characterized by more than 254 000 annotations using more than 9000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases. AVAILABILITY AND IMPLEMENTATION Source code for analysis scripts is available on GitHub at https://github.com/tkonopka/mouse-embeddings. The data underlying this article are available in Zenodo at https://doi.org/10.5281/zenodo.4916171. CONTACT t.konopka@qmul.ac.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Tomasz Konopka
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK,To whom correspondence should be addressed.
| | - Letizia Vestito
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK,Ear Institute, University College London, WC1X 8EE London, UK,Great Ormond Street Institute of Child Health, University College London, WC1N 1EH London, UK
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ London, UK
| |
Collapse
|
22
|
Konopka T, Ng S, Smedley D. Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base. PLoS Comput Biol 2021; 17:e1009283. [PMID: 34379637 PMCID: PMC8382188 DOI: 10.1371/journal.pcbi.1009283] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 08/23/2021] [Accepted: 07/16/2021] [Indexed: 11/20/2022] Open
Abstract
Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.
Collapse
Affiliation(s)
- Tomasz Konopka
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Sandra Ng
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
23
|
Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021; 37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, many computational methods have been developed to
incorporate information about phenotypes for disease–gene
prioritization task. These methods generally compute the similarity between
a patient’s phenotypes and a database of gene-phenotype to find the
most phenotypically similar match. The main limitation in these methods is
their reliance on knowledge about phenotypes associated with particular
genes, which is not complete in humans as well as in many model organisms,
such as the mouse and fish. Information about functions of gene products and
anatomical site of gene expression is available for more genes and can also
be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical
ontologies, which is able to exploit axioms in ontologies and other
graph-structured data. Using our machine-learning method, we embed genes
based on their associated phenotypes, functions of the gene products and
anatomical location of gene expression. We then develop a machine-learning
model to predict gene–disease associations based on the associations
between genes and multiple biomedical ontologies, and this model
significantly improves over state-of-the-art methods. Furthermore, we extend
phenotype-based gene prioritization methods significantly to all genes,
which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Supplementary information Supplementary data
are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Chen
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.,Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| |
Collapse
|
24
|
Nguyen QH, Le DH. Similarity Calculation, Enrichment Analysis, and Ontology Visualization of Biomedical Ontologies using UFO. Curr Protoc 2021; 1:e115. [PMID: 33900688 DOI: 10.1002/cpz1.115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The rapid growth of biomedical ontologies observed in recent years has been reported to be useful in various applications. In this article, we propose two main-function protocols-term-related and entity-related-with the three most common ontology analyses, including similarity calculation, enrichment analysis, and ontology visualization, which can be done by separate methods. Many previously developed tools implementing those methods run on different platforms and implement a limited number of the methods for similarity calculation and enrichment analysis tools for a specific type of biomedical ontology, although any type can be acceptable. Moreover, depending on each application, methods have distinct advantages; thus, the greater the number of methods a tool has, the better decisions that users make. The protocol here implements all the analyses above using an advanced popular tool called UFO. UFO is a Cytoscape app that unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for biomedical ontologies in OBO format, which can calculate the similarity between two sets of entities and weigh imported entity networks, as well as generate functional similarity networks. The complete protocol can be performed in 30 min and is designed for use by biologists with no prior bioinformatics training. © 2021 Wiley Periodicals LLC. Basic Protocol: Running UFO using a list of input Gene Ontology, Disease Ontology, or Human Phenotype Ontology data.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
| |
Collapse
|
25
|
Masuya H, Usuda D, Nakata H, Yuhara N, Kurihara K, Namiki Y, Iwase S, Takada T, Tanaka N, Suzuki K, Yamagata Y, Kobayashi N, Yoshiki A, Kushida T. Establishment and application of information resource of mutant mice in RIKEN BioResource Research Center. Lab Anim Res 2021; 37:6. [PMID: 33455583 PMCID: PMC7811887 DOI: 10.1186/s42826-020-00068-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/21/2020] [Indexed: 12/12/2022] Open
Abstract
Online databases are crucial infrastructures to facilitate the wide effective and efficient use of mouse mutant resources in life sciences. The number and types of mouse resources have been rapidly growing due to the development of genetic modification technology with associated information of genomic sequence and phenotypes. Therefore, data integration technologies to improve the findability, accessibility, interoperability, and reusability of mouse strain data becomes essential for mouse strain repositories. In 2020, the RIKEN BioResource Research Center released an integrated database of bioresources including, experimental mouse strains, Arabidopsis thaliana as a laboratory plant, cell lines, microorganisms, and genetic materials using Resource Description Framework-related technologies. The integrated database shows multiple advanced features for the dissemination of bioresource information. The current version of our online catalog of mouse strains which functions as a part of the integrated database of bioresources is available from search bars on the page of the Center (https://brc.riken.jp) and the Experimental Animal Division (https://mus.brc.riken.jp/) websites. The BioResource Research Center also released a genomic variation database of mouse strains established in Japan and Western Europe, MoG+ (https://molossinus.brc.riken.jp/mogplus/), and a database for phenotype-phenotype associations across the mouse phenome using data from the International Mouse Phenotyping Platform. In this review, we describe features of current version of databases related to mouse strain resources in RIKEN BioResource Research Center and discuss future views.
Collapse
Affiliation(s)
- Hiroshi Masuya
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.
| | - Daiki Usuda
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Hatsumi Nakata
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Naomi Yuhara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Keiko Kurihara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuri Namiki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Shigeru Iwase
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Toyoyuki Takada
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Nobuhiko Tanaka
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Kenta Suzuki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuki Yamagata
- Laboratory for Developmental Dynamics, Center for Biosystems Dynamics Research, RIKEN, Kobe, Japan
| | - Norio Kobayashi
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.,Data Knowledge Organization Unit, Head Office for Information Systems and Cybersecurity, RIKEN, Wako, Japan
| | - Atsushi Yoshiki
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Tatsuya Kushida
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| |
Collapse
|
26
|
Reynolds T, Johnson EC, Huggett SB, Bubier JA, Palmer RHC, Agrawal A, Baker EJ, Chesler EJ. Interpretation of psychiatric genome-wide association studies with multispecies heterogeneous functional genomic data integration. Neuropsychopharmacology 2021; 46:86-97. [PMID: 32791514 PMCID: PMC7688940 DOI: 10.1038/s41386-020-00795-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 07/27/2020] [Accepted: 07/29/2020] [Indexed: 02/08/2023]
Abstract
Genome-wide association studies and other discovery genetics methods provide a means to identify previously unknown biological mechanisms underlying behavioral disorders that may point to new therapeutic avenues, augment diagnostic tools, and yield a deeper understanding of the biology of psychiatric conditions. Recent advances in psychiatric genetics have been made possible through large-scale collaborative efforts. These studies have begun to unearth many novel genetic variants associated with psychiatric disorders and behavioral traits in human populations. Significant challenges remain in characterizing the resulting disease-associated genetic variants and prioritizing functional follow-up to make them useful for mechanistic understanding and development of therapeutics. Model organism research has generated extensive genomic data that can provide insight into the neurobiological mechanisms of variant action, but a cohesive effort must be made to establish which aspects of the biological modulation of behavioral traits are evolutionarily conserved across species. Scalable computing, new data integration strategies, and advanced analysis methods outlined in this review provide a framework to efficiently harness model organism data in support of clinically relevant psychiatric phenotypes.
Collapse
Affiliation(s)
- Timothy Reynolds
- The Jackson Laboratory, Bar Harbor, ME, USA
- Computer Science Department, Baylor University, Waco, TX, USA
| | - Emma C Johnson
- Department of Psychiatry, Washington University in St Louis, St Louis, MO, USA
| | | | | | | | - Arpana Agrawal
- Department of Psychiatry, Washington University in St Louis, St Louis, MO, USA
| | - Erich J Baker
- Computer Science Department, Baylor University, Waco, TX, USA
| | | |
Collapse
|
27
|
Ontological representation, classification and data-driven computing of phenotypes. J Biomed Semantics 2020; 11:15. [PMID: 33349245 PMCID: PMC7751121 DOI: 10.1186/s13326-020-00230-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 11/03/2020] [Indexed: 11/21/2022] Open
Abstract
Background The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term ‘phenotype’ has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimisation. In the context of a methodological use case ‘phenotype pipeline’ (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms. Results In this article, we present a Core Ontology of Phenotypes (COP) and the software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model, classify and compute phenotypes from already available data. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated with selected phenotypes including SOFA score, socio-economic status, body surface area and WHO BMI classification based on available medical data. Conclusions We developed a novel ontology-based method to model phenotypes of living beings with the aim of automated phenotype reasoning based on available data. This new approach can be used in clinical context, e.g., for supporting the diagnostic process, evaluating risk factors, and recruiting appropriate participants for clinical and epidemiological studies.
Collapse
|
28
|
DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput Biol 2020; 16:e1008453. [PMID: 33206638 PMCID: PMC7710064 DOI: 10.1371/journal.pcbi.1008453] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 12/02/2020] [Accepted: 10/20/2020] [Indexed: 12/21/2022] Open
Abstract
Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases. Gene–phenotype associations can help to understand the underlying mechanisms of many genetic diseases. However, experimental identification, often involving animal models, is time consuming and expensive. Computational methods that predict gene–phenotype associations can be used instead. We developed DeepPheno, a novel approach for predicting the phenotypes resulting from a loss of function of a single gene. We use gene functions and gene expression as information to prediction phenotypes. Our method uses a neural network classifier that is able to account for hierarchical dependencies between phenotypes. We extensively evaluate our method and compare it with related approaches, and we show that DeepPheno results in better performance in several evaluations. Furthermore, we found that many of the new predictions made by our method have been added to phenotype association databases released one year later. Overall, DeepPheno simulates some aspects of human physiology and how molecular and physiological alterations lead to abnormal phenotypes.
Collapse
|
29
|
Thessen AE, Walls RL, Vogt L, Singer J, Warren R, Buttigieg PL, Balhoff JP, Mungall CJ, McGuinness DL, Stucky BJ, Yoder MJ, Haendel MA. Transforming the study of organisms: Phenomic data models and knowledge bases. PLoS Comput Biol 2020; 16:e1008376. [PMID: 33232313 PMCID: PMC7685442 DOI: 10.1371/journal.pcbi.1008376] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- Ronin Institute for Independent Scholarship, Monclair, New Jersey, United States of America
| | - Ramona L. Walls
- Bio5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | | | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, Champaign, Illinois, United States of America
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
30
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2020; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
31
|
Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, Keith D, Conlin T, Vasilevsky N, Zhang XA, Balhoff JP, Babb L, Bello SM, Blau H, Bradford Y, Carbon S, Carmody L, Chan LE, Cipriani V, Cuzick A, Della Rocca M, Dunn N, Essaid S, Fey P, Grove C, Gourdine JP, Hamosh A, Harris M, Helbig I, Hoatlin M, Joachimiak M, Jupp S, Lett KB, Lewis SE, McNamara C, Pendlington ZM, Pilgrim C, Putman T, Ravanmehr V, Reese J, Riggs E, Robb S, Roncaglia P, Seager J, Segerdell E, Similuk M, Storm AL, Thaxon C, Thessen A, Jacobsen JOB, McMurry JA, Groza T, Köhler S, Smedley D, Robinson PN, Mungall CJ, Haendel MA, Munoz-Torres MC, Osumi-Sutherland D. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2020; 48:D704-D715. [PMID: 31701156 PMCID: PMC7056945 DOI: 10.1093/nar/gkz997] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/09/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022] Open
Abstract
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven’t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.
Collapse
Affiliation(s)
- Kent A Shefchek
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Michael Gargano
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Nicolas Matentzoglu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Matthew Brush
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Daniel Keith
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Tom Conlin
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - James P Balhoff
- Renaissance Computing Institute at UNC, Chapel Hill, NC 27517, USA
| | - Larry Babb
- Broad Institute, Cambridge, MA 02142, USA
| | | | - Hannah Blau
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Yvonne Bradford
- Institute of Neuroscience, University of Oregon, Eugene, OR 97401, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Leigh Carmody
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Valentina Cipriani
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Maria Della Rocca
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Shahim Essaid
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Petra Fey
- dictyBase, Center for Genetic Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Chris Grove
- California Institute of Technology, Pasadena, CA 91125, USA
| | - Jean-Phillipe Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany.,Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Maureen Hoatlin
- Department of Biochemistry and Molecular Biology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Marcin Joachimiak
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kenneth B Lett
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | | | - Zoë M Pendlington
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Tim Putman
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Vida Ravanmehr
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Erin Riggs
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA 17837, USA
| | - Sofia Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Erik Segerdell
- Xenbase, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrea L Storm
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Courtney Thaxon
- University of North Carolina Medical School, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Anne Thessen
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Julie A McMurry
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | | | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Melissa A Haendel
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA.,Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Monica C Munoz-Torres
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
32
|
Wang RL. Semantic characterization of adverse outcome pathways. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2020; 222:105478. [PMID: 32278258 PMCID: PMC7393770 DOI: 10.1016/j.aquatox.2020.105478] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/17/2020] [Accepted: 03/23/2020] [Indexed: 05/09/2023]
Abstract
This study was undertaken to systematically assess the utilities and performance of ontology-based semantic analysis in adverse outcome pathway (AOP) research. With an increasing number of AOPs developed by scientific domain experts to organize toxicity information and facilitate chemical risk assessment, there is a pressing need for objective approaches to evaluate the biological coherence and quality of these AOPs. Powered by ontologies covering a wide range of biological domains, abundant phenotypic data annotated ontologically, and some sophisticated knowledge computing tools, semantic analysis has great potential in this area of application. With the events in the AOP-Wiki first annotated into logical definitions and then grouped into phenotypic profiles by individual AOPs, the coherence and quality of AOPs were assessed at several levels: paired key event relationships (KER), all possible event pair combinations within AOPs, and the phenotypic profiles of AOPs, genes, biological pathways, human diseases, and selected chemicals. The semantic similarities were assessed at all these levels based on a unified cross-species vertebrate phenotype ontology encompassing the logical definitions of AOP events as well as many other domain ontologies. A substantial number of KERs and AOPs in the AOP-Wiki were found to be semantically coherent. These same coherent AOPs also mapped to many more genes, pathways, and diseases biologically aligned with the intended chain of events therein leading to their respective adverse outcomes. Significantly, these findings imply that semantic analysis should also have utilities in developing future AOPs by selecting candidate events from either the existing AOP-Wiki events or a broader collection of ontology terms semantically similar to the molecular initiating events or adverse outcomes of interest. In addition, semantic analysis enabled AOP networks to be constructed at the level of phenotypic profiles based on similarities, complementing those based on event sharing by bringing genes, pathways, diseases, and chemicals into the networks too-thus greatly expanding the biological scope and our understanding of AOPs.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Great Lakes Toxicology & Ecology Division, Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency, Cincinnati, OH, 45268, USA.
| |
Collapse
|
33
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
34
|
Konopka T, Smedley D. Incremental data integration for tracking genotype-disease associations. PLoS Comput Biol 2020; 16:e1007586. [PMID: 31986132 PMCID: PMC7004389 DOI: 10.1371/journal.pcbi.1007586] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 02/06/2020] [Accepted: 12/03/2019] [Indexed: 12/30/2022] Open
Abstract
Functional annotation of genes remains a challenge in fundamental biology and is a limiting factor for translational medicine. Computational approaches have been developed to process heterogeneous data into meaningful metrics, but often do not address how findings might be updated when new evidence comes to light. To address this challenge, we describe requirements for a framework for incremental data integration and propose an implementation based on phenotype ontologies and Bayesian probability updates. We apply the framework to quantify similarities between gene annotations and disease profiles. Within this scope, we categorize human diseases according to how well they can be recapitulated by animal models and quantify similarities between human diseases and mouse models produced by the International Mouse Phenotyping Consortium. The flexibility of the approach allows us to incorporate negative phenotypic data to better prioritize candidate genes, and to stratify disease mapping using sex-dependent phenotypes. All our association scores can be updated and we exploit this feature to showcase integration with curated annotations from high-precision assays. Incremental integration is thus a suitable framework for tracking functional annotations and linking to complex human pathology. Human diseases are often caused or influenced by genetic factors. The link between a particular gene and a specific disease is well-established in some cases. However, the roles of many genes are still unclear and many diseases do not have an understood genetic mechanism. Dissecting such interactions requires using a range of experimental approaches and assessing the results in a holistic manner. Computational methods already exist for comparing phenotypes observed in models and patients, and they work well when the phenotypes are detailed. In this work we argue that algorithms should also be able to report meaningful assessments based on preliminary data, and to update reports in a coherent manner when new information comes to light. These requirements lead to specific mathematical properties, which define incremental integration. We implement these requirements in a computational framework. We study the extent individual rare human diseases might be recapitulated by animal models. We compute gene-disease associations using data from public resources, including previously unused negative data. Altogether, these examples illustrate the framework can use observations in model systems to track gene-disease associations in the human context.
Collapse
Affiliation(s)
- Tomasz Konopka
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- * E-mail: (TK); (DS)
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- * E-mail: (TK); (DS)
| |
Collapse
|
35
|
Finding relationships among biological entities. LOGIC AND CRITICAL THINKING IN THE BIOMEDICAL SCIENCES 2020. [PMCID: PMC7499094 DOI: 10.1016/b978-0-12-821364-3.00005-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Confusion over the concepts of “relationships” and “similarities” lies at the heart of many battles over the direction and intent of research projects. Here is a short story that demonstrates the difference between the two concepts: You look up at the clouds, and you begin to see the shape of a lion. The cloud has a tail, like a lion’s tale, and a fluffy head, like a lion’s mane. With a little imagination the mouth of the lion seems to roar down from the sky. You have succeeded in finding similarities between the cloud and a lion. If you look at a cloud and you imagine a tea kettle producing a head of steam and you recognize that the physical forces that create a cloud and the physical forces that produced steam from a heated kettle are the same, then you have found a relationship. Most popular classification algorithms operate by grouping together data objects that have similar properties or values. In so doing, they may miss finding the true relationships among objects. Traditionally, relationships among data objects are discovered by an intellectual process. In this chapter, we will discuss the scientific gains that come when we classify biological entities by relationships, not by their similarities.
Collapse
|
36
|
|
37
|
Shen F, Peng S, Fan Y, Wen A, Liu S, Wang Y, Wang L, Liu H. HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology. J Biomed Inform 2019; 96:103246. [PMID: 31255713 DOI: 10.1016/j.jbi.2019.103246] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/25/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph. METHODS In this study, we developed a framework, HPO2Vec+, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, node2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes. RESULTS The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships. CONCLUSION We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | - Suyuan Peng
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; The Second Clinical College Guangzhou University of Chinese Medicine, China
| | - Yadan Fan
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
38
|
Pan Q, Wei J, Guo F, Huang S, Gong Y, Liu H, Liu J, Li L. Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and Phenomics. BMC Genomics 2019; 20:443. [PMID: 31159731 PMCID: PMC6547493 DOI: 10.1186/s12864-019-5812-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 05/20/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Trait ontology (TO) analysis is a powerful system for functional annotation and enrichment analysis of genes. However, given the complexity of the molecular mechanisms underlying phenomes, only a few hundred gene-to-TO relationships in plants have been elucidated to date, limiting the pace of research in this "big data" era. RESULTS Here, we curated all the available trait associated sites (TAS) information from 79 association mapping studies of maize (Zea mays L.) and rice (Oryza sativa L.) lines with diverse genetic backgrounds and built a large-scale TAS-derived TO system for functional annotation of genes in various crops. Our TO system contains information for up to 18,042 genes (6345 in maize at the 25 k level and 11,697 in rice at the 50 k level), including gene-to-TO relationships, which covers over one fifth of the annotated gene sets for maize and rice. A comparison of Gene Ontology (GO) vs. TO analysis demonstrated that the TAS-derived TO system is an efficient alternative tool for gene functional annotation and enrichment analysis. We therefore combined information from the TO, GO, metabolic pathway, and co-expression network databases and constructed the TAS system, which is publicly available at http://tas.hzau.edu.cn . TAS provides a user-friendly interface for functional annotation of genes, enrichment analysis, genome-wide extraction of trait-associated genes, and crosschecking of different functional annotation databases. CONCLUSIONS TAS bridges the gap between genomic and phenomic information in crops. This easy-to-use tool will be useful for geneticists, biologists, and breeders in the agricultural community, as it facilitates the dissection of molecular mechanisms conferring agronomic traits in an easy, genome-wide manner.
Collapse
Affiliation(s)
- Qingchun Pan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Junfeng Wei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Feng Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Suiyong Huang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yong Gong
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
39
|
Prieto-González D, Castilla-Rodríguez I, González E, Couce ML. Towards the automated economic assessment of newborn screening for rare diseases. J Biomed Inform 2019; 95:103216. [PMID: 31128259 DOI: 10.1016/j.jbi.2019.103216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 05/17/2019] [Accepted: 05/18/2019] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Economic assessments of newborn screening programs for rare diseases involve the use of models and require huge efforts to synthesize information from different sources. Sharing and automatically or semi-automatically reusing this information for new assessments would be desirable, but it is not possible nowadays due to the lack of suitable tools. MATERIAL AND METHODS We designed and implemented the Rare Diseases Ontology for Simulation (RaDiOS) after performing two reviews, and critically appraising the existing data repositories on rare diseases. The first review involved previous published economic assessments, and served to identify the main parameters required to model newborn screening. The second review aimed at locating existing data repositories potentially available to inform these parameters. RESULTS We found key model parameters on epidemiology, screening methods, diagnose methods, pathogenesis, treatment and follow-up tests. We also identified seven data repositories directly related to rare diseases. None of such repositories was well-suited for the automated generation of simulation models. We incorporated the identified parameters as structured classes and properties of the new ontology (RaDiOS). We carefully set the relationships among the parameters so to allow automated inference from the ontology. CONCLUSIONS RaDiOS is an ontology that serves as a data repository to automatically build simulation models for the economic assessment of newborn screening for rare diseases.
Collapse
Affiliation(s)
- David Prieto-González
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain
| | - Iván Castilla-Rodríguez
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain; Spanish Network of Health Services Research for Chronic Diseases (REDISSEC), Tenerife, Spain.
| | - Evelio González
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain
| | - María L Couce
- Unidad de Diagnóstico y Tratamiento de Enfermedades Metabólicas Congénitas, Servicio de Neonatología, Hospital Clínico Universitario de Santiago, Departamento de Pediatría, IDIS, CIBERER, Santiago de Compostela, La Coruña, Spain
| |
Collapse
|
40
|
Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019; 13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Improving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms. Results In this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms. Conclusions Compared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.
Collapse
Affiliation(s)
- Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
41
|
Wang RL, Edwards S, Ives C. Ontology-based semantic mapping of chemical toxicities. Toxicology 2018; 412:89-100. [PMID: 30468866 DOI: 10.1016/j.tox.2018.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 11/11/2018] [Accepted: 11/19/2018] [Indexed: 12/15/2022]
Abstract
This study was undertaken to evaluate the use of ontology-based semantic mapping (OS-Mapping) in chemical toxicity assessment. Nineteen chemical-species phenotypic profiles (CSPPs) were constructed by ontologically annotating the toxicity responses reported in more than seven hundred published studies of ten chemicals on six vertebrate species. The CSPPs were semantically compared to more than 29,000 publicly available phenotypic profiles of genes, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, and diseases based on a cross-species phenotype ontology. OS-Mapping was shown to differentiate chemical toxicities among themselves as well as within and across species. It also revealed cases of chemical by species interactions. In addition to confirming similar MOAs (mechanisms of action) for a few chemicals, OS-Mapping also generated novel insights into the MOAs underlying some seemingly different, yet phenotypically similar, classes of chemicals. The nature of a unified cross-species phenotype ontology and its representation of diverse knowledge domains allowed the construction of a complete phenotypic continuum for the 17α-ethynylestradiol_fathead minnow across the biological levels of organization, which complemented a similar one derived from the Comparative Toxicogenomics Database but based primarily on 17α-ethynylestradiol-induced molecular phenotypes. Overall, OS-Mapping has been demonstrated to offer a powerful approach to help bridge the gap between the molecular and non-molecular phenotypes of chemicals characterized by using high throughput or traditional omics methods and their apical endpoints of greater regulatory relevance, which are typically phenotypes found at the higher levels of biological organization. OS-Mapping also enables comparative toxicity assessment among chemicals, both within and across species. Furthermore, the semantic analysis of phenotypes can reveal additional novel MOAs for some well-known chemicals and discover candidate MOAs for chemicals that are less molecularly characterized. A full phenotypic continuum based on OS-Mapping will also be conducive to the future development of adverse outcome pathways. As phenomics continues to advance and the ontological annotation of literature becomes more automated, the power of OS-Mapping will be further enhanced.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Exposure Methods and Measurements Division, National Exposure Research Laboratory, US EPA, Cincinnati, OH 45268, USA.
| | - Stephen Edwards
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| | - Cataia Ives
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| |
Collapse
|
42
|
Hunter FMI, L Atkinson F, Bento AP, Bosc N, Gaulton A, Hersey A, Leach AR. A large-scale dataset of in vivo pharmacology assay results. Sci Data 2018; 5:180230. [PMID: 30351302 PMCID: PMC6206617 DOI: 10.1038/sdata.2018.230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 09/03/2018] [Indexed: 12/17/2022] Open
Abstract
ChEMBL is a large-scale, open-access drug discovery resource containing bioactivity
information primarily extracted from scientific literature. A substantial dataset of more
than 135,000 in vivo assays has been collated as a key resource of animal models for
translational medicine within drug discovery. To improve the utility of the in vivo
data, an extensive data curation task has been undertaken that allows the assays to be
grouped by animal disease model or phenotypic endpoint. The dataset contains previously
unavailable information about compounds or drugs tested in animal models and, in conjunction
with assay data on protein targets or cell- or tissue- based systems, allows the
investigation of the effects of compounds at differing levels of biological complexity.
Equally, it enables researchers to identify compounds that have been investigated for a
group of disease-, pharmacology- or toxicity-relevant assays.
Collapse
Affiliation(s)
- Fiona M I Hunter
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Francis L Atkinson
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - A Patrícia Bento
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nicolas Bosc
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Anna Gaulton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Anne Hersey
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R Leach
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
43
|
Affiliation(s)
- Melissa A Haendel
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| | - Christopher G Chute
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| | - Peter N Robinson
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| |
Collapse
|
44
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
45
|
Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018; 19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs. Results We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network. Conclusions PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China. .,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
46
|
Baker LR, Weasner BM, Nagel A, Neuman SD, Bashirullah A, Kumar JP. Eyeless/Pax6 initiates eye formation non-autonomously from the peripodial epithelium. Development 2018; 145:dev.163329. [PMID: 29980566 DOI: 10.1242/dev.163329] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 06/27/2018] [Indexed: 01/08/2023]
Abstract
The transcription factor Pax6 is considered the master control gene for eye formation because (1) it is present within the genomes and retina/lens of all animals with a visual system; (2) severe retinal defects accompany its loss; (3) Pax6 genes have the ability to substitute for one another across the animal kingdom; and (4) Pax6 genes are capable of inducing ectopic eye/lens in flies and mammals. Many roles of Pax6 were first elucidated in Drosophila through studies of the gene eyeless (ey), which controls both growth of the entire eye-antennal imaginal disc and fate specification of the eye. We show that Ey also plays a surprising role within cells of the peripodial epithelium to control pattern formation. It regulates the expression of decapentaplegic (dpp), which is required for initiation of the morphogenetic furrow in the eye itself. Loss of Ey within the peripodial epithelium leads to the loss of dpp expression within the eye, failure of the furrow to initiate, and abrogation of retinal development. These findings reveal an unexpected mechanism for how Pax6 controls eye development in Drosophila.
Collapse
Affiliation(s)
- Luke R Baker
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Bonnie M Weasner
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Athena Nagel
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Sarah D Neuman
- Department of Pharmaceutical Sciences, University of Wisconsin, Madison, WI 53705, USA
| | - Arash Bashirullah
- Department of Pharmaceutical Sciences, University of Wisconsin, Madison, WI 53705, USA
| | - Justin P Kumar
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
47
|
Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018; 6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open
Abstract
Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein—ontology term—disease relations. As an application of the proposed approach, HPO term—protein associations (i.e., HPO2protein) were predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO performance was among the best (Fmax = 0.35). The automated cross ontology mapping approach developed in this work may be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Cancer Systems Biology Laboratory (KanSiL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
48
|
Vogt L. Towards a semantic approach to numerical tree inference in phylogenetics. Cladistics 2018; 34:200-224. [PMID: 34645075 DOI: 10.1111/cla.12195] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/03/2017] [Indexed: 12/24/2022] Open
Abstract
Conventional approaches to phylogeny reconstruction require a character analysis step prior to and methodologically separated from a numerical tree inference step. The former results in a character matrix that contains the empirical data analysed in the latter. This separation of steps involves various methodological and conceptual problems (e.g. homology assessment independent of tree inference and character optimization, character dependencies, discounting of alternative homology hypotheses). In morphology, the character analysis step covers the stages of morphological comparative studies, homology assessment and the identification and coding of morphological characters. Unfortunately, only the last stage requires some formalism, whereas the preceding stages are commonly regarded to be pre-rational and intuitive, which is why their reproducibility and analytical accessibility is limited. Here, I introduce a rational for a semantic approach to numerical tree inference that uses sets of semantic instance anatomies as data source instead of character matrices, thereby avoiding the above-mentioned problems. A semantic instance anatomy is an ontology-based description of the anatomical organization of a specimen in the form of a semantic graph. The semantic approach to numerical tree inference combines and integrates the steps of character analysis and numerical tree inference and makes both analytically accessible and communicable. Before outlining first steps for a research programme dedicated to the semantic approach to numerical tree inference, I discuss in detail the methodological, conceptual, and computational challenges and requirements that first have to be dealt with before adequate algorithms can be developed.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, An der Immenburg 1, Bonn, D-53121, Germany
| |
Collapse
|
49
|
Dahdul W, Manda P, Cui H, Balhoff JP, Dececchi TA, Ibrahim N, Lapp H, Vision T, Mabee PM. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems. Database (Oxford) 2018; 2018:5255130. [PMID: 30576485 PMCID: PMC6301375 DOI: 10.1093/database/bay110] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 08/22/2018] [Accepted: 09/24/2018] [Indexed: 11/12/2022]
Abstract
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
Collapse
Affiliation(s)
| | - Prashanti Manda
- University of North Carolina at Greensboro, Greensboro, NC, USA
| | - Hong Cui
- University of Arizona, Tucson, AZ, USA
| | - James P Balhoff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - T Alexander Dececchi
- University of South Dakota, Vermillion, SD, USA
- Current affiliation: University of Pittsburgh at Johnstown, Johnstown, PA, USA
| | - Nizar Ibrahim
- University of Chicago, Chicago, IL, USA
- Current affiliation: University of Detroit Mercy, Detroit, MI, USA & University of Portsmouth, Portsmouth, UK
| | | | - Todd Vision
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
50
|
Sreeja A, Vinayan KP. Multidimensional knowledge-based framework is an essential step in the categorization of gene sets in complex disorders. J Bioinform Comput Biol 2017; 15:1750022. [DOI: 10.1142/s0219720017500226] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In complex disorders, collaborative role of several genes accounts for the multitude of symptoms and the discovery of molecular mechanisms requires proper understanding of pertinent genes. Majority of the recent techniques utilize either single information or consolidate the independent outlook from multiple knowledge sources for assisting the discovery of candidate genes. In any case, given that various sorts of heterogeneous sources are possibly significant for quality gene prioritization, every source bearing data not conveyed by another, we assert that a perfect strategy ought to give approaches to observe among them in a genuine integrative style that catches the degree of each, instead of utilizing a straightforward mix of sources. We propose a flexible approach that empowers multi-source information reconciliation for quality gene prioritization that augments the complementary nature of various learning sources so as to utilize the maximum information of aggregated data. To illustrate the proposed approach, we took Autism Spectrum Disorder (ASD) as a case study and validated the framework on benchmark studies. We observed that the combined ranking based on integrated knowledge reduces the false positive observations and boosts the performance when compared with individual rankings. The clinical phenotype validation for ASD shows that there is a significant linkage between top positioned genes and endophenotypes of ASD. Categorization of genes based on endophenotype associations by this method will be useful for further hypothesis generation leading to clinical and translational analysis. This approach may also be useful in other complex neurological and psychiatric disorders with a strong genetic component.
Collapse
Affiliation(s)
- A. Sreeja
- Department of Computer Science & IT, School of Arts and Sciences, Amrita University, Kochi, Kerala, India
| | - K. P. Vinayan
- Division of Paediatric Neurology, Department of Neurology, Amrita Institute of Medical Sciences, Amrita University, Kochi, Kerala, India
| |
Collapse
|