101
|
Zhang W, Zeng B, Yang M, Yang H, Wang J, Deng Y, Zhang H, Yao G, Wu S, Li W. ncRNAVar: A Manually Curated Database for Identification of Noncoding RNA Variants Associated with Human Diseases. J Mol Biol 2020; 433:166727. [PMID: 33275967 DOI: 10.1016/j.jmb.2020.166727] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 11/22/2020] [Accepted: 11/25/2020] [Indexed: 12/20/2022]
Abstract
While variants of noncoding RNAs (ncRNAs) have been experimentally validated as a new class of biomarkers and drug targets, the discovery and interpretation of relationships between ncRNA variants and human diseases become important and challenging. Here we present ncRNAVar (http://www.liwzlab.cn/ncrnavar/), the first database that provides association data between validated ncRNA variants and human diseases through manual curation on 2650 publications and computational annotation. ncRNAVar contains 4565 associations between 711 human disease phenotypes and 3112 variants from 2597 ncRNAs. Each association was reviewed by professional curators, incorporated with valuable annotation and cross references, and designated with an association score by our refined score model. ncRNAVar offers web applications including association prioritization, network visualization, and relationship mapping. ncRNAVar, presenting a landscape of ncRNA variants in human diseases and a useful resource for subsequent software development, will improve our insight of relationships between ncRNA variants and human health.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Binghui Zeng
- Guanghua School of Stomatology, Hospital of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou 510055, China
| | - Minglei Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Jianbo Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Yongjie Deng
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Guocai Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Song Wu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; Center for Precision Medicine, Sun Yat-sen University, Guangzhou 510080, China; Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou 510080, China.
| |
Collapse
|
102
|
Zhu Q, Nguyen DT, Grishagin I, Southall N, Sid E, Pariser A. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD). J Biomed Semantics 2020; 11:13. [PMID: 33183351 PMCID: PMC7663894 DOI: 10.1186/s13326-020-00232-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 11/05/2020] [Indexed: 01/16/2023] Open
Abstract
Background The Genetic and Rare Diseases (GARD) Information Center was established by the National Institutes of Health (NIH) to provide freely accessible consumer health information on over 6500 genetic and rare diseases. As the cumulative scientific understanding and underlying evidence for these diseases have expanded over time, existing practices to generate knowledge from these publications and resources have not been able to keep pace. Through determining the applicability of computational approaches to enhance or replace manual curation tasks, we aim to both improve the sustainability and relevance of consumer health information, but also to develop a foundational database, from which translational science researchers may start to unravel disease characteristics that are vital to the research process. Results We developed a meta-ontology based integrative knowledge graph for rare diseases in Neo4j. This integrative knowledge graph includes a total of 3,819,623 nodes and 84,223,681 relations from 34 different biomedical data resources, including curated drug and rare disease associations. Semi-automatic mappings were generated for 2154 unique FDA orphan designations to 776 unique GARD diseases, and 3322 unique FDA designated drugs to UNII, as well as 180,363 associations between drug and indication from Inxight Drugs, which were integrated into the knowledge graph. We conducted four case studies to demonstrate the capabilities of this integrative knowledge graph in accelerating the curation of scientific understanding on rare diseases through the generation of disease mappings/profiles and pathogenesis associations. Conclusions By integrating well-established database resources, we developed an integrative knowledge graph containing a large volume of biomedical and research data. Demonstration of several immediate use cases and limitations of this process reveal both the potential feasibility and barriers of utilizing graph-based resources and approaches to support their use by providers of consumer health information, such as GARD, that may struggle with the needs of maintaining knowledge reliant on an evolving and growing evidence-base. Finally, the successful integration of these datasets into a freely accessible knowledge graph highlights an opportunity to take a translational science view on the field of rare diseases by enabling researchers to identify disease characteristics, which may play a role in the translation of discover across different research domains.
Collapse
Affiliation(s)
- Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, 20850, USA.
| | - Dac-Trung Nguyen
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, 20850, USA
| | - Ivan Grishagin
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, 20850, USA
| | - Noel Southall
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, 20850, USA
| | - Eric Sid
- Office of Rare Disease Research, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| | - Anne Pariser
- Office of Rare Disease Research, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, 20892, USA
| |
Collapse
|
103
|
Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Shefchek KA, Good BM, Balhoff JP, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. PATTERNS (NEW YORK, N.Y.) 2020; 2:100155. [PMID: 33196056 PMCID: PMC7649624 DOI: 10.2196/13803.100155 10.1016/j.patter.2020.100155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.
Collapse
Affiliation(s)
- Justin T. Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA,Corresponding author
| | - Deepak Unni
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, USA
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, 20122 Milan, Italy
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kent A. Shefchek
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Benjamin M. Good
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C. Munoz-Torres
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA,Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Melissa A. Haendel
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Marcin P. Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
104
|
Zhu Q, Nguyen DT, Sid E, Pariser A. Leveraging the UMLS As a Data Standard for Rare Disease Data Normalization and Harmonization. Methods Inf Med 2020; 59:131-139. [PMID: 33147635 DOI: 10.1055/s-0040-1718940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
OBJECTIVE In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. METHODS We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. RESULTS We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). CONCLUSION The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.
Collapse
Affiliation(s)
- Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States
| | - Dac-Trung Nguyen
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States
| | - Eric Sid
- Office of Rare Diseases Research, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States
| | - Anne Pariser
- Office of Rare Diseases Research, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States
| |
Collapse
|
105
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
106
|
Rubinstein YR, Robinson PN, Gahl WA, Avillach P, Baynam G, Cederroth H, Goodwin RM, Groft SC, Hansson MG, Harris NL, Huser V, Mascalzoni D, McMurry JA, Might M, Nellaker C, Mons B, Paltoo DN, Pevsner J, Posada M, Rockett-Frase AP, Roos M, Rubinstein TB, Taruscio D, van Enckevort E, Haendel MA. The case for open science: rare diseases. JAMIA Open 2020; 3:472-486. [PMID: 33426479 PMCID: PMC7660964 DOI: 10.1093/jamiaopen/ooaa030] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/30/2020] [Accepted: 06/23/2020] [Indexed: 01/04/2023] Open
Abstract
The premise of Open Science is that research and medical management will progress faster if data and knowledge are openly shared. The value of Open Science is nowhere more important and appreciated than in the rare disease (RD) community. Research into RDs has been limited by insufficient patient data and resources, a paucity of trained disease experts, and lack of therapeutics, leading to long delays in diagnosis and treatment. These issues can be ameliorated by following the principles and practices of sharing that are intrinsic to Open Science. Here, we describe how the RD community has adopted the core pillars of Open Science, adding new initiatives to promote care and research for RD patients and, ultimately, for all of medicine. We also present recommendations that can advance Open Science more globally.
Collapse
Affiliation(s)
- Yaffa R Rubinstein
- Special Volunteer in the Office of Strategic Initiatives, National Library of Medicine, Bethesda, Maryland, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | - William A Gahl
- Undiagnosed Diseases Program and Office of the Clinical Director, National Human Genome Research Institute (NHGRI), National Institutes of Health, Bethesda, Maryland, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Telethon Kids Institute, Perth, Australia
| | | | - Rebecca M Goodwin
- Department of Health and Human Services, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Stephen C Groft
- NCATS, National Institutes of Health, Bethesda, Maryland, USA
| | - Mats G Hansson
- Center for Research Ethics and Bioethics, Uppsala Universitet, Uppsala, Sweden
| | - Nomi L Harris
- Department of Environmental Genomics & System Biology, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Vojtech Huser
- Department of Health and Human Services, NCBI, National Institutes of Health, Bethesda, Maryland, USA
| | - Deborah Mascalzoni
- Center for Research Ethics and Bioethics, Uppsala University, Sweden and EURAC Research, Bolzano, Italy
| | - Julie A McMurry
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon, USA
| | - Matthew Might
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Christoffer Nellaker
- Nuffield Department of Women's and Reproductive Health, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
| | - Dina N Paltoo
- Department of Health and Human Services, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Jonathan Pevsner
- Department of Neurology, Kennedy Krieger Institute and Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Manuel Posada
- Rare Diseases Research Institute & CIBERER, Instituto de Salud Carlos III, Madrid, Spain
| | | | - Marco Roos
- Human Genetics, Leiden University Medical Center, Leiden, Netherlands
| | - Tamar B Rubinstein
- Children Hospital at Montefiore/Albert Einstein College of Medicine—Pediatrics, Bronx, New York, USA
| | - Domenica Taruscio
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Esther van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Leiden, Netherlands
| | - Melissa A Haendel
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
107
|
Mostafa T, Abdel-Hamid IA, Taymour M, Ali OI. Gene Variants in Premature Ejaculation: Systematic Review and Future Directions. Sex Med Rev 2020; 8:586-602. [PMID: 32800770 DOI: 10.1016/j.sxmr.2020.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/06/2020] [Accepted: 07/11/2020] [Indexed: 02/05/2023]
Abstract
INTRODUCTION A growing number of genetic association studies have been performed to investigate the association between the genetic susceptibility alleles and the risk of premature ejaculation (PE); however, the results remain inconclusive. OBJECTIVES This systematic review aimed: (i) to determine whether an association exists between gene(s) or allelic variant(s) and PE; (ii) to assess whether the associations are consistent across studies in magnitude and direction, and (iii) to identify any limitation, gap, or shortcoming in the included studies. METHODS The literature search was conducted in PubMed, MEDLINE, Scopus, Cochrane Library, EMBASE, Academic Search Complete, Google Scholar, and CINAHL databases. RESULTS Different gene variants associated with PE were assessed. 25 genetic association studies met the inclusion criteria that investigated 11 genes, 2,624 men with PE compared with 9,346 men as controls, twins, and siblings. 19 studies demonstrated a significant association with PE, whereas 4 studies denied such a relationship. SLC6A4 gene polymorphism was investigated in 11 studies (7 studies demonstrated a significant relationship with PE, and 4 studies denied such a relationship). Dopamine transporter gene (DAT1) polymorphism was investigated in 4 studies exhibiting a significant relationship. Androgen receptor gene polymorphisms were investigated in 2 studies, 1 with a significant relationship and the other with a non-significant relationship. Oxytocin gene polymorphisms and tryptophan hydroxylase 2 gene polymorphisms were investigated in 2 studies with a significant relationship. CONCLUSION While this review has highlighted several genes that may be potentially associated with PE such as SLC6A4, limitations such as variance in study methods, lack of robust findings, small sample sizes, lack of reproducibility, quality of reporting, and quality of assessment remain a major concern. Further efforts such as standardizing reporting, exploring complementary designs, and the use of genome-wide association studies technology are warranted to test the reproducibility of these early findings. Mostafa T, Abdel-Hamid IA, Taymour M, et al. Gene Variants in Premature Ejaculation: Systematic Review and Future Directions. Sex Med Rev 2020;8:586-602.
Collapse
Affiliation(s)
- Taymour Mostafa
- Andrology, Sexology & STIs Department, Faculty of Medicine, Cairo University, Cairo, Egypt.
| | | | - Mai Taymour
- Dermatology & Andrology, Private Sector, Cairo, Egypt
| | - Omar I Ali
- Faculty of Medicine and Surgery, 6th October University, Giza, Egypt
| |
Collapse
|
108
|
Narang A, Uppilli B, Vivekanand A, Naushin S, Yadav A, Singhal K, Shamim U, Sharma P, Zahra S, Mathur A, Seth M, Parveen S, Vats A, Hillman S, Dolma P, Varma B, Jain V, Prasher B, Sengupta S, Mukerji M, Faruq M. Frequency spectrum of rare and clinically relevant markers in multiethnic Indian populations (ClinIndb): A resource for genomic medicine in India. Hum Mutat 2020; 41:1833-1847. [PMID: 32906206 DOI: 10.1002/humu.24102] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 08/17/2020] [Accepted: 08/28/2020] [Indexed: 12/18/2022]
Abstract
There have been concerted efforts toward cataloging rare and deleterious variants in different world populations using high-throughput genotyping and sequencing-based methods. The Indian population is underrepresented or its information with respect to clinically relevant variants is sparse in public data sets. The aim of this study was to estimate the burden of monogenic disease-causing variants in Indian populations. Toward this, we have assessed the frequency profile of monogenic phenotype-associated ClinVar variants. The study utilized a genotype data set (global screening array, Illumina) from 2795 individuals (multiple in-house genomics cohorts) representing diverse ethnic and geographically distinct Indian populations. Of the analyzed variants from Global Screening Array, ~9% were found to be informative and were either not known earlier or underrepresented in public databases in terms of their frequencies. These variants were linked to disorders, namely inborn errors of metabolism, monogenic diabetes, hereditary cancers, and various other hereditary conditions. We have also shown that our study cohort is genetically a better representative of the Indian population than its representation in the 1000 Genome Project (South Asians). We have created a database, ClinIndb, linked to the Leiden Open Variation Database, to help clinicians and researchers in diagnosis, counseling, and development of appropriate genetic screening tools relevant to the Indian populations and Indians living abroad.
Collapse
Affiliation(s)
- Ankita Narang
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Bharathram Uppilli
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Asokachandran Vivekanand
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Salwa Naushin
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Arti Yadav
- CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Khushboo Singhal
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Uzma Shamim
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Pooja Sharma
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Sana Zahra
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Aradhana Mathur
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Malika Seth
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Shaista Parveen
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Archana Vats
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Sara Hillman
- NIHR UCL Clinical Lecturer and Subspecialty Trainee Maternal and Fetal Medicine, UCL Institute for Women's Health, London, UK
| | - Padma Dolma
- Department of Obstetrics and Gynaecology, Sonam Norboo Memorial Hospital, Leh, Ladakh, India
| | - Binuja Varma
- CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Vandana Jain
- Department of Pediatrics, All India Institute of Medical Sciences, New Delhi, India
| | | | - Bhavana Prasher
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.,CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Shantanu Sengupta
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Mitali Mukerji
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.,CSIR Ayurgenomics Unit-TRISUTRA, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Mohammed Faruq
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| |
Collapse
|
109
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2020; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
110
|
Harich B, van der Voet M, Klein M, Čížek P, Fenckova M, Schenck A, Franke B. From Rare Copy Number Variants to Biological Processes in ADHD. Am J Psychiatry 2020; 177:855-866. [PMID: 32600152 DOI: 10.1176/appi.ajp.2020.19090923] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE Attention deficit hyperactivity disorder (ADHD) is a highly heritable psychiatric disorder. The objective of this study was to define ADHD-associated candidate genes and their associated molecular modules and biological themes, based on the analysis of rare genetic variants. METHODS The authors combined data from 11 published copy number variation studies in 6,176 individuals with ADHD and 25,026 control subjects and prioritized genes by applying an integrative strategy based on criteria including recurrence in individuals with ADHD, absence in control subjects, complete coverage in copy number gains, and presence in the minimal region common to overlapping copy number variants (CNVs), as well as on protein-protein interactions and information from cross-species genotype-phenotype annotation. RESULTS The authors localized 2,241 eligible genes in the 1,532 reported CNVs, of which they classified 432 as high-priority ADHD candidate genes. The high-priority ADHD candidate genes were significantly coexpressed in the brain. A network of 66 genes was supported by ADHD-relevant phenotypes in the cross-species database. Four significantly interconnected protein modules were found among the high-priority ADHD genes. A total of 26 genes were observed across all applied bioinformatic methods. Lookup in the latest genome-wide association study for ADHD showed that among those 26 genes, POLR3C and RBFOX1 were also supported by common genetic variants. CONCLUSIONS Integration of a stringent filtering procedure in CNV studies with suitable bioinformatics approaches can identify ADHD candidate genes at increased levels of credibility. The authors' analytic pipeline provides additional insight into the molecular mechanisms underlying ADHD and allows prioritization of genes for functional validation in validated model organisms.
Collapse
Affiliation(s)
- Benjamin Harich
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Monique van der Voet
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Marieke Klein
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Pavel Čížek
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Michaela Fenckova
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Annette Schenck
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| | - Barbara Franke
- Department of Human Genetics (Harich, van der Voet, Klein, Fenckova, Schenck, Franke) and Department of Psychiatry (Franke), Donders Institute for Brain, Cognition, and Behavior, Radboud University Medical Center, Nijmegen, the Netherlands; and Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, the Netherlands (Čížek)
| |
Collapse
|
111
|
Reese J, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.08.17.254839. [PMID: 32839776 PMCID: PMC7444288 DOI: 10.1101/2020.08.17.254839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates biomedical data to produce knowledge graphs (KGs) for COVID-19 response. This KG framework can also be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. BIGGER PICTURE An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
Collapse
|
112
|
Brunak S, Bjerre Collin C, Eva Ó Cathaoir K, Golebiewski M, Kirschner M, Kockum I, Moser H, Waltemath D. Towards standardization guidelines for in silico approaches in personalized medicine. J Integr Bioinform 2020; 17:jib-2020-0006. [PMID: 32827396 PMCID: PMC7756614 DOI: 10.1515/jib-2020-0006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/26/2020] [Indexed: 01/11/2023] Open
Abstract
Despite the ever-progressing technological advances in producing data in health and clinical research, the generation of new knowledge for medical benefits through advanced analytics still lags behind its full potential. Reasons for this obstacle are the inherent heterogeneity of data sources and the lack of broadly accepted standards. Further hurdles are associated with legal and ethical issues surrounding the use of personal/patient data across disciplines and borders. Consequently, there is a need for broadly applicable standards compliant with legal and ethical regulations that allow interpretation of heterogeneous health data through in silico methodologies to advance personalized medicine. To tackle these standardization challenges, the Horizon2020 Coordinating and Support Action EU-STANDS4PM initiated an EU-wide mapping process to evaluate strategies for data integration and data-driven in silico modelling approaches to develop standards, recommendations and guidelines for personalized medicine. A first step towards this goal is a broad stakeholder consultation process initiated by an EU-STANDS4PM workshop at the annual COMBINE meeting (COMBINE 2019 workshop report in same issue). This forum analysed the status quo of data and model standards and reflected on possibilities as well as challenges for cross-domain data integration to facilitate in silico modelling approaches for personalized medicine.
Collapse
Affiliation(s)
| | | | | | | | - Marc Kirschner
- University of Copenhagen, Copenhagen, Denmark.,Forschungszentrum Jülich GmbH, Project Management Jülich, Jülich, Germany
| | | | - Heike Moser
- German Institute for Standardization, Berlin, Germany
| | - Dagmar Waltemath
- Medical Informatics Laboratory, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
113
|
Zhang W, Yao G, Wang J, Yang M, Wang J, Zhang H, Li W. ncRPheno: a comprehensive database platform for identification and validation of disease related noncoding RNAs. RNA Biol 2020; 17:943-955. [PMID: 32122231 PMCID: PMC7549653 DOI: 10.1080/15476286.2020.1737441] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 02/24/2020] [Accepted: 02/25/2020] [Indexed: 12/31/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play critical roles in many critical biological processes and have become a novel class of potential targets and bio-markers for disease diagnosis, therapy, and prognosis. Annotating and analysing ncRNA-disease association data are essential but challenging. Current computational resources lack comprehensive database platforms to consistently interpret and prioritize ncRNA-disease association data for biomedical investigation and application. Here, we present the ncRPheno database platform (http://lilab2.sysu.edu.cn/ncrpheno), which comprehensively integrates and annotates ncRNA-disease association data and provides novel searches, visualizations, and utilities for association identification and validation. ncRPheno contains 482,751 non-redundant associations between 14,494 ncRNAs and 3,210 disease phenotypes across 11 species with supporting evidence in the literature. A scoring model was refined to prioritize the associations based on evidential metrics. Moreover, ncRPheno provides user-friendly web interfaces, novel visualizations, and programmatic access to enable easy exploration, analysis, and utilization of the association data. A case study through ncRPheno demonstrated a comprehensive landscape of ncRNAs dysregulation associated with 22 cancers and uncovered 821 cancer-associated common ncRNAs. As a unique database platform, ncRPheno outperforms the existing similar databases in terms of data coverage and utilities, and it will assist studies in encoding ncRNAs associated with phenotypes ranging from genetic disorders to complex diseases. ABBREVIATIONS APIs: application programming interfaces; circRNA: circular RNA; ECO: Evidence & Conclusion Ontology; EFO: Experimental Factor Ontology; FDR: false discovery rate; GO: Gene Ontology; GWAS: genome wide association studies; HPO: Human Phenotype Ontology; ICGC: International Cancer Genome Consortium; lncRNA: long noncoding RNA; miRNA: micro RNA; ncRNA: noncoding RNA; NGS: next generation sequencing; OMIM: Online Mendelian Inheritance in Man; piRNA: piwi-interacting RNA; snoRNA: small nucleolar RNA; TCGA: The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Guocai Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Jianbo Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Minglei Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Jing Wang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Sun Yat-Sen University, Ministry of Education, China
| |
Collapse
|
114
|
Zhang J, Yao Y, He H, Shen J. Clinical Interpretation of Sequence Variants. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 106:e98. [PMID: 32176464 PMCID: PMC7431429 DOI: 10.1002/cphg.98] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Clinical interpretation of DNA sequence variants is a critical step in reporting clinical genetic testing results. Application of next-generation sequencing technology in molecular genetic testing has facilitated diagnoses of genetic disorders in clinical practice. However, the large number of DNA sequence variants detected in clinical specimens, many of which have never been seen before, make clinical interpretation challenging. Recommendations by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) have been widely adopted by clinical laboratories around the world to guide clinical interpretation of sequence variants. The ClinGen Sequence Variant Interpretation Working Group and various disease-specific variant curation expert panels have also developed specifications for the ACMG/AMP recommendations. Despite these efforts to standardize variant interpretation in clinical practice, different laboratories may subjectively use professional judgment to determine which criteria are applicable when classifying a variant. In addition, clinicians and researchers who are not familiar with the variant interpretation process may have difficulty understanding clinical genetic reports and communicating the clinical significance of genetic testing results. Here we provide a step-by-step protocol for clinical interpretation of sequence variants, including practical examples. By following this protocol, clinical laboratory geneticists can interpret the clinical significance of sequence variants according to the ACMG/AMP recommendations and ClinGen framework. Furthermore, this article will help clinicians and researchers to understand variant classification in clinical genetic testing reports and evaluate the quality of the reports. © 2020 by John Wiley & Sons, Inc. Basic Protocol: Interpreting the clinical significance of sequence variants Support Protocol: Reevaluating the clinical significance of sequence variants.
Collapse
Affiliation(s)
- Junyu Zhang
- Department of Reproductive Genetics, International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| | - Yanyi Yao
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
- Medical Genetics Center, Maternal and Child Health Hospital of Hubei Province, Wuhan, China
| | - Haixian He
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
- Department of Otorhinolaryngology, Qilu Hospital of Shandong University, Jinan, China
- NHC Key Laboratory of Otorhinolaryngology, Shandong University, Jinan, China
| | - Jun Shen
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
115
|
Alliance of Genome Resources Portal: unified model organism research platform. Nucleic Acids Res 2020; 48:D650-D658. [PMID: 31552413 PMCID: PMC6943066 DOI: 10.1093/nar/gkz813] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 09/03/2019] [Accepted: 09/19/2019] [Indexed: 01/13/2023] Open
Abstract
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
Collapse
|
116
|
Courtier-Orgogozo V, Arnoult L, Prigent SR, Wiltgen S, Martin A. Gephebase, a database of genotype-phenotype relationships for natural and domesticated variation in Eukaryotes. Nucleic Acids Res 2020; 48:D696-D703. [PMID: 31544935 PMCID: PMC6943045 DOI: 10.1093/nar/gkz796] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 08/21/2019] [Accepted: 09/06/2019] [Indexed: 12/30/2022] Open
Abstract
Gephebase is a manually-curated database compiling our accumulated knowledge of the genes and mutations that underlie natural, domesticated and experimental phenotypic variation in all Eukaryotes—mostly animals, plants and yeasts. Gephebase aims to compile studies where the genotype–phenotype association (based on linkage mapping, association mapping or a candidate gene approach) is relatively well supported. Human clinical traits and aberrant mutant phenotypes in laboratory organisms are not included and can be found in other databases (e.g. OMIM, OMIA, Monarch Initiative). Gephebase contains more than 1700 entries. Each entry corresponds to an allelic difference at a given gene and its associated phenotypic change(s) between two species or two individuals of the same species, and is enriched with molecular details, taxonomic information, and bibliographic information. Users can easily browse entries and perform searches at various levels using boolean operators (e.g. transposable elements, snakes, carotenoid content, Doebley). Data is exportable in spreadsheet format. This database allows to perform meta-analyses to extract global trends about the living world and the research fields. Gephebase should also help breeders, conservationists and others to identify promising target genes for crop improvement, parasite/pest control, bioconservation and genetic diagnostic. It is freely available at www.gephebase.org.
Collapse
Affiliation(s)
| | - Laurent Arnoult
- Institut Jacques Monod, CNRS, UMR 7592, Université de Paris, Paris, France
| | - Stéphane R Prigent
- Institut Jacques Monod, CNRS, UMR 7592, Université de Paris, Paris, France
| | | | - Arnaud Martin
- Department of Biological Sciences, The George Washington University, Washington, DC, USA
| |
Collapse
|
117
|
Bogue MA, Philip VM, Walton DO, Grubb SC, Dunn MH, Kolishovski G, Emerson J, Mukherjee G, Stearns T, He H, Sinha V, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. Mouse Phenome Database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res 2020; 48:D716-D723. [PMID: 31696236 PMCID: PMC7145612 DOI: 10.1093/nar/gkz1032] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/18/2019] [Accepted: 10/21/2019] [Indexed: 01/27/2023] Open
Abstract
The Mouse Phenome Database (MPD; https://phenome.jax.org) is a widely accessed and highly functional data repository housing primary phenotype data for the laboratory mouse accessible via APIs and providing tools to analyze and visualize those data. Data come from investigators around the world and represent a broad scope of phenotyping endpoints and disease-related traits in naïve mice and those exposed to drugs, environmental agents or other treatments. MPD houses rigorously curated per-animal data with detailed protocols. Public ontologies and controlled vocabularies are used for annotation. In addition to phenotype tools, genetic analysis tools enable users to integrate and interpret genome–phenome relations across the database. Strain types and populations include inbred, recombinant inbred, F1 hybrid, transgenic, targeted mutants, chromosome substitution, Collaborative Cross, Diversity Outbred and other mapping populations. Our new analysis tools allow users to apply selected data in an integrated fashion to address problems in trait associations, reproducibility, polygenic syndrome model selection and multi-trait modeling. As we refine these tools and approaches, we will continue to provide users a means to identify consistent, quality studies that have high translational relevance.
Collapse
Affiliation(s)
- Molly A Bogue
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | - Vivek M Philip
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | - David O Walton
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | | | - Matthew H Dunn
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | | | - Jake Emerson
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | | | | | - Hao He
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | - Vinita Sinha
- The Jackson Laboratory, Bar Harbor, Maine, ME 04609, USA
| | | | | | | |
Collapse
|
118
|
Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, Keith D, Conlin T, Vasilevsky N, Zhang XA, Balhoff JP, Babb L, Bello SM, Blau H, Bradford Y, Carbon S, Carmody L, Chan LE, Cipriani V, Cuzick A, Della Rocca M, Dunn N, Essaid S, Fey P, Grove C, Gourdine JP, Hamosh A, Harris M, Helbig I, Hoatlin M, Joachimiak M, Jupp S, Lett KB, Lewis SE, McNamara C, Pendlington ZM, Pilgrim C, Putman T, Ravanmehr V, Reese J, Riggs E, Robb S, Roncaglia P, Seager J, Segerdell E, Similuk M, Storm AL, Thaxon C, Thessen A, Jacobsen JOB, McMurry JA, Groza T, Köhler S, Smedley D, Robinson PN, Mungall CJ, Haendel MA, Munoz-Torres MC, Osumi-Sutherland D. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2020; 48:D704-D715. [PMID: 31701156 PMCID: PMC7056945 DOI: 10.1093/nar/gkz997] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/09/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022] Open
Abstract
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven’t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.
Collapse
Affiliation(s)
- Kent A Shefchek
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Michael Gargano
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Nicolas Matentzoglu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Matthew Brush
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Daniel Keith
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Tom Conlin
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - James P Balhoff
- Renaissance Computing Institute at UNC, Chapel Hill, NC 27517, USA
| | - Larry Babb
- Broad Institute, Cambridge, MA 02142, USA
| | | | - Hannah Blau
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Yvonne Bradford
- Institute of Neuroscience, University of Oregon, Eugene, OR 97401, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Leigh Carmody
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Valentina Cipriani
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Maria Della Rocca
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Shahim Essaid
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Petra Fey
- dictyBase, Center for Genetic Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Chris Grove
- California Institute of Technology, Pasadena, CA 91125, USA
| | - Jean-Phillipe Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany.,Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Maureen Hoatlin
- Department of Biochemistry and Molecular Biology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Marcin Joachimiak
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kenneth B Lett
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | | | - Zoë M Pendlington
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Tim Putman
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Vida Ravanmehr
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Erin Riggs
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA 17837, USA
| | - Sofia Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Erik Segerdell
- Xenbase, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrea L Storm
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Courtney Thaxon
- University of North Carolina Medical School, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Anne Thessen
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Julie A McMurry
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | | | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Melissa A Haendel
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA.,Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Monica C Munoz-Torres
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
119
|
Canakoglu A, Bernasconi A, Colombo A, Masseroli M, Ceri S. GenoSurf: metadata driven semantic search system for integrated genomic datasets. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5670757. [PMID: 31820804 PMCID: PMC6902006 DOI: 10.1093/database/baz132] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 10/04/2019] [Accepted: 10/21/2019] [Indexed: 01/18/2023]
Abstract
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
Collapse
Affiliation(s)
- Arif Canakoglu
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Andrea Colombo
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| |
Collapse
|
120
|
Liu C, Peres Kury FS, Li Z, Ta C, Wang K, Weng C. Doc2Hpo: a web application for efficient and accurate HPO concept curation. Nucleic Acids Res 2020; 47:W566-W570. [PMID: 31106327 PMCID: PMC6602487 DOI: 10.1093/nar/gkz386] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/26/2019] [Accepted: 04/30/2019] [Indexed: 01/18/2023] Open
Abstract
We present Doc2Hpo, an interactive web application that enables interactive and efficient phenotype concept curation from clinical text with automated concept normalization using the Human Phenotype Ontology (HPO). Users can edit the HPO concepts automatically extracted by Doc2Hpo in real time, and export the extracted HPO concepts into gene prioritization tools. Our evaluation showed that Doc2Hpo significantly reduced manual effort while achieving high accuracy in HPO concept curation. Doc2Hpo is freely available at https://impact2.dbmi.columbia.edu/doc2hpo/. The source code is available at https://github.com/stormliucong/doc2hpo for local installation for protected health data.
Collapse
Affiliation(s)
- Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | | | - Ziran Li
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Casey Ta
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
121
|
Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D, Cipriani V, Balhoff JP, Conlin T, Blau H, Baynam G, Palmer R, Gratian D, Dawkins H, Segal M, Jansen AC, Muaz A, Chang WH, Bergerson J, Laulederkind SJF, Yüksel Z, Beltran S, Freeman AF, Sergouniotis PI, Durkin D, Storm AL, Hanauer M, Brudno M, Bello SM, Sincan M, Rageth K, Wheeler MT, Oegema R, Lourghi H, Della Rocca MG, Thompson R, Castellanos F, Priest J, Cunningham-Rundles C, Hegde A, Lovering RC, Hajek C, Olry A, Notarangelo L, Similuk M, Zhang XA, Gómez-Andrés D, Lochmüller H, Dollfus H, Rosenzweig S, Marwaha S, Rath A, Sullivan K, Smith C, Milner JD, Leroux D, Boerkoel CF, Klion A, Carter MC, Groza T, Smedley D, Haendel MA, Mungall C, Robinson PN. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 2020; 47:D1018-D1027. [PMID: 30476213 PMCID: PMC6324074 DOI: 10.1093/nar/gky1105] [Citation(s) in RCA: 406] [Impact Index Per Article: 101.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/24/2018] [Indexed: 12/12/2022] Open
Abstract
The Human Phenotype Ontology (HPO)—a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases—is used by thousands of researchers, clinicians, informaticians and electronic health record systems around the world. Its detailed descriptions of clinical abnormalities and computable disease definitions have made HPO the de facto standard for deep phenotyping in the field of rare disease. The HPO’s interoperability with other ontologies has enabled it to be used to improve diagnostic accuracy by incorporating model organism data. It also plays a key role in the popular Exomiser tool, which identifies potential disease-causing variants from whole-exome or whole-genome sequencing data. Since the HPO was first introduced in 2008, its users have become both more numerous and more diverse. To meet these emerging needs, the project has added new content, language translations, mappings and computational tooling, as well as integrations with external community data. The HPO continues to collaborate with clinical adopters to improve specific areas of the ontology and extend standardized disease descriptions. The newly redesigned HPO website (www.human-phenotype-ontology.org) simplifies browsing terms and exploring clinical features, diseases, and human genes.
Collapse
Affiliation(s)
- Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 10117, Germany.,Einstein Center Digital Future, Berlin 10117, Germany.,Monarch Initiative, monarchinitiative.org
| | - Leigh Carmody
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Nicole Vasilevsky
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
| | - Julius O B Jacobsen
- Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Daniel Danis
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Jean-Philippe Gourdine
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
| | - Michael Gargano
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Nomi L Harris
- Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nicolas Matentzoglu
- Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Julie A McMurry
- Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - David Osumi-Sutherland
- Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Valentina Cipriani
- Monarch Initiative, monarchinitiative.org.,William Harvey Research Institute, Queen Mary University College of London.,UCL Genetics Institute, University College of London.,UCL Institute of Ophthalmology, University College of London
| | - James P Balhoff
- Monarch Initiative, monarchinitiative.org.,Renaissance Computing Institute, University of North Carolina at Chapel Hill
| | - Tom Conlin
- Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - Hannah Blau
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia.,School of Paediatrics and Telethon Kids Institute, University of Western Australia, Perth, WA, Australia.,Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia.,Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia.,The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
| | - Richard Palmer
- Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia
| | - Dylan Gratian
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia
| | - Hugh Dawkins
- The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
| | | | - Anna C Jansen
- Neurogenetics Research Group, Vrije Universiteit Brussel, Brussels, Belgium.,Pediatric Neurology Unit, Department of Pediatrics, UZ Brussel, Brussels, Belgium
| | - Ahmed Muaz
- Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Willie H Chang
- Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
| | - Jenna Bergerson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Stanley J F Laulederkind
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, 8701 Watertown Plank Road Milwaukee, WI 53226, USA
| | | | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Alexandra F Freeman
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - Daniel Durkin
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Andrea L Storm
- ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
| | - Marc Hanauer
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Michael Brudno
- Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
| | | | - Murat Sincan
- Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
| | - Kayli Rageth
- Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
| | - Matthew T Wheeler
- Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
| | - Renske Oegema
- Department of Genetics, University Medical Center Utrecht, the Netherlands
| | - Halima Lourghi
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Maria G Della Rocca
- ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel Thompson
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
| | | | - James Priest
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ayushi Hegde
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ruth C Lovering
- Institute of Cardiovascular Science, University College London, UK
| | | | - Annie Olry
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Luigi Notarangelo
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Xingmin A Zhang
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David Gómez-Andrés
- Child Neurology Unit. Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
| | - Hanns Lochmüller
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Department of Neuropediatrics and Muscle Disorders, Medical Center-University of Freiburg, Faculty of Medicine, Freiburg, Germany.,Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada.,Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Hélène Dollfus
- Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
| | - Sergio Rosenzweig
- Immunology Service, Department of Laboratory Medicine, NIH Clinical Center, Bethesda, MD, USA
| | - Shruti Marwaha
- Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
| | - Ana Rath
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Kathleen Sullivan
- Department of Pediatrics, Division of Allergy Immunology, The Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, 3615 Civic Center Boulevard, Philadelphia, PA 19104, USA
| | | | - Joshua D Milner
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Dorothée Leroux
- Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
| | | | - Amy Klion
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Melody C Carter
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Tudor Groza
- Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Damian Smedley
- Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Melissa A Haendel
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - Chris Mungall
- Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Peter N Robinson
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| |
Collapse
|
122
|
Köhler S, Øien NC, Buske OJ, Groza T, Jacobsen JOB, McNamara C, Vasilevsky N, Carmody LC, Gourdine JP, Gargano M, McMurry JA, Danis D, Mungall CJ, Smedley D, Haendel M, Robinson PN. Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics. ACTA ACUST UNITED AC 2020; 103:e92. [PMID: 31479590 DOI: 10.1002/cphg.92] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The Human Phenotype Ontology (HPO) is a standardized set of phenotypic terms that are organized in a hierarchical fashion. It is a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics. The HPO is frequently used to create a set of terms that accurately describe the observed clinical abnormalities of an individual being evaluated for suspected rare genetic disease. This profile is compared with computational disease profiles in the HPO database with the aim of identifying genetic diseases with comparable phenotypic profiles. The computational analysis can be coupled with the analysis of whole-exome or whole-genome sequencing data through applications such as Exomiser. This article explains how to choose an optimal set of HPO terms for these cases and enter them with software, such as PhenoTips and PatientArchive, and demonstrates how to use Phenomizer and Exomiser to generate a computational differential diagnosis. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany.,Einstein Center Digital Future, Berlin, Germany.,Monarch Initiative (monarchinitiative.org)
| | | | | | | | - Julius O B Jacobsen
- Monarch Initiative (monarchinitiative.org).,Queen Mary University of London, Charterhouse Square, London, United Kingdom
| | | | - Nicole Vasilevsky
- Monarch Initiative (monarchinitiative.org).,Oregon Health & Science University, Portland, Oregon
| | - Leigh C Carmody
- Monarch Initiative (monarchinitiative.org).,The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut
| | - J P Gourdine
- Monarch Initiative (monarchinitiative.org).,Oregon Health & Science University, Portland, Oregon
| | - Michael Gargano
- Monarch Initiative (monarchinitiative.org).,The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut
| | - Julie A McMurry
- Monarch Initiative (monarchinitiative.org).,Oregon State University, Corvallis, Oregon
| | - Daniel Danis
- Monarch Initiative (monarchinitiative.org).,The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut
| | - Christopher J Mungall
- Monarch Initiative (monarchinitiative.org).,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California
| | - Damian Smedley
- Monarch Initiative (monarchinitiative.org).,Queen Mary University of London, Charterhouse Square, London, United Kingdom
| | - Melissa Haendel
- Monarch Initiative (monarchinitiative.org).,Oregon Health & Science University, Portland, Oregon.,Oregon State University, Corvallis, Oregon
| | - Peter N Robinson
- Monarch Initiative (monarchinitiative.org).,The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut.,Institute for Systems Genomics, University of Connecticut, Farmington, Connecticut
| |
Collapse
|
123
|
Abstract
The term phenotype is so commonly used that we often assume that we each mean the same thing. The general definition, the set of observable characteristics of an individual resulting from the interaction of their genotype with the environment, is often left to the eye of the beholder. Whether applied to the multiple levels of biological phenomena or the intact human being, our ability to characterize, classify, and analyze phenotype has been limited by measurement deficits, computing limitations, and a culture that avoids the generalizable. With the advent of modern technology, there is the potential for a revolution in phenotyping, which incorporates old and new in structured ways to dramatically advance basic understanding of biology and behavior and to lead to major improvements in clinical care and public health. This revolution in how we think about phenotypes will require a radical change in the scale at which biomedicine operates with significant changes in the unit of action, which will have far-reaching implications for how care, translation, and discovery are implemented.
Collapse
Affiliation(s)
- Calum A MacRae
- From the One Brave Idea (C.A.M., R.M.C.).,Cardiovascular Medicine Division and Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA (C.A.M.)
| | - Robert M Califf
- From the One Brave Idea (C.A.M., R.M.C.).,Verily Life Sciences (R.M.C.).,Google Health, South San Francisco and Mountain View, CA (R.M.C.)
| |
Collapse
|
124
|
Wagner AH, Walsh B, Mayfield G, Tamborero D, Sonkin D, Krysiak K, Deu-Pons J, Duren RP, Gao J, McMurry J, Patterson S, Del Vecchio Fitz C, Pitel BA, Sezerman OU, Ellrott K, Warner JL, Rieke DT, Aittokallio T, Cerami E, Ritter DI, Schriml LM, Freimuth RR, Haendel M, Raca G, Madhavan S, Baudis M, Beckmann JS, Dienstmann R, Chakravarty D, Li XS, Mockus S, Elemento O, Schultz N, Lopez-Bigas N, Lawler M, Goecks J, Griffith M, Griffith OL, Margolin AA. A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer. Nat Genet 2020; 52:448-457. [PMID: 32246132 PMCID: PMC7127986 DOI: 10.1038/s41588-020-0603-8] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 02/26/2020] [Indexed: 12/19/2022]
Abstract
Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.
Collapse
Affiliation(s)
- Alex H Wagner
- Washington University School of Medicine, St. Louis, MO, USA
| | - Brian Walsh
- Oregon Health and Science University, Portland, OR, USA
| | | | - David Tamborero
- Pompeu Fabra University, Barcelona, Spain
- Karolinska Institute, Solna, Sweden
| | | | | | - Jordi Deu-Pons
- Institute for Research in Biomedicine, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | | | - Jianjiong Gao
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Julie McMurry
- Oregon Health and Science University, Portland, OR, USA
| | - Sara Patterson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | | | - Kyle Ellrott
- Oregon Health and Science University, Portland, OR, USA
| | | | | | - Tero Aittokallio
- Institute for Molecular Medicine Finland, Helsinki, Finland
- University of Turku, Turku, Finland
| | | | - Deborah I Ritter
- Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX, USA
| | - Lynn M Schriml
- University of Maryland School of Medicine, Baltimore, MD, USA
| | | | - Melissa Haendel
- Oregon Health and Science University, Portland, OR, USA
- Linus Pauling Institute at Oregon State University, Corvallis, OR, USA
| | - Gordana Raca
- Children's Hospital Los Angeles, Los Angeles, CA, USA
- Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Subha Madhavan
- Georgetown University Medical Center, Washington, DC, USA
| | | | | | | | | | | | - Susan Mockus
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Nuria Lopez-Bigas
- Pompeu Fabra University, Barcelona, Spain
- Institute for Research in Biomedicine, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | | | - Jeremy Goecks
- Oregon Health and Science University, Portland, OR, USA
| | | | - Obi L Griffith
- Washington University School of Medicine, St. Louis, MO, USA.
| | | |
Collapse
|
125
|
Abstract
PURPOSE OF REVIEW Although primarily designed for medical documentation and billing purposes, the electronic health record (EHR) has significant potential for translational research. In this article, we provide an overview of the use of the EHR for genomics research with a focus on heritable lipid disorders. RECENT FINDINGS Linking the EHR to genomic data enables repurposing of vast phenotype data for genomic discovery. EHR data can be used to study the genetic basis of common and rare disorders, identify subphenotypes of diseases, assess pathogenicity of novel genomic variants, investigate pleiotropy, and rapidly assemble cohorts for genomic medicine clinical trials. EHR-based discovery can inform clinical practice; examples include use of polygenic risk scores for assessing disease risk and use of phenotype data to interpret rare variants. Despite limitations such as missing data, variable use of standards and poor interoperablility between disparate systems, the EHR is a powerful resource for genomic research. SUMMARY When linked to genomic data, the EHR can be leveraged for genomic discovery, which in turn can inform clinical care, exemplifying the virtuous cycle of a learning healthcare system.
Collapse
Affiliation(s)
- Maya S Safarova
- Atherosclerosis and Lipid Genomics Laboratory and Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | | |
Collapse
|
126
|
Sima AC, Mendes de Farias T, Zbinden E, Anisimova M, Gil M, Stockinger H, Stockinger K, Robinson-Rechavi M, Dessimoz C. Enabling semantic queries across federated bioinformatics databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5614223. [PMID: 31697362 PMCID: PMC6836710 DOI: 10.1093/database/baz106] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/01/2019] [Accepted: 08/02/2019] [Indexed: 11/23/2022]
Abstract
Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Zbinden
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Anisimova
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Manuel Gil
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Genetics, Evolution, and Environment, University College London, Gower St, London WC1E 6BT, UK.,Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| |
Collapse
|
127
|
Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, Hanspers K, Hermjakob H, Hudson TS, Hybiske K, Keating SM, Manske M, Mayers M, Mietchen D, Mitraka E, Pico AR, Putman T, Riutta A, Queralt-Rosinach N, Schriml LM, Shafee T, Slenter D, Stephan R, Thornton K, Tsueng G, Tu R, Ul-Hasan S, Willighagen E, Wu C, Su AI. Wikidata as a knowledge graph for the life sciences. eLife 2020; 9:e52614. [PMID: 32180547 PMCID: PMC7077981 DOI: 10.7554/elife.52614] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 02/28/2020] [Indexed: 12/22/2022] Open
Abstract
Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.
Collapse
Affiliation(s)
| | - Gregory Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sebastian Burgstaller-Muehlbacher
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna and Medical University of ViennaViennaAustria
| | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Obi L Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | | | - Toby S Hudson
- School of Chemistry, The University of SydneySydneyAustralia
| | - Kevin Hybiske
- Division of Allergy and Infectious Diseases, Department of Medicine, University of WashingtonSeattleUnited States
| | - Sarah M Keating
- European Bioinformatics Institute (EMBL-EBI)HinxtonUnited Kingdom
| | - Magnus Manske
- Wellcome Trust Sanger InstituteCambridgeUnited Kingdom
| | - Michael Mayers
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Daniel Mietchen
- School of Data Science, University of VirginiaCharlottesvilleUnited States
| | - Elvira Mitraka
- University of Maryland School of MedicineBaltimoreUnited States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Timothy Putman
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Anders Riutta
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Nuria Queralt-Rosinach
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Lynn M Schriml
- University of Maryland School of MedicineBaltimoreUnited States
| | - Thomas Shafee
- Department of Animal Plant and Soil Sciences, La Trobe UniversityMelbourneAustralia
| | - Denise Slenter
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | | | | | - Ginger Tsueng
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Roger Tu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sabah Ul-Hasan
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| |
Collapse
|
128
|
Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, Schneider FD, Kattge J, Poelen JH, Madin JS, Ankenbrand MJ, Penone C, Feng X, Adams VM, Alroy J, Andrew SC, Balk MA, Bland LM, Boyle BL, Bravo-Avila CH, Brennan I, Carthey AJR, Catullo R, Cavazos BR, Conde DA, Chown SL, Fadrique B, Gibb H, Halbritter AH, Hammock J, Hogan JA, Holewa H, Hope M, Iversen CM, Jochum M, Kearney M, Keller A, Mabee P, Manning P, McCormack L, Michaletz ST, Park DS, Perez TM, Pineda-Munoz S, Ray CA, Rossetto M, Sauquet H, Sparrow B, Spasojevic MJ, Telford RJ, Tobias JA, Violle C, Walls R, Weiss KCB, Westoby M, Wright IJ, Enquist BJ. Open Science principles for accelerating trait-based science across the Tree of Life. Nat Ecol Evol 2020; 4:294-303. [PMID: 32066887 DOI: 10.1038/s41559-020-1109-6] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 01/10/2020] [Indexed: 01/22/2023]
Abstract
Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles-open data, open source and open methods-is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges.
Collapse
Affiliation(s)
- Rachael V Gallagher
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia.
| | - Daniel S Falster
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Brian S Maitner
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Roberto Salguero-Gómez
- Department of Zoology, Oxford University, Oxford, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, Brisbane, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
| | - Vigdis Vandvik
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - William D Pearse
- Ecology Center and Department of Biology, Utah State University, Logan, UT, USA
| | | | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | - Joshua S Madin
- Hawai'i Institute of Marine Biology, University of Hawai'i at Manoa, Manoa, HI, USA
| | - Markus J Ankenbrand
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Comprehensive Heart Failure Center, University Hospital Wuerzburg, Wuerzburg, Germany
| | - Caterina Penone
- Institute of Plant Sciences, University of Bern, Bern, Switzerland
| | - Xiao Feng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Vanessa M Adams
- Discipline of Geography and Spatial Sciences, University of Tasmania, Hobart, Tasmania, Australia
| | - John Alroy
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Samuel C Andrew
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Meghan A Balk
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Lucie M Bland
- School of Life and Environmental Sciences, Centre for Integrative Ecology, Deakin University, Geelong, Victoria, Australia
| | - Brad L Boyle
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Catherine H Bravo-Avila
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Ian Brennan
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Alexandra J R Carthey
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Renee Catullo
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brittany R Cavazos
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Dalia A Conde
- Species360 Conservation Science Alliance, Bloomington, MN, USA.,Interdisciplinary Center on Population Dynamics, University of Southern Denmark, Odense, Denmark.,Department of Biology, University of Southern Denmark, Odense, Denmark
| | - Steven L Chown
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Belen Fadrique
- Department of Biology, University of Miami, Miami, FL, USA
| | - Heloise Gibb
- Department of Ecology, Environment and Evolution and Centre for Future Landscapes, La Trobe University, Melbourne, Victoria, Australia
| | - Aud H Halbritter
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Jennifer Hammock
- National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - J Aaron Hogan
- International Center for Tropical Botany, Department of Biological Sciences, Florida International University, Miami, FL, USA
| | - Hamish Holewa
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Michael Hope
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Colleen M Iversen
- Climate Change Science Institute and Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Malte Jochum
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Plant Sciences, University of Bern, Bern, Switzerland.,Institute of Biology, Leipzig University, Leipzig, Germany
| | - Michael Kearney
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Alexander Keller
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | - Peter Manning
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Luke McCormack
- Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
| | - Sean T Michaletz
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel S Park
- Department of Organismic and Evolutionary Biology and Harvard University Herbaria, Harvard University, Cambridge, MA, USA
| | - Timothy M Perez
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Silvia Pineda-Munoz
- School of Biological Sciences and School of Earth & Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Courtenay A Ray
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Maurizio Rossetto
- National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Queensland Alliance of Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Hervé Sauquet
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.,National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Universite Paris-Saclay, Orsay, France
| | - Benjamin Sparrow
- TERN / School of Biological Sciences, Faculty of Science, The University of Adelaide, Adelaide, South Australia, Australia
| | - Marko J Spasojevic
- Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, USA
| | - Richard J Telford
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Joseph A Tobias
- Department of Life Sciences, Imperial College London, London, UK
| | - Cyrille Violle
- CEFE, CNRS, Univ Montpellier, Université Paul Valéry Montpellier, Montpellier, France
| | | | | | - Mark Westoby
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Ian J Wright
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
129
|
Martin A, Wolcott NS, O'Connell LA. Bringing immersive science to undergraduate laboratory courses using CRISPR gene knockouts in frogs and butterflies. ACTA ACUST UNITED AC 2020; 223:223/Suppl_1/jeb208793. [PMID: 32034043 DOI: 10.1242/jeb.208793] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The use of CRISPR/Cas9 for gene editing offers new opportunities for biology students to perform genuine research exploring the gene-to-phenotype relationship. It is important to introduce the next generation of scientists, health practitioners and other members of society to the technical and ethical aspects of gene editing. Here, we share our experience leading hands-on undergraduate laboratory classes, where students formulate hypotheses regarding the roles of candidate genes involved in development, perform loss-of-function experiments using programmable nucleases and analyze the phenotypic effects of mosaic mutant animals. This is enabled by the use of the amphibian Xenopus laevis and the butterfly Vanessa cardui, two organisms that reliably yield hundreds of large and freshly fertilized eggs in a scalable manner. Frogs and butterflies also present opportunities to teach key biological concepts about gene regulation and development. To complement these practical aspects, we describe learning activities aimed at equipping students with a broad understanding of genome editing techniques, their application in fundamental and translational research, and the bioethical challenges they raise. Overall, our work supports the introduction of CRISPR technology into undergraduate classrooms and, when coupled with classroom undergraduate research experiences, enables hypothesis-driven research by undergraduates.
Collapse
Affiliation(s)
- Arnaud Martin
- Department of Biological Sciences, The George Washington University, Washington, DC 20052, USA
| | - Nora S Wolcott
- Department of Biological Sciences, The George Washington University, Washington, DC 20052, USA
| | | |
Collapse
|
130
|
Lloyd KCK, Adams DJ, Baynam G, Beaudet AL, Bosch F, Boycott KM, Braun RE, Caulfield M, Cohn R, Dickinson ME, Dobbie MS, Flenniken AM, Flicek P, Galande S, Gao X, Grobler A, Heaney JD, Herault Y, de Angelis MH, Lupski JR, Lyonnet S, Mallon AM, Mammano F, MacRae CA, McInnes R, McKerlie C, Meehan TF, Murray SA, Nutter LMJ, Obata Y, Parkinson H, Pepper MS, Sedlacek R, Seong JK, Shiroishi T, Smedley D, Tocchini-Valentini G, Valle D, Wang CKL, Wells S, White J, Wurst W, Xu Y, Brown SDM. The Deep Genome Project. Genome Biol 2020; 21:18. [PMID: 32008577 PMCID: PMC6996159 DOI: 10.1186/s13059-020-1931-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 01/08/2020] [Indexed: 12/12/2022] Open
Affiliation(s)
- K. C. Kent Lloyd
- Department of Surgery, School of Medicine, and Mouse Biology Program, University of California, Davis, CA 95618 USA
| | - David J. Adams
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA UK
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, Perth, Australia
- Division of Paediatrics and Telethon Kids Institute, Faculty of Health and Medical Sciences, University of Western Australia, Perth, Australia
- Faculty of Science and Engineering, School of Spatial Sciences, Curtin University, Perth, Australia
| | - Arthur L. Beaudet
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Fatima Bosch
- Center of Animal Biotechnology and Gene Therapy, Universitat Autònoma Barcelona, Barcelona, Spain
| | - Kym M. Boycott
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON K1H 8L1 Canada
| | | | - Mark Caulfield
- Genomics England, William Harvey Research Institute, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Ronald Cohn
- The Hospital for Sick Children, Toronto, ON M5G 1X8 Canada
| | - Mary E. Dickinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- Departments of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Michael S. Dobbie
- Phenomics Australia, The Australian National University, 131 Garran Road, Acton, ACT 2601 Australia
| | - Ann M. Flenniken
- The Centre for Phenogenomics, Lunenfeld-Tanenbaum Research Institute, Toronto, ON M5T 3H7 Canada
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Sanjeev Galande
- National Facility for Gene Function in Health and Disease, Department of Biology, Indian Institute of Science, Education and Research (IISER) Pune, Pune, Maharashtra 411008 India
| | - Xiang Gao
- SKL of Pharmaceutical Biotechnology and Model Animal Research Center, Collaborative Innovation Center for Genetics and Development, Nanjing Biomedical Research Institute, Nanjing University, Nanjing, 210061 China
| | - Anne Grobler
- DST/NWU Preclinical Drug Development Platform, North-West University, Potchefstroom, 2520 South Africa
| | - Jason D. Heaney
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Yann Herault
- Université de Strasbourg, CNRS, INSERM, Institut de Génétique, Biologie Moléculaire et Cellulaire, Institut Clinique de la Souris, IGBMC, PHENOMIN-ICS, 67404 Illkirch, France
| | - Martin Hrabě de Angelis
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Chair of Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, 85354 Freising-Weihenstephan, Germany
- German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany
| | - James R. Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Stanislas Lyonnet
- Institut Imagine, UMR-1163 INSERM et Université de Paris, Hôpital Universitaire Necker-Enfants Malades, 24, Boulevard du Montparnasse, 75015 Paris, France
| | - Ann-Marie Mallon
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | - Fabio Mammano
- Monterotondo Mouse Clinic, Italian National Research Council (CNR), Institute of Biochemistry and Cell Biology (IBBC), Monterotondo Scalo, I-00015 Rome, Italy
| | - Calum A. MacRae
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
| | - Roderick McInnes
- Lady Davis Research Institute, Jewish General Hospital, McGill University, 3999 Côte Ste- Catherine Road, Montreal, Quebec H3T 1E2 Canada
| | - Colin McKerlie
- The Hospital for Sick Children, Toronto, ON M5G 1X8 Canada
- The Centre for Phenogenomics, The Hospital for Sick Children, Toronto, ON M5T 3H7 Canada
| | - Terrence F. Meehan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | | | - Lauryl M. J. Nutter
- The Centre for Phenogenomics, The Hospital for Sick Children, Toronto, ON M5T 3H7 Canada
| | - Yuichi Obata
- RIKEN BioResource Research Center, Tsukuba, Ibaraki, 305-0074 Japan
| | - Helen Parkinson
- National Facility for Gene Function in Health and Disease, Department of Biology, Indian Institute of Science, Education and Research (IISER) Pune, Pune, Maharashtra 411008 India
| | - Michael S. Pepper
- Institute for Cellular and Molecular Medicine, Department Immunology, and SAMRC Extramural Unit for Stem Cell Research and Therapy, Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa
| | - Radislav Sedlacek
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic
| | - Je Kyung Seong
- Korea Mouse Phenotyping Consortium (KMPC) and BK21 Program for Veterinary Science, Research Institute for Veterinary Science, College of Veterinary Medicine, Seoul National University, 599 Gwanangno, Gwanak-gu, Seoul, 08826 South Korea
| | | | - Damian Smedley
- Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ UK
| | - Glauco Tocchini-Valentini
- Monterotondo Mouse Clinic, Italian National Research Council (CNR), Institute of Biochemistry and Cell Biology (IBBC), Monterotondo Scalo, I-00015 Rome, Italy
| | - David Valle
- McKusick-Nathans Department of Genetic Medicine, The Johns Hopkins University School of Medicine, 519 BRB, 733 N Broadway, Baltimore, MD 21205 USA
| | - Chi-Kuang Leo Wang
- National Laboratory Animal Center, National Applied Research Laboratories, Taipei, Taiwan
| | - Sara Wells
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | | | - Wolfgang Wurst
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health GmbH, 85764 Neuherberg, Germany
- Chair of Developmental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, 85354 Freising-Weihenstephan, Germany
- Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Munich Cluster for Systems Neurology (SyNergy), Adolf-Butenandt-Institut, Ludwig Maximillian’s Universitat Munchen, 81377 Munich, Germany
| | - Ying Xu
- Cambridge-Suda Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical College of Soochow University, Suzhou, 215123 Jiangsu China
| | - Steve D. M. Brown
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| |
Collapse
|
131
|
Struck A, Walsh B, Buchanan A, Lee JA, Spangler R, Stuart JM, Ellrott K. Exploring Integrative Analysis Using the BioMedical Evidence Graph. JCO Clin Cancer Inform 2020; 4:147-159. [PMID: 32097025 PMCID: PMC7049249 DOI: 10.1200/cci.19.00110] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2020] [Indexed: 12/22/2022] Open
Abstract
PURPOSE The analysis of cancer biology data involves extremely heterogeneous data sets, including information from RNA sequencing, genome-wide copy number, DNA methylation data reporting on epigenetic regulation, somatic mutations from whole-exome or whole-genome analyses, pathology estimates from imaging sections or subtyping, drug response or other treatment outcomes, and various other clinical and phenotypic measurements. Bringing these different resources into a common framework, with a data model that allows for complex relationships as well as dense vectors of features, will unlock integrated data set analysis. METHODS We introduce the BioMedical Evidence Graph (BMEG), a graph database and query engine for discovery and analysis of cancer biology. The BMEG is unique from other biologic data graphs in that sample-level molecular and clinical information is connected to reference knowledge bases. It combines gene expression and mutation data with drug-response experiments, pathway information databases, and literature-derived associations. RESULTS The construction of the BMEG has resulted in a graph containing > 41 million vertices and 57 million edges. The BMEG system provides a graph query-based application programming interface to enable analysis, with client code available for Python, Javascript, and R, and a server online at bmeg.io. Using this system, we have demonstrated several forms of cross-data set analysis to show the utility of the system. CONCLUSION The BMEG is an evolving resource dedicated to enabling integrative analysis. We have demonstrated queries on the system that illustrate mutation significance analysis, drug-response machine learning, patient-level knowledge-base queries, and pathway level analysis. We have compared the resulting graph to other available integrated graph systems and demonstrated the former is unique in the scale of the graph and the type of data it makes available.
Collapse
Affiliation(s)
- Adam Struck
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| | - Brian Walsh
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| | - Alexander Buchanan
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| | - Jordan A. Lee
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| | - Ryan Spangler
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| | - Joshua M. Stuart
- Biomolecular Engineering Department, University of California, Santa Cruz, Santa Cruz, CA
- University of California Santa Cruz Genomics Institute, University of California, Santa Cruz Santa Cruz, CA
| | - Kyle Ellrott
- Biomedical Engineering, Oregon Health and Science University, Portland OR
| |
Collapse
|
132
|
Beck T, Shorter T, Brookes AJ. GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies. Nucleic Acids Res 2020; 48:D933-D940. [PMID: 31612961 PMCID: PMC7145571 DOI: 10.1093/nar/gkz895] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 09/30/2019] [Accepted: 10/02/2019] [Indexed: 12/31/2022] Open
Abstract
The GWAS Central resource provides a toolkit for integrative access and visualization of a uniquely extensive collection of genome-wide association study data, while ensuring safe open access to prevent research participant identification. GWAS Central is the world's most comprehensive openly accessible repository of summary-level GWAS association information, providing over 70 million P-values for over 3800 studies investigating over 1400 unique phenotypes. The database content comprises direct submissions received from GWAS authors and consortia, in addition to actively gathered data sets from various public sources. GWAS data are discoverable from the perspective of genetic markers, genes, genome regions or phenotypes, via graphical visualizations and detailed downloadable data reports. Tested genetic markers and relevant genomic features can be visually interrogated across up to sixteen multiple association data sets in a single view using the integrated genome browser. The semantic standardization of phenotype descriptions with Medical Subject Headings and the Human Phenotype Ontology allows the precise identification of genetic variants associated with diseases, phenotypes and traits of interest. Harmonization of the phenotype descriptions used across several GWAS-related resources has extended the phenotype search capabilities to enable cross-database study discovery using a range of ontologies. GWAS Central is updated regularly and available at https://www.gwascentral.org.
Collapse
Affiliation(s)
- Tim Beck
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| | - Tom Shorter
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
- Health Data Research UK, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
133
|
Laskowski RA, Stephenson JD, Sillitoe I, Orengo CA, Thornton JM. VarSite: Disease variants and protein structure. Protein Sci 2020; 29:111-119. [PMID: 31606900 PMCID: PMC6933866 DOI: 10.1002/pro.3746] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/04/2019] [Accepted: 10/07/2019] [Indexed: 12/20/2022]
Abstract
VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite.
Collapse
Affiliation(s)
- Roman A. Laskowski
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - James D. Stephenson
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)CambridgeUK
- Wellcome Trust Sanger InstituteCambridgeUK
| | - Ian Sillitoe
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | - Christine A. Orengo
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | - Janet M. Thornton
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| |
Collapse
|
134
|
The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases. Genetics 2019; 213:1189-1196. [PMID: 31796553 PMCID: PMC6893393 DOI: 10.1534/genetics.119.302523] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/11/2019] [Indexed: 12/17/2022] Open
Abstract
Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure.
Collapse
|
135
|
Kanavy DM, McNulty SM, Jairath MK, Brnich SE, Bizon C, Powell BC, Berg JS. Comparative analysis of functional assay evidence use by ClinGen Variant Curation Expert Panels. Genome Med 2019; 11:77. [PMID: 31783775 PMCID: PMC6884856 DOI: 10.1186/s13073-019-0683-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 11/05/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The 2015 American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines for clinical sequence variant interpretation state that "well-established" functional studies can be used as evidence in variant classification. These guidelines articulated key attributes of functional data, including that assays should reflect the biological environment and be analytically sound; however, details of how to evaluate these attributes were left to expert judgment. The Clinical Genome Resource (ClinGen) designates Variant Curation Expert Panels (VCEPs) in specific disease areas to make gene-centric specifications to the ACMG/AMP guidelines, including more specific definitions of appropriate functional assays. We set out to evaluate the existing VCEP guidelines for functional assays. METHODS We evaluated the functional criteria (PS3/BS3) of six VCEPs (CDH1, Hearing Loss, Inherited Cardiomyopathy-MYH7, PAH, PTEN, RASopathy). We then established criteria for evaluating functional studies based on disease mechanism, general class of assay, and the characteristics of specific assay instances described in the primary literature. Using these criteria, we extensively curated assay instances cited by each VCEP in their pilot variant classification to analyze VCEP recommendations and their use in the interpretation of functional studies. RESULTS Unsurprisingly, our analysis highlighted the breadth of VCEP-approved assays, reflecting the diversity of disease mechanisms among VCEPs. We also noted substantial variability between VCEPs in the method used to select these assays and in the approach used to specify strength modifications, as well as differences in suggested validation parameters. Importantly, we observed discrepancies between the parameters VCEPs specified as required for approved assay instances and the fulfillment of these requirements in the individual assays cited in pilot variant interpretation. CONCLUSIONS Interpretation of the intricacies of functional assays often requires expert-level knowledge of the gene and disease, and current VCEP recommendations for functional assay evidence are a useful tool to improve the accessibility of functional data by providing a starting point for curators to identify approved functional assays and key metrics. However, our analysis suggests that further guidance is needed to standardize this process and ensure consistency in the application of functional evidence.
Collapse
Affiliation(s)
- Dona M Kanavy
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Shannon M McNulty
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Meera K Jairath
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sarah E Brnich
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bradford C Powell
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan S Berg
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
136
|
Moridi M, Ghadirinia M, Sharifi-Zarchi A, Zare-Mirakabad F. The assessment of efficient representation of drug features using deep learning for drug repositioning. BMC Bioinformatics 2019; 20:577. [PMID: 31726977 PMCID: PMC6854697 DOI: 10.1186/s12859-019-3165-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 10/21/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND De novo drug discovery is a time-consuming and expensive process. Nowadays, drug repositioning is utilized as a common strategy to discover a new drug indication for existing drugs. This strategy is mostly used in cases with a limited number of candidate pairs of drugs and diseases. In other words, they are not scalable to a large number of drugs and diseases. Most of the in-silico methods mainly focus on linear approaches while non-linear models are still scarce for new indication predictions. Therefore, applying non-linear computational approaches can offer an opportunity to predict possible drug repositioning candidates. RESULTS In this study, we present a non-linear method for drug repositioning. We extract four drug features and two disease features to find the semantic relations between drugs and diseases. We utilize deep learning to extract an efficient representation for each feature. These representations reduce the dimension and heterogeneity of biological data. Then, we assess the performance of different combinations of drug features to introduce a pipeline for drug repositioning. In the available database, there are different numbers of known drug-disease associations corresponding to each combination of drug features. Our assessment shows that as the numbers of drug features increase, the numbers of available drugs decrease. Thus, the proposed method with large numbers of drug features is as accurate as small numbers. CONCLUSION Our pipeline predicts new indications for existing drugs systematically, in a more cost-effective way and shorter timeline. We assess the pipeline to discover the potential drug-disease associations based on cross-validation experiments and some clinical trial studies.
Collapse
Affiliation(s)
- Mahroo Moridi
- Department of Mathematics and Computer Science, Amirkabir University of Technology, (Tehran Polytechnic), Tehran, Iran
| | - Marzieh Ghadirinia
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Ali Sharifi-Zarchi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Department of Mathematics and Computer Science, Amirkabir University of Technology, (Tehran Polytechnic), Tehran, Iran.
| |
Collapse
|
137
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
138
|
Wang J, Mao D, Fazal F, Kim SY, Yamamoto S, Bellen H, Liu Z. Using MARRVEL v1.2 for Bioinformatics Analysis of Human Genes and Variant Pathogenicity. CURRENT PROTOCOLS IN BIOINFORMATICS 2019; 67:e85. [PMID: 31524990 PMCID: PMC6750039 DOI: 10.1002/cpbi.85] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
One of the greatest challenges in the bioinformatic analysis of human sequencing data is identifying which variants are pathogenic. Numerous databases and tools have been generated to address this difficulty. However, these many useful data and tools are broadly dispersed, requiring users to search for their variants of interest through human genetic databases, variant function prediction tools, and model organism databases. To solve this problem, we collected data and observed workflows of human geneticists, clinicians, and model organism researchers to carefully select and display valuable information that facilitates the evaluation of whether a variant is likely to be pathogenic. This program, Model organism Aggregated Resources for Rare Variant ExpLoration (MARRVEL) v1.2, allows users to collect relevant data from 27 public sources for further efficient bioinformatic analysis of the pathogenicity of human variants. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Julia Wang
- Program in Developmental Biology, Medical Scientist Training Program, Baylor College of Medicine, Houston, Texas
| | - Dongxue Mao
- Department of Pediatrics-Neurology, Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| | - Fatima Fazal
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
| | - Seon-Young Kim
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Howard Hughes Medical Institute, Houston, Texas
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, Texas
- Department of Neuroscience, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, Texas
- Program in Developmental Biology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, Texas
| | - Hugo Bellen
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Baylor College of Medicine, Houston, Texas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Howard Hughes Medical Institute, Houston, Texas
| | - Zhandong Liu
- Department of Pediatrics, Jan and Dan Duncan Neurological Research Institute at Texas, Children's Hospital, Houston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
- Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
139
|
Harnish JM, Deal SL, Chao HT, Wangler MF, Yamamoto S. In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila. J Vis Exp 2019:10.3791/59658. [PMID: 31498321 PMCID: PMC7418855 DOI: 10.3791/59658] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Advances in sequencing technology have made whole-genome and whole-exome datasets more accessible for both clinical diagnosis and cutting-edge human genetics research. Although a number of in silico algorithms have been developed to predict the pathogenicity of variants identified in these datasets, functional studies are critical to determining how specific genomic variants affect protein function, especially for missense variants. In the Undiagnosed Diseases Network (UDN) and other rare disease research consortia, model organisms (MO) including Drosophila, C. elegans, zebrafish, and mice are actively used to assess the function of putative human disease-causing variants. This protocol describes a method for the functional assessment of rare human variants used in the Model Organisms Screening Center Drosophila Core of the UDN. The workflow begins with gathering human and MO information from multiple public databases, using the MARRVEL web resource to assess whether the variant is likely to contribute to a patient's condition as well as design effective experiments based on available knowledge and resources. Next, genetic tools (e.g., T2A-GAL4 and UAS-human cDNA lines) are generated to assess the functions of variants of interest in Drosophila. Upon development of these reagents, two-pronged functional assays based on rescue and overexpression experiments can be performed to assess variant function. In the rescue branch, the endogenous fly genes are "humanized" by replacing the orthologous Drosophila gene with reference or variant human transgenes. In the overexpression branch, the reference and variant human proteins are exogenously driven in a variety of tissues. In both cases, any scorable phenotype (e.g., lethality, eye morphology, electrophysiology) can be used as a read-out, irrespective of the disease of interest. Differences observed between reference and variant alleles suggest a variant-specific effect, and thus likely pathogenicity. This protocol allows rapid, in vivo assessments of putative human disease-causing variants of genes with known and unknown functions.
Collapse
Affiliation(s)
- J Michael Harnish
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - Samantha L Deal
- Program in Developmental Biology, Baylor College of Medicine
| | - Hsiao-Tuan Chao
- Department of Molecular and Human Genetics, Baylor College of Medicine; Department of Pediatrics, Section of Neurology and Developmental Neuroscience, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Neuroscience, Baylor College of Medicine
| | - Michael F Wangler
- Department of Molecular and Human Genetics, Baylor College of Medicine; Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine; Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Neuroscience, Baylor College of Medicine;
| |
Collapse
|
140
|
MacRae CA. Closing the 'phenotype gap' in precision medicine: improving what we measure to understand complex disease mechanisms. Mamm Genome 2019; 30:201-211. [PMID: 31428846 DOI: 10.1007/s00335-019-09810-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 06/30/2019] [Indexed: 10/26/2022]
Abstract
The central concept underlying precision medicine is a mechanistic understanding of each disease and its response to therapy sufficient to direct a specific intervention. To execute on this vision requires parsing incompletely defined disease syndromes into discrete mechanistic subsets and developing interventions to precisely address each of these etiologically distinct entities. This will require substantial adjustment of traditional paradigms which have tended to aggregate high-level phenotypes with very different etiologies. In the current environment, where diagnoses are not mechanistic, drug development has become so expensive that it is now impractical to imagine the cost-effective creation of new interventions for many prevalent chronic conditions. The vision of precision medicine also argues for a much more seamless integration of research and development with clinical care, where shared taxonomies will enable every clinical interaction to inform our collective understanding of disease mechanisms and drug responses. Ideally, this would be executed in ways that drive real-time and real-world discovery, innovation, translation, and implementation. Only in oncology, where at least some of the biology is accessible through surgical excision of the diseased tissue or liquid biopsy, has "co-clinical" modeling proven feasible. In most common germline disorders, while genetics often reveal the causal mutations, there still remain substantial barriers to efficient disease modeling. Aggregation of similar disorders under single diagnostic labels has directly contributed to the paucity of etiologic and mechanistic understanding by directly reducing the resolution of any subsequent studies. Existing clinical phenotypes are typically anatomic, physiologic, or histologic, and result in a substantial mismatch in information content between the phenomes in humans or in animal 'models' and the variation in the genome. This lack of one-to-one mapping of discrete mechanisms between disease and animal models causes a failure of translation and is one form of 'phenotype gap.' In this review, we will focus on the origins of the phenotyping deficit and approaches that may be considered to bridge the gap, creating shared taxonomies between human diseases and relevant models, using cardiovascular examples.
Collapse
Affiliation(s)
- Calum A MacRae
- Cardiovascular Medicine, Genetics and Network Medicine Divisions, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Hale 7016, 75 Francis Street, Boston, MA, 02115, USA.
| |
Collapse
|
141
|
Wang J, Liu Z, Bellen HJ, Yamamoto S. Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information. J Vis Exp 2019:10.3791/59542. [PMID: 31475990 PMCID: PMC7401700 DOI: 10.3791/59542] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Through whole-exome/genome sequencing, human geneticists identify rare variants that segregate with disease phenotypes. To assess if a specific variant is pathogenic, one must query many databases to determine whether the gene of interest is linked to a genetic disease, whether the specific variant has been reported before, and what functional data is available in model organism databases that may provide clues about the gene's function in human. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) is a one-stop data collection tool for human genes and variants and their orthologous genes in seven model organisms including in mouse, rat, zebrafish, fruit fly, nematode worm, fission yeast, and budding yeast. In this Protocol, we provide an overview of what MARRVEL can be used for and discuss how different datasets can be used to assess whether a variant of unknown significance (VUS) in a known disease-causing gene or a variant in a gene of uncertain significance (GUS) may be pathogenic. This protocol will guide a user through searching multiple human databases simultaneously starting with a human gene with or without a variant of interest. We also discuss how to utilize data from OMIM, ExAC/gnomAD, ClinVar, Geno2MP, DGV and DECHIPHER. Moreover, we illustrate how to interpret a list of ortholog candidate genes, expression patterns, and GO terms in model organisms associated with each human gene. Furthermore, we discuss the value protein structural domain annotations provided and explain how to use the multiple species protein alignment feature to assess whether a variant of interest affects an evolutionarily conserved domain or amino acid. Finally, we will discuss three different use-cases of this website. MARRVEL is an easily accessible open access website designed for both clinical and basic researchers and serves as a starting point to design experiments for functional studies.
Collapse
Affiliation(s)
- Julia Wang
- Program in Developmental Biology, Baylor College of Medicine; Medical Scientist Training Program, Baylor College of Medicine
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital
| | - Hugo J Bellen
- Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Molecular and Human Genetics, Baylor College of Medicine; Department of Neuroscience, Baylor College of Medicine; Howard Hughes Medical Institute, Baylor College of Medicine
| | - Shinya Yamamoto
- Program in Developmental Biology, Baylor College of Medicine; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital; Department of Molecular and Human Genetics, Baylor College of Medicine; Department of Neuroscience, Baylor College of Medicine;
| |
Collapse
|
142
|
Bogue MA, Grubb SC, Walton DO, Philip VM, Kolishovski G, Stearns T, Dunn MH, Skelly DA, Kadakkuzha B, TeHennepe G, Kunde-Ramamoorthy G, Chesler EJ. Mouse Phenome Database: an integrative database and analysis suite for curated empirical phenotype data from laboratory mice. Nucleic Acids Res 2019; 46:D843-D850. [PMID: 29136208 PMCID: PMC5753241 DOI: 10.1093/nar/gkx1082] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 10/19/2017] [Indexed: 12/25/2022] Open
Abstract
The Mouse Phenome Database (MPD; https://phenome.jax.org) is a widely used resource that provides access to primary experimental trait data, genotypic variation, protocols and analysis tools for mouse genetic studies. Data are contributed by investigators worldwide and represent a broad scope of phenotyping endpoints and disease-related traits in naïve mice and those exposed to drugs, environmental agents or other treatments. MPD houses individual animal data with detailed, searchable protocols, and makes these data available to other resources via API. MPD provides rigorous curation of experimental data and supporting documentation using relevant ontologies and controlled vocabularies. Most data in MPD are from inbreds and other reproducible strains such that the data are cumulative over time and across laboratories. The resource has been expanded to include the QTL Archive and other primary phenotype data from mapping crosses as well as advanced high-diversity mouse populations including the Collaborative Cross and Diversity Outbred mice. Furthermore, MPD provides a means of assessing replicability and reproducibility across experimental conditions and protocols, benchmarking assays in users’ own laboratories, identifying sensitized backgrounds for making new mouse models with genome editing technologies, analyzing trait co-inheritance, finding the common genetic basis for multiple traits and assessing sex differences and sex-by-genotype interactions.
Collapse
Affiliation(s)
- Molly A Bogue
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | | | | | | - Tim Stearns
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | | | | | | | | | |
Collapse
|
143
|
Davis AP, Wiegers J, Wiegers TC, Mattingly CJ. Public data sources to support systems toxicology applications. CURRENT OPINION IN TOXICOLOGY 2019; 16:17-24. [PMID: 33604492 PMCID: PMC7889036 DOI: 10.1016/j.cotox.2019.03.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Public databases provide a wealth of freely available information about chemicals, genes, proteins, biological networks, phenotypes, diseases, and exposure science that can be integrated to construct pathways for systems toxicology applications. Relating this disparate information from public repositories, however, can be challenging since databases use a variety of ways to represent, describe, and make available their content. The use of standard vocabularies to annotate key data concepts, however, allows the information to be more easily exchanged and combined for discovery of new findings. We explore some of the many public data sources currently available to support systems toxicology, and demonstrate the value of standardizing data to help construct chemical-induced outcome pathways.
Collapse
Affiliation(s)
- Allan Peter Davis
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Jolene Wiegers
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Thomas C Wiegers
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Carolyn J Mattingly
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
- Center for Human Health and the Environment, North Carolina State University, Raleigh, North Carolina 27695, United States
| |
Collapse
|
144
|
Boyles R, Thessen A, Waldrop A, Haendel M. Ontology-based data integration for advancing toxicological knowledge. CURRENT OPINION IN TOXICOLOGY 2019. [DOI: 10.1016/j.cotox.2019.05.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
145
|
Fang H, De Wolf H, Knezevic B, Burnham KL, Osgood J, Sanniti A, Lledó Lara A, Kasela S, De Cesco S, Wegner JK, Handunnetthi L, McCann FE, Chen L, Sekine T, Brennan PE, Marsden BD, Damerell D, O'Callaghan CA, Bountra C, Bowness P, Sundström Y, Milani L, Berg L, Göhlmann HW, Peeters PJ, Fairfax BP, Sundström M, Knight JC. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nat Genet 2019; 51:1082-1091. [PMID: 31253980 PMCID: PMC7124888 DOI: 10.1038/s41588-019-0456-1] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 05/24/2019] [Indexed: 12/22/2022]
Abstract
Most candidate drugs currently fail later-stage clinical trials, largely due to poor prediction of efficacy on early target selection1. Drug targets with genetic support are more likely to be therapeutically valid2,3, but the translational use of genome-scale data such as from genome-wide association studies for drug target discovery in complex diseases remains challenging4-6. Here, we show that integration of functional genomic and immune-related annotations, together with knowledge of network connectivity, maximizes the informativeness of genetics for target validation, defining the target prioritization landscape for 30 immune traits at the gene and pathway level. We demonstrate how our genetics-led drug target prioritization approach (the priority index) successfully identifies current therapeutics, predicts activity in high-throughput cellular screens (including L1000, CRISPR, mutagenesis and patient-derived cell assays), enables prioritization of under-explored targets and allows for determination of target-level trait relationships. The priority index is an open-access, scalable system accelerating early-stage drug target selection for immune-mediated disease.
Collapse
Affiliation(s)
- Hai Fang
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Bogdan Knezevic
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Katie L Burnham
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Julie Osgood
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Anna Sanniti
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Alicia Lledó Lara
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Silva Kasela
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Stephane De Cesco
- Alzheimer's Research UK Oxford Drug Discovery Institute, Target Discovery Institute, University of Oxford, Oxford, UK
| | | | | | - Fiona E McCann
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
| | - Liye Chen
- Botnar Research Centre, University of Oxford, Oxford, UK
| | - Takuya Sekine
- Botnar Research Centre, University of Oxford, Oxford, UK
| | - Paul E Brennan
- Alzheimer's Research UK Oxford Drug Discovery Institute, Target Discovery Institute, University of Oxford, Oxford, UK
- Structural Genomics Consortium, University of Oxford, Oxford, UK
| | - Brian D Marsden
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
- Structural Genomics Consortium, University of Oxford, Oxford, UK
| | - David Damerell
- Structural Genomics Consortium, University of Oxford, Oxford, UK
| | - Chris A O'Callaghan
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Chas Bountra
- Structural Genomics Consortium, University of Oxford, Oxford, UK
| | - Paul Bowness
- Botnar Research Centre, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
| | - Yvonne Sundström
- Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden
| | - Lili Milani
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Louise Berg
- Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden
| | | | | | - Benjamin P Fairfax
- Department of Oncology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | - Michael Sundström
- Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden
| | - Julian C Knight
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK.
| |
Collapse
|
146
|
Wang LL, Thomas Hayman G, Smith JR, Tutaj M, Shimoyama ME, Gennari JH. Predicting instances of pathway ontology classes for pathway integration. J Biomed Semantics 2019; 10:11. [PMID: 31196182 PMCID: PMC6567466 DOI: 10.1186/s13326-019-0202-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 05/22/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND To improve the outcomes of biological pathway analysis, a better way of integrating pathway data is needed. Ontologies can be used to organize data from disparate sources, and we leverage the Pathway Ontology as a unifying ontology for organizing pathway data. We aim to associate pathway instances from different databases to the appropriate class in the Pathway Ontology. RESULTS Using a supervised machine learning approach, we trained neural networks to predict mappings between Reactome pathways and Pathway Ontology (PW) classes. For 2222 Reactome classes, the neural network (NN) model generated 10,952 class recommendations. We compared against a baseline bag-of-words (BOW) model for predicting correct PW classes. A 5% subset of Reactome pathways (111 pathways) was randomly selected, and the corresponding class recommendations from both models were evaluated by two curators. The precision of the BOW model was higher (0.49 for BOW and 0.39 for NN), but the recall was lower (0.42 for BOW and 0.78 for NN). Around 78% of Reactome pathways received pertinent recommendations from the NN model. CONCLUSIONS The neural predictive model produced meaningful class recommendations that assisted PW curators in selecting appropriate class mappings for Reactome pathways. Our methods can be used to reduce the manual effort associated with ontology curation, and more broadly, for augmenting the curators' ability to organize and integrate data from pathway databases using the Pathway Ontology.
Collapse
Affiliation(s)
- Lucy Lu Wang
- Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican St, Seattle, 98109, WA, USA.
| | - G Thomas Hayman
- Department of Biomedical Engineering, Medical College of Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, 53226, WI, USA
| | - Jennifer R Smith
- Department of Biomedical Engineering, Medical College of Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, 53226, WI, USA
| | - Monika Tutaj
- Department of Biomedical Engineering, Medical College of Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, 53226, WI, USA
| | - Mary E Shimoyama
- Department of Biomedical Engineering, Medical College of Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, 53226, WI, USA
| | - John H Gennari
- Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican St, Seattle, 98109, WA, USA
| |
Collapse
|
147
|
Ontology mapping for semantically enabled applications. Drug Discov Today 2019; 24:2068-2075. [PMID: 31158512 DOI: 10.1016/j.drudis.2019.05.020] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 04/12/2019] [Accepted: 05/28/2019] [Indexed: 12/14/2022]
Abstract
In this review, we provide a summary of recent progress in ontology mapping (OM) at a crucial time when biomedical research is under a deluge of an increasing amount and variety of data. This is particularly important for realising the full potential of semantically enabled or enriched applications and for meaningful insights, such as drug discovery, using machine-learning technologies. We discuss challenges and solutions for better ontology mappings, as well as how to select ontologies before their application. In addition, we describe tools and algorithms for ontology mapping, including evaluation of tool capability and quality of mappings. Finally, we outline the requirements for an ontology mapping service (OMS) and the progress being made towards implementation of such sustainable services.
Collapse
|
148
|
New models for human disease from the International Mouse Phenotyping Consortium. Mamm Genome 2019; 30:143-150. [PMID: 31127358 PMCID: PMC6606664 DOI: 10.1007/s00335-019-09804-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 05/15/2019] [Indexed: 12/21/2022]
Abstract
The International Mouse Phenotyping Consortium (IMPC) continues to expand the catalogue of mammalian gene function by conducting genome and phenome-wide phenotyping on knockout mouse lines. The extensive and standardized phenotype screens allow the identification of new potential models for human disease through cross-species comparison by computing the similarity between the phenotypes observed in the mutant mice and the human phenotypes associated to their orthologous loci in Mendelian disease. Here, we present an update on the novel disease models available from the most recent data release (DR10.0), with 5861 mouse genes fully or partially phenotyped and a total number of 69,982 phenotype calls reported. With approximately one-third of human Mendelian genes with orthologous null mouse phenotypes described, the range of available models relevant for human diseases keeps increasing. Among the breadth of new data, we identify previously uncharacterized disease genes in the mouse and additional phenotypes for genes with existing mutant lines mimicking the associated disorder. The automated and unbiased discovery of relevant models for all types of rare diseases implemented by the IMPC constitutes a powerful tool for human genetics and precision medicine.
Collapse
|
149
|
Saul MC, Philip VM, Reinholdt LG, Chesler EJ. High-Diversity Mouse Populations for Complex Traits. Trends Genet 2019; 35:501-514. [PMID: 31133439 DOI: 10.1016/j.tig.2019.04.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 04/19/2019] [Accepted: 04/22/2019] [Indexed: 12/21/2022]
Abstract
Contemporary mouse genetic reference populations are a powerful platform to discover complex disease mechanisms. Advanced high-diversity mouse populations include the Collaborative Cross (CC) strains, Diversity Outbred (DO) stock, and their isogenic founder strains. When used in systems genetics and integrative genomics analyses, these populations efficiently harnesses known genetic variation for precise and contextualized identification of complex disease mechanisms. Extensive genetic, genomic, and phenotypic data are already available for these high-diversity mouse populations and a growing suite of data analysis tools have been developed to support research on diverse mice. This integrated resource can be used to discover and evaluate disease mechanisms relevant across species.
Collapse
Affiliation(s)
- Michael C Saul
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | - Vivek M Philip
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | | | -
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA; UNC Chapel Hill, Chapel Hill, NC, USA; SUNY Binghamton, Binghamton, NY, USA; Pittsburgh University, Pittsburgh, PA, USA
| | - Elissa J Chesler
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA.
| |
Collapse
|
150
|
Coll-Tané M, Krebbers A, Castells-Nobau A, Zweier C, Schenck A. Intellectual disability and autism spectrum disorders 'on the fly': insights from Drosophila. Dis Model Mech 2019; 12:dmm039180. [PMID: 31088981 PMCID: PMC6550041 DOI: 10.1242/dmm.039180] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Intellectual disability (ID) and autism spectrum disorders (ASD) are frequently co-occurring neurodevelopmental disorders and affect 2-3% of the population. Rapid advances in exome and genome sequencing have increased the number of known implicated genes by threefold, to more than a thousand. The main challenges in the field are now to understand the various pathomechanisms associated with this bewildering number of genetic disorders, to identify new genes and to establish causality of variants in still-undiagnosed cases, and to work towards causal treatment options that so far are available only for a few metabolic conditions. To meet these challenges, the research community needs highly efficient model systems. With an increasing number of relevant assays and rapidly developing novel methodologies, the fruit fly Drosophila melanogaster is ideally positioned to change gear in ID and ASD research. The aim of this Review is to summarize some of the exciting work that already has drawn attention to Drosophila as a model for these disorders. We highlight well-established ID- and ASD-relevant fly phenotypes at the (sub)cellular, brain and behavioral levels, and discuss strategies of how this extraordinarily efficient and versatile model can contribute to 'next generation' medical genomics and to a better understanding of these disorders.
Collapse
Affiliation(s)
- Mireia Coll-Tané
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Alina Krebbers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Anna Castells-Nobau
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Christiane Zweier
- Institute of Human Genetics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Annette Schenck
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|