1
|
Palacz W, Lichołai S, Musiał J, Wawrzycka-Adamczyk K, Ślusarczyk G, Strug B, Yaman B, Tesi M, Gisslander K, O'Sullivan D, Vaglio A, Emmi G, Little MA, Wójcik K. Ontology-based integration and querying of heterogeneous rare disease data sources - POLVAS perspective. Comput Biol Med 2025; 185:109452. [PMID: 39626458 DOI: 10.1016/j.compbiomed.2024.109452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/14/2024] [Accepted: 11/15/2024] [Indexed: 01/26/2025]
Abstract
The integration of rare disease medical databases belonging to different countries is an important problem, as a large number of observations are required for reliable statistical inference of patient data in order to facilitate clinical research. Such integration of national registry data, which requires harmonization of the heterogeneous data sets into a unified view, is facilitated in the European FAIRVASC project by developing a domain-specific ontology. The FAIRVASC project is dedicated to the rare disease of anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV). This paper focuses on the practical issues and challenges, encountered during the process of integrating the Polish national database POLVAS into the federated database within the FAIRVASC project. It discusses the use of ontology-based methods for data integration and the importance of ensuring patient privacy and data protection. It addresses the problem of missing information in POLVAS, which can be obtained by aggregating other data available within the database, incompatibility of data types and formats, and mapping polish data names into the common vocabulary. The modifications of mappings used to 'uplift' national data into the Resource Description Framework (RDF) triplestore are also proposed. The described methods allow for integrating the Polish national database into the European network over which federated queries are performed.
Collapse
Affiliation(s)
- Wojciech Palacz
- Institute of Applied Computer Science, Jagiellonian University, ul. Łojasiewicza 11, 30-048 Kraków, Poland.
| | - Sabina Lichołai
- Division of Molecular Biology and Clinical Genetics, Jagiellonian University Medical College, ul. Skawińska 8, 31-066 Kraków, Poland.
| | - Jacek Musiał
- 2nd Department of Internal Medicine, Jagiellonian University Medical College, ul. Jakubowskiego 2, 30-688 Kraków, Poland.
| | - Katarzyna Wawrzycka-Adamczyk
- 2nd Department of Internal Medicine, Jagiellonian University Medical College, ul. Jakubowskiego 2, 30-688 Kraków, Poland
| | - Grażyna Ślusarczyk
- Institute of Applied Computer Science, Jagiellonian University, ul. Łojasiewicza 11, 30-048 Kraków, Poland.
| | - Barbara Strug
- Institute of Applied Computer Science, Jagiellonian University, ul. Łojasiewicza 11, 30-048 Kraków, Poland.
| | - Beyza Yaman
- ADAPT Centre for Digital Content, O'Reilly Institute, Trinity College Dublin, Dublin 2, Ireland.
| | - Michelangelo Tesi
- Nephrology and Dialysis Unit, Azienda Ospedaliera Universitaria Meyer IRCCS, Firenze, Italy.
| | - Karl Gisslander
- Department of Clinical Sciences - Rheumatology, Lund University, Lund, SE-221 85, Sweden.
| | - Declan O'Sullivan
- ADAPT Centre for Digital Content, O'Reilly Institute, Trinity College Dublin, Dublin 2, Ireland.
| | - Augusto Vaglio
- Nephrology and Dialysis Unit, Azienda Ospedaliera Universitaria Meyer IRCCS, Firenze, Italy; Department of Biomedical, Experimental and Clinical Sciences, University of Florence, Firenze, Italy.
| | - Giacomo Emmi
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy; Clinical Medicine and Rheumatology Unit, Cattinara University Hospital, Trieste, Italy; Centre for Inflammatory Diseases, Monash University Department of Medicine, Monash Medical Centre, Melbourne, Australia.
| | - Mark A Little
- ADAPT Centre for Digital Content, O'Reilly Institute, Trinity College Dublin, Dublin 2, Ireland.
| | - Krzysztof Wójcik
- 2nd Department of Internal Medicine, Jagiellonian University Medical College, ul. Jakubowskiego 2, 30-688 Kraków, Poland.
| |
Collapse
|
2
|
Gahlot U, Sharma YK, Patel J, Ragumani S. Google trend analysis of the Indian population reveals a panel of seasonally sensitive comorbid symptoms with implications for monitoring the seasonally sensitive human population. Popul Health Metr 2024; 22:40. [PMID: 39736745 DOI: 10.1186/s12963-024-00349-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 10/13/2024] [Indexed: 01/01/2025] Open
Abstract
Seasonal variations in the environment induce observable changes in the human physiological system and manifest as various clinical symptoms in a specific human population. Our earlier studies predicted four global severe seasonal sensitive comorbid lifestyle diseases (SCLDs), namely, asthma, obesity, hypertension, and fibrosis. Our studies further indicated that the SCLD category of the human population may be maladapted or unacclimatized to seasonal changes. The current study aimed to explore the major seasonal symptoms associated with SCLD and evaluate their seasonal linkages via Google Trends (GT). We used the Human Disease Symptom Network (HSDN) to dissect common symptoms of SCLD. We then exploited medical databases and medical literature resources in consultation with medical practitioners to narrow down the clinical symptoms associated with four SCLDs, namely, pulmonary hypertension, pulmonary fibrosis, asthma, and obesity. Our study revealed a strong association of 12 clinical symptoms with SCLD. Each clinical symptom was further subjected to GT analysis to address its seasonal linkage. The GT search was carried out in the Indian population for the period from January 2015-December 2019. In the GT analysis, 11 clinical symptoms were strongly associated with Indian seasonal changes, with the exception of hypergammaglobulinemia, due to the lack of GT data in the Indian population. These 11 symptoms also presented sudden increases or decreases in search volume during the two major Indian seasonal transition months, namely, March and November. Moreover, in addition to SCLD, several seasonally associated clinical disorders share most of these 12 symptoms. In this regard, we named these 12 symptoms the "seasonal sensitive comorbid symptoms (SSC)" of the human population. Further clinical studies are needed to verify the utility of these symptoms in screening seasonally maladapted human populations. We also warrant that clinicians and researcher be well aware of the limitations and pitfalls of GT before correlating the clinical outcome of SSC symptoms with GT.
Collapse
Affiliation(s)
- Urmila Gahlot
- Bioinformatics Group, Defense Institute of Physiology and Allied Sciences, Defense Research and Development Organization, Lucknow Road, Timarpur, Delhi, India
| | - Yogendra Kumar Sharma
- Bioinformatics Group, Defense Institute of Physiology and Allied Sciences, Defense Research and Development Organization, Lucknow Road, Timarpur, Delhi, India
| | - Jaichand Patel
- Bioinformatics Group, Defense Institute of Physiology and Allied Sciences, Defense Research and Development Organization, Lucknow Road, Timarpur, Delhi, India
| | - Sugadev Ragumani
- Bioinformatics Group, Defense Institute of Physiology and Allied Sciences, Defense Research and Development Organization, Lucknow Road, Timarpur, Delhi, India.
| |
Collapse
|
3
|
Leist IC, Rivas-Torrubia M, Alarcón-Riquelme ME, Barturen G, Consortium PC, Gut IG, Rueda M. Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. BMC Bioinformatics 2024; 25:373. [PMID: 39633268 PMCID: PMC11616229 DOI: 10.1186/s12859-024-05993-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 11/19/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats. However, matching participants using GA4GH-based formats remains challenging, as current methods are not fully compatible, limiting their effectiveness. RESULTS Here, we introduce Pheno-Ranker, an open-source software toolkit for individual-level comparison of phenotypic data. As input, it accepts JSON/YAML data exchange formats from Beacon v2 and Phenopackets v2 data models, as well as any data structure encoded in JSON, YAML, or CSV formats. Internally, the hierarchical data structure is flattened to one dimension and then transformed through one-hot encoding. This allows for efficient pairwise (all-to-all) comparisons within cohorts or for matching of a patient's profile in cohorts. Users have the flexibility to refine their comparisons by including or excluding terms, applying weights to variables, and obtaining statistical significance through Z-scores and p-values. The output consists of text files, which can be further analyzed using unsupervised learning techniques, such as clustering or multidimensional scaling (MDS), and with graph analytics. Pheno-Ranker's performance has been validated with simulated and synthetic data, showing its accuracy, robustness, and efficiency across various health data scenarios. A real data use case from the PRECISESADS study highlights its practical utility in clinical research. CONCLUSIONS Pheno-Ranker is a user-friendly, lightweight software for semantic similarity analysis of phenotypic data in Beacon v2 and Phenopackets v2 formats, extendable to other data types. It enables the comparison of a wide range of variables beyond HPO or OMIM terms while preserving full context. The software is designed as a command-line tool with additional utilities for CSV import, data simulation, summary statistics plotting, and QR code generation. For interactive analysis, it also includes a web-based user interface built with R Shiny. Links to the online documentation, including a Google Colab tutorial, and the tool's source code are available on the project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker .
Collapse
Affiliation(s)
- Ivo C Leist
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain
- Universitat de Barcelona (UB), Barcelona, Spain
| | - María Rivas-Torrubia
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
| | - Marta E Alarcón-Riquelme
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Guillermo Barturen
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
- Department of Genetics, Faculty of Science, University of Granada, 18071, Granada, Spain
- Bioinformatics Laboratory, Centro de Investigación Biomédica, Biotechnology Institute, PTS, Avda del Conocimiento S/N, 18100, Granada, Spain
| | | | - Ivo G Gut
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain
- Universitat de Barcelona (UB), Barcelona, Spain
| | - Manuel Rueda
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.
- Universitat de Barcelona (UB), Barcelona, Spain.
| |
Collapse
|
4
|
Slaba K, Pokorna P, Jugas R, Palova H, Prochazkova D, Aulicka S, Spanelova K, Danhofer P, Horak O, Tuckova J, Kleiblova P, Gaillyova R, Hrunka M, Jouza M, Pinkova B, Papez J, Konecna P, Zidkova J, Stourac P, Sterba J, Demlova R, Demlova E, Jabandziev P, Slaby O. Diagnostic efficacy and clinical utility of whole-exome sequencing in Czech pediatric patients with rare and undiagnosed diseases. Sci Rep 2024; 14:28780. [PMID: 39567597 PMCID: PMC11579298 DOI: 10.1038/s41598-024-79872-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 11/13/2024] [Indexed: 11/22/2024] Open
Abstract
In the last decade, undiagnosed disease programs have emerged to address the significant number of individuals with suspected but undiagnosed rare genetic diseases. In our single-center study, we have launched a pilot program for pediatric patients with undiagnosed diseases in the second-largest university hospital in the Czech Republic. This study was prospectively conducted at the Department of Pediatrics at University Hospital Brno between 2020 and 2023. A total of 58 Czech patients with undiagnosed diseases were enrolled in the study. All children underwent singleton WES with targeted phenotype-driven analysis. We identified 28 variants, including 11 pathogenic, 13 likely pathogenic, and 4 VUS according to ACMG guidelines, as diagnostic of genetic diseases in 25 patients, resulting in an overall diagnostic yield of 43%. Eleven variants were novel and had not been previously reported in any public database. The overall clinical utility (actionability) enabling at least one type of change in the medical care of the patient was 76%, whereas the average number of clinical implications to individual patient care was two. Singleton WES facilitated the diagnostic process in the Czech undiagnosed pediatric population. We believe it is an effective approach to enable appropriate counseling, surveillance, and personalized clinical management.
Collapse
Affiliation(s)
- Katerina Slaba
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic.
| | - Petra Pokorna
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Robin Jugas
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Hana Palova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Dagmar Prochazkova
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Stefania Aulicka
- Department of Pediatric Neurology, University Hospital Brno, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Klara Spanelova
- Department of Pediatric Neurology, University Hospital Brno, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Pavlina Danhofer
- Department of Pediatric Neurology, University Hospital Brno, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Ondrej Horak
- Department of Pediatric Neurology, University Hospital Brno, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jana Tuckova
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Petra Kleiblova
- Institute of Medical Biochemistry and Laboratory Diagnostics, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Renata Gaillyova
- Department of Medical Genetics and Genomics, University Hospital Brno, Faculty of Medicine, Masaryk University Brno, Brno, Czech Republic
| | - Matej Hrunka
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Martin Jouza
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Blanka Pinkova
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Jan Papez
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Petra Konecna
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
| | - Jana Zidkova
- Centre of Molecular Biology and Genetics, Department of Hematology, Oncology and Internal Medicine, Masaryk University and University Hospital Brno, Brno, Czech Republic
| | - Petr Stourac
- Department of Pediatric Anesthesiology and Intensive Care Medicine, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Department of Simulation Medicine, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jaroslav Sterba
- Department of Pediatric Oncology, Faculty of Medicine, University Hospital Brno, Masaryk University, Brno, Czech Republic
| | - Regina Demlova
- Department of Pharmacology/CZECRIN, Masaryk University Faculty of Medicine, Brno, Czech Republic
| | - Eva Demlova
- Department of Pharmacology/CZECRIN, Masaryk University Faculty of Medicine, Brno, Czech Republic
| | - Petr Jabandziev
- Department of Pediatrics, University Hospital Brno, Faculty of Medicine, Masaryk University, Cernopolni 9, 613 00, Brno, Czech Republic
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Ondrej Slaby
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
- Department of Biology, Faculty of Medicine, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic.
- Center of Precision Medicine, Department of Pathology, University Hospital Brno, Faculty of Medicine, Masaryk University, Brno, Czech Republic.
| |
Collapse
|
5
|
Isakov O, Marek-Yagel D, Greenberg R, Naftali M, Ben-Shachar S. PANGEN: an online platform for the comparison and creation of diagnostic gene panels. Database (Oxford) 2024; 2024:baae065. [PMID: 39043627 PMCID: PMC11265858 DOI: 10.1093/database/baae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/23/2024] [Accepted: 07/19/2024] [Indexed: 07/25/2024]
Abstract
Targeted gene panel sequencing is used to limit the search for causative genetic variants solely to genes with an established association with the phenotype. The design of gene panels is challenging due to the lack of consensus regarding phenotypic associations for some genes, which results in high variation in gene composition for the same panel offered by different laboratories. We developed PANGEN, a platform that provides a centralized resource for gene panel information, with the ability to compare and generate new intelligent diagnostic panels. Gene-phenotype associations were collected from 12 public and commercial sources (Blueprint, Cegat, Centogene, ClinGen, Fulgent, GeneDx, Health in Code, Human Phenotype Ontology, Invitae, PanelApp, Prevention genetics, and Pronto diagnostics). Gene-phenotype associations are categorized into tiers according to categories derived from the original source panel. Pairwise panel similarity was calculated by dividing the number of common genes by the total number of genes in both panels. Regions with extreme guanine-cytosine (GC) content were collected from the Genome in a Bottle stratifications dataset, and putative genomic duplications were retrieved from the University of Santa Cruz database. Overall, 1533 panels, 9759 phenotypes, and 6979 genes were collected. The platform provides an interface to (i) explore and compare collected panels, (ii) find similar panels, (iii) identify genes with high GC content or duplication levels, (iv) generate gene panels by combining panels from various sources, and (v) stratify a generated panel into genes with a strong phenotype association ('core') and those with a weaker association ('extended'). The presented platform represents a unique resource for gene panel exploration and comparison that facilitates the generation of tailored diagnostic panels through a public online web server. Database URL: https://c-gc.shinyapps.io/PANGEN/.
Collapse
Affiliation(s)
- Ofer Isakov
- Raphael Recanati Genetic Institute, Rabin Medical Center-Beilinson Hospital, Zeev Jabotinsky 39, Petach Tikva 4941492, Israel
- Clalit Research Institute, Clalit Health Services, Tuval 40, Ramat Gan 5252247, Israel
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration, Harvard Medical School and Clalit Research Institute, 10 Shattuck Street, Suite 514, Boston, MA 02115, USA
- Faculty of Medicine, Tel Aviv University, Klachkin 35, Tel Aviv 6997801, Israel
| | - Dina Marek-Yagel
- Clalit Research Institute, Clalit Health Services, Tuval 40, Ramat Gan 5252247, Israel
| | - Rotem Greenberg
- Clalit Research Institute, Clalit Health Services, Tuval 40, Ramat Gan 5252247, Israel
| | - Michal Naftali
- Clalit Research Institute, Clalit Health Services, Tuval 40, Ramat Gan 5252247, Israel
| | - Shay Ben-Shachar
- Clalit Research Institute, Clalit Health Services, Tuval 40, Ramat Gan 5252247, Israel
- The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration, Harvard Medical School and Clalit Research Institute, 10 Shattuck Street, Suite 514, Boston, MA 02115, USA
- Faculty of Medicine, Tel Aviv University, Klachkin 35, Tel Aviv 6997801, Israel
| |
Collapse
|
6
|
Palmer EE, Cederroth H, Cederroth M, Delgado-Vega AM, Roberts N, Taylan F, Nordgren A, Botto LD. Equity in action: The Diagnostic Working Group of The Undiagnosed Diseases Network International. NPJ Genom Med 2024; 9:37. [PMID: 38965249 PMCID: PMC11224220 DOI: 10.1038/s41525-024-00422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 05/29/2024] [Indexed: 07/06/2024] Open
Abstract
Rare diseases are recognized as a global public health priority. A timely and accurate diagnosis is a critical enabler for precise and personalized health care. However, barriers to rare disease diagnoses are especially steep for those from historically underserved communities, including low- and middle-income countries. The Undiagnosed Diseases Network International (UDNI) was launched in 2015 to help fill the knowledge gaps that impede diagnosis for rare diseases, and to foster the translation of research into medical practice, aided by active patient involvement. To better pursue these goals, in 2021 the UDNI established the Diagnostic Working Group of the UDNI (UDNI DWG) as a community of practice that would (a) accelerate diagnoses for more families; (b) support and share knowledge and skills by developing Undiagnosed Diseases Programs, particularly those in lower resource areas; and (c) promote discovery and expand global medical knowledge. This Perspectives article documents the initial establishment and iterative co-design of the UDNI DWG.
Collapse
Affiliation(s)
- Elizabeth Emma Palmer
- Discipline of Paediatrics and Child Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia.
- Centre for Clinical Genetics, Sydney Childrens' Hospitals Network, Sydney, NSW, Australia.
| | | | | | - Angelica Maria Delgado-Vega
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| | - Natalie Roberts
- Discipline of Paediatrics and Child Health, School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
| | - Fulya Taylan
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, Stockholm, Sweden
- Institute of Biomedicine, Department of Laboratory Medicine, University of Gothenburg, Gothenburg, Sweden
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Lorenzo D Botto
- Division of Medical Genetics, Department of Pediatrics, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
7
|
Mora S, Gazzarata R, Blobel B, Murgia Y, Giacomini M. Transforming Ontology Web Language Elements into Common Terminology Service 2 Terminology Resources. J Pers Med 2024; 14:676. [PMID: 39063930 PMCID: PMC11277904 DOI: 10.3390/jpm14070676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 06/17/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open
Abstract
Communication and cooperation are fundamental for the correct deployment of P5 medicine, and this can be achieved only by correct comprehension of semantics so that it can aspire to medical knowledge sharing. There is a hierarchy in the operations that need to be performed to achieve this goal that brings to the forefront the complete understanding of the real-world business system by domain experts using Domain Ontologies, and only in the last instance acknowledges the specific transformation at the pure information and communication technology level. A specific feature that should be maintained during such types of transformations is versioning that aims to record the evolution of meanings in time as well as the management of their historical evolution. The main tool used to represent ontology in computing environments is the Ontology Web Language (OWL), but it was not created for managing the evolution of meanings in time. Therefore, we tried, in this paper, to find a way to use the specific features of Common Terminology Service-Release 2 (CTS2) to perform consistent and validated transformations of ontologies written in OWL. The specific use case managed in the paper is the Alzheimer's Disease Ontology (ADO). We were able to consider all of the elements of ADO and map them with CTS2 terminological resources, except for a subset of elements such as the equivalent class derived from restrictions on other classes.
Collapse
Affiliation(s)
- Sara Mora
- UO Information and Communication Technologies, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, 16132 Genoa, Italy;
| | - Roberta Gazzarata
- Healthropy Società a Responsabilità Limitata (S.R.L.), 17100 Savona, Italy;
| | - Bernd Blobel
- Medical Faculty, University of Regensburg, 93053 Regensburg, Germany;
| | - Ylenia Murgia
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145 Genova, Italy;
| | - Mauro Giacomini
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145 Genova, Italy;
| |
Collapse
|
8
|
Santos R, Moreno-Torres V, Pintos I, Corral O, de Mendoza C, Soriano V, Corpas M. Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients. GIGABYTE 2024; 2024:gigabyte127. [PMID: 38948510 PMCID: PMC11211761 DOI: 10.46471/gigabyte.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/04/2024] [Indexed: 07/02/2024] Open
Abstract
Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.
Collapse
Affiliation(s)
- Renato Santos
- National Heart & Lung Institute, Imperial College London, London, UK
| | - Víctor Moreno-Torres
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Ilduara Pintos
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Octavio Corral
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Carmen de Mendoza
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Madrid, Spain
| | - Vicente Soriano
- Health Sciences School & Medical Centre, Universidad Internacional La Rioja (UNIR), Madrid, Spain
| | - Manuel Corpas
- School of Life Sciences, University of Westminster, London, UK
| |
Collapse
|
9
|
Gallego D, Serrano M, Cordoba-Caballero J, Gámez A, Seoane P, Perkins JR, Ranea JAG, Pérez B. Transcriptomic analysis identifies dysregulated pathways and therapeutic targets in PMM2-CDG. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167163. [PMID: 38599261 DOI: 10.1016/j.bbadis.2024.167163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/15/2024] [Accepted: 04/04/2024] [Indexed: 04/12/2024]
Abstract
PMM2-CDG (MIM # 212065), the most common congenital disorder of glycosylation, is caused by the deficiency of phosphomannomutase 2 (PMM2). It is a multisystemic disease of variable severity that particularly affects the nervous system; however, its molecular pathophysiology remains poorly understood. Currently, there is no effective treatment. We performed an RNA-seq based transcriptomic study using patient-derived fibroblasts to gain insight into the mechanisms underlying the clinical symptomatology and to identify druggable targets. Systems biology methods were used to identify cellular pathways potentially affected by PMM2 deficiency, including Senescence, Bone regulation, Cell adhesion and Extracellular Matrix (ECM) and Response to cytokines. Functional validation assays using patients' fibroblasts revealed defects related to cell proliferation, cell cycle, the composition of the ECM and cell migration, and showed a potential role of the inflammatory response in the pathophysiology of the disease. Furthermore, treatment with a previously described pharmacological chaperone reverted the differential expression of some of the dysregulated genes. The results presented from transcriptomic data might serve as a platform for identifying therapeutic targets for PMM2-CDG, as well as for monitoring the effectiveness of therapeutic strategies, including pharmacological candidates and mannose-1-P, drug repurposing.
Collapse
Affiliation(s)
- Diana Gallego
- Centro de Diagnóstico de Enfermedades Moleculares, Centro de Biología Molecular-SO UAM-CSIC, Universidad Autónoma de Madrid, Campus de Cantoblanco, U746- CIBER de Enfermedades Raras (CIBERER), Instituto de Investigación Sanitaria IdiPAZ, 28049 Madrid, Spain
| | - Mercedes Serrano
- Pediatric Neurology Department, Hospital Sant Joan de Déu, Institut de Recerca Sant Joan de Déu, 08950 Barcelona, Spain; U-703 Centre for Biomedical Research on Rare Diseases (CIBER-ER), Instituto de Salud Carlos III, Spain
| | - Jose Cordoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain; U-741, CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Alejandra Gámez
- Centro de Diagnóstico de Enfermedades Moleculares, Centro de Biología Molecular-SO UAM-CSIC, Universidad Autónoma de Madrid, Campus de Cantoblanco, U746- CIBER de Enfermedades Raras (CIBERER), Instituto de Investigación Sanitaria IdiPAZ, 28049 Madrid, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain; U-741, CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - James R Perkins
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain; U-741, CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain; The Biomedical Research Institute of Málaga (IBIMA), Málaga, Spain; Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain; U-741, CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain; The Biomedical Research Institute of Málaga (IBIMA), Málaga, Spain; Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Madrid, Spain.
| | - Belén Pérez
- Centro de Diagnóstico de Enfermedades Moleculares, Centro de Biología Molecular-SO UAM-CSIC, Universidad Autónoma de Madrid, Campus de Cantoblanco, U746- CIBER de Enfermedades Raras (CIBERER), Instituto de Investigación Sanitaria IdiPAZ, 28049 Madrid, Spain.
| |
Collapse
|
10
|
Aughey G, Cali E, Maroofian R, Zaki MS, Pagnamenta AT, Rahman F, Menzies L, Shafique A, Suri M, Roze E, Aguennouz M, Ghizlane Z, Saadi SM, Ali Z, Abdulllah U, Cheema HA, Anjum MN, Morel G, McFarland R, Altunoglu U, Kraus V, Shoukier M, Murphy D, Flemming K, Yttervik H, Rhouda H, Lesca G, Murtaza BN, Rehman MU, Consortium GE, Seo GH, Beetz C, Kayserili H, Krioulie Y, Chung WK, Naz S, Maqbool S, Gleeson J, Baig SM, Efthymiou S, Taylor JC, Severino M, Jepson JE, Houlden H. Clinical and neurogenetic characterisation of autosomal recessive RBL2-associated progressive neurodevelopmental disorder. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.03.24306631. [PMID: 38746364 PMCID: PMC11092723 DOI: 10.1101/2024.05.03.24306631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Retinoblastoma (RB) proteins are highly conserved transcriptional regulators that play important roles during development by regulating cell-cycle gene expression. RBL2 dysfunction has been linked to a severe neurodevelopmental disorder. However, to date, clinical features have only been described in six individuals carrying five biallelic predicted loss of function (pLOF) variants. To define the phenotypic effects of RBL2 mutations in detail, we identified and clinically characterized a cohort of 28 patients from 18 families carrying LOF variants in RBL2 , including fourteen new variants that substantially broaden the molecular spectrum. The clinical presentation of affected individuals is characterized by a range of neurological and developmental abnormalities. Global developmental delay and intellectual disability were uniformly observed, ranging from moderate to profound and involving lack of acquisition of key motor and speech milestones in most patients. Frequent features included postnatal microcephaly, infantile hypotonia, aggressive behaviour, stereotypic movements and non-specific dysmorphic features. Common neuroimaging features were cerebral atrophy, white matter volume loss, corpus callosum hypoplasia and cerebellar atrophy. In parallel, we used the fruit fly, Drosophila melanogaster , to investigate how disruption of the conserved RBL2 orthologueue Rbf impacts nervous system function and development. We found that Drosophila Rbf LOF mutants recapitulate several features of patients harboring RBL2 variants, including alterations in the head and brain morphology reminiscent of microcephaly, and perturbed locomotor behaviour. Surprisingly, in addition to its known role in controlling tissue growth during development, we find that continued Rbf expression is also required in fully differentiated post-mitotic neurons for normal locomotion in Drosophila , and that adult-stage neuronal re-expression of Rbf is sufficient to rescue Rbf mutant locomotor defects. Taken together, this study provides a clinical and experimental basis to understand genotype-phenotype correlations in an RBL2 -linked neurodevelopmental disorder and suggests that restoring RBL2 expression through gene therapy approaches may ameliorate aspects of RBL2 LOF patient symptoms.
Collapse
|
11
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
12
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
13
|
Xiang G, Guo Y, Bumcrot D, Sigova A. JMnorm: a novel joint multi-feature normalization method for integrative and comparative epigenomics. Nucleic Acids Res 2024; 52:e11. [PMID: 38055833 PMCID: PMC10810286 DOI: 10.1093/nar/gkad1146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/25/2023] [Accepted: 11/14/2023] [Indexed: 12/08/2023] Open
Abstract
Combinatorial patterns of epigenetic features reflect transcriptional states and functions of genomic regions. While many epigenetic features have correlated relationships, most existing data normalization approaches analyze each feature independently. Such strategies may distort relationships between functionally correlated epigenetic features and hinder biological interpretation. We present a novel approach named JMnorm that simultaneously normalizes multiple epigenetic features across cell types, species, and experimental conditions by leveraging information from partially correlated epigenetic features. We demonstrate that JMnorm-normalized data can better preserve cross-epigenetic-feature correlations across different cell types and enhance consistency between biological replicates than data normalized by other methods. Additionally, we show that JMnorm-normalized data can consistently improve the performance of various downstream analyses, which include candidate cis-regulatory element clustering, cross-cell-type gene expression prediction, detection of transcription factor binding and changes upon perturbations. These findings suggest that JMnorm effectively minimizes technical noise while preserving true biologically significant relationships between epigenetic datasets. We anticipate that JMnorm will enhance integrative and comparative epigenomics.
Collapse
Affiliation(s)
- Guanjue Xiang
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - Yuchun Guo
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - David Bumcrot
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| | - Alla Sigova
- CAMP4 Therapeutics Corp., One Kendall Square, Building 1400 West, Cambridge, MA 02139, USA
| |
Collapse
|
14
|
Liu Y, Sang G, Liu Z, Pan Y, Cheng J, Zhang Y. MPTN: A message-passing transformer network for drug repurposing from knowledge graph. Comput Biol Med 2024; 168:107800. [PMID: 38043469 DOI: 10.1016/j.compbiomed.2023.107800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/09/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023]
Abstract
Drug repurposing (DR) based on knowledge graphs (KGs) is challenging, which uses knowledge graph reasoning models to predict new therapeutic pathways for existing drugs. With the rapid development of computing technology and the growing availability of validated biomedical data, various knowledge graph-based methods have been widely used to analyze and process complex and novel data to discover new indications for given drugs. However, existing methods need to be improved in extracting semantic information from contextual triples of biomedical entities. In this study, we propose a message-passing transformer network named MPTN based on knowledge graph for drug repurposing. Firstly, CompGCN is used as precoder to jointly aggregate entity and relation embeddings. Then, to fully capture the semantic information of entity context triples, the message propagating transformer module is designed. The module integrates the transformer into the message passing mechanism and incorporates the attention weight information of computing entity context triples into the entity embedding to update the entity embedding. Next, the residual connection is introduced to retain information as much as possible and improve prediction accuracy. Finally, MPTN utilizes the InteractE module as the decoder to obtain heterogeneous feature interactions in entity and relation representations and predict new pathways for drug treatment. Experiments on two datasets show that the model is superior to the existing knowledge graph embedding (KGE) learning methods.
Collapse
Affiliation(s)
- Yuanxin Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Guoming Sang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Zhi Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Yilin Pan
- School of Artificial Intelligence, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Junkai Cheng
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China.
| |
Collapse
|
15
|
Blobel B, Ruotsalainen P, Oemig F, Giacomini M, Sottile PA, Endsleff F. Principles and Standards for Designing and Managing Integrable and Interoperable Transformed Health Ecosystems. J Pers Med 2023; 13:1579. [PMID: 38003894 PMCID: PMC10672117 DOI: 10.3390/jpm13111579] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/25/2023] [Accepted: 10/31/2023] [Indexed: 11/26/2023] Open
Abstract
The advancement of sciences and technologies, economic challenges, increasing expectations, and consumerism result in a radical transformation of health and social care around the globe, characterized by foundational organizational, methodological, and technological paradigm changes. The transformation of the health and social care ecosystems aims at ubiquitously providing personalized, preventive, predictive, participative precision (5P) medicine, considering and understanding the individual's health status in a comprehensive context from the elementary particle up to society. For designing and implementing such advanced ecosystems, an understanding and correct representation of the structure, function, and relations of their components is inevitable, thereby including the perspectives, principles, and methodologies of all included disciplines. To guarantee consistent and conformant processes and outcomes, the specifications and principles must be based on international standards. A core standard for representing transformed health ecosystems and managing the integration and interoperability of systems, components, specifications, and artifacts is ISO 23903:2021, therefore playing a central role in this publication. Consequently, ISO/TC 215 and CEN/TC 251, both representing the international standardization on health informatics, declared the deployment of ISO 23903:2021 mandatory for all their projects and standards addressing more than one domain. The paper summarizes and concludes the first author's leading engagement in the evolution of pHealth in Europe and beyond over the last 15 years, discussing the concepts, principles, and standards for designing, implementing, and managing 5P medicine ecosystems. It not only introduces the theoretical foundations of the approach but also exemplifies its deployment in practical projects and solutions regarding interoperability and integration in multi-domain ecosystems. The presented approach enables comprehensive and consistent integration of and interoperability between domains, systems, related actors, specifications, standards, and solutions. That way, it should help overcome the problems and limitations of data-centric approaches, which still dominate projects and products nowadays, and replace them with knowledge-centric, comprehensive, and consistent ones.
Collapse
Affiliation(s)
- Bernd Blobel
- Medical Faculty, University of Regensburg, 93053 Regensburg, Germany
- Faculty European Campus Rottal-Inn, Deggendorf Institute of Technology, 94469 Deggendorf, Germany
- First Medical Faculty, Charles University Prague, 11000 Staré Mĕsto, Czech Republic
| | - Pekka Ruotsalainen
- Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland;
| | - Frank Oemig
- IT-Consulting in Healthcare, 45472 Mülheim, Germany;
| | - Mauro Giacomini
- Department of Informatics, Bioengineering, Robotics and System Engineering, University of Genoa, 16145 Genoa, Italy;
| | | | - Frederik Endsleff
- IT Architecture, Centre for IT and Medical Technology (CIMT), The Capital Region of Denmark, 2100 Copenhagen, Denmark;
| |
Collapse
|
16
|
Corpas M, de Mendoza C, Moreno-Torres V, Pintos I, Seoane P, Perkins JR, Ranea JA, Fatumo S, Korcsmaros T, Martín-Villa JM, Barreiro P, Corral O, Soriano V. Genetic signature detected in T cell receptors from patients with severe COVID-19. iScience 2023; 26:107735. [PMID: 37720084 PMCID: PMC10504482 DOI: 10.1016/j.isci.2023.107735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/21/2023] [Accepted: 08/23/2023] [Indexed: 09/19/2023] Open
Abstract
Characterization of host genetic factors contributing to COVID-19 severity promises advances on drug discovery to fight the disease. Most genetic analyses to date have identified genome-wide significant associations involving loss-of-function variants for immune response pathways. Despite accumulating evidence supporting a role for T cells in COVID-19 severity, no definitive genetic markers have been found to support an involvement of T cell responses. We analyzed 205 whole exomes from both a well-characterized cohort of hospitalized severe COVID-19 patients and controls. Significantly enriched high impact alleles were found for 25 variants within the T cell receptor beta (TRB) locus on chromosome 7. Although most of these alleles were found in heterozygosis, at least three or more in TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, and TRBV10-1 suggested a possible TRB loss of function via compound heterozygosis. This loss-of-function in TRB genes supports suboptimal or dysfunctional T cell responses as a major contributor to severe COVID-19 pathogenesis.
Collapse
Affiliation(s)
- Manuel Corpas
- School of Life Sciences, University of Westminster, London, UK
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, UK
- UNIR Health Sciences School & Medical Center, Madrid, Spain
- Institute of Continuing Education, University of Cambridge, Cambridge, UK
| | - Carmen de Mendoza
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Spain
| | - Víctor Moreno-Torres
- UNIR Health Sciences School & Medical Center, Madrid, Spain
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Spain
| | - Ilduara Pintos
- Puerta de Hierro University Hospital & Research Institute, Majadahonda, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- The Biomedical Research Institute of Málaga (IBIMA), Málaga, Spain
| | - Juan A.G. Ranea
- Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- The Biomedical Research Institute of Málaga (IBIMA), Málaga, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Madrid, Spain
| | - Segun Fatumo
- The African Computational Genomics (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
- London School of Hygiene and Tropical Medicine, London, UK
- H3Africa Bioinformatics Network (H3ABioNet) Node, Centre for Genomics Research and Innovation, NABDA/FMST, Abuja, Nigeria
| | - Tamas Korcsmaros
- Faculty of Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK
| | | | - Pablo Barreiro
- UNIR Health Sciences School & Medical Center, Madrid, Spain
- Emergency Hospital Isabel Zendal, Madrid, Spain
| | - Octavio Corral
- UNIR Health Sciences School & Medical Center, Madrid, Spain
| | | |
Collapse
|
17
|
Gao Z, Winhusen TJ, Gorenflo M, Ghitza UE, Davis PB, Kaelber DC, Xu R. Repurposing ketamine to treat cocaine use disorder: integration of artificial intelligence-based prediction, expert evaluation, clinical corroboration and mechanism of action analyses. Addiction 2023; 118:1307-1319. [PMID: 36792381 PMCID: PMC10631254 DOI: 10.1111/add.16168] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 01/25/2023] [Indexed: 02/17/2023]
Abstract
BACKGROUND AND AIMS Cocaine use disorder (CUD) is a significant public health issue for which there is no Food and Drug Administration (FDA) approved medication. Drug repurposing looks for new cost-effective uses of approved drugs. This study presents an integrated strategy to identify repurposed FDA-approved drugs for CUD treatment. DESIGN Our drug repurposing strategy combines artificial intelligence (AI)-based drug prediction, expert panel review, clinical corroboration and mechanisms of action analysis being implemented in the National Drug Abuse Treatment Clinical Trials Network (CTN). Based on AI-based prediction and expert knowledge, ketamine was ranked as the top candidate for clinical corroboration via electronic health record (EHR) evaluation of CUD patient cohorts prescribed ketamine for anesthesia or depression compared with matched controls who received non-ketamine anesthesia or antidepressants/midazolam. Genetic and pathway enrichment analyses were performed to understand ketamine's potential mechanisms of action in the context of CUD. SETTING The study utilized TriNetX to access EHRs from more than 90 million patients world-wide. Genetic- and functional-level analyses used DisGeNet, Search Tool for Interactions of Chemicals and Kyoto Encyclopedia of Genes and Genomes databases. PARTICIPANTS A total of 7742 CUD patients who received anesthesia (3871 ketamine-exposed and 3871 anesthetic-controlled) and 7910 CUD patients with depression (3955 ketamine-exposed and 3955 antidepressant-controlled) were identified after propensity score-matching. MEASUREMENTS EHR analysis outcome was a CUD remission diagnosis within 1 year of drug prescription. FINDINGS Patients with CUD prescribed ketamine for anesthesia displayed a significantly higher rate of CUD remission compared with matched individuals prescribed other anesthetics [hazard ratio (HR) = 1.98, 95% confidence interval (CI) = 1.42-2.78]. Similarly, CUD patients prescribed ketamine for depression evidenced a significantly higher CUD remission ratio compared with matched patients prescribed antidepressants or midazolam (HR = 4.39, 95% CI = 2.89-6.68). The mechanism of action analysis revealed that ketamine directly targets multiple CUD-associated genes (BDNF, CNR1, DRD2, GABRA2, GABRB3, GAD1, OPRK1, OPRM1, SLC6A3, SLC6A4) and pathways implicated in neuroactive ligand-receptor interaction, cAMP signaling and cocaine abuse/dependence. CONCLUSIONS Ketamine appears to be a potential repurposed drug for treatment of cocaine use disorder.
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - T. John Winhusen
- Center for Addiction Research, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Maria Gorenflo
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Udi E. Ghitza
- Center for the Clinical Trials Network (CCTN), National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Pamela B. Davis
- Center for Community Health Integration, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - David C. Kaelber
- Center for Clinical Informatics Research and Education, The Metro Health System, Cleveland, OH, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
18
|
Tun X, Wang EJ, Gao Z, Lundberg K, Xu R, Hu D. Integrin β3-Mediated Cell Senescence Associates with Gut Inflammation and Intestinal Degeneration in Models of Alzheimer's Disease. Int J Mol Sci 2023; 24:5697. [PMID: 36982771 PMCID: PMC10052535 DOI: 10.3390/ijms24065697] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/02/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disorder characterized by memory loss and personality changes that ultimately lead to dementia. Currently, 50 million people worldwide suffer from dementia related to AD, and the pathogenesis underlying AD pathology and cognitive decline is unknown. While AD is primarily a neurological disease of the brain, individuals with AD often experience intestinal disorders, and gut abnormalities have been implicated as a major risk factor in the development of AD and relevant dementia. However, the mechanisms that mediate gut injury and contribute to the vicious cycle between gut abnormalities and brain injury in AD remain unknown. In the present study, a bioinformatics analysis was performed on the proteomics data of variously aged AD mouse colon tissues. We found that levels of integrin β3 and β-galactosidase (β-gal), two markers of cellular senescence, increased with age in the colonic tissue of mice with AD. The advanced artificial intelligence (AI)-based prediction of AD risk also demonstrated the association between integrin β3 and β-gal and AD phenotypes. Moreover, we showed that elevated integrin β3 levels were accompanied by senescence phenotypes and immune cell accumulation in AD mouse colonic tissue. Further, integrin β3 genetic downregulation abolished upregulated senescence markers and inflammatory responses in colonic epithelial cells in conditions associated with AD. We provide a new understanding of the molecular actions underpinning inflammatory responses during AD and suggest integrin β3 may function as novel target mediating gut abnormalities in this disease.
Collapse
Affiliation(s)
- Xin Tun
- Department of Physiology and Biophysics, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Evan J. Wang
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
- Beachwood High School, Beachwood, OH 44122, USA
| | - Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Kathleen Lundberg
- Proteomics Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Di Hu
- Department of Physiology and Biophysics, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| |
Collapse
|
19
|
Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. EBioMedicine 2023; 87:104413. [PMID: 36563487 PMCID: PMC9769411 DOI: 10.1016/j.ebiom.2022.104413] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/23/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
Collapse
Affiliation(s)
- Justin T Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Elena Casiraghi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Johanna J Loomba
- The Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Bryan Laraway
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Michael Gargano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Kenneth J Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Nariman Ammar
- Health Science Center, University of Tennessee, Memphis, TN, USA
| | - Blessy Antony
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Julie A McMurry
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Andrew Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA; Tufts University School of Medicine, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA; Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Richard Moffitt
- Department of Biomedical Informatics and Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA
| | | | | | | | - Kristin Kostka
- Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Christopher G Chute
- Schools of Medicine, Public Health and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA.
| |
Collapse
|
20
|
Martin T, Rommel K, Thomas C, Eymann J, Kretschmer T, Berner R, Lee-Kirsch MA, Hebestreit H. [Uncovering rare diseases in medical data-coding]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2022; 65:1133-1142. [PMID: 36239768 PMCID: PMC9636302 DOI: 10.1007/s00103-022-03598-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 09/19/2022] [Indexed: 12/03/2022]
Abstract
The ICD-10-GM coding system used in the German healthcare system only captures a minority of rare disease diagnoses. Therefore, information on the incidence and prevalence of rare diseases as well as necessary (financial) resources for the expert care required for evidence-based decisions by health insurers, care providers, and politicians are lacking. Furthermore, the missing information complicates and sometimes even precludes the generation of scientific knowledge on rare diseases. Therefore, starting in 2023, all in-patient cases in Germany with a rare disease diagnosis must be coded by an ORPHAcode using the Alpha-ID-SE file.The file Alpha-ID-SE links the ICD-10-GM codes to the internationally established ORPHAcodes for rare diseases. Commercially available software tools progressively support the coding of rare diseases. In several centers for rare diseases linked to university hospitals, IT tools and procedures were established to realize a complete coding of rare diseases. These include financial incentives for the institutions providing rare disease codes, systematic queries asking for rare disease codes during the coding process, and a semi-automated coding process for all patients with a rare disease previously seen at the institution. A combination of the different approaches probably results in the most complete coding.To get the complete picture of rare disease epidemiology and care requirements, a specific and unique coding of out-patient cases is also desirable. Furthermore, a structured reporting of phenotype is required, especially for complex rare diseases and for yet undiagnosed cases.
Collapse
Affiliation(s)
- Tamara Martin
- Zentrum für Seltene Erkrankungen und Institut für Medizinische Genetik und Angewandte Genomik, Universität und Universitätsklinikum Tübingen, Tübingen, Deutschland
| | - Kathrin Rommel
- Abteilung K - Kodiersysteme, Fachgebiet - K4 Orphanet Deutschland, Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM), Bonn, Deutschland
| | - Carina Thomas
- Abteilung K - Kodiersysteme, Fachgebiet - K4 Orphanet Deutschland, Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM), Bonn, Deutschland
| | - Jutta Eymann
- Zentrum für Seltene Erkrankungen, Universität und Universitätsklinikum Tübingen, Tübingen, Deutschland
| | - Tanita Kretschmer
- Klinik und Poliklinik für Kinder- und Jugendmedizin und UniversitätsCentrum für Seltene Erkrankungen (USE), Universitätsklinikum Carl Gustav Carus der TU Dresden, Dresden, Deutschland
| | - Reinhard Berner
- Klinik und Poliklinik für Kinder- und Jugendmedizin und UniversitätsCentrum für Seltene Erkrankungen (USE), Universitätsklinikum Carl Gustav Carus der TU Dresden, Dresden, Deutschland
| | - Min Ae Lee-Kirsch
- Klinik und Poliklinik für Kinder- und Jugendmedizin und UniversitätsCentrum für Seltene Erkrankungen (USE), Universitätsklinikum Carl Gustav Carus der TU Dresden, Dresden, Deutschland
| | - Helge Hebestreit
- Zentrum für Seltene Erkrankungen - Referenzzentrum Nordbayern, Universitätsklinikum Würzburg, Josef-Schneider-Str. 2, 97080, Würzburg, Deutschland.
| |
Collapse
|
21
|
Zhu Q, Qu C, Liu R, Vatas G, Clough A, Nguyễn ÐT, Sid E, Mathé E, Xu Y. Rare disease-based scientific annotation knowledge graph. Front Artif Intell 2022; 5:932665. [PMID: 36034595 PMCID: PMC9403737 DOI: 10.3389/frai.2022.932665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 07/15/2022] [Indexed: 11/13/2022] Open
Abstract
Rare diseases (RDs) are naturally associated with a low prevalence rate, which raises a big challenge due to there being less data available for supporting preclinical and clinical studies. There has been a vast improvement in our understanding of RD, largely owing to advanced big data analytic approaches in genetics/genomics. Consequently, a large volume of RD-related publications has been accumulated in recent years, which offers opportunities to utilize these publications for accessing the full spectrum of the scientific research and supporting further investigation in RD. In this study, we systematically analyzed, semantically annotated, and scientifically categorized RD-related PubMed articles, and integrated those semantic annotations in a knowledge graph (KG), which is hosted in Neo4j based on a predefined data model. With the successful demonstration of scientific contribution in RD via the case studies performed by exploring this KG, we propose to extend the current effort by expanding more RD-related publications and more other types of resources as a next step.
Collapse
Affiliation(s)
- Qian Zhu
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences, Rockville, MD, United States
- *Correspondence: Qian Zhu
| | - Chunxu Qu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences, Bethesda, MD, United States
| | - Ruizheng Liu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences, Bethesda, MD, United States
| | | | | | | | - Eric Sid
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences, Bethesda, MD, United States
| | - Ewy Mathé
- Division of Pre-clinical Innovation, National Center for Advancing Translational Sciences, Rockville, MD, United States
| | - Yanji Xu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences, Bethesda, MD, United States
| |
Collapse
|
22
|
Gao Z, Ding P, Xu R. KG-Predict: A knowledge graph computational framework for drug repurposing. J Biomed Inform 2022; 132:104133. [PMID: 35840060 PMCID: PMC9595135 DOI: 10.1016/j.jbi.2022.104133] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/18/2022] [Accepted: 07/03/2022] [Indexed: 11/26/2022]
Abstract
The emergence of large-scale phenotypic, genetic, and other multi-model biochemical data has offered unprecedented opportunities for drug discovery including drug repurposing. Various knowledge graph-based methods have been developed to integrate and analyze complex and heterogeneous data sources to find new therapeutic applications for existing drugs. However, existing methods have limitations in modeling and capturing context-sensitive inter-relationships among tens of thousands of biomedical entities. In this paper, we developed KG-Predict: a knowledge graph computational framework for drug repurposing. We first integrated multiple types of entities and relations from various genotypic and phenotypic databases to construct a knowledge graph termed GP-KG. GP-KG was composed of 1,246,726 associations between 61,146 entities. KG-Predict then aggregated the heterogeneous topological and semantic information from GP-KG to learn low-dimensional representations of entities and relations, and further utilized these representations to infer new drug-disease interactions. In cross-validation experiments, KG-Predict achieved high performances [AUROC (the area under receiver operating characteristic) = 0.981, AUPR (the area under precision-recall) = 0.409 and MRR (the mean reciprocal rank) = 0.261], outperforming other state-of-art graph embedding methods. We applied KG-Predict in identifying novel repositioned candidate drugs for Alzheimer's disease (AD) and showed that KG-Predict prioritized both FDA-approved and active clinical trial anti-AD drugs among the top (AUROC = 0.868 and AUPR = 0.364).
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| | - Pingjian Ding
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| |
Collapse
|
23
|
Reese JT, Blau H, Bergquist T, Loomba JJ, Callahan T, Laraway B, Antonescu C, Casiraghi E, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.05.24.22275398. [PMID: 35665012 PMCID: PMC9164456 DOI: 10.1101/2022.05.24.22275398] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.
Collapse
|
24
|
Chanda AK, Bai T, Yang Z, Vucetic S. Improving medical term embeddings using UMLS Metathesaurus. BMC Med Inform Decis Mak 2022; 22:114. [PMID: 35488252 PMCID: PMC9052653 DOI: 10.1186/s12911-022-01850-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 03/29/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small. METHODS In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus. RESULTS To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications. CONCLUSION This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.
Collapse
Affiliation(s)
- Ashis Kumar Chanda
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Tian Bai
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Ziyu Yang
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Slobodan Vucetic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
25
|
Wonkam A, Adadey SM, Schrauwen I, Aboagye ET, Wonkam-Tingang E, Esoh K, Popel K, Manyisa N, Jonas M, deKock C, Nembaware V, Cornejo Sanchez DM, Bharadwaj T, Nasir A, Everard JL, Kadlubowska MK, Nouel-Saied LM, Acharya A, Quaye O, Amedofu GK, Awandare GA, Leal SM. Exome sequencing of families from Ghana reveals known and candidate hearing impairment genes. Commun Biol 2022; 5:369. [PMID: 35440622 PMCID: PMC9019055 DOI: 10.1038/s42003-022-03326-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 03/25/2022] [Indexed: 12/15/2022] Open
Abstract
We investigated hearing impairment (HI) in 51 families from Ghana with at least two affected members that were negative for GJB2 pathogenic variants. DNA samples from 184 family members underwent whole-exome sequencing (WES). Variants were found in 14 known non-syndromic HI (NSHI) genes [26/51 (51.0%) families], five genes that can underlie either syndromic HI or NSHI [13/51 (25.5%)], and one syndromic HI gene [1/51 (2.0%)]. Variants in CDH23 and MYO15A contributed the most to HI [31.4% (16/51 families)]. For DSPP, an autosomal recessive mode of inheritance was detected. Post-lingual expression was observed for a family segregating a MARVELD2 variant. To our knowledge, seven novel candidate HI genes were identified (13.7%), with six associated with NSHI (INPP4B, CCDC141, MYO19, DNAH11, POTEI, and SOX9); and one (PAX8) with Waardenburg syndrome. MYO19 and DNAH11 were replicated in unrelated Ghanaian probands. Six of the novel genes were expressed in mouse inner ear. It is known that Pax8-/- mice do not respond to sound, and depletion of Sox9 resulted in defective vestibular structures and abnormal utricle development. Most variants (48/60; 80.0%) have not previously been associated with HI. Identifying seven candidate genes in this study emphasizes the potential of novel HI genes discovery in Africa.
Collapse
Affiliation(s)
- Ambroise Wonkam
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa.
- McKusick-Nathans Institute and Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
| | - Samuel Mawuli Adadey
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, LG 54, Ghana
| | - Isabelle Schrauwen
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Elvis Twumasi Aboagye
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, LG 54, Ghana
| | - Edmond Wonkam-Tingang
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Kevin Esoh
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Kalinka Popel
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Noluthando Manyisa
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Mario Jonas
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Carmen deKock
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Victoria Nembaware
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Diana M Cornejo Sanchez
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Thashi Bharadwaj
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Abdul Nasir
- Department of Molecular Science and Technology, Ajou University, Suwon-si, Republic of Korea
| | - Jenna L Everard
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Magda K Kadlubowska
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Liz M Nouel-Saied
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Anushree Acharya
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA
| | - Osbourne Quaye
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, LG 54, Ghana
| | - Geoffrey K Amedofu
- Department of Eye, Ear, Nose, and Throat, School of Medical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | - Gordon A Awandare
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, LG 54, Ghana
| | - Suzanne M Leal
- Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Centre, New York, NY, 10032, USA.
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Centre, New York, NY, 10032, USA.
| |
Collapse
|
26
|
Slavotinek A, Prasad H, Yip T, Rego S, Hoban H, Kvale M. Predicting genes from phenotypes using human phenotype ontology (HPO) terms. Hum Genet 2022; 141:1749-1760. [PMID: 35357580 DOI: 10.1007/s00439-022-02449-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/16/2022] [Indexed: 11/28/2022]
Abstract
The interpretation of genomic variants following whole exome sequencing (WES) can be aided using human phenotype ontology (HPO) terms to standardize clinical features and predict causative genes. We performed WES on 453 patients diagnosed prior to 18 years of age and identified 114 pathogenic (P) or likely pathogenic (LP) variants in 112 patients. We utilized PhenoDB to extract HPO terms from provider notes and then used Phen2Gene to generate a gene score and gene ranking from each list of HPO terms. We assigned Phen2Gene gene rankings to 6 rank classes, with class 1 covering raw gene rankings of 1 to 10 and class 2 covering rankings from 11 to 50 out of a total of 17,126 possible gene rankings. Phen2Gene ranked causative genes into rank class 1 or 2 in 27.7% of cases and the genes in rank class 1 were all associated with well-characterized phenotypes. We found significant associations between the gene score and the number of years, since the gene was first published, the number of HPO terms with an hierarchical depth greater or equal to 11, and the number of Online Mendelian Inheritance in Man terms associated with the phenotype and gene. We conclude that genes associated with recognizable phenotypes and terms deep in the HPO hierarchy have the best chance of producing a high gene score and ranking in class 1 to 2 using Phen2Gene software with HPO terms. Clinicians and laboratory staff should consider these results when HPO terms are employed to prioritize candidate genes.
Collapse
Affiliation(s)
- Anne Slavotinek
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| | - Hannah Prasad
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Tiffany Yip
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Shannon Rego
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Hannah Hoban
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Mark Kvale
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
27
|
Fisher ME, Segerdell E, Matentzoglu N, Nenni MJ, Fortriede JD, Chu S, Pells TJ, Osumi-Sutherland D, Chaturvedi P, James-Zorn C, Sundararaj N, Lotay VS, Ponferrada V, Wang DZ, Kim E, Agalakov S, Arshinoff BI, Karimi K, Vize PD, Zorn AM. The Xenopus phenotype ontology: bridging model organism phenotype data to human health and development. BMC Bioinformatics 2022; 23:99. [PMID: 35317743 PMCID: PMC8939077 DOI: 10.1186/s12859-022-04636-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results Here we present the Xenopus phenotype ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype–phenotype data that can be directly related to other uPheno compliant resources. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04636-8.
Collapse
Affiliation(s)
- Malcolm E Fisher
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Erik Segerdell
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nicolas Matentzoglu
- Monarch Initiative, London, UK.,Semanticly Ltd, London, UK.,European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Mardi J Nenni
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Joshua D Fortriede
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Stanley Chu
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Troy J Pells
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | | | - Praneet Chaturvedi
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Christina James-Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Nivitha Sundararaj
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Vaneet S Lotay
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Virgilio Ponferrada
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Dong Zhuo Wang
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Eugene Kim
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Sergei Agalakov
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Bradley I Arshinoff
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Kamran Karimi
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Peter D Vize
- Department of Biological Science, University of Calgary, Calgary, AB, Canada
| | - Aaron M Zorn
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
28
|
Zhang N, Zang T. A multi-network integration approach for measuring disease similarity based on ncRNA regulation and heterogeneous information. BMC Bioinformatics 2022; 23:89. [PMID: 35255810 PMCID: PMC8902705 DOI: 10.1186/s12859-022-04613-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 02/14/2022] [Indexed: 11/28/2022] Open
Abstract
Background Measuring similarity between complex diseases has significant implications for revealing the pathogenesis of diseases and development in the domain of biomedicine. It has been consentaneous that functional associations between disease-related genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated the profound involvement of non-coding RNA in the regulation of genome organization and gene expression. Thus, taking ncRNA into account can be useful in measuring disease similarities. However, existing methods ignore the regulation functions of ncRNA in biological process. In this study, we proposed a novel deep-learning method to deduce disease similarity. Results In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilize three different disease-related information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic Auto Encoder (AE) and improved AE model is proposed to extract constraints and low-dimensional feature representations. We finally obtain an accurate and low-dimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity. Conclusion ImpAESim focuses on extracting a low-dimensional vector representation of features based on ncRNA regulation, and gene–gene interaction network. Our method can significantly reduce the calculation bias resulted from the sparse disease associations which are derived from semantic associations.
Collapse
Affiliation(s)
- Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
29
|
Abstract
Internet-connected devices, including personal computers, smartphones, smartwatches, and voice assistants, have evolved into powerful multisensor technologies that billions of people interact with daily to connect with friends and colleagues, access and share information, purchase goods, play games, and navigate their environment. Digital phenotyping taps into the data streams captured by these devices to characterize and understand health and disease. The purpose of this article is to summarize opportunities for digital phenotyping in neurology, review studies using everyday technologies to obtain motor and cognitive information, and provide a perspective on how neurologists can embrace and accelerate progress in this emerging field.
Collapse
Affiliation(s)
- Anoopum S. Gupta
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
30
|
Reyes-Peña C, Tovar M, Bravo M, Motz R. An ontology network for Diabetes Mellitus in Mexico. J Biomed Semantics 2021; 12:19. [PMID: 34625104 PMCID: PMC8500829 DOI: 10.1186/s13326-021-00252-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 09/14/2021] [Indexed: 12/04/2022] Open
Abstract
Background Medical experts in the domain of Diabetes Mellitus (DM) acquire specific knowledge from diabetic patients through monitoring and interaction. This allows them to know the disease and information about other conditions or comorbidities, treatments, and typical consequences of the Mexican population. This indicates that an expert in a domain knows technical information about the domain and contextual factors that interact with it in the real world, contributing to new knowledge generation. For capturing and managing information about the DM, it is necessary to design and implement techniques and methods that allow: determining the most relevant conceptual dimensions and their correct organization, the integration of existing medical and clinical information from different resources, and the generation of structures that represent the deduction process of the doctor. An Ontology Network is a collection of ontologies of diverse knowledge domains which can be interconnected by meta-relations. This article describes an Ontology Network for representing DM in Mexico, designed by a proposed methodology. The information used for Ontology Network building include the ontological resource reuse and non-ontological resource transformation for ontology design and ontology extending by natural language processing techniques. These are medical information extracted from vocabularies, taxonomies, medical dictionaries, ontologies, among others. Additionally, a set of semantic rules has been defined within the Ontology Network to derive new knowledge. Results An Ontology Network for DM in Mexico has been built from six well-defined domains, resulting in new classes, using ontological and non-ontological resources to offer a semantic structure for assisting in the medical diagnosis process. The network comprises 1367 classes, 20 object properties, 63 data properties, and 4268 individuals from seven different ontologies. Ontology Network evaluation was carried out by verifying the purpose for its design and some quality criteria. Conclusions The composition of the Ontology Network offers a set of well-defined ontological modules facilitating the reuse of one or more of them. The inclusion of international vocabularies as SNOMED CT or ICD-10 reinforces the representation by international standards. It increases the semantic interoperability of the network, providing the opportunity to integrate other ontologies with the same vocabularies. The ontology network design methodology offers a guide for ontology developers about how to use ontological and non-ontological resources in order to exploit the maximum of information and knowledge from a set of domains that share or not information.
Collapse
Affiliation(s)
- Cecilia Reyes-Peña
- Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, Av. San Claudio, Puebla, Mexico.
| | - Mireya Tovar
- Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, Av. San Claudio, Puebla, Mexico
| | - Maricela Bravo
- Universidad Autonoma Metropolitana, Av. San Pablo No. 180, Mexico City, Mexico
| | - Regina Motz
- Universidad de la Republica, Julio Herrera y Reissig 565, Montevideo, Uruguay
| |
Collapse
|
31
|
Noh KW, Buettner R, Klein S. Shifting Gears in Precision Oncology-Challenges and Opportunities of Integrative Data Analysis. Biomolecules 2021; 11:biom11091310. [PMID: 34572523 PMCID: PMC8465238 DOI: 10.3390/biom11091310] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/26/2021] [Accepted: 09/01/2021] [Indexed: 02/07/2023] Open
Abstract
For decades, research relating to modification of host immunity towards antitumor response activation has been ongoing, with the breakthrough discovery of immune-checkpoint blockers. Several biomarkers with potential predictive value have been reported in recent studies for these novel therapies. However, with the plethora of therapeutic options existing for a given cancer entity, modern oncology is now being confronted with multifactorial interpretation to devise “the best therapy” for the individual patient. Into the bargain come the multiverse guidelines for established and emerging diagnostic biomarkers, as well as the complex interplay between cancer cells and tumor microenvironment, provoking immense challenges in the therapy decision-making process. Through this review, we present various molecular diagnostic modalities and techniques, such as genomics, immunohistochemistry and quantitative image analysis, which have the potential of becoming powerful tools in the development of an optimal treatment regime when analogized with patient characteristics. We will summarize the underlying complexities of these methods and shed light upon the necessary considerations and requirements for data integration. It is our hope to provide compelling evidence to emphasize on the need for inclusion of integrative data analysis in modern cancer therapy, and thereupon paving a path towards precision medicine and better patient outcomes.
Collapse
Affiliation(s)
- Ka-Won Noh
- Institute for Pathology, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany; (K.-W.N.); (R.B.)
| | - Reinhard Buettner
- Institute for Pathology, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50937 Cologne, Germany; (K.-W.N.); (R.B.)
| | - Sebastian Klein
- Gerhard-Domagk-Institute of Pathology, University Hospital Münster, 48149 Münster, Germany
- Correspondence: ; Tel.: +49-251-83-57670
| |
Collapse
|
32
|
Abstract
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
| |
Collapse
|
33
|
Yang J, Dong C, Duan H, Shu Q, Li H. RDmap: a map for exploring rare diseases. Orphanet J Rare Dis 2021; 16:101. [PMID: 33632281 PMCID: PMC7905868 DOI: 10.1186/s13023-021-01741-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 02/11/2021] [Indexed: 02/01/2023] Open
Abstract
Background The complexity of the phenotypic characteristics and molecular bases of many rare human genetic diseases makes the diagnosis of such diseases a challenge for clinicians. A map for visualizing, locating and navigating rare diseases based on similarity will help clinicians and researchers understand and easily explore these diseases. Methods A distance matrix of rare diseases included in Orphanet was measured by calculating the quantitative distance among phenotypes and pathogenic genes based on Human Phenotype Ontology (HPO) and Gene Ontology (GO), and each disease was mapped into Euclidean space. A rare disease map, enhanced by clustering classes and disease information, was developed based on ECharts. Results A rare disease map called RDmap was published at http://rdmap.nbscn.org. Total 3287 rare diseases are included in the phenotype-based map, and 3789 rare genetic diseases are included in the gene-based map; 1718 overlapping diseases are connected between two maps. RDmap works similarly to the widely used Google Map service and supports zooming and panning. The phenotype similarity base disease location function performed better than traditional keyword searches in an in silico evaluation, and 20 published cases of rare diseases also demonstrated that RDmap can assist clinicians in seeking the rare disease diagnosis. Conclusion RDmap is the first user-interactive map-style rare disease knowledgebase. It will help clinicians and researchers explore the increasingly complicated realm of rare genetic diseases.
Collapse
Affiliation(s)
- Jian Yang
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Binsheng Road 3333#, Hangzhou, Zhejiang, 310052, China.,The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Cong Dong
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Binsheng Road 3333#, Hangzhou, Zhejiang, 310052, China.,The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Huilong Duan
- The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Qiang Shu
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Binsheng Road 3333#, Hangzhou, Zhejiang, 310052, China
| | - Haomin Li
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Binsheng Road 3333#, Hangzhou, Zhejiang, 310052, China.
| |
Collapse
|
34
|
Masuya H, Usuda D, Nakata H, Yuhara N, Kurihara K, Namiki Y, Iwase S, Takada T, Tanaka N, Suzuki K, Yamagata Y, Kobayashi N, Yoshiki A, Kushida T. Establishment and application of information resource of mutant mice in RIKEN BioResource Research Center. Lab Anim Res 2021; 37:6. [PMID: 33455583 PMCID: PMC7811887 DOI: 10.1186/s42826-020-00068-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/21/2020] [Indexed: 12/12/2022] Open
Abstract
Online databases are crucial infrastructures to facilitate the wide effective and efficient use of mouse mutant resources in life sciences. The number and types of mouse resources have been rapidly growing due to the development of genetic modification technology with associated information of genomic sequence and phenotypes. Therefore, data integration technologies to improve the findability, accessibility, interoperability, and reusability of mouse strain data becomes essential for mouse strain repositories. In 2020, the RIKEN BioResource Research Center released an integrated database of bioresources including, experimental mouse strains, Arabidopsis thaliana as a laboratory plant, cell lines, microorganisms, and genetic materials using Resource Description Framework-related technologies. The integrated database shows multiple advanced features for the dissemination of bioresource information. The current version of our online catalog of mouse strains which functions as a part of the integrated database of bioresources is available from search bars on the page of the Center (https://brc.riken.jp) and the Experimental Animal Division (https://mus.brc.riken.jp/) websites. The BioResource Research Center also released a genomic variation database of mouse strains established in Japan and Western Europe, MoG+ (https://molossinus.brc.riken.jp/mogplus/), and a database for phenotype-phenotype associations across the mouse phenome using data from the International Mouse Phenotyping Platform. In this review, we describe features of current version of databases related to mouse strain resources in RIKEN BioResource Research Center and discuss future views.
Collapse
Affiliation(s)
- Hiroshi Masuya
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.
| | - Daiki Usuda
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Hatsumi Nakata
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Naomi Yuhara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Keiko Kurihara
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuri Namiki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Shigeru Iwase
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Toyoyuki Takada
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Nobuhiko Tanaka
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Kenta Suzuki
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Yuki Yamagata
- Laboratory for Developmental Dynamics, Center for Biosystems Dynamics Research, RIKEN, Kobe, Japan
| | - Norio Kobayashi
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan.,Data Knowledge Organization Unit, Head Office for Information Systems and Cybersecurity, RIKEN, Wako, Japan
| | - Atsushi Yoshiki
- Experimental Animal Division, BioResource Research Center, RIKEN, Tsukuba, Japan
| | - Tatsuya Kushida
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| |
Collapse
|
35
|
Sousa D, Lamurias A, Couto FM. Using Neural Networks for Relation Extraction from Biomedical Literature. Methods Mol Biol 2021; 2190:289-305. [PMID: 32804372 DOI: 10.1007/978-1-0716-0826-5_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.
Collapse
Affiliation(s)
- Diana Sousa
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal.
| | - Andre Lamurias
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| | - Francisco M Couto
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| |
Collapse
|
36
|
van der Velde KJ, van den Hoek S, van Dijk F, Hendriksen D, van Diemen CC, Johansson LF, Abbott KM, Deelen P, Sikkema‐Raddatz B, Swertz MA. A pipeline-friendly software tool for genome diagnostics to prioritize genes by matching patient symptoms to literature. ADVANCED GENETICS (HOBOKEN, N.J.) 2020; 1:e10023. [PMID: 36619248 PMCID: PMC9744518 DOI: 10.1002/ggn2.10023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 04/11/2023]
Abstract
Despite an explosive growth of next-generation sequencing data, genome diagnostics only provides a molecular diagnosis to a minority of patients. Software tools that prioritize genes based on patient symptoms using known gene-disease associations may complement variant filtering and interpretation to increase chances of success. However, many of these tools cannot be used in practice because they are embedded within variant prioritization algorithms, or exist as remote services that cannot be relied upon or are unacceptable because of legal/ethical barriers. In addition, many tools are not designed for command-line usage, closed-source, abandoned, or unavailable. We present Variant Interpretation using Biomedical literature Evidence (VIBE), a tool to prioritize disease genes based on Human Phenotype Ontology codes. VIBE is a locally installed executable that ensures operational availability and is built upon DisGeNET-RDF, a comprehensive knowledge platform containing gene-disease associations mostly from literature and variant-disease associations mostly from curated source databases. VIBE's command-line interface and output are designed for easy incorporation into bioinformatic pipelines that annotate and prioritize variants for further clinical interpretation. We evaluate VIBE in a benchmark based on 305 patient cases alongside seven other tools. Our results demonstrate that VIBE offers consistent performance with few cases missed, but we also find high complementarity among all tested tools. VIBE is a powerful, free, open source and locally installable solution for prioritizing genes based on patient symptoms. Project source code, documentation, benchmark and executables are available at https://github.com/molgenis/vibe.
Collapse
Affiliation(s)
- K. Joeri van der Velde
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Sander van den Hoek
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Freerk van Dijk
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Prinses Maxima Center for Child OncologyUtrechtThe Netherlands
| | - Dennis Hendriksen
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Cleo C. van Diemen
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Lennart F. Johansson
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Kristin M. Abbott
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Patrick Deelen
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Birgit Sikkema‐Raddatz
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Morris A. Swertz
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| |
Collapse
|
37
|
Ong E, Wang LL, Schaub J, O'Toole JF, Steck B, Rosenberg AZ, Dowd F, Hansen J, Barisoni L, Jain S, de Boer IH, Valerius MT, Waikar SS, Park C, Crawford DC, Alexandrov T, Anderton CR, Stoeckert C, Weng C, Diehl AD, Mungall CJ, Haendel M, Robinson PN, Himmelfarb J, Iyengar R, Kretzler M, Mooney S, He Y. Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project. Nat Rev Nephrol 2020; 16:686-696. [PMID: 32939051 PMCID: PMC8012202 DOI: 10.1038/s41581-020-00335-w] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/24/2020] [Indexed: 12/29/2022]
Abstract
An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National collaborative efforts such as the Kidney Precision Medicine Project are working towards this goal through the collection and integration of large, disparate clinical, biological and imaging data from patients with kidney disease. Ontologies are powerful tools that facilitate these efforts by enabling researchers to organize and make sense of different data elements and the relationships between them. Ontologies are critical to support the types of big data analysis necessary for kidney precision medicine, where heterogeneous clinical, imaging and biopsy data from diverse sources must be combined to define a patient's phenotype. The development of two new ontologies - the Kidney Tissue Atlas Ontology and the Ontology of Precision Medicine and Investigation - will support the creation of the Kidney Tissue Atlas, which aims to provide a comprehensive molecular, cellular and anatomical map of the kidney. These ontologies will improve the annotation of kidney-relevant data, and eventually lead to new definitions of kidney disease in support of precision medicine.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Lucy L Wang
- Allen Institute for Artificial Intelligence, Seattle, WA, USA
| | - Jennifer Schaub
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - John F O'Toole
- Department of Nephrology and Hypertension, Glickman Urological and Kidney Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Becky Steck
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Avi Z Rosenberg
- Department of Pathology, Johns Hopkins University, Baltimore, MD, USA
| | - Frederick Dowd
- UW Medicine Research IT, University of Washington, Seattle, WA, USA
| | - Jens Hansen
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura Barisoni
- Division of AI/Computational Pathology, Department of Pathology, and Division of Nephrology, Department of Medicine, Duke University, Durham, NC, USA
| | - Sanjay Jain
- Division of Nephrology, School of Medicine, Washington University in St. Louis, St Louis, MO, USA
| | - Ian H de Boer
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - M Todd Valerius
- Division of Renal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Sushrut S Waikar
- Section of Nephrology, Boston University Medical Center, Boston, MA, USA
| | - Christopher Park
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Cleveland Institute for Computational Biology, Cleveland, OH, USA
| | - Theodore Alexandrov
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Christian Stoeckert
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania Philadelphia, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Melissa Haendel
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jonathan Himmelfarb
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Ravi Iyengar
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Sean Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
38
|
Zhu Q, Nguyen DT, Alyea G, Hanson K, Sid E, Pariser A. Phenotypically Similar Rare Disease Identification from an Integrative Knowledge Graph for Data Harmonization: Preliminary Study. JMIR Med Inform 2020; 8:e18395. [PMID: 33006565 PMCID: PMC7568218 DOI: 10.2196/18395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 08/02/2020] [Accepted: 08/19/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. OBJECTIVE To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. METHODS We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. RESULTS For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. CONCLUSIONS We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.
Collapse
Affiliation(s)
- Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| | - Dac-Trung Nguyen
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| | | | - Karen Hanson
- ICF International Inc, Rockville, MD, United States
| | - Eric Sid
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Anne Pariser
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD, United States
| |
Collapse
|
39
|
Systematic identification of genetic systems associated with phenotypes in patients with rare genomic copy number variations. Hum Genet 2020; 140:457-475. [PMID: 32778951 DOI: 10.1007/s00439-020-02214-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 07/30/2020] [Indexed: 01/02/2023]
Abstract
Copy number variation (CNV) related disorders tend to show complex phenotypic profiles that do not match known diseases. This makes it difficult to ascertain their underlying molecular basis. A potential solution is to compare the affected genomic regions for multiple patients that share a pathological phenotype, looking for commonalities. Here, we present a novel approach to associate phenotypes with functional systems, in terms of GO categories and KEGG and Reactome pathways, based on patient data. The approach uses genomic and phenomic data from the same patients, finding shared genomic regions between patients with similar phenotypes. These regions are mapped to genes to find associated functional systems. We applied the approach to analyse patients in the DECIPHER database with de novo CNVs, finding functional systems associated with most phenotypes, often due to mutations affecting related genes in the same genomic region. Manual inspection of the ten top-scoring phenotypes found multiple FunSys connections supported by the previous studies for seven of them. The workflow also produces reports focussed on the genes and FunSys connected to the different phenotypes, alongside patient-specific reports, which give details of the associated genes and FunSys for each individual in the cohort. These can be run in "confidential" mode, preserving patient confidentiality. The workflow presented here can be used to associate phenotypes with functional systems using data at the level of a whole cohort of patients, identifying important connections that could not be found when considering them individually. The full workflow is available for download, enabling it to be run on any patient cohort for which phenotypic and CNV data are available.
Collapse
|
40
|
Li X, Lin X, Ren H, Guo J. Ontological Organization and Bioinformatic Analysis of Adverse Drug Reactions From Package Inserts: Development and Usability Study. J Med Internet Res 2020; 22:e20443. [PMID: 32706718 PMCID: PMC7400033 DOI: 10.2196/20443] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 06/11/2020] [Accepted: 06/14/2020] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Licensed drugs may cause unexpected adverse reactions in patients, resulting in morbidity, risk of mortality, therapy disruptions, and prolonged hospital stays. Officially approved drug package inserts list the adverse reactions identified from randomized controlled clinical trials with high evidence levels and worldwide postmarketing surveillance. Formal representation of the adverse drug reaction (ADR) enclosed in semistructured package inserts will enable deep recognition of side effects and rational drug use, substantially reduce morbidity, and decrease societal costs. OBJECTIVE This paper aims to present an ontological organization of traceable ADR information extracted from licensed package inserts. In addition, it will provide machine-understandable knowledge for bioinformatics analysis, semantic retrieval, and intelligent clinical applications. METHODS Based on the essential content of package inserts, a generic ADR ontology model is proposed from two dimensions (and nine subdimensions), covering the ADR information and medication instructions. This is followed by a customized natural language processing method programmed with Python to retrieve the relevant information enclosed in package inserts. After the biocuration and identification of retrieved data from the package insert, an ADR ontology is automatically built for further bioinformatic analysis. RESULTS We collected 165 package inserts of quinolone drugs from the National Medical Products Administration and other drug databases in China, and built a specialized ADR ontology containing 2879 classes and 15,711 semantic relations. For each quinolone drug, the reported ADR information and medication instructions have been logically represented and formally organized in an ADR ontology. To demonstrate its usage, the source data were further bioinformatically analyzed. For example, the number of drug-ADR triples and major ADRs associated with each active ingredient were recorded. The 10 ADRs most frequently observed among quinolones were identified and categorized based on the 18 categories defined in the proposal. The occurrence frequency, severity, and ADR mitigation method explicitly stated in package inserts were also analyzed, as well as the top 5 specific populations with contraindications for quinolone drugs. CONCLUSIONS Ontological representation and organization using officially approved information from drug package inserts enables the identification and bioinformatic analysis of adverse reactions caused by a specific drug with regard to predefined ADR ontology classes and semantic relations. The resulting ontology-based ADR knowledge source classifies drug-specific adverse reactions, and supports a better understanding of ADRs and safer prescription of medications.
Collapse
Affiliation(s)
- Xiaoying Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Xin Lin
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Huiling Ren
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| | - Jinjing Guo
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
41
|
Gómez-López G, Dopazo J, Cigudosa JC, Valencia A, Al-Shahrour F. Precision medicine needs pioneering clinical bioinformaticians. Brief Bioinform 2020; 20:752-766. [PMID: 29077790 DOI: 10.1093/bib/bbx144] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 09/14/2017] [Indexed: 01/18/2023] Open
Abstract
Success in precision medicine depends on accessing high-quality genetic and molecular data from large, well-annotated patient cohorts that couple biological samples to comprehensive clinical data, which in conjunction can lead to effective therapies. From such a scenario emerges the need for a new professional profile, an expert bioinformatician with training in clinical areas who can make sense of multi-omics data to improve therapeutic interventions in patients, and the design of optimized basket trials. In this review, we first describe the main policies and international initiatives that focus on precision medicine. Secondly, we review the currently ongoing clinical trials in precision medicine, introducing the concept of 'precision bioinformatics', and we describe current pioneering bioinformatics efforts aimed at implementing tools and computational infrastructures for precision medicine in health institutions around the world. Thirdly, we discuss the challenges related to the clinical training of bioinformaticians, and the urgent need for computational specialists capable of assimilating medical terminologies and protocols to address real clinical questions. We also propose some skills required to carry out common tasks in clinical bioinformatics and some tips for emergent groups. Finally, we explore the future perspectives and the challenges faced by precision medicine bioinformatics.
Collapse
Affiliation(s)
| | - Joaquín Dopazo
- Clinical Bioinformatics Area of the Fundacio´n Progreso y Salud (Seville)
| | | | | | | |
Collapse
|
42
|
Study of the Gastrointestinal Heat Retention Syndrome in Children: From Diagnostic Model to Biological Basis. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2020; 2019:5303869. [PMID: 31929814 PMCID: PMC6942808 DOI: 10.1155/2019/5303869] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 10/10/2019] [Accepted: 11/15/2019] [Indexed: 12/23/2022]
Abstract
Gastrointestinal heat retention syndrome (GHRS) refers to a condition that is associated with increased gastrointestinal heat caused by a metabolic block in energy. It is common in children and is closely related to the occurrence and development of recurrent respiratory tract infection, pneumonia, recurrent functional abdominal pain, etc. However, there are no standardized diagnostic criteria to differentiate the GHRS. Therefore, this study is aimed to establish a diagnostic model for children's GHRS and explore the possible biological basis by using systems biology to achieve. Furthermore, Delphi method and the clinical data of Lasso analysis were used to screen out the core symptoms. Nineteen core symptoms of GHRS in children were screened including digestive symptoms such as dry stool, poor appetite, vomiting, and some nervous system symptoms such as night restlessness and irritability. Based on the core symptoms, a GHRS diagnosis model was established using the eXtreme Gradient Boosting (XGBoost) method, and the accuracy of internal verification reached 93.03%. Relevant targets of the core symptoms in the Human Phenotype Ontology (HPO) were retrieved, and target interactions were linked through the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, and core targets were selected after topological analysis using Cytoscape. Relevant biological processes and pathways were analyzed by applying the DAVID and KEGG databases. The enriched biological processes focused on the cell proliferation, differentiation, apoptosis, and mitochondrial metabolism, which were mainly associated with PI3K-AKT, MAPK network pathways, and the Wnt signaling pathway. In conclusion, we established a diagnosis model of GHRS in children based on the core symptoms and provided an objective standard for its clinical diagnosis. And, the Wnt signaling pathway and the estrogen receptor-activated PI3K-AKT and MAPK network pathways may play important roles in the GHRS processing.
Collapse
|
43
|
|
44
|
Data-driven method to enhance craniofacial and oral phenotype vocabularies. J Am Dent Assoc 2019; 150:933-939.e2. [PMID: 31668172 DOI: 10.1016/j.adaj.2019.05.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/29/2019] [Accepted: 05/31/2019] [Indexed: 01/29/2023]
Abstract
BACKGROUND A significant amount of clinical information captured as free-text narratives could be better used for several applications, such as clinical decision support, ontology development, evidence-based practice, and research. The Human Phenotype Ontology (HPO) is specifically used for semantic comparisons for diagnostic purposes. All these functions require quality coverage of the domain of interest. The authors used natural language processing to capture craniofacial and oral phenotype signatures from electronic health records and then used these signatures for evaluation of existing oral phenotype ontology coverage. METHODS The authors applied a text-processing pipeline based on the clinical Text Analysis and Knowledge Extraction System to annotate the clinical notes with Unified Medical Language System codes. The authors extracted the disease or disorder phenotype terms, which were then compared with HPO terms and their synonyms. RESULTS The authors retrieved 2,153 deidentified clinical notes from 558 patients. Finally, 2,416 unique diseases or disorders phenotype terms were extracted, which included 210 craniofacial or oral phenotype terms. Twenty-six of these phenotypes were not found in the HPO. CONCLUSIONS The authors demonstrated that natural language processing tools could extract relevant phenotype terms from clinical narratives, which could help identify gaps in existing ontologies and enhance craniofacial and dental phenotyping vocabularies. PRACTICAL IMPLICATIONS The expansion of terms in the dental, oral, and craniofacial domains in the HPO is particularly important as the dental community moves toward electronic health records.
Collapse
|
45
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2019; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
46
|
Hyung D, Mallon AM, Kyung DS, Cho SY, Seong JK. TarGo: network based target gene selection system for human disease related mouse models. Lab Anim Res 2019; 35:23. [PMID: 32257911 PMCID: PMC7081697 DOI: 10.1186/s42826-019-0023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022] Open
Abstract
Genetically engineered mouse models are used in high-throughput phenotyping screens to understand genotype-phenotype associations and their relevance to human diseases. However, not all mutant mouse lines with detectable phenotypes are associated with human diseases. Here, we propose the “Target gene selection system for Genetically engineered mouse models” (TarGo). Using a combination of human disease descriptions, network topology, and genotype-phenotype correlations, novel genes that are potentially related to human diseases are suggested. We constructed a gene interaction network using protein-protein interactions, molecular pathways, and co-expression data. Several repositories for human disease signatures were used to obtain information on human disease-related genes. We calculated disease- or phenotype-specific gene ranks using network topology and disease signatures. In conclusion, TarGo provides many novel features for gene function prediction.
Collapse
Affiliation(s)
- Daejin Hyung
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea
| | - Ann-Marie Mallon
- 2MRC Harwell Institute, Mammalian Genetics Unit, Oxfordshire, OX11 0RD UK
| | - Dong Soo Kyung
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| | - Soo Young Cho
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea
| | - Je Kyung Seong
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| |
Collapse
|
47
|
Yan CK, Wang WX, Zhang G, Wang JL, Patel A. BiRWDDA: A Novel Drug Repositioning Method Based on Multisimilarity Fusion. J Comput Biol 2019; 26:1230-1242. [DOI: 10.1089/cmb.2019.0063] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Affiliation(s)
- Chao-Kun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Wen-Xiu Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jian-Lin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | | |
Collapse
|
48
|
Peng Y, Jiang Y, Radivojac P. Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies. Bioinformatics 2019; 34:i313-i322. [PMID: 29949985 PMCID: PMC6022688 DOI: 10.1093/bioinformatics/bty268] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation Modern problems of concept annotation associate an object of interest (gene, individual, text document) with a set of interrelated textual descriptors (functions, diseases, topics), often organized in concept hierarchies or ontologies. Most ontology can be seen as directed acyclic graphs (DAGs), where nodes represent concepts and edges represent relational ties between these concepts. Given an ontology graph, each object can only be annotated by a consistent sub-graph; that is, a sub-graph such that if an object is annotated by a particular concept, it must also be annotated by all other concepts that generalize it. Ontologies therefore provide a compact representation of a large space of possible consistent sub-graphs; however, until now we have not been aware of a practical algorithm that can enumerate such annotation spaces for a given ontology. Results We propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation. Availability and implementation https://github.com/shawn-peng/counting-consistent-sub-DAG Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yisu Peng
- Department of Computer Science, Indiana University, Bloomington, USA
| | - Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, USA
| | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, USA
| |
Collapse
|
49
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
50
|
Köhler S, Øien NC, Buske OJ, Groza T, Jacobsen JOB, McNamara C, Vasilevsky N, Carmody LC, Gourdine JP, Gargano M, McMurry J, Danis D, Mungall CJ, Smedley D, Haendel M, Robinson PN. Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics. CURRENT PROTOCOLS IN HUMAN GENETICS 2019; 103:e92. [PMID: 31479590 PMCID: PMC6814016 DOI: 10.1002/cphg.92] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The Human Phenotype Ontology (HPO) is a standardized set of phenotypic terms that are organized in a hierarchical fashion. It is a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics. The HPO is frequently used to create a set of terms that accurately describe the observed clinical abnormalities of an individual being evaluated for suspected rare genetic disease. This profile is compared with computational disease profiles in the HPO database with the aim of identifying genetic diseases with comparable phenotypic profiles. The computational analysis can be coupled with the analysis of whole-exome or whole-genome sequencing data through applications such as Exomiser. This article explains how to choose an optimal set of HPO terms for these cases and enter them with software, such as PhenoTips and PatientArchive, and demonstrates how to use Phenomizer and Exomiser to generate a computational differential diagnosis. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 10117, Germany
- Einstein Center Digital Future, Berlin 10117, Germany
- Monarch Initiative, monarchinitiative.org
| | | | | | | | - Julius OB Jacobsen
- Monarch Initiative, monarchinitiative.org
- Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Nicole Vasilevsky
- Monarch Initiative, monarchinitiative.org
- Oregon Health & Science University, Portland, OR 97217, USA
| | - Leigh C Carmody
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - JP Gourdine
- Monarch Initiative, monarchinitiative.org
- Oregon Health & Science University, Portland, OR 97217, USA
| | - Michael Gargano
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Julie McMurry
- Monarch Initiative, monarchinitiative.org
- Oregon State University, Corvallis, OR, USA
| | - Daniel Danis
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Christopher J Mungall
- Monarch Initiative, monarchinitiative.org
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Damian Smedley
- Monarch Initiative, monarchinitiative.org
- Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Melissa Haendel
- Monarch Initiative, monarchinitiative.org
- Oregon Health & Science University, Portland, OR 97217, USA
- Oregon State University, Corvallis, OR, USA
| | - Peter N Robinson
- Monarch Initiative, monarchinitiative.org
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| |
Collapse
|