201
|
Konte B, Leicht G, Giegling I, Pogarell O, Karch S, Hartmann AM, Friedl M, Hegerl U, Rujescu D, Mulert C. A genome-wide association study of early gamma-band response in a schizophrenia case-control sample. World J Biol Psychiatry 2018; 19:602-609. [PMID: 28922980 DOI: 10.1080/15622975.2017.1366054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
OBJECTIVES Disturbances in the gamma-frequency band of electroencephalography (EEG) measures are among the most consistently observed intermediate phenotypes in schizophrenia. We assessed whether genetic variations are associated with gamma-band activity. METHODS We performed a genome-wide association analysis of the early auditory evoked gamma-band response in schizophrenia affected subjects and healthy control individuals (in total N = 315). RESULTS No marker surpassed the threshold for genome-wide significant association. Several of the markers that were closest to significance mapped to genes involved in neuronal development and the Neuregulin-ErbB signalling network, such as NRG2 and KALRN. Using a gene-set enrichment analysis, we found suggestive evidence for association with genes involved in EEG abnormality (P = .048). CONCLUSIONS We identified no marker genome-wide significantly associating with gamma response; independent replication of the gene-set analysis result and larger sample sizes will be required to provide leads to cellular pathways involved in gamma-band activity.
Collapse
Affiliation(s)
- Bettina Konte
- a Department of Psychiatry, Psychotherapy and Psychosomatics , Martin-Luther-University Halle-Wittenberg , Halle , Germany
| | - Gregor Leicht
- b Psychiatry Neuroimaging Branch, Imaging Center NeuroImage Nord and Department of Psychiatry and Psychotherapy , University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| | - Ina Giegling
- a Department of Psychiatry, Psychotherapy and Psychosomatics , Martin-Luther-University Halle-Wittenberg , Halle , Germany
| | - Oliver Pogarell
- c Department of Psychiatry and Psychotherapy , Ludwig-Maximilians-University , Munich , Germany
| | - Susanne Karch
- c Department of Psychiatry and Psychotherapy , Ludwig-Maximilians-University , Munich , Germany
| | - Annette M Hartmann
- a Department of Psychiatry, Psychotherapy and Psychosomatics , Martin-Luther-University Halle-Wittenberg , Halle , Germany
| | - Marion Friedl
- a Department of Psychiatry, Psychotherapy and Psychosomatics , Martin-Luther-University Halle-Wittenberg , Halle , Germany
| | - Ulrich Hegerl
- d Department of Psychiatry and Psychotherapy , University of Leipzig , Leipzig , Germany
| | - Dan Rujescu
- a Department of Psychiatry, Psychotherapy and Psychosomatics , Martin-Luther-University Halle-Wittenberg , Halle , Germany
| | - Christoph Mulert
- b Psychiatry Neuroimaging Branch, Imaging Center NeuroImage Nord and Department of Psychiatry and Psychotherapy , University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| |
Collapse
|
202
|
Zhang W, Lei X. Two-step Random Walk Algorithm to Identify Cancer Genes Based on Various Biological Data. 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2018:1296-1301. [DOI: 10.1109/bibm.2018.8621448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
203
|
Weustenfeld M, Eidelpes R, Schmuth M, Rizzo WB, Zschocke J, Keller MA. Genotype and phenotype variability in Sjögren-Larsson syndrome. Hum Mutat 2018; 40:177-186. [PMID: 30372562 PMCID: PMC6587760 DOI: 10.1002/humu.23679] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 10/10/2018] [Accepted: 10/25/2018] [Indexed: 12/24/2022]
Abstract
The Sjögren-Larsson syndrome (SLS) is a rare autosomal recessive disorder caused by pathogenic variants in the ALDH3A2 gene, which codes for fatty aldehyde dehydrogenase (FALDH). FALDH prevents the accumulation of toxic fatty aldehydes by converting them into fatty acids. Pathogenic ALDH3A2 variants cause symptoms such as ichthyosis, spasticity, intellectual disability, and a wide range of less common clinical features. Interpreting patient-to-patient variability is often complicated by inconsistent reporting and negatively impacts on establishing robust criteria to measure the success of SLS treatments. Thus, with this study, patient-centered literature data was merged into a concise genotype-based, open-access database (www.LOVD.nl/ALDH3A2). One hundred and seventy eight individuals with 90 unique SLS-causing variants were included with phenotypic data being available for more than 90%. While the three lead symptoms did occur in almost all cases, more heterogeneity was observed for other frequent clinical manifestations of SLS. However, a stringent genotype-phenotype correlation analysis was hampered by the considerable variability in reporting phenotypic features. Consequently, we compiled a set of recommendations of how to generate comprehensive SLS patient descriptions in the future. This will be of benefit on multiple levels, for example, in clinical diagnosis, basic research, and the development of novel treatment options for SLS.
Collapse
Affiliation(s)
| | - Reiner Eidelpes
- Center for Molecular Biosciences Innsbruck (CMBI), Institute of Organic Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Matthias Schmuth
- Department of Dermatology, Venereology and Allergology, Medical University of Innsbruck, Innsbruck, Austria
| | - William B Rizzo
- Department of Pediatrics, UNMC Child Health Research Institute, University of Nebraska Medical Center, Omaha, NE, USA
| | - Johannes Zschocke
- Division of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria
| | - Markus A Keller
- Division of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
204
|
Comparative expression profiling reveals widespread coordinated evolution of gene expression across eukaryotes. Nat Commun 2018; 9:4963. [PMID: 30470754 PMCID: PMC6251915 DOI: 10.1038/s41467-018-07436-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 10/24/2018] [Indexed: 12/17/2022] Open
Abstract
Comparative studies of gene expression across species have revealed many important insights, but have also been limited by the number of species represented. Here we develop an approach to identify orthologs between highly diverged transcriptome assemblies, and apply this to 657 RNA-seq gene expression profiles from 309 diverse unicellular eukaryotes. We analyzed the resulting data for coevolutionary patterns, and identify several hundred protein complexes and pathways whose expression levels have evolved in a coordinated fashion across the trillions of generations separating these species, including many gene sets with little or no within-species co-expression across environmental or genetic perturbations. We also detect examples of adaptive evolution, for example of tRNA ligase levels to match genome-wide codon usage. In sum, we find that comparative studies from extremely diverse organisms can reveal new insights into the evolution of gene expression, including coordinated evolution of some of the most conserved protein complexes in eukaryotes. Gene pairs that are coexpressed across various environmental conditions in multiple species suggest functional similarity. Here the authors analyze patterns of gene expression co-evolution across diverse eukaryotes, and identify hundreds of protein complexes and pathways whose gene expression levels have co-evolved since their ancient divergence.
Collapse
|
205
|
Kolberg L, Kuzmin I, Adler P, Vilo J, Peterson H. funcExplorer: a tool for fast data-driven functional characterisation of high-throughput expression data. BMC Genomics 2018; 19:817. [PMID: 30428831 PMCID: PMC6236982 DOI: 10.1186/s12864-018-5176-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Accepted: 10/16/2018] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND A widely applied approach to extract knowledge from high-throughput genomic data is clustering of gene expression profiles followed by functional enrichment analysis. This type of analysis, when done manually, is highly subjective and has limited reproducibility. Moreover, this pipeline can be very time-consuming and resource-demanding as enrichment analysis is done for tens to hundreds of clusters at a time. Thus, the task often needs programming skills to form a pipeline of different software tools or R packages to enable an automated approach. Furthermore, visualising the results can be challenging. RESULTS We developed a web tool, funcExplorer, which automatically combines hierarchical clustering and enrichment analysis to detect functionally related gene clusters. The functional characterisation is achieved using structured knowledge from data sources such as Gene Ontology, KEGG and Reactome pathways, Human Protein Atlas, and Human Phenotype Ontology. funcExplorer includes various measures for finding biologically meaningful clusters, provides a modern graphical user interface, and has wide-ranging data export and sharing options as well as software transparency by open-source code. The results are presented in a visually compact and interactive format, enabling users to explore the biological essence of the data. We compared our results with previously published gene clusters to demonstrate that funcExplorer can perform the data characterisation equally well, but without requiring labour-intensive manual interference. CONCLUSIONS The open-source web tool funcExplorer enables scientists with high-throughput genomic data to obtain a preliminary interactive overview of the expression patterns, gene names, and shared functionalities in their dataset in a visually pleasing format. funcExplorer is publicly available at https://biit.cs.ut.ee/funcexplorer.
Collapse
Affiliation(s)
- Liis Kolberg
- Institute of Computer Science, University of Tartu, Juhan Liivi 2, Tartu, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Juhan Liivi 2, Tartu, Estonia
| | - Priit Adler
- Institute of Computer Science, University of Tartu, Juhan Liivi 2, Tartu, Estonia
- Quretec Ltd, Ülikooli 6a, Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Juhan Liivi 2, Tartu, Estonia
- Quretec Ltd, Ülikooli 6a, Tartu, Estonia
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Juhan Liivi 2, Tartu, Estonia
- Quretec Ltd, Ülikooli 6a, Tartu, Estonia
| |
Collapse
|
206
|
Zhu L, Hua G, Zafar S, Pan Y. Fundamental ideas and mathematical basis of ontology learning algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169769] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Linli Zhu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
| | - Gang Hua
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Sohail Zafar
- University of Management and Technology (UMT), Lahore, Pakistan
| | - Yu Pan
- School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
207
|
Köhler S. Improved ontology-based similarity calculations using a study-wise annotation model. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4953405. [PMID: 29688377 PMCID: PMC5868182 DOI: 10.1093/database/bay026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 02/20/2018] [Indexed: 11/13/2022]
Abstract
A typical use case of ontologies is the calculation of similarity scores between items that are annotated with classes of the ontology. For example, in differential diagnostics and disease gene prioritization, the human phenotype ontology (HPO) is often used to compare a query phenotype profile against gold-standard phenotype profiles of diseases or genes. The latter have long been constructed as flat lists of ontology classes, which, as we show in this work, can be improved by exploiting existing structure and information in annotation datasets or full text disease descriptions. We derive a study-wise annotation model of diseases and genes and show that this can improve the performance of semantic similarity measures. Inferred weights of individual annotations are one reason for this improvement, but more importantly using the study-wise structure further boosts the results of the algorithms according to precision-recall analyses. We test the study-wise annotation model for diseases annotated with classes from the HPO and for genes annotated with gene ontology (GO) classes. We incorporate this annotation model into similarity algorithms and show how this leads to improved performance. This work adds weight to the need for enhancing simple list-based representations of disease or gene annotations. We show how study-wise annotations can be automatically derived from full text summaries of disease descriptions and from the annotation data provided by the GO Consortium and how semantic similarity measure can utilize this extended annotation model. Database URL: https://phenomics.github.io/
Collapse
Affiliation(s)
- Sebastian Köhler
- NeuroCure Clinical Research Center, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
208
|
Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. Am J Hum Genet 2018; 103:535-552. [PMID: 30290150 DOI: 10.1016/j.ajhg.2018.08.017] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 08/28/2018] [Indexed: 01/29/2023] Open
Abstract
Although recent studies provide evidence for a common genetic basis between complex traits and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific manner remains elusive. Here, we have quantified the overlap of genes identified through large-scale genome-wide association studies (GWASs) for 62 complex traits and diseases with genes containing mutations known to cause 20 broad categories of Mendelian disorders. We identified a significant enrichment of genes linked to phenotypically matched Mendelian disorders in GWAS gene sets; of the total 1,240 comparisons, a higher proportion of phenotypically matched or related pairs (n = 50 of 92 [54%]) than phenotypically unmatched pairs (n = 27 of 1,148 [2%]) demonstrated significant overlap, confirming a phenotype-specific enrichment pattern. Further, we observed elevated GWAS effect sizes near genes linked to phenotypically matched Mendelian disorders. Finally, we report examples of GWAS variants localized at the transcription start site or physically interacting with the promoters of genes linked to phenotypically matched Mendelian disorders. Our results are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are dysregulated by non-coding variants in complex traits and demonstrate how leveraging findings from related Mendelian disorders and functional genomic datasets can prioritize genes that are putatively dysregulated by local and distal non-coding GWAS variants.
Collapse
|
209
|
Gao W, L.G. Guirao J, Basavanagoud B, Wu J. Partial multi-dividing ontology learning algorithm. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.049] [Citation(s) in RCA: 148] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
210
|
Normand EA, Braxton A, Nassef S, Ward PA, Vetrini F, He W, Patel V, Qu C, Westerfield LE, Stover S, Dharmadhikari AV, Muzny DM, Gibbs RA, Dai H, Meng L, Wang X, Xiao R, Liu P, Bi W, Xia F, Walkiewicz M, Van den Veyver IB, Eng CM, Yang Y. Clinical exome sequencing for fetuses with ultrasound abnormalities and a suspected Mendelian disorder. Genome Med 2018; 10:74. [PMID: 30266093 PMCID: PMC6162951 DOI: 10.1186/s13073-018-0582-x] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 09/12/2018] [Indexed: 12/11/2022] Open
Abstract
Background Exome sequencing is now being incorporated into clinical care for pediatric and adult populations, but its integration into prenatal diagnosis has been more limited. One reason for this is the paucity of information about the clinical utility of exome sequencing in the prenatal setting. Methods We retrospectively reviewed indications, results, time to results (turnaround time, TAT), and impact of exome results for 146 consecutive “fetal exomes” performed in a clinical diagnostic laboratory between March 2012 and November 2017. We define a fetal exome as one performed on a sample obtained from a fetus or a product of conception with at least one structural anomaly detected by prenatal imaging or autopsy. Statistical comparisons were performed using Fisher’s exact test. Results Prenatal exome yielded an overall molecular diagnostic rate of 32% (n = 46/146). Of the 46 molecular diagnoses, 50% were autosomal dominant disorders (n = 23/46), 41% were autosomal recessive disorders (n = 19/46), and 9% were X-linked disorders (n = 4/46). The molecular diagnostic rate was highest for fetuses with anomalies affecting multiple organ systems and for fetuses with craniofacial anomalies. Out of 146 cases, a prenatal trio exome option designed for ongoing pregnancies was performed on 62 fetal specimens, resulting in a diagnostic yield of 35% with an average TAT of 14 days for initial reporting (excluding tissue culture time). The molecular diagnoses led to refined recurrence risk estimates, altered medical management, and informed reproductive planning for families. Conclusion Exome sequencing is a useful diagnostic tool when fetal structural anomalies suggest a genetic etiology, but other standard prenatal genetic tests did not provide a diagnosis. Electronic supplementary material The online version of this article (10.1186/s13073-018-0582-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elizabeth A Normand
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alicia Braxton
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Salma Nassef
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Patricia A Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | | | | | | | | | - Lauren E Westerfield
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Samantha Stover
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Donna M Muzny
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hongzheng Dai
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Linyan Meng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Xia Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Rui Xiao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Weimin Bi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Fan Xia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Magdalena Walkiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA.,Present address: The National Institute of Allergy and Infectious Disease, NIH, Bethesda, MD, USA
| | - Ignatia B Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX, USA
| | - Christine M Eng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Baylor Genetics, Houston, TX, USA
| | - Yaping Yang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. .,Baylor Genetics, Houston, TX, USA.
| |
Collapse
|
211
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
212
|
Meeting Patients' Right to the Correct Diagnosis: Ongoing International Initiatives on Undiagnosed Rare Diseases and Ethical and Social Issues. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2018; 15:ijerph15102072. [PMID: 30248891 PMCID: PMC6210164 DOI: 10.3390/ijerph15102072] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 09/14/2018] [Accepted: 09/18/2018] [Indexed: 12/19/2022]
Abstract
The time required to reach a correct diagnosis is a key concern for rare disease (RD) patients. Diagnostic delay can be intolerably long, often described as an “odyssey” and, for some, a diagnosis may remain frustratingly elusive. The International Rare Disease Research Consortium proposed, as ultimate goal for 2017–2027, to enable all people with a suspected RD to be diagnosed within one year of presentation, if the disorder is known. Subsequently, unsolved cases would enter a globally coordinated diagnostic and research pipeline. In-depth analysis of the genotype through next generation sequencing, together with a standardized in-depth phenotype description and sophisticated high-throughput approaches, have been applied as diagnostic tools to increase the chance of a timely and accurate diagnosis. The success of this approach is evident in the Orphanet database. From 2010 to March 2017 over 600 new RDs and roughly 3600 linked genes have been described and identified. However, combination of -omics and phenotype data, as well as international sharing of this information, has raised ethical concerns. Values to be assessed include not only patient autonomy but also family implications, beneficence, non-maleficence, justice, solidarity and reciprocity, which must be respected and promoted and, at the same time, balanced among each other. In this work we suggest that, to maximize patients’ involvement in the search for a diagnosis and identification of new causative genes, undiagnosed patients should have the possibility to: (1) actively participate in the description of their phenotype; (2) choose the level of visibility of their profile in matchmaking databases; (3) express their preferences regarding return of new findings, in particular which level of Variant of Unknown Significance (VUS) significance should be considered relevant to them. The quality of the relationship between individual patients and physicians, and between the patient community and the scientific community, is critically important for optimizing the use of available data and enabling international collaboration in order to provide a diagnosis, and the attached support, to unsolved cases. The contribution of patients to collecting and coding data comprehensively is critical for efficient use of data downstream of data collection.
Collapse
|
213
|
Taeubner J, Wieczorek D, Yasin L, Brozou T, Borkhardt A, Kuhlen M. Penetrance and Expressivity in Inherited Cancer Predisposing Syndromes. Trends Cancer 2018; 4:718-728. [PMID: 30352675 DOI: 10.1016/j.trecan.2018.09.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 09/01/2018] [Accepted: 09/06/2018] [Indexed: 02/07/2023]
Abstract
Inherited diseases are not always expressed in the same way in every individual that carries the same variant in a disease-causing gene. This phenomenon is known as reduced or incomplete penetrance. Variable and incomplete penetrance may explain why inherited diseases are occasionally transmitted through unaffected parents, but also why clinically healthy individuals can carry potentially pathogenic variants without expressing features of the disease. Here, we will provide an overview of factors that play a fundamental role in the concept of penetrance and expressivity of cancer predisposing genes in children with malignancies. These findings are important to understand the complexity of inherited diseases and cancer development and to improve genetic counselling for the affected families.
Collapse
Affiliation(s)
- Julia Taeubner
- Department of Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Dagmar Wieczorek
- Institute of Human Genetics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Layal Yasin
- Department of Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Triantafyllia Brozou
- Department of Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Arndt Borkhardt
- Department of Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Michaela Kuhlen
- Department of Pediatric Oncology, Hematology and Clinical Immunology, University Children's Hospital, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
| |
Collapse
|
214
|
Wood L, Bassez G, Bleyenheuft C, Campbell C, Cossette L, Jimenez-Moreno AC, Dai Y, Dawkins H, Manera JAD, Dogan C, el Sherif R, Fossati B, Graham C, Hilbert J, Kastreva K, Kimura E, Korngut L, Kostera-Pruszczyk A, Lindberg C, Lindvall B, Luebbe E, Lusakowska A, Mazanec R, Meola G, Orlando L, Takahashi MP, Peric S, Puymirat J, Rakocevic-Stojanovic V, Rodrigues M, Roxburgh R, Schoser B, Segovia S, Shatillo A, Thiele S, Tournev I, van Engelen B, Vohanka S, Lochmüller H. Eight years after an international workshop on myotonic dystrophy patient registries: case study of a global collaboration for a rare disease. Orphanet J Rare Dis 2018; 13:155. [PMID: 30185236 PMCID: PMC6126043 DOI: 10.1186/s13023-018-0889-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 08/12/2018] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Myotonic Dystrophy is the most common form of muscular dystrophy in adults, affecting an estimated 10 per 100,000 people. It is a multisystemic disorder affecting multiple generations with increasing severity. There are currently no licenced therapies to reverse, slow down or cure its symptoms. In 2009 TREAT-NMD (a global alliance with the mission of improving trial readiness for neuromuscular diseases) and the Marigold Foundation held a workshop of key opinion leaders to agree a minimal dataset for patient registries in myotonic dystrophy. Eight years after this workshop, we surveyed 22 registries collecting information on myotonic dystrophy patients to assess the proliferation and utility the dataset agreed in 2009. These registries represent over 10,000 myotonic dystrophy patients worldwide (Europe, North America, Asia and Oceania). RESULTS The registries use a variety of data collection methods (e.g. online patient surveys or clinician led) and have a variety of budgets (from being run by volunteers to annual budgets over €200,000). All registries collect at least some of the originally agreed data items, and a number of additional items have been suggested in particular items on cognitive impact. CONCLUSIONS The community should consider how to maximise this collective resource in future therapeutic programmes.
Collapse
Affiliation(s)
- Libby Wood
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
| | - Guillaume Bassez
- Centre de référence des maladies neuromusculaires, Hôpital Henri Mondor, Paris, France
| | | | | | - Louise Cossette
- Centre de recherche du CHU de Québec, Université Laval, Quebec, Canada
| | | | - Yi Dai
- Department of Neurology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, China
| | - Hugh Dawkins
- Office of Population Health Genomics, Perth, Western Australia
| | | | - Celine Dogan
- Centre de référence des maladies neuromusculaires, Hôpital Henri Mondor, Paris, France
| | - Rasha el Sherif
- Neuromuscular & Neuro-genetics Unit, Air Hospital, Cairo, Egypt
| | - Barbara Fossati
- U.O. Neurology and Stroke Unit, IRCCS Policlinico San Donato, San Donato Milanese, Milan, Italy
| | - Caroline Graham
- Office of Population Health Genomics, Perth, Western Australia
| | - James Hilbert
- Department of Neurology, University of Rochester Medical Center, Rochester, NY USA
| | - Kristinia Kastreva
- Department of Neurology, Alexandrovska University Hospital, Medical University, Sofia, Bulgaria
| | - En Kimura
- Department of Promoting Clinical Trial and Translational Medicine, National Center for Neurology and Psychiatry, Translational Medical Center, Kodaira, Japan
| | | | | | | | | | - Elizabeth Luebbe
- Department of Neurology, University of Rochester Medical Center, Rochester, NY USA
| | - Anna Lusakowska
- Department of Neurology, Medical University of Warsaw, Warszawa, Poland
| | - Radim Mazanec
- University Hospital Prague- Motol and Charles University Prague, Prague, Czech Republic
| | - Giovani Meola
- U.O. Neurology and Stroke Unit, IRCCS Policlinico San Donato, San Donato Milanese, Milan, Italy
| | | | - Masanori P. Takahashi
- Department of Functional Diagnostic Science, Osaka University Graduate School of Medicine, Suita, Japan
| | - Stojan Peric
- Neurology Clinic, School of Medicine, University of Belgrade, Belgrade, Serbia
| | - Jack Puymirat
- Centre de recherche du CHU de Québec, Université Laval, Quebec, Canada
| | | | - Miriam Rodrigues
- Neurology, Auckland City Hospital, Private Bag 92024, Auckland, 1142 New Zealand
| | - Richard Roxburgh
- Neurology, Auckland City Hospital, Private Bag 92024, Auckland, 1142 New Zealand
| | - Benedikt Schoser
- Friedrich-Baur-Institute, Department of Neurology, Klinikum München, Munich, Germany
| | - Sonia Segovia
- Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER), Valencia, Spain
| | - Andriy Shatillo
- Institute of Neurology, Psychiatry and Narcology, Academy of medical science of Ukraine, Kharkiv, Ukraine
| | - Simone Thiele
- Friedrich-Baur-Institute, Department of Neurology, Klinikum München, Munich, Germany
| | - Ivailo Tournev
- Department of Neurology, Alexandrovska University Hospital, Medical University, Sofia, Bulgaria
| | | | - Stanislav Vohanka
- University Hospital and Masaryk University Brno, Brno, Czech Republic
| | - Hanns Lochmüller
- Department of Neuropediatrics and Muscle Disorders, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Centro Nacional de Análisis Genómico (CNAG-CRG), Center for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| |
Collapse
|
215
|
Bianchi L, Liò P. Opportunities for community awareness platforms in personal genomics and bioinformatics education. Brief Bioinform 2018; 18:1082-1090. [PMID: 27580620 DOI: 10.1093/bib/bbw078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Indexed: 01/16/2023] Open
Abstract
Precision and personalized medicine will be increasingly based on the integration of various type of information, particularly electronic health records and genome sequences. The availability of cheap genome sequencing services and the information interoperability will increase the role of online bioinformatics analysis. Being on the Internet poses constant threats to security and privacy. While we are connected and we share information, websites and internet services collect various types of personal data with or without the user consent. It is likely that genomics will merge with the internet culture of connectivity. This process will increase incidental findings, exposure and vulnerability. Here we discuss the social vulnerability owing to the genome and Internet combined security and privacy weaknesses. This urges more efforts in education and social awareness on how biomedical data are analysed and transferred through the internet and how inferential methods could integrate information from different sources. We propose that digital social platforms, used for raising collective awareness in different fields, could be developed for collaborative and bottom-up efforts in education. In this context, bioinformaticians could play a meaningful role in mitigating the future risk of digital-genomic divide.
Collapse
|
216
|
Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018; 19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs. Results We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network. Conclusions PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China. .,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
217
|
An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery. NPJ Genom Med 2018; 3:21. [PMID: 30131872 PMCID: PMC6089983 DOI: 10.1038/s41525-018-0060-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 04/06/2018] [Accepted: 07/06/2018] [Indexed: 12/18/2022] Open
Abstract
Despite major progress in defining the genetic basis of Mendelian disorders, the molecular etiology of many cases remains unknown. Patients with these undiagnosed disorders often have complex presentations and require treatment by multiple health care specialists. Here, we describe an integrated clinical diagnostic and research program using whole-exome and whole-genome sequencing (WES/WGS) for Mendelian disease gene discovery. This program employs specific case ascertainment parameters, a WES/WGS computational analysis pipeline that is optimized for Mendelian disease gene discovery with variant callers tuned to specific inheritance modes, an interdisciplinary crowdsourcing strategy for genomic sequence analysis, matchmaking for additional cases, and integration of the findings regarding gene causality with the clinical management plan. The interdisciplinary gene discovery team includes clinical, computational, and experimental biomedical specialists who interact to identify the genetic etiology of the disease, and when so warranted, to devise improved or novel treatments for affected patients. This program effectively integrates the clinical and research missions of an academic medical center and affords both diagnostic and therapeutic options for patients suffering from genetic disease. It may therefore be germane to other academic medical institutions engaged in implementing genomic medicine programs.
Collapse
|
218
|
Hu W, Qiu H, Huang J, Dumontier M. BioSearch: a semantic search engine for Bio2RDF. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4079799. [PMID: 29220451 PMCID: PMC5569678 DOI: 10.1093/database/bax059] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 07/10/2017] [Indexed: 12/14/2022]
Abstract
Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL:http://ws.nju.edu.cn/biosearch/
Collapse
Affiliation(s)
- Wei Hu
- State Key Laboratory for Novel Software Technology, Nanjing University, China.,Institute of Data Science, Maastricht University, The Netherlands
| | - Honglei Qiu
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Jiacheng Huang
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, The Netherlands
| |
Collapse
|
219
|
Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 2018; 35:497-505. [DOI: 10.1093/bioinformatics/bty637] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Affiliation(s)
- Alberto Valdeolivas
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
- ProGeLife, Marseille
| | - Laurent Tichit
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Claire Navarro
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Sophie Perrin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Gaëlle Odelin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Nicolas Levy
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Pierre Cau
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Elisabeth Remy
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| |
Collapse
|
220
|
Abstract
Diagnosing rare diseases can be challenging for clinicians. This article gives an overview on novel approaches, which enable automated phenotype-driven analyses of differential diagnoses for rare diseases as well as genomic variation data of affected individuals. The focus lies on reliable methods for collating clinical phenotypic data and new algorithms for precise and robust assessment of the similarity between phenotypic profiles. The Human Phenotype Ontology project (HPO; www.human-phenotype-ontology.org ) provides an ontology for collating symptoms and clinical phenotypic abnormalities. Using ontologies makes it possible to capture these data in a precise and comprehensive fashion as well as to apply reliable and robust automated analyses. Tools, such as the Phenomizer, enable the algorithmic calculation of similarity values amongst patients or between patients and disease descriptions. Such digital tools represent a solid foundation for differential diagnostic applications. Many rare diseases have a strong genetic component but the analysis of the coding DNA variants in rare disease patients is an enormously complex procedure, which often impedes successful molecular diagnostics. In this situation a combined analysis of the patients HPO-coded phenotypic features and the genomic characteristics of the variants can be of substantial help. In this case the HPO project and the associated algorithms are helpful: it is therefore an important component for phenotype-driven translational research and prioritization of disease-relavant genomic variations.
Collapse
Affiliation(s)
- S Köhler
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178, Berlin, Deutschland.
- Einstein Center Digital Future, Wilhelmstr. 67, 10117, Berlin, Deutschland.
- NeuroCure Clinical Research Center, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Deutschland.
| |
Collapse
|
221
|
Abstract
Due to the ubiquitous availability of the information on the web, there is a great need for a standardized representation of this information. Therefore, developing an efficient algorithm for retrieving information from knowledge graphs is a key challenge for many semantic web applications. This article presents spreading activation over ontology (SAOO) approach in order to detect the relatedness between two human diseases by applying spreading activation algorithm based on bidirectional search technique. The proposed approach detects two diseases relatedness by considering semantic domain knowledge. The methodology of the proposed work is divided into two phases: Semantic Matching and Diseases Relatedness Detection. In semantic matching, diseases within the user-submitted query are semantically identified in the ontology graph. In diseases relatedness detection, the relatedness between the two diseases is detected by using bidirectional-based spreading activation on the ontology graph. The classification of these diseases is provided as well.
Collapse
Affiliation(s)
- Said Fathalla
- Bonn University, Bonn, Germany & Alexandria University, Alexandria, Egypt
| |
Collapse
|
222
|
Phenotype-loci associations in networks of patients with rare disorders: application to assist in the diagnosis of novel clinical cases. Eur J Hum Genet 2018; 26:1451-1461. [PMID: 29946186 PMCID: PMC6138686 DOI: 10.1038/s41431-018-0139-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 02/06/2018] [Accepted: 03/06/2018] [Indexed: 12/29/2022] Open
Abstract
Copy number variations (CNVs) are genomic structural variations (deletions, duplications, or translocations) that represent the 4.8-9.5% of human genome variation in healthy individuals. In some cases, CNVs can also lead to disease, being the etiology of many known rare genetic/genomic disorders. Despite the last advances in genomic sequencing and diagnosis, the pathological effects of many rare genetic variations remain unresolved, largely due to the low number of patients available for these cases, making it difficult to identify consistent patterns of genotype-phenotype relationships. We aimed to improve the identification of statistically consistent genotype-phenotype relationships by integrating all the genetic and clinical data of thousands of patients with rare genomic disorders (obtained from the DECIPHER database) into a phenotype-patient-genotype tripartite network. Then we assessed how our network approach could help in the characterization and diagnosis of novel cases in clinical genetics. The systematic approach implemented in this work is able to better define the relationships between phenotypes and specific loci, by exploiting large-scale association networks of phenotypes and genotypes in thousands of rare disease patients. The application of the described methodology facilitated the diagnosis of novel clinical cases, ranking phenotypes by locus specificity and reporting putative new clinical features that may suggest additional clinical follow-ups. In this work, the proof of concept developed over a set of novel clinical cases demonstrates that this network-based methodology might help improve the precision of patient clinical records and the characterization of rare syndromes.
Collapse
|
223
|
Jackson R, Kartoglu I, Stringer C, Gorrell G, Roberts A, Song X, Wu H, Agrawal A, Lui K, Groza T, Lewsley D, Northwood D, Folarin A, Stewart R, Dobson R. CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak 2018; 18:47. [PMID: 29941004 PMCID: PMC6020175 DOI: 10.1186/s12911-018-0623-9] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/01/2018] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present. RESULTS To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall. CONCLUSION We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
Collapse
Affiliation(s)
- Richard Jackson
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ UK
| | - Ismail Kartoglu
- InterDigital Communications, 64 Great Eastern Street, 1st Floor, London, EC2A 3QR UK
| | - Clive Stringer
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | | | - Angus Roberts
- University of Sheffield, Western Bank, Sheffield, S10 2TN UK
| | - Xingyi Song
- University of Sheffield, Western Bank, Sheffield, S10 2TN UK
| | - Honghan Wu
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH16 4UX UK
| | - Asha Agrawal
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Kenneth Lui
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| | - Tudor Groza
- Garvan Institute of Medical Research, Sydney, NSW 2010 Australia
| | - Damian Lewsley
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Doug Northwood
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Amos Folarin
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ UK
| | - Richard Dobson
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| |
Collapse
|
224
|
Cornish AJ, David A, Sternberg MJE. PhenoRank: reducing study bias in gene prioritization through simulation. Bioinformatics 2018; 34:2087-2095. [PMID: 29360927 PMCID: PMC5949213 DOI: 10.1093/bioinformatics/bty028] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 01/10/2018] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Motivation Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10-16). Availability and implementation PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Alessia David
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| |
Collapse
|
225
|
Pontikos N, Yu J, Moghul I, Withington L, Blanco-Kelly F, Vulliamy T, Wong TLE, Murphy C, Cipriani V, Fiorentino A, Arno G, Greene D, Jacobsen JOB, Clark T, Gregory DS, Nemeth AM, Halford S, Inglehearn CF, Downes S, Black GC, Webster AR, Hardcastle AJ, Plagnol V. Phenopolis: an open platform for harmonization and analysis of genetic and phenotypic data. Bioinformatics 2018; 33:2421-2423. [PMID: 28334266 DOI: 10.1093/bioinformatics/btx147] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/14/2017] [Indexed: 11/12/2022] Open
Abstract
Summary Phenopolis is an open-source web server providing an intuitive interface to genetic and phenotypic databases. It integrates analysis tools such as variant filtering and gene prioritization based on phenotype. The Phenopolis platform will accelerate clinical diagnosis, gene discovery and encourage wider adoption of the Human Phenotype Ontology in the study of rare genetic diseases. Availability and Implementation A demo of the website is available at https://phenopolis.github.io . If you wish to install a local copy, source code and installation instruction are available at https://github.com/phenopolis . The software is implemented using Python, MongoDB, HTML/Javascript and various bash shell scripts. Contact n.pontikos@ucl.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolas Pontikos
- UCL Genetics Institute, University College London, London WC1E 6BT, UK.,Institute of Ophthalmology, University College London, London EC1V 9EL, UK.,Moorfields Eye Hospital, London EC1V 2PD, UK
| | - Jing Yu
- Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Ismail Moghul
- UCL Cancer Institute, University College London, London WC1E 6DD, UK
| | | | - Fiona Blanco-Kelly
- Institute of Ophthalmology, University College London, London EC1V 9EL, UK.,Moorfields Eye Hospital, London EC1V 2PD, UK
| | - Tom Vulliamy
- Barts and The London School of Medicine and Dentistry, Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - Tsz Lun Ernest Wong
- Barts and The London School of Medicine and Dentistry, Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - Cian Murphy
- UCL Genetics Institute, University College London, London WC1E 6BT, UK.,Warwick Medical School, The University of Warwick, Coventry CV4 7AL, UK
| | - Valentina Cipriani
- UCL Genetics Institute, University College London, London WC1E 6BT, UK.,Institute of Ophthalmology, University College London, London EC1V 9EL, UK.,Moorfields Eye Hospital, London EC1V 2PD, UK
| | - Alessia Fiorentino
- Institute of Ophthalmology, University College London, London EC1V 9EL, UK
| | - Gavin Arno
- Institute of Ophthalmology, University College London, London EC1V 9EL, UK.,Moorfields Eye Hospital, London EC1V 2PD, UK
| | - Daniel Greene
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK.,Medical Research Council Biostatistics Unit, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| | - Julius O B Jacobsen
- Barts and The London School of Medicine and Dentistry, William Harvey Research Institute, Queen Mary University of London, John Vane Building, Charterhouse Square, London EC1M 6BQ, UK
| | - Tristan Clark
- Computer Science Department, University College London, London WC1E 6BT, UK
| | - David S Gregory
- Computer Science Department, University College London, London WC1E 6BT, UK
| | - Andrea M Nemeth
- Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Stephanie Halford
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Chris F Inglehearn
- Leeds Institute of Biomedical and Clinical Sciences, University of Leeds, St James's University Hospital, Leeds LS9 7TF, UK
| | - Susan Downes
- Oxford Eye Hospital, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Graeme C Black
- Manchester Royal Eye Hospital, Central Manchester University Hospitals NHS Foundation Trust, Manchester M13 9WL, UK
| | - Andrew R Webster
- Institute of Ophthalmology, University College London, London EC1V 9EL, UK.,Moorfields Eye Hospital, London EC1V 2PD, UK
| | | | | | - Vincent Plagnol
- UCL Genetics Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
226
|
Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 2018; 33:2723-2730. [PMID: 28449114 PMCID: PMC5860058 DOI: 10.1093/bioinformatics/btx275] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 04/18/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. Availability and implementation https://github.com/bio-ontology-research-group/walking-rdf-and-owl. Contact robert.hoehndorf@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Mohammad Asif Khan
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Omar Maddouri
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.,Life Sciences Division, College of Science & Engineering, Hamad Bin Khalifa University, HBKU, Doha, Qatar
| | - Akira R Kinjo
- Institute for Protein Research, Osaka University 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Núria Queralt-Rosinach
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037 USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
227
|
Le DH, Dao LTM. Annotating Diseases Using Human Phenotype Ontology Improves Prediction of Disease-Associated Long Non-coding RNAs. J Mol Biol 2018; 430:2219-2230. [PMID: 29758261 DOI: 10.1016/j.jmb.2018.05.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 04/28/2018] [Accepted: 05/05/2018] [Indexed: 01/13/2023]
Abstract
Recently, many long non-coding RNAs (lncRNAs) have been identified and their biological function has been characterized; however, our understanding of their underlying molecular mechanisms related to disease is still limited. To overcome the limitation in experimentally identifying disease-lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies have calculated the similarity between two diseases by mapping exactly each disease to a single Disease Ontology (DO) term, and then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO terms. Until now, there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO) is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, that is, regularized least square and heterogeneous graph-based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than that on DO-based networks/matrices. In addition, our method can predict 11 novel cancer-associated lncRNAs, which are supported by literature evidence.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam; Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam.
| | - Lan T M Dao
- Vinmec Research Institute of Stem Cell and Gene Technology, 458 Minh Khai, Hai Ba Trung, Hanoi, Vietnam
| |
Collapse
|
228
|
Gong X, Jiang J, Duan Z, Lu H. A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology. BMC Bioinformatics 2018; 19:162. [PMID: 29745853 PMCID: PMC5998886 DOI: 10.1186/s12859-018-2064-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Although rapid developed sequencing technologies make it possible for genotype data to be used in clinical diagnosis, it is still challenging for clinicians to understand the results of sequencing and make correct judgement based on them. Before this, diagnosis based on clinical features held a leading position. With the establishment of the Human Phenotype Ontology (HPO) and the enrichment of phenotype-disease annotations, there throws much more attention to the improvement of phenotype-based diagnosis. Results In this study, we presented a novel method called RelativeBestPair to measure similarity from the query terms to hereditary diseases based on HPO and then rank the candidate diseases. To evaluate the performance, we simulated a set of patients based on 44 complex diseases. Besides, by adding noise or imprecision or both, cases closer to real clinical conditions were generated. Thus, four simulated datasets were used to make comparison among RelativeBestPair and seven existing semantic similarity measures. RelativeBestPair ranked the underlying disease as top 1 on 93.73% of the simulated dataset without noise and imprecision, 93.64% of the simulated dataset with noise and without imprecision, 39.82% of the simulated dataset without noise and with imprecision, and 33.64% of the simulated dataset with both noise and imprecision. Conclusion Compared with the seven existing semantic similarity measures, RelativeBestPair showed similar performance in two datasets without imprecision. While RelativeBestPair appeared to be equal to Resnik and better than other six methods in the simulated dataset without noise and with imprecision, it significantly outperformed all other seven methods in the simulated dataset with both noise and imprecision. It can be indicated that RelativeBestPair might be of great help in clinical setting.
Collapse
Affiliation(s)
- Xiaofeng Gong
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Jianping Jiang
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhongqu Duan
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Hui Lu
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
229
|
Gainotti S, Torreri P, Wang CM, Reihs R, Mueller H, Heslop E, Roos M, Badowska DM, de Paulis F, Kodra Y, Carta C, Martìn EL, Miller VR, Filocamo M, Mora M, Thompson M, Rubinstein Y, Posada de la Paz M, Monaco L, Lochmüller H, Taruscio D. The RD-Connect Registry & Biobank Finder: a tool for sharing aggregated data and metadata among rare disease researchers. Eur J Hum Genet 2018; 26:631-643. [PMID: 29396563 PMCID: PMC5945774 DOI: 10.1038/s41431-017-0085-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 11/20/2017] [Accepted: 11/23/2017] [Indexed: 12/20/2022] Open
Abstract
In rare disease (RD) research, there is a huge need to systematically collect biomaterials, phenotypic, and genomic data in a standardized way and to make them findable, accessible, interoperable and reusable (FAIR). RD-Connect is a 6 years global infrastructure project initiated in November 2012 that links genomic data with patient registries, biobanks, and clinical bioinformatics tools to create a central research resource for RDs. Here, we present RD-Connect Registry & Biobank Finder, a tool that helps RD researchers to find RD biobanks and registries and provide information on the availability and accessibility of content in each database. The finder concentrates information that is currently sparse on different repositories (inventories, websites, scientific journals, technical reports, etc.), including aggregated data and metadata from participating databases. Aggregated data provided by the finder, if appropriately checked, can be used by researchers who are trying to estimate the prevalence of a RD, to organize a clinical trial on a RD, or to estimate the volume of patients seen by different clinical centers. The finder is also a portal to other RD-Connect tools, providing a link to the RD-Connect Sample Catalogue, a large inventory of RD biological samples available in participating biobanks for RD research. There are several kinds of users and potential uses for the RD-Connect Registry & Biobank Finder, including researchers collaborating with academia and the industry, dealing with the questions of basic, translational, and/or clinical research. As of November 2017, the finder is populated with aggregated data for 222 registries and 21 biobanks.
Collapse
Affiliation(s)
- Sabina Gainotti
- Bioethics Unit, Office of the President, Istituto Superiore di Sanità, Rome, Italy.
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy.
| | - Paola Torreri
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | | | - Robert Reihs
- Institute of Pathology, Medical University of Graz, Graz, Austria
| | - Heimo Mueller
- Institute of Pathology, Medical University of Graz, Graz, Austria
| | - Emma Heslop
- John Walton Muscular Dystrophy Research Centre, Institute of Genetic Medicine, Newcastle University, Newcastle, UK
| | - Marco Roos
- Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
| | - Dorota Mazena Badowska
- John Walton Muscular Dystrophy Research Centre, Institute of Genetic Medicine, Newcastle University, Newcastle, UK
| | - Federico de Paulis
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Yllka Kodra
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Claudio Carta
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| | - Estrella Lopez Martìn
- Institute of Rare Diseases Research (IIER) & Centre for Biomedical Network Research on Rare Diseases (CIBERER), Institute of Health Carlos III, Madrid, Spain
| | | | - Mirella Filocamo
- Centro di diagnostica genetica e biochimica delle malattie metaboliche, Istituto Giannina Gaslini, Genoa, Italy
| | - Marina Mora
- Neuromuscular Diseases and Neuroimmunology Unit, Fondazione Istituto Neurologico C. Besta, Milan, Italy
| | - Mark Thompson
- Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
| | - Yaffa Rubinstein
- Office of Health Information Programs Development, National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Manuel Posada de la Paz
- Institute of Rare Diseases Research (IIER) & Centre for Biomedical Network Research on Rare Diseases (CIBERER), Institute of Health Carlos III, Madrid, Spain
| | | | - Hanns Lochmüller
- John Walton Muscular Dystrophy Research Centre, Institute of Genetic Medicine, Newcastle University, Newcastle, UK
| | - Domenica Taruscio
- National Center for Rare Diseases, Istituto Superiore di Sanità, Rome, Italy
| |
Collapse
|
230
|
GC[Formula: see text]NMF: A Novel Matrix Factorization Framework for Gene-Phenotype Association Prediction. Interdiscip Sci 2018; 10:572-582. [PMID: 29691712 DOI: 10.1007/s12539-018-0296-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 03/05/2018] [Accepted: 04/03/2018] [Indexed: 10/17/2022]
Abstract
Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
Collapse
|
231
|
Abstract
Data, including information generated from them by processing and analysis, are an asset with measurable value. The assets that biological research funding produces are the data generated, the information derived from these data, and, ultimately, the discoveries and knowledge these lead to. From the time when Henry Oldenburg published the first scientific journal in 1665 (Proceedings of the Royal Society) to the founding of the United States National Library of Medicine in 1879 to the present, there has been a sustained drive to improve how researchers can record and discover what is known. Researchers’ experimental work builds upon years and (collectively) billions of dollars’ worth of earlier work. Today, researchers are generating data at ever-faster rates because of advances in instrumentation and technology, coupled with decreases in production costs. Unfortunately, the ability of researchers to manage and disseminate their results has not kept pace, so their work cannot achieve its maximal impact. Strides have recently been made, but more awareness is needed of the essential role that biological data resources, including biocuration, play in maintaining and linking this ever-growing flood of data and information. The aim of this paper is to describe the nature of data as an asset, the role biocurators play in increasing its value, and consistent, practical means to measure effectiveness that can guide planning and justify costs in biological research information resources’ development and management.
Collapse
|
232
|
Arguello Casteleiro M, Demetriou G, Read W, Fernandez Prieto MJ, Maroto N, Maseda Fernandez D, Nenadic G, Klein J, Keane J, Stevens R. Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature. J Biomed Semantics 2018; 9:13. [PMID: 29650041 PMCID: PMC5896136 DOI: 10.1186/s13326-018-0181-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 03/06/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. METHODS We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. RESULTS In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. CONCLUSIONS This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.
Collapse
Affiliation(s)
| | - George Demetriou
- School of Computer Science, University of Manchester, Manchester, UK
| | - Warren Read
- School of Computer Science, University of Manchester, Manchester, UK
| | | | - Nava Maroto
- Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, UK.,Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Julie Klein
- Institut National de la Santé et de la Recherche Medicale (INSERM) U1048, Toulouse, France.,Universite Toulouse III Paul Sabatier, route de Narbonne, Toulouse, France
| | - John Keane
- School of Computer Science, University of Manchester, Manchester, UK.,Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Manchester, UK.
| |
Collapse
|
233
|
A Systems Approach to Refine Disease Taxonomy by Integrating Phenotypic and Molecular Networks. EBioMedicine 2018; 31:79-91. [PMID: 29669699 PMCID: PMC6013753 DOI: 10.1016/j.ebiom.2018.04.002] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Revised: 03/14/2018] [Accepted: 04/03/2018] [Indexed: 12/22/2022] Open
Abstract
The International Classification of Diseases (ICD) relies on clinical features and lags behind the current understanding of the molecular specificity of disease pathobiology, necessitating approaches that incorporate growing biomedical data for classifying diseases to meet the needs of precision medicine. Our analysis revealed that the heterogeneous molecular diversity of disease chapters and the blurred boundary between disease categories in ICD should be further investigated. Here, we propose a new classification of diseases (NCD) by developing an algorithm that predicts the additional categories of a disease by integrating multiple networks consisting of disease phenotypes and their molecular profiles. With statistical validations from phenotype-genotype associations and interactome networks, we demonstrate that NCD improves disease specificity owing to its overlapping categories and polyhierarchical structure. Furthermore, NCD captures the molecular diversity of diseases and defines clearer boundaries in terms of both phenotypic similarity and molecular associations, establishing a rational strategy to reform disease taxonomy. The International Classification of Diseases (ICD) lags behind the current molecular characteristics of disease. We quantified the limitations (specificity and blurred boundary) of ICD with integrated phenotypic and molecular profiles. An integrative disease network integrating phenotypic and genotypic profiles proposes a refined disease category framework.
Disease taxonomy is one of the foundations of medical science and healthcare solutions. The most widely used disease taxonomy in clinical settings is the International Classification of Diseases (ICD), a system established >100 years ago and maintained by the World Health Organization to track disease incidence. It is well recognized that ICD, which is based on clinical observations, largely lags behind the molecular achievements of this medical big data era. We quantified the limitations of ICD using integrated phenotypic and molecular profiles and proposed a refined disease taxonomy with possible applications for precision medicine.
Collapse
|
234
|
Zepeda-Mendoza CJ, Menon S, Morton CC. Computational Prediction of Position Effects of Human Chromosome Rearrangements. CURRENT PROTOCOLS IN HUMAN GENETICS 2018; 97:e57. [PMID: 30038699 PMCID: PMC6054318 DOI: 10.1002/cphg.57] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Balanced and apparently balanced chromosome abnormalities (BCAs) have long been known to generate disease through position effects, either by altering local networks of gene regulation or positioning genes in architecturally different chromosome domains. Despite these observations, identification of distally affected genes by BCAs is oftentimes neglected, especially when predicted gene disruptions are found elsewhere in the genome. In this unit, we provide detailed instructions on how to run a computational pipeline that identifies relevant candidates of non-coding BCA position effects. This methodology facilitates quick identification of genes potentially involved in disease by non-coding BCAs and other types of rearrangements, and expands on the importance of considering the long-range consequences of genomic lesions.
Collapse
Affiliation(s)
- Cinthya J. Zepeda-Mendoza
- Laboratory Genetics and Genomics, Mayo Clinic School of Graduate Medical Education, Mayo Clinic, Rochester, MN 55902, USA
| | - Shreya Menon
- Department of Obstetrics, Gynecology, and Reproductive Biology, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Cynthia C. Morton
- Department of Obstetrics, Gynecology, and Reproductive Biology, Brigham and Women’s Hospital, Boston, MA 02115, USA,Harvard Medical School, Boston, MA 02115, USA,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA02142, USA,Department of Pathology, Brigham and Women’s Hospital, Boston, MA 02115, USA,Division of Evolution and Genomic Science, School of Biological Sciences, Manchester Academic Health Science Centre, Manchester M13 9NT, UK,Corresponding author
| |
Collapse
|
235
|
Bhattacharya S, Li J, Sockell A, Kan MJ, Bava FA, Chen SC, Ávila-Arcos MC, Ji X, Smith E, Asadi NB, Lachman RS, Lam HYK, Bustamante CD, Butte AJ, Nolan GP. Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia. Genome Res 2018; 28:423-431. [PMID: 29567674 PMCID: PMC5880234 DOI: 10.1101/gr.223693.117] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 02/21/2018] [Indexed: 12/30/2022]
Abstract
Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification.
Collapse
Affiliation(s)
- Sanchita Bhattacharya
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - Jian Li
- Roche Sequencing Solutions, Belmont, California 94002, USA
| | - Alexandra Sockell
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Matthew J Kan
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - Felice A Bava
- Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, California 94305, USA
| | - Shann-Ching Chen
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - María C Ávila-Arcos
- International Laboratory for Human Genome Research, National Autonomous University of Mexico (UNAM) Santiago de Querétaro, Querétaro 76230, Mexico
| | - Xuhuai Ji
- Human Immune Monitoring Center and Functional Genomics Facility, Stanford University, Stanford, California 94305, USA
| | - Emery Smith
- Ultra Intelligence Corporation, Boulder, Colorado 80301, USA
| | - Narges B Asadi
- Roche Sequencing Solutions, Belmont, California 94002, USA
| | - Ralph S Lachman
- Department of Pediatric Radiology, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Hugo Y K Lam
- Roche Sequencing Solutions, Belmont, California 94002, USA
| | - Carlos D Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Atul J Butte
- Institute for Computational Health Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - Garry P Nolan
- Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
236
|
Bastarache L, Hughey JJ, Hebbring S, Marlo J, Zhao W, Ho WT, Van Driest SL, McGregor TL, Mosley JD, Wells QS, Temple M, Ramirez AH, Carroll R, Osterman T, Edwards T, Ruderfer D, Velez Edwards DR, Hamid R, Cogan J, Glazer A, Wei WQ, Feng Q, Brilliant M, Zhao ZJ, Cox NJ, Roden DM, Denny JC. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science 2018; 359:1233-1239. [PMID: 29590070 PMCID: PMC5959723 DOI: 10.1126/science.aal4043] [Citation(s) in RCA: 125] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 08/25/2017] [Accepted: 01/22/2018] [Indexed: 12/11/2022]
Abstract
Genetic association studies often examine features independently, potentially missing subpopulations with multiple phenotypes that share a single cause. We describe an approach that aggregates phenotypes on the basis of patterns described by Mendelian diseases. We mapped the clinical features of 1204 Mendelian diseases into phenotypes captured from the electronic health record (EHR) and summarized this evidence as phenotype risk scores (PheRSs). In an initial validation, PheRS distinguished cases and controls of five Mendelian diseases. Applying PheRS to 21,701 genotyped individuals uncovered 18 associations between rare variants and phenotypes consistent with Mendelian diseases. In 16 patients, the rare genetic variants were associated with severe outcomes such as organ transplants. PheRS can augment rare-variant interpretation and may identify subsets of patients with distinct genetic causes for common diseases.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jacob J Hughey
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Scott Hebbring
- Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - Joy Marlo
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wanke Zhao
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Wanting T Ho
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Sara L Van Driest
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Tracy L McGregor
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jonathan D Mosley
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Quinn S Wells
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Michael Temple
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Andrea H Ramirez
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Robert Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Travis Osterman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Todd Edwards
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Douglas Ruderfer
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Digna R Velez Edwards
- Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Rizwan Hamid
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joy Cogan
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Andrew Glazer
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - QiPing Feng
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Murray Brilliant
- Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - Zhizhuang J Zhao
- Department of Pathology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Nancy J Cox
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
237
|
Saklatvala JR, Dand N, Simpson MA. Text-mined phenotype annotation and vector-based similarity to improve identification of similar phenotypes and causative genes in monogenic disease patients. Hum Mutat 2018; 39:643-652. [PMID: 29460986 DOI: 10.1002/humu.23413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 01/25/2018] [Accepted: 02/16/2018] [Indexed: 11/07/2022]
Abstract
The genetic diagnosis of rare monogenic diseases using exome/genome sequencing requires the true causal variant(s) to be identified from tens of thousands of observed variants. Typically a virtual gene panel approach is taken whereby only variants in genes known to cause phenotypes resembling the patient under investigation are considered. With the number of known monogenic gene-disease pairs exceeding 5,000, manual curation of personalized virtual panels using exhaustive knowledge of the genetic basis of the human monogenic phenotypic spectrum is challenging. We present improved probabilistic methods for estimating phenotypic similarity based on Human Phenotype Ontology annotation. A limitation of existing methods for evaluating a disease's similarity to a reference set is that reference diseases are typically represented as a series of binary (present/absent) observations of phenotypic terms. We evaluate a quantified disease reference set, using term frequency in phenotypic text descriptions to approximate term relevance. We demonstrate an improved ability to identify related diseases through the use of a quantified reference set, and that vector space similarity measures perform better than established information content-based measures. These improvements enable the generation of bespoke virtual gene panels, facilitating more accurate and efficient interpretation of genomic variant profiles from individuals with rare Mendelian disorders. These methods are available online at https://atlas.genetics.kcl.ac.uk/~jake/cgi-bin/patient_sim.py.
Collapse
Affiliation(s)
- Jake R Saklatvala
- Department of Medical & Molecular Genetics, King's College London, London, United Kingdom
| | - Nick Dand
- Department of Medical & Molecular Genetics, King's College London, London, United Kingdom
| | - Michael A Simpson
- Department of Medical & Molecular Genetics, King's College London, London, United Kingdom
| |
Collapse
|
238
|
Gawliński P, Pelc M, Ciara E, Jhangiani S, Jurkiewicz E, Gambin T, Różdżyńska-Świątkowska A, Dawidziuk M, Coban-Akdemir Z, Guilbride D, Muzny D, Lupski J, Krajewska-Walasek M. Phenotype expansion and development in Kosaki overgrowth syndrome. Clin Genet 2018; 93:919-924. [DOI: 10.1111/cge.13192] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 11/19/2017] [Accepted: 11/20/2017] [Indexed: 01/09/2023]
Affiliation(s)
- P. Gawliński
- Department of Medical Genetics; Institute of Mother and Child; Warsaw Poland
| | - M. Pelc
- Department of Medical Genetics; The Children's Memorial Health Institute; Warsaw Poland
| | - E. Ciara
- Department of Medical Genetics; The Children's Memorial Health Institute; Warsaw Poland
| | - S. Jhangiani
- Human Genome Sequencing Center; Baylor College of Medicine; Houston Texas
| | - E. Jurkiewicz
- Department of Diagnostic Imaging; The Children's Memorial Health Institute; Warsaw Poland
| | - T. Gambin
- Department of Medical Genetics; Institute of Mother and Child; Warsaw Poland
- Institute of Computer Science; Warsaw University of Technology; Warsaw Texas
- Department of Molecular and Human Genetics; Baylor College of Medicine; Houston Texas
| | | | - M. Dawidziuk
- Department of Medical Genetics; Institute of Mother and Child; Warsaw Poland
| | - Z.H. Coban-Akdemir
- Department of Molecular and Human Genetics; Baylor College of Medicine; Houston Texas
| | | | - D. Muzny
- Human Genome Sequencing Center; Baylor College of Medicine; Houston Texas
| | - J.R. Lupski
- Department of Molecular and Human Genetics; Baylor College of Medicine; Houston Texas
- Human Genome Sequencing Center; Baylor College of Medicine; Houston Texas
- Texas Children's Hospital; Houston Texas
| | - M. Krajewska-Walasek
- Department of Medical Genetics; The Children's Memorial Health Institute; Warsaw Poland
| |
Collapse
|
239
|
Shen Y, Yuan K, Chen D, Colloc J, Yang M, Li Y, Lei K. An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription. Artif Intell Med 2018; 86:20-32. [PMID: 29433958 DOI: 10.1016/j.artmed.2018.01.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 01/20/2018] [Accepted: 01/22/2018] [Indexed: 12/17/2022]
Abstract
BACKGROUND The available antibiotic decision-making systems were developed from a physician's perspective. However, because infectious diseases are common, many patients desire access to knowledge via a search engine. Although the use of antibiotics should, in principle, be subject to a doctor's advice, many patients take them without authorization, and some people cannot easily or rapidly consult a doctor. In such cases, a reliable antibiotic prescription support system is needed. METHODS AND RESULTS This study describes the construction and optimization of the sensitivity and specificity of a decision support system named IDDAP, which is based on ontologies for infectious disease diagnosis and antibiotic therapy. The ontology for this system was constructed by collecting existing ontologies associated with infectious diseases, syndromes, bacteria and drugs into the ontology's hierarchical conceptual schema. First, IDDAP identifies a potential infectious disease based on a patient's self-described disease state. Then, the system searches for and proposes an appropriate antibiotic therapy specifically adapted to the patient based on factors such as the patient's body temperature, infection sites, symptoms/signs, complications, antibacterial spectrum, contraindications, drug-drug interactions between the proposed therapy and previously prescribed medication, and the route of therapy administration. The constructed domain ontology contains 1,267,004 classes, 7,608,725 axioms, and 1,266,993 members of "SubClassOf" that pertain to infectious diseases, bacteria, syndromes, anti-bacterial drugs and other relevant components. The system includes 507 infectious diseases and their therapy methods in combination with 332 different infection sites, 936 relevant symptoms of the digestive, reproductive, neurological and other systems, 371 types of complications, 838,407 types of bacteria, 341 types of antibiotics, 1504 pairs of reaction rates (antibacterial spectrum) between antibiotics and bacteria, 431 pairs of drug interaction relationships and 86 pairs of antibiotic-specific population contraindicated relationships. Compared with the existing infectious disease-relevant ontologies in the field of knowledge comprehension, this ontology is more complete. Analysis of IDDAP's performance in terms of classifiers based on receiver operating characteristic (ROC) curve results (89.91%) revealed IDDAP's advantages when combined with our ontology. CONCLUSIONS AND SIGNIFICANCE This study attempted to bridge the patient/caregiver gap by building a sophisticated application that uses artificial intelligence and machine learning computational techniques to perform data-driven decision-making at the point of primary care. The first level of decision-making is conducted by the IDDAP and provides the patient with a first-line therapy. Patients can then make a subjective judgment, and if any questions arise, should consult a physician for subsequent decisions, particularly in complicated cases or in cases in which the necessary information is not yet available in the knowledge base.
Collapse
Affiliation(s)
- Ying Shen
- School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Kaiqi Yuan
- School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Daoyuan Chen
- School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Joël Colloc
- Laboratory CIRTAI/IDEES, Université du Havre, Le Havre Cedex, France
| | - Min Yang
- SIAT, Chinese Academy of Sciences, China
| | | | - Kai Lei
- School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China.
| |
Collapse
|
240
|
Shakeel M, Irfan M, Khan IA. Estimating the mutational load for cardiovascular diseases in Pakistani population. PLoS One 2018; 13:e0192446. [PMID: 29420653 PMCID: PMC5805289 DOI: 10.1371/journal.pone.0192446] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 01/23/2018] [Indexed: 02/05/2023] Open
Abstract
The deleterious genetic variants contributing to certain diseases may differ in terms of number and allele frequency from population to population depending on their evolutionary background. Here, we prioritize the deleterious variants from Pakistani population in manually curated gene list already reported to be associated with common, Mendelian, and congenital cardiovascular diseases (CVDs) using the genome/exome sequencing data of Pakistani individuals publically available in 1000 Genomes Project (PJL), and Exome Aggregation Consortium (ExAC) South Asia. By applying a set of tools such as Combined Annotation Dependent Depletion (CADD), ANNOVAR, and Variant Effect Predictor (VEP), we highlighted 561 potentially detrimental variants from PJL data, and 7374 variants from ExAC South Asian data. Likewise, filtration from ClinVar for CVDs revealed 03 pathogenic and 02 likely pathogenic variants from PJL and 112 pathogenic and 42 likely pathogenic variants from ExAC South Asians. The comparison of derived allele frequencies (DAF) revealed many of these prioritized variants having two fold and higher DAF in Pakistani individuals than in other populations. The highest number of deleterious variants contributing to common CVDs in descending order includes hypertension, atherosclerosis, heart failure, aneurysm, and coronary heart disease, and for Mendelian and congenital CVDs cardiomyopathies, cardiac arrhythmias, and atrioventricular septal defects.
Collapse
Affiliation(s)
- Muhammad Shakeel
- Jamil-ur-Rahman Center for Genome Research, Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| | - Muhammad Irfan
- Jamil-ur-Rahman Center for Genome Research, Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| | - Ishtiaq Ahmad Khan
- Jamil-ur-Rahman Center for Genome Research, Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan
| |
Collapse
|
241
|
Anderson D, Lassmann T. A phenotype centric benchmark of variant prioritisation tools. NPJ Genom Med 2018; 3:5. [PMID: 29423277 PMCID: PMC5799157 DOI: 10.1038/s41525-018-0044-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 01/09/2018] [Accepted: 01/10/2018] [Indexed: 01/08/2023] Open
Abstract
Next generation sequencing is a standard tool used in clinical diagnostics. In Mendelian diseases the challenge is to discover the single etiological variant among thousands of benign or functionally unrelated variants. After calling variants from aligned sequencing reads, variant prioritisation tools are used to examine the conservation or potential functional consequences of variants. We hypothesised that the performance of variant prioritisation tools may vary by disease phenotype. To test this we created benchmark data sets for variants associated with different disease phenotypes. We found that performance of 24 tested tools is highly variable and differs by disease phenotype. The task of identifying a causative variant amongst a large number of benign variants is challenging for all tools, highlighting the need for further development in the field. Based on our observations, we recommend use of five top performers found in this study (FATHMM, M-CAP, MetaLR, MetaSVM and VEST3). In addition we provide tables indicating which analytical approach works best in which disease context. Variant prioritisation tools are best suited to investigate variants associated with well-studied genetic diseases, as these variants are more readily available during algorithm development than variants associated with rare diseases. We anticipate that further development into disease focussed tools will lead to significant improvements.
Collapse
Affiliation(s)
- Denise Anderson
- Telethon Kids Institute, The University of Western Australia, Subiaco, WA 6008 Australia
| | - Timo Lassmann
- Telethon Kids Institute, The University of Western Australia, Subiaco, WA 6008 Australia
| |
Collapse
|
242
|
Abstract
The majority of rare diseases affect children, most of whom have an underlying genetic cause for their condition. However, making a molecular diagnosis with current technologies and knowledge is often still a challenge. Paediatric genomics is an immature but rapidly evolving field that tackles this issue by incorporating next-generation sequencing technologies, especially whole-exome sequencing and whole-genome sequencing, into research and clinical workflows. This complex multidisciplinary approach, coupled with the increasing availability of population genetic variation data, has already resulted in an increased discovery rate of causative genes and in improved diagnosis of rare paediatric disease. Importantly, for affected families, a better understanding of the genetic basis of rare disease translates to more accurate prognosis, management, surveillance and genetic advice; stimulates research into new therapies; and enables provision of better support.
Collapse
|
243
|
Blankenburg H, Pramstaller PP, Domingues FS. A network-based meta-analysis for characterizing the genetic landscape of human aging. Biogerontology 2018; 19:81-94. [PMID: 29270911 PMCID: PMC5765210 DOI: 10.1007/s10522-017-9741-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 12/01/2017] [Indexed: 01/22/2023]
Abstract
Great amounts of omics data are generated in aging research, but their diverse and partly complementary nature requires integrative analysis approaches for investigating aging processes and connections to age-related diseases. To establish a broader picture of the genetic and epigenetic landscape of human aging we performed a large-scale meta-analysis of 6600 human genes by combining 35 datasets that cover aging hallmarks, longevity, changes in DNA methylation and gene expression, and different age-related diseases. To identify biological relationships between aging-associated genes we incorporated them into a protein interaction network and characterized their network neighborhoods. In particular, we computed a comprehensive landscape of more than 1000 human aging clusters, network regions where genes are highly connected and where gene products commonly participate in similar processes. In addition to clusters that capture known aging processes such as nutrient-sensing and mTOR signaling, we present a number of clusters with a putative functional role in linking different aging processes as promising candidates for follow-up studies. To enable their detailed exploration, all datasets and aging clusters are made freely available via an interactive website ( https://gemex.eurac.edu/bioinf/age/ ).
Collapse
Affiliation(s)
- Hagen Blankenburg
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Viale Druso 1, 39100 Bolzano, Italy
| | - Peter P. Pramstaller
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Viale Druso 1, 39100 Bolzano, Italy
- Department of Neurology, General Central Hospital, Bolzano, Italy
- Department of Neurology, University of Lübeck, Lübeck, Germany
| | - Francisco S. Domingues
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Viale Druso 1, 39100 Bolzano, Italy
| |
Collapse
|
244
|
Adler A, Kirchmeier P, Reinhard J, Brauner B, Dunger I, Fobo G, Frishman G, Montrone C, Mewes HW, Arnold M, Ruepp A. PhenoDis: a comprehensive database for phenotypic characterization of rare cardiac diseases. Orphanet J Rare Dis 2018; 13:22. [PMID: 29370821 PMCID: PMC5785853 DOI: 10.1186/s13023-018-0765-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 01/12/2018] [Indexed: 12/31/2022] Open
Abstract
Background Thoroughly annotated data resources are a key requirement in phenotype dependent analysis and diagnosis of diseases in the area of precision medicine. Recent work has shown that curation and systematic annotation of human phenome data can significantly improve the quality and selectivity for the interpretation of inherited diseases. We have therefore developed PhenoDis, a comprehensive, manually annotated database providing symptomatic, genetic and imprinting information about rare cardiac diseases. Results PhenoDis includes 214 rare cardiac diseases from Orphanet and 94 more from OMIM. For phenotypic characterization of the diseases, we performed manual annotation of diseases with articles from the biomedical literature. Detailed description of disease symptoms required the use of 2247 different terms from the Human Phenotype Ontology (HPO). Diseases listed in PhenoDis frequently cover a broad spectrum of symptoms with 28% from the branch of ‘cardiovascular abnormality’ and others from areas such as neurological (11.5%) and metabolism (6%). We collected extensive information on the frequency of symptoms in respective diseases as well as on disease-associated genes and imprinting data. The analysis of the abundance of symptoms in patient studies revealed that most of the annotated symptoms (71%) are found in less than half of the patients of a particular disease. Comprehensive and systematic characterization of symptoms including their frequency is a pivotal prerequisite for computer based prediction of diseases and disease causing genetic variants. To this end, PhenoDis provides in-depth annotation for a complete group of rare diseases, including information on pathogenic and likely pathogenic genetic variants for 206 diseases as listed in ClinVar. We integrated all results in an online database (http://mips.helmholtz-muenchen.de/phenodis/) with multiple search options and provide the complete dataset for download. Conclusion PhenoDis provides a comprehensive set of manually annotated rare cardiac diseases that enables computational approaches for disease prediction via decision support systems and phenotype-driven strategies for the identification of disease causing genes.
Collapse
Affiliation(s)
- Angela Adler
- Technische Universität München, Chair of Genome Oriented Bioinformatics, Center of Life and Food Science, D-85350, Freising-Weihenstephan, Germany
| | - Pia Kirchmeier
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Julian Reinhard
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Barbara Brauner
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Irmtraud Dunger
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Gisela Fobo
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Goar Frishman
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Corinna Montrone
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - H-Werner Mewes
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany.,Technische Universität München, Chair of Genome Oriented Bioinformatics, Center of Life and Food Science, D-85350, Freising-Weihenstephan, Germany
| | - Matthias Arnold
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany
| | - Andreas Ruepp
- Institute for Bioinformatics and Systems Biology (IBIS), Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), D-85764, Neuherberg, Germany.
| |
Collapse
|
245
|
Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics 2018; 19:919. [PMID: 29363423 PMCID: PMC5780854 DOI: 10.1186/s12864-017-4338-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. Results We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. Conclusions The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set. Electronic supplementary material The online version of this article (10.1186/s12864-017-4338-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China
| | - Yue Jiang
- Hospital for Sick Children, Toronto, M5G 1X8, Canada
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, 150081, People's Republic of China
| | - Jie Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xian, 710072, People's Republic of China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China.
| | - Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150088, People's Republic of China.
| |
Collapse
|
246
|
Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders. Genet Med 2018; 20:1216-1223. [PMID: 29323667 PMCID: PMC5912505 DOI: 10.1038/gim.2017.246] [Citation(s) in RCA: 223] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/20/2017] [Indexed: 12/15/2022] Open
Abstract
Purpose Given the rapid pace of discovery in rare disease genomics, it is likely that improvements in diagnostic yield can be made by systematically reanalysing previously generated genomic sequence data in light of new knowledge. Methods We tested this hypothesis in the UK-wide Deciphering Developmental Disorders Study, where in 2014 we reported a diagnostic yield of 27% through whole exome sequencing of 1133 children with severe developmental disorders and their parents. We reanalysed existing data using improved variant calling methodologies, novel variant detection algorithms, updated variant annotation, evidence-based filtering strategies, and newly discovered disease-associated genes. Results We are now able to diagnose an additional 182 individuals, taking our overall diagnostic yield to 454/1133 (40%), and another 43 (4%) have a finding of uncertain clinical significance. The majority of these new diagnoses are due to novel developmental disorder-associated genes discovered since our original publication. Conclusion This study highlights the importance of coupling large-scale research with clinical practice, and of discussing the possibility of iterative reanalysis and recontact with patients and health professionals at an early stage. We estimate that implementing parent-offspring whole exome sequencing as a first line diagnostic test for developmental disorders would diagnose >50% of patients.
Collapse
|
247
|
Oliveira D, Pesquita C. Improving the interoperability of biomedical ontologies with compound alignments. J Biomed Semantics 2018; 9:1. [PMID: 29316968 PMCID: PMC5761129 DOI: 10.1186/s13326-017-0171-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 12/21/2017] [Indexed: 12/29/2022] Open
Abstract
Background Ontologies are commonly used to annotate and help process life sciences data. Although their original goal is to facilitate integration and interoperability among heterogeneous data sources, when these sources are annotated with distinct ontologies, bridging this gap can be challenging. In the last decade, ontology matching systems have been evolving and are now capable of producing high-quality mappings for life sciences ontologies, usually limited to the equivalence between two ontologies. However, life sciences research is becoming increasingly transdisciplinary and integrative, fostering the need to develop matching strategies that are able to handle multiple ontologies and more complex relations between their concepts. Results We have developed ontology matching algorithms that are able to find compound mappings between multiple biomedical ontologies, in the form of ternary mappings, finding for instance that “aortic valve stenosis”(HP:0001650) is equivalent to the intersection between “aortic valve”(FMA:7236) and “constricted” (PATO:0001847). The algorithms take advantage of search space filtering based on partial mappings between ontology pairs, to be able to handle the increased computational demands. The evaluation of the algorithms has shown that they are able to produce meaningful results, with precision in the range of 60-92% for new mappings. The algorithms were also applied to the potential extension of logical definitions of the OBO and the matching of several plant-related ontologies. Conclusions This work is a first step towards finding more complex relations between multiple ontologies. The evaluation shows that the results produced are significant and that the algorithms could satisfy specific integration needs.
Collapse
Affiliation(s)
- Daniela Oliveira
- Insight Centre for Data Analytics, NUI Galway, Galway Business Park, Dangan, Galway, H91 AEX4, Ireland. .,LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal.
| | - Catia Pesquita
- LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| |
Collapse
|
248
|
Abstract
PURPOSE OF REVIEW The development of next-generation sequencing (NGS) technologies is transforming the practice of medical genetics and revolutionizing the approach to heterogeneous hereditary conditions, including skeletal muscle disorders. Here, we review the different NGS approaches described in the literature so far for the characterization of myopathic patients and the results obtained from the implementation of such approaches in a clinical setting. RECENT FINDINGS The overall diagnostic rate of NGS strategies for patients affected by skeletal muscle disorders is higher than the success rate obtained using the traditional gene-by-gene approach. Moreover, many recent articles have been expanding the clinical phenotypes associated with already known disease genes. SUMMARY NGS applications will soon be the first-tier test for skeletal muscle disorders. They will improve the diagnosis in myopathic patients, promoting their inclusion into novel therapeutic trials. At the same time, they will improve our knowledge about the molecular mechanisms causing skeletal muscle disorders, favoring the development of novel therapeutic approaches.
Collapse
|
249
|
Nair NU, Das A, Amit U, Robinson W, Park SG, Basu M, Lugo A, Leor J, Ruppin E, Hannenhalli S. Putative functional genes in idiopathic dilated cardiomyopathy. Sci Rep 2018; 8:66. [PMID: 29311597 PMCID: PMC5758757 DOI: 10.1038/s41598-017-18524-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 12/12/2017] [Indexed: 12/16/2022] Open
Abstract
Idiopathic dilated cardiomyopathy (DCM) is a complex disorder with a genetic and an environmental component involving multiple genes, many of which are yet to be discovered. We integrate genetic, epigenetic, transcriptomic, phenotypic, and evolutionary features into a method - Hridaya, to infer putative functional genes underlying DCM in a genome-wide fashion, using 213 human heart genomes and transcriptomes. Many genes identified by Hridaya are experimentally shown to cause cardiac complications. We validate the top predicted genes, via five different genome-wide analyses: First, the predicted genes are associated with cardiovascular functions. Second, their knockdowns in mice induce cardiac abnormalities. Third, their inhibition by drugs cause cardiac side effects in human. Fourth, they tend to have differential exon usage between DCM and normal samples. Fifth, analyzing 213 individual genotypes, we show that regulatory polymorphisms of the predicted genes are associated with elevated risk of cardiomyopathy. The stratification of DCM patients based on cardiac expression of the functional genes reveals two subgroups differing in key cardiac phenotypes. Integrating predicted functional genes with cardiomyocyte drug treatment experiments reveals novel potential drug targets. We provide a list of investigational drugs that target the newly identified functional genes that may lead to cardiac side effects.
Collapse
Affiliation(s)
- Nishanth Ulhas Nair
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA.
| | - Avinash Das
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
| | - Uri Amit
- The Neufeld Cardiac Research Institute, Tel Aviv University, Tel Aviv-Yafo, Israel
- Tamman Cardiovascular Research Institute, Sheba Medical Center, Ramat Gan, Israel
- The Dr. Pinchas Borenstein Talpiot Medical Leadership Program, Sheba Medical Center, Tel-Hashomer, Israel
- Department of Radiation Oncology, Sheba Medical Center, Tel-Hashomer, Israel
| | - Welles Robinson
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
| | - Seung Gu Park
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
| | - Mahashweta Basu
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
| | - Alex Lugo
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
| | - Jonathan Leor
- The Neufeld Cardiac Research Institute, Tel Aviv University, Tel Aviv-Yafo, Israel
- Tamman Cardiovascular Research Institute, Sheba Medical Center, Ramat Gan, Israel
| | - Eytan Ruppin
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, 20742, USA.
| |
Collapse
|
250
|
Hunt SE, McLaren W, Gil L, Thormann A, Schuilenburg H, Sheppard D, Parton A, Armean IM, Trevanion SJ, Flicek P, Cunningham F. Ensembl variation resources. Database (Oxford) 2018; 2018:5255129. [PMID: 30576484 PMCID: PMC6310513 DOI: 10.1093/database/bay119] [Citation(s) in RCA: 280] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 09/25/2018] [Accepted: 10/04/2018] [Indexed: 12/31/2022]
Abstract
The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype and disease. There are numerous valuable and well-established variation resources, but collating and making sense of non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without a systematic catalogue of these data and appropriate query and annotation tools, understanding the genome sequence of an individual and assessing their disease risk is impossible. In Ensembl, we substantially solve this problem: we develop methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats, both visually and programmatically; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.
Collapse
Affiliation(s)
- Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - William McLaren
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Laurent Gil
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Anja Thormann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Andrew Parton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Irina M Armean
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|