1
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
2
|
Villani RM, McKenzie ME, Davidson AL, Spurdle AB. Regional-specific calibration enables application of computational evidence for clinical classification of 5' cis-regulatory variants in Mendelian disease. Am J Hum Genet 2024; 111:1301-1315. [PMID: 38815586 PMCID: PMC11267523 DOI: 10.1016/j.ajhg.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 06/01/2024] Open
Abstract
To date, clinical genetic testing for Mendelian disease variants has focused heavily on exonic coding and intronic gene regions. This multi-step study was undertaken to provide an evidence base for selecting and applying computational approaches for use in clinical classification of 5' cis-regulatory region variants. Curated datasets of clinically reported disease-causing 5' cis-regulatory region variants and variants from matched genomic regions in population controls were used to calibrate six bioinformatic tools as predictors of variant pathogenicity. Likelihood ratio estimates were aligned to code weights following ClinGen recommendations for application of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) classification scheme. Considering code assignment across all reference dataset variants, performance was best for CADD (81.2%) and REMM (81.5%). Optimized thresholds provided moderate evidence toward pathogenicity (CADD, REMM) and moderate (CADD) or supporting (REMM) evidence against pathogenicity. Both sensitivity and specificity of prediction were improved when further categorizing variants based on location in an EPDnew-defined promoter region. Combining predictions (CADD, REMM, and location in a promoter region) increased specificity at the expense of sensitivity. Importantly, the optimal CADD thresholds for assigning ACMG/AMP codes PP3 (≥10) and BP4 (≤8) were vastly different from recommendations for protein-coding variants (PP3 ≥25.3; BP4 ≤22.7); CADD <22.7 would incorrectly assign BP4 for >90% of reported disease-causing cis-regulatory region variants. Our results demonstrate the need to consider a tiered approach and tailored score thresholds to optimize bioinformatic impact prediction for clinical classification of 5' cis-regulatory region variants.
Collapse
Affiliation(s)
- Rehan M Villani
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Maddison E McKenzie
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Aimee L Davidson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia; University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
3
|
Groza T, Gration D, Baynam G, Robinson PN. FastHPOCR: pragmatic, fast, and accurate concept recognition using the human phenotype ontology. Bioinformatics 2024; 40:btae406. [PMID: 38913850 PMCID: PMC11227366 DOI: 10.1093/bioinformatics/btae406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/18/2024] [Accepted: 06/19/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATION Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens-to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s. AVAILABILITY AND IMPLEMENTATION FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Telethon Kids Institute, Nedlands, WA 6009, Australia
- School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Bentley, WA 6102, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore 169609, Singapore
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Telethon Kids Institute, Nedlands, WA 6009, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
- Faculty of Health and Medical Sciences, University of Western Australia, Crawley, WA 6009, Australia
| | - Peter N Robinson
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| |
Collapse
|
4
|
Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600715. [PMID: 38979263 PMCID: PMC11230389 DOI: 10.1101/2024.06.25.600715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.
Collapse
Affiliation(s)
- Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Sai Ritesh Thela
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| |
Collapse
|
5
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
6
|
Santen RJ, Karaguzel G, Livaoglu M, Yue W, Cline JM, Ratan A, Sasano H. Role of ERα and Aromatase in Juvenile Gigantomastia. J Clin Endocrinol Metab 2024; 109:1765-1772. [PMID: 38227777 DOI: 10.1210/clinem/dgae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/04/2024] [Accepted: 01/09/2024] [Indexed: 01/18/2024]
Abstract
CONTEXT Approximately 150 patients with juvenile gigantomastia have been reported in the literature but the underlying biologic mechanisms remain unknown. OBJECTIVE To conduct extensive clinical, biochemical, immunochemical, and genetic studies in 3 patients with juvenile gigantomastia to determine causative biologic factors. METHODS We examined clinical effects of estrogen by blockading estrogen synthesis or its action. Breast tissue aromatase expression and activity were quantitated in 1 patient and 5 controls. Other biochemical markers, including estrogen receptor α (ERα), cyclin D1 and E, p-RB, p-MAPK, p-AKT, BCL-2, EGF-R, IGF-IR β, and p-EGFR were assayed by Western blot. Immunohistochemical analyses for aromatase, ERα and β, PgR, Ki67, sulfotransferase, estrone sulfatase, and 17βHD were performed in all 3 patients. The entire genomes of the mother, father, and patient in the 3 families were sequenced. RESULTS Blockade of estrogen synthesis or action in patients resulted in demonstrable clinical effects. Biochemical studies on fresh frozen tissue revealed no differences between patients and controls, presumably due to tissue dilution from the large proportion of stroma. However, immunohistochemical analysis of ductal breast cells in the 3 patients revealed a high percent of ERα (64.1% ± 7.8% vs reference women 9.6%, range 2.3-15%); aromatase score of 4 (76%-100% of cells positive vs 30.4% ± 5.6%); PgR (69.5% ± 15.2% vs 6.0%, range 2.7%-11.9%) and Ki67 (23.7% ± 0.54% vs 4.2%). Genetic studies were inconclusive although some intriguing variants were identified. CONCLUSION The data implicate an important biologic role for ERα to increase tissue sensitivity to estrogen and aromatase to enhance local tissue production as biologic factors involved in juvenile gigantomastia.
Collapse
Affiliation(s)
- Richard J Santen
- Division of Endocrinology and Metabolism, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - Gulay Karaguzel
- Department of Pediatric Endocrinology, Karadeniz Technical University, School of Medicine, 61080 Trabzon, Turkey
| | - Murat Livaoglu
- Department of Plastic Surgery, Karadeniz Technical University, 61080 Trabzon, Turkey
| | - Wei Yue
- Division of Endocrinology and Metabolism, University of Virginia School of Medicine, Charlottesville, VA 22903, USA
| | - J Mark Cline
- Department of Pathology, Section of Comparative Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA
| | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA 22908, USA
| | - Hironobu Sasano
- Department of Pathology, Tohoku University School of Medicine, Sendai, Miyagi 980-8575, Japan
| |
Collapse
|
7
|
Muhammad SS, Shoaib M, Pervez MT. An Integrated Framework for Analysis and Prediction of Impact of Single Nucleotide Polymorphism Associated with Human Diseases. Evol Bioinform Online 2024; 20:11769343241249916. [PMID: 38737438 PMCID: PMC11088291 DOI: 10.1177/11769343241249916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/10/2024] [Indexed: 05/14/2024] Open
Abstract
Single nucleotide polymorphisms are most common type of genetic variation in human genome. Analyzing genetic variants can help us better understand the genetic basis of diseases and develop predictive models which are useful to identify individuals who are at increased risk for certain diseases. Several SNP analysis tools have already been developed. For running these tools, the user needs to collect data from various databases. Secondly, often researchers have to use multiple variant analysis tools for cross validating their results and increase confidence in their findings. Extracting data from multiple databases and running multiple tools at a time, increases complexity and time required for analysis. There are some web-based tools that integrate multiple genetic variant databases and provide variant annotations for a few tools. These approaches have some limitations such as retrieving annotation information, filtering common pathogenic variants. The proposed web-based tool, namely IPSNP: An Integrated Platform for Predicting Impact of SNPs is written in Django which is a python-based framework. It uses RESTful API of MyVariant.info to extract annotation information of variants associated with a given gene, rsID, HGVS format variants specified in a VCF file for 29 tools. The results are in the form of a CSV file of predictions (1) derived from the consensus decision, (2) a file having annotations for the variants associated with the given gene, (3) a file showing variants declared as pathogenic commonly by the selected tools, and (4) a CSV file containing chromosome coordinates based on GRCh37 and GRCh38 genome assemblies, rsIDs and proteomic data, so that users may use tools of their choice and avoiding manual parameter collection for each tool. IPSNP is a valuable resource for researchers and clinicians and it can help to save time and effort in discovering the novel disease-associated variants and the development of personalized treatments.
Collapse
Affiliation(s)
- Syed Shah Muhammad
- Department of Computer Science, University of Engineering & Technology, Lahore, Punjab, Pakistan
| | - Muhammad Shoaib
- Department of Computer Science, University of Engineering & Technology, Lahore, Punjab, Pakistan
| | - Muhammad Tariq Pervez
- Department of Biological Sciences, Virtual University of Pakistan, Lahore, Punjab, Pakistan
| |
Collapse
|
8
|
Mao D, Liu C, Wang L, Ai-Ouran R, Deisseroth C, Pasupuleti S, Kim SY, Li L, Rosenfeld JA, Meng L, Burrage LC, Wangler MF, Yamamoto S, Santana M, Perez V, Shukla P, Eng CM, Lee B, Yuan B, Xia F, Bellen HJ, Liu P, Liu Z. AI-MARRVEL - A Knowledge-Driven AI System for Diagnosing Mendelian Disorders. NEJM AI 2024; 1:10.1056/aioa2300009. [PMID: 38962029 PMCID: PMC11221788 DOI: 10.1056/aioa2300009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/05/2024]
Abstract
BACKGROUND Diagnosing genetic disorders requires extensive manual curation and interpretation of candidate variants, a labor-intensive task even for trained geneticists. Although artificial intelligence (AI) shows promise in aiding these diagnoses, existing AI tools have only achieved moderate success for primary diagnosis. METHODS AI-MARRVEL (AIM) uses a random-forest machine-learning classifier trained on over 3.5 million variants from thousands of diagnosed cases. AIM additionally incorporates expert-engineered features into training to recapitulate the intricate decision-making processes in molecular diagnosis. The online version of AIM is available at https://ai.marrvel.org. To evaluate AIM, we benchmarked it with diagnosed patients from three independent cohorts. RESULTS AIM improved the rate of accurate genetic diagnosis, doubling the number of solved cases as compared with benchmarked methods, across three distinct real-world cohorts. To better identify diagnosable cases from the unsolved pools accumulated over time, we designed a confidence metric on which AIM achieved a precision rate of 98% and identified 57% of diagnosable cases out of a collection of 871 cases. Furthermore, AIM's performance improved after being fine-tuned for targeted settings including recessive disorders and trio analysis. Finally, AIM demonstrated potential for novel disease gene discovery by correctly predicting two newly reported disease genes from the Undiagnosed Diseases Network. CONCLUSIONS AIM achieved superior accuracy compared with existing methods for genetic diagnosis. We anticipate that this tool may aid in primary diagnosis, reanalysis of unsolved cases, and the discovery of novel disease genes. (Funded by the NIH Common Fund and others.).
Collapse
Affiliation(s)
- Dongxue Mao
- Department of Pediatrics, Baylor College of Medicine, Houston
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Chaozhong Liu
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
- Graduate School of Biomedical Sciences, Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston
| | - Linhua Wang
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
- Graduate School of Biomedical Sciences, Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston
| | - Rami Ai-Ouran
- Department of Pediatrics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
- Department of Data Science and AI, Al Hussein Technical University, Amman, Jordan
| | - Cole Deisseroth
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Sasidhar Pasupuleti
- Department of Pediatrics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Seon Young Kim
- Department of Pediatrics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Lucian Li
- Department of Pediatrics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
| | - Linyan Meng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Baylor Genetics, Houston7
| | - Lindsay C Burrage
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
| | - Michael F Wangler
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| | | | | | | | - Christine M Eng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Baylor Genetics, Houston7
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
| | - Bo Yuan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Human Genome Sequencing Center, Baylor College of Medicine, Houston
| | - Fan Xia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Baylor Genetics, Houston7
| | - Hugo J Bellen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
- Department of Neuroscience, Baylor College of Medicine, Houston
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston
- Baylor Genetics, Houston7
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston
| |
Collapse
|
9
|
Stenton SL, O'Leary MC, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O'Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson MW, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JOB, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O'Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project. Hum Genomics 2024; 18:44. [PMID: 38685113 PMCID: PMC11057178 DOI: 10.1186/s40246-024-00604-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Collapse
Affiliation(s)
- Sarah L Stenton
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Melanie C O'Leary
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace E VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie DiTroia
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vijay S Ganesh
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Emily Groopman
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily O'Heir
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brian Mangilog
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lynn S Pais
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jillian Serrano
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael W Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Riccardo Bellazzi
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Maria Giulia Carta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | - Matteo Floris
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | - Thomas Joseph
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Structural and Computational Biology and Molecular Biophysics Program, Baylor College of Medicine, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Yulan Lu
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tarun Karthik Kumar Mamidi
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Marta Mulargia
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Giovanna Nicora
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | | | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Maurizio S Podda
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
- Institute of Clinical Physiology (IFC), CNR, Via Moruzzi 1, 56124, Pisa, Italy
- University of Siena, Siena, Italy
- CTGLab, Institute of Informatics and Telematics (IIT), CNR, ViaMoruzzi 1, 56124, Pisa, Italy
| | - Aditya Rao
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | | | - Vangala G Saipradeep
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Peter Schols
- Invitae, San Francisco, CA, USA
- Codon One, Louvain, EU, Belgium
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX, USA
| | - Naveen Sivadasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | | | - Rajgopal Srinivasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Uma Sunderam
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Naina Tiwari
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Xiao Wang
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Yaqiong Wang
- Center for Molecular Medicine, Pediatric Research Institute, Children's Hospital of Fudan University, Shanghai, China
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Rujie Yin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yuning You
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Stephanie M Fullerton
- Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O'Donnell-Luria
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
10
|
Benegas G, Albors C, Aw AJ, Ye C, Song YS. GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.10.561776. [PMID: 37873118 PMCID: PMC10592768 DOI: 10.1101/2023.10.10.561776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, OMIM), experimental functional assays (DMS, DepMap), and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants.
Collapse
Affiliation(s)
- Gonzalo Benegas
- Graduate Group in Computational Biology, University of California, Berkeley
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley
| | - Alan J. Aw
- Department of Statistics, University of California, Berkeley
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
| |
Collapse
|
11
|
Ding M, Chen K, Yang Y, Zhao H. Prioritizing genomic variants pathogenicity via DNA, RNA, and protein-level features based on extreme gradient boosting. Hum Genet 2024:10.1007/s00439-024-02667-0. [PMID: 38575818 DOI: 10.1007/s00439-024-02667-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 03/05/2024] [Indexed: 04/06/2024]
Abstract
Genetic diseases are mostly implicated with genetic variants, including missense, synonymous, non-sense, and copy number variants. These different kinds of variants are indicated to affect phenotypes in various ways from previous studies. It remains essential but challenging to understand the functional consequences of these genetic variants, especially the noncoding ones, due to the lack of corresponding annotations. While many computational methods have been proposed to identify the risk variants. Most of them have only curated DNA-level and protein-level annotations to predict the pathogenicity of the variants, and others have been restricted to missense variants exclusively. In this study, we have curated DNA-, RNA-, and protein-level features to discriminate disease-causing variants in both coding and noncoding regions, where the features of protein sequences and protein structures have been shown essential for analyzing missense variants in coding regions while the features related to RNA-splicing and RBP binding are significant for variants in noncoding regions and synonymous variants in coding regions. Through the integration of these features, we have formulated the Multi-level feature Genomic Variants Predictor (ML-GVP) using the gradient boosting tree. The method has been trained on more than 400,000 variants in the Sherloc-training set from the 6th critical assessment of genome interpretation with superior performance. The method is one of the two best-performing predictors on the blind test in the Sherloc assessment, and is further confirmed by another independent test dataset of de novo variants.
Collapse
Affiliation(s)
- Maolin Ding
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Ken Chen
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Ministry of Education, Guangzhou, China.
| | - Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510000, China.
| |
Collapse
|
12
|
Nizomov J, Jin W, Xia Y, Liu Y, Li Z, Chen L. MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587790. [PMID: 38617248 PMCID: PMC11014600 DOI: 10.1101/2024.04.02.587790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Massively parallel reporter assay (MPRA) is an important technology to evaluate the impact of genetic variants on gene regulation. Here, we present MPRAVarDB, an online database and web server, for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants associated with GWAS loci, eQTLs and various genomic features, resulting in a total of 242,818 variants tested across more than 30 cell lines and 30 human diseases or traits. MPRAVarDB empowers the query of MPRA variants by genomic region, disease and cell line or by any combination of these query terms. Notably, MPRAVarDB offers a suite of pretrained machine learning models tailored to the specific disease and cell line, facilitating the genome-wide prediction of regulatory variants. MPRAVarDB is friendly to use, and users only need a few clicks to receive query and prediction results.
Collapse
Affiliation(s)
- Javlon Nizomov
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202
| | - Zhigang Li
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| |
Collapse
|
13
|
Seo Y, Joo K, Lee J, Diaz A, Jang S, Cherry TJ, Bujakowska KM, Han J, Woo SJ, Small KW. Two novel non-coding single nucleotide variants in the DNase1 hypersensitivity site of PRDM13 causing North Carolina macular dystrophy in Korea. Mol Vis 2024; 30:58-66. [PMID: 38601016 PMCID: PMC11006008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 02/17/2024] [Indexed: 04/12/2024] Open
Abstract
Purpose Pathogenic variants in North Carolina macular dystrophy (NCMD) have rarely been reported in the East Asian population. Herein, we reported novel variants of NCMD in 2 Korean families. Methods The regions associated with NCMD were analyzed with genome sequencing, and variants were filtered based on the minor allele frequency (0.5%) and heterozygosity. Non-coding variants were functionally annotated using multiple computational tools. Results We identified two rare novel variants, chr6:g.99,598,914T>C (hg38; V17) and chr6:g.99,598,926G>A (hg38; V18) upstream of PRDM13 in families A and B, respectively. In Family 1, Grade 2 NCMD and a best-corrected visual acuity of 20/25 and 20/200 in the right and left eyes, respectively, were observed. In Family B, all affected individuals had Grade 1 NCMD with characteristic confluent drusen at the fovea and a best-corrected visual acuity of 20/20 in both eyes. These two variants are 10-22 bp downstream of the reported V10 variant within the DNase1 hypersensitivity site. This site is associated with progressive bifocal chorioretinal atrophy and congenital posterior polar chorioretinal hypertrophy and lies in the putative enhancer site of PRDM13. Conclusion We identified two novel NCMD variants in the Korean population and further validated the regulatory role of the DNase1 hypersensitivity site upstream of PRDM13.
Collapse
Affiliation(s)
- Yuri Seo
- Institute of Vision Research, Department of Ophthalmology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin-si, Gyeonggi-do, South Korea
| | - Kwangsic Joo
- Department of Ophthalmology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Junwon Lee
- Institute of Vision Research, Department of Ophthalmology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Amber Diaz
- Macula and Retina Institute, Glendale and Los Angeles, CA
- Molecular Insight Research Foundation, Glendale and Los Angeles, CA
| | | | - Timothy J. Cherry
- Center for Developmental Biology and Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
- Brotman Baty Institute, Seattle, WA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA
| | - Kinga M. Bujakowska
- Ocular Genomic Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, MA
| | - Jinu Han
- Institute of Vision Research, Department of Ophthalmology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
- Ocular Genomic Institute, Massachusetts Eye and Ear Infirmary, Department of Ophthalmology, Harvard Medical School, Boston, MA
| | - Se Joon Woo
- Department of Ophthalmology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Kent W. Small
- Macula and Retina Institute, Glendale and Los Angeles, CA
- Molecular Insight Research Foundation, Glendale and Los Angeles, CA
| |
Collapse
|
14
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
15
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
16
|
Balachandran S, Prada-Medina CA, Mensah MA, Kakar N, Nagel I, Pozojevic J, Audain E, Hitz MP, Kircher M, Sreenivasan VKA, Spielmann M. STIGMA: Single-cell tissue-specific gene prioritization using machine learning. Am J Hum Genet 2024; 111:338-349. [PMID: 38228144 PMCID: PMC10870135 DOI: 10.1016/j.ajhg.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Collapse
Affiliation(s)
- Saranya Balachandran
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Cesar A Prada-Medina
- Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin A Mensah
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; BIH Charité Digital Clinician Scientist Program, BIH Biomedical Innovation Academy, Anna-Louisa-Karsch-Strasse 2, 10178 Berlin, Germany; RG Development & Disease, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Naseebullah Kakar
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Inga Nagel
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Enrique Audain
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Marc-Phillip Hitz
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Varun K A Sreenivasan
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
| | - Malte Spielmann
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck.
| |
Collapse
|
17
|
Groza T, Caufield H, Gration D, Baynam G, Haendel MA, Robinson PN, Mungall CJ, Reese JT. An evaluation of GPT models for phenotype concept recognition. BMC Med Inform Decis Mak 2024; 24:30. [PMID: 38297371 PMCID: PMC10829255 DOI: 10.1186/s12911-024-02439-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
OBJECTIVE Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Kent St, Bentley, WA, 6102, Australia.
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore, 169609, Singapore.
| | - Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
- Faculty of Health and Medical Sciences, University of Western Australia, 35 Stirling Hwy, Crawley, WA, 6009, Australia
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|
18
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
19
|
Son Y, Chung J. Risk Factor Analysis of Cryopreserved Autologous Bone Flap Resorption in Adult Patients Undergoing Cranioplasty with Volumetry Measurement Using Conventional Statistics and Machine-Learning Technique. J Korean Neurosurg Soc 2024; 67:103-114. [PMID: 37709548 PMCID: PMC10788544 DOI: 10.3340/jkns.2023.0143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/29/2023] [Accepted: 09/13/2023] [Indexed: 09/16/2023] Open
Abstract
OBJECTIVE Decompressive craniectomy (DC) with duroplasty is one of the common surgical treatments for life-threatening increased intracranial pressure (ICP). Once ICP is controlled, cranioplasty (CP) with reinsertion of the cryopreserved autologous bone flap or a synthetic implant is considered for protection and esthetics. Although with the risk of autologous bone flap resorption (BFR), cryopreserved autologous bone flap for CP is one of the important material due to its cost effectiveness. In this article, we performed conventional statistical analysis and the machine learning technique understand the risk factors for BFR. METHODS Patients aged >18 years who underwent autologous bone CP between January 2015 and December 2021 were reviewed. Demographic data, medical records, and volumetric measurements of the autologous bone flap volume from 94 patients were collected. BFR was defined with absolute quantitative method (BFR-A) and relative quantitative method (BFR%). Conventional statistical analysis and random forest with hyper-ensemble approach (RF with HEA) was performed. And overlapped partial dependence plots (PDP) were generated. RESULTS Conventional statistical analysis showed that only the initial autologous bone flap volume was statistically significant on BFR-A. RF with HEA showed that the initial autologous bone flap volume, interval between DC and CP, and bone quality were the factors with most contribution to BFR-A, while, trauma, bone quality, and initial autologous bone flap volume were the factors with most contribution to BFR%. Overlapped PDPs of the initial autologous bone flap volume on the BRF-A crossed at approximately 60 mL, and a relatively clear separation was found between the non-BFR and BFR groups. Therefore, the initial autologous bone flap of over 60 mL could be a possible risk factor for BFR. CONCLUSION From the present study, BFR in patients who underwent CP with autologous bone flap might be inevitable. However, the degree of BFR may differ from one to another. Therefore, considering artificial bone flaps as implants for patients with large DC could be reasonable. Still, the risk factors for BFR are not clearly understood. Therefore, chronological analysis and pathophysiologic studies are needed.
Collapse
Affiliation(s)
- Yohan Son
- Department of Neurosurgery, Dankook University Hospital, Cheonan, Korea
| | - Jaewoo Chung
- Department of Neurosurgery, Dankook University Hospital, Cheonan, Korea
- Department of Neurosurgery, College of Medicine, Dankook University, Cheonan, Korea
| |
Collapse
|
20
|
Lee AS, Ayers LJ, Kosicki M, Chan WM, Fozo LN, Pratt BM, Collins TE, Zhao B, Rose MF, Sanchis-Juan A, Fu JM, Wong I, Zhao X, Tenney AP, Lee C, Laricchia KM, Barry BJ, Bradford VR, Lek M, MacArthur DG, Lee EA, Talkowski ME, Brand H, Pennacchio LA, Engle EC. A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.22.23300468. [PMID: 38234731 PMCID: PMC10793524 DOI: 10.1101/2023.12.22.23300468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Unsolved Mendelian cases often lack obvious pathogenic coding variants, suggesting potential non-coding etiologies. Here, we present a single cell multi-omic framework integrating embryonic mouse chromatin accessibility, histone modification, and gene expression assays to discover cranial motor neuron (cMN) cis-regulatory elements and subsequently nominate candidate non-coding variants in the congenital cranial dysinnervation disorders (CCDDs), a set of Mendelian disorders altering cMN development. We generated single cell epigenomic profiles for ~86,000 cMNs and related cell types, identifying ~250,000 accessible regulatory elements with cognate gene predictions for ~145,000 putative enhancers. Seventy-five percent of elements (44 of 59) validated in an in vivo transgenic reporter assay, demonstrating that single cell accessibility is a strong predictor of enhancer activity. Applying our cMN atlas to 899 whole genome sequences from 270 genetically unsolved CCDD pedigrees, we achieved significant reduction in our variant search space and nominated candidate variants predicted to regulate known CCDD disease genes MAFB, PHOX2A, CHN1, and EBF3 - as well as new candidates in recurrently mutated enhancers through peak- and gene-centric allelic aggregation. This work provides novel non-coding variant discoveries of relevance to CCDDs and a generalizable framework for nominating non-coding variants of potentially high functional impact in other Mendelian disorders.
Collapse
Affiliation(s)
- Arthur S Lee
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Lauren J Ayers
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Michael Kosicki
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Wai-Man Chan
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Lydia N Fozo
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Brandon M Pratt
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Thomas E Collins
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Boxun Zhao
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Matthew F Rose
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Pathology, Boston Children's Hospital, Boston, MA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Jack M Fu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Isaac Wong
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Alan P Tenney
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Cassia Lee
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Harvard College, Cambridge, MA
| | - Kristen M Laricchia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Brenda J Barry
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Victoria R Bradford
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, NSW, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Eunjung Alice Lee
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, MA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA
| | - Elizabeth C Engle
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, MA
- Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
- Medical Genetics Training Program, Harvard Medical School, Boston, MA
- Department of Ophthalmology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| |
Collapse
|
21
|
Groza T, Wu H, Dinger ME, Danis D, Hilton C, Bagley A, Davids JR, Luo L, Lu Z, Robinson PN. Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics 2023; 39:btad716. [PMID: 38001031 PMCID: PMC10710372 DOI: 10.1093/bioinformatics/btad716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/20/2023] [Accepted: 11/23/2023] [Indexed: 11/26/2023] Open
Abstract
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Honghan Wu
- Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom
| | - Marcel E Dinger
- Pryzm Health, Sydney, NSW 2089, Australia
- School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Coleman Hilton
- Shriners Children’s Corporate Headquarters, Tampa, FL 33607, United States
| | - Anita Bagley
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| |
Collapse
|
22
|
Chen Y, Paramo MI, Zhang Y, Yao L, Shah SR, Jin Y, Zhang J, Pan X, Yu H. Finding Needles in the Haystack: Strategies for Uncovering Noncoding Regulatory Variants. Annu Rev Genet 2023; 57:201-222. [PMID: 37562413 DOI: 10.1146/annurev-genet-030723-120717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Despite accumulating evidence implicating noncoding variants in human diseases, unraveling their functionality remains a significant challenge. Systematic annotations of the regulatory landscape and the growth of sequence variant data sets have fueled the development of tools and methods to identify causal noncoding variants and evaluate their regulatory effects. Here, we review the latest advances in the field and discuss potential future research avenues to gain a more in-depth understanding of noncoding regulatory variants.
Collapse
Affiliation(s)
- You Chen
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Mauricio I Paramo
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yingying Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Li Yao
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Sagar R Shah
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yiyang Jin
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Junke Zhang
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Xiuqi Pan
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Haiyuan Yu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
23
|
Pagnamenta AT, Camps C, Giacopuzzi E, Taylor JM, Hashim M, Calpena E, Kaisaki PJ, Hashimoto A, Yu J, Sanders E, Schwessinger R, Hughes JR, Lunter G, Dreau H, Ferla M, Lange L, Kesim Y, Ragoussis V, Vavoulis DV, Allroggen H, Ansorge O, Babbs C, Banka S, Baños-Piñero B, Beeson D, Ben-Ami T, Bennett DL, Bento C, Blair E, Brasch-Andersen C, Bull KR, Cario H, Cilliers D, Conti V, Davies EG, Dhalla F, Dacal BD, Dong Y, Dunford JE, Guerrini R, Harris AL, Hartley J, Hollander G, Javaid K, Kane M, Kelly D, Kelly D, Knight SJL, Kreins AY, Kvikstad EM, Langman CB, Lester T, Lines KE, Lord SR, Lu X, Mansour S, Manzur A, Maroofian R, Marsden B, Mason J, McGowan SJ, Mei D, Mlcochova H, Murakami Y, Németh AH, Okoli S, Ormondroyd E, Ousager LB, Palace J, Patel SY, Pentony MM, Pugh C, Rad A, Ramesh A, Riva SG, Roberts I, Roy N, Salminen O, Schilling KD, Scott C, Sen A, Smith C, Stevenson M, Thakker RV, Twigg SRF, Uhlig HH, van Wijk R, Vona B, Wall S, Wang J, Watkins H, Zak J, Schuh AH, Kini U, Wilkie AOM, Popitsch N, Taylor JC. Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases. Genome Med 2023; 15:94. [PMID: 37946251 PMCID: PMC10636885 DOI: 10.1186/s13073-023-01240-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.
Collapse
Affiliation(s)
- Alistair T Pagnamenta
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Carme Camps
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Edoardo Giacopuzzi
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Human Technopole, Viale Rita Levi Montalcini 1, 20157, Milan, Italy
| | - John M Taylor
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Mona Hashim
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Eduardo Calpena
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Pamela J Kaisaki
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Akiko Hashimoto
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Jing Yu
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Edward Sanders
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Ron Schwessinger
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Jim R Hughes
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- University Medical Center Groningen, Groningen University, PO Box 72, 9700 AB, Groningen, The Netherlands
| | - Helene Dreau
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Matteo Ferla
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Lukas Lange
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Yesim Kesim
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Vassilis Ragoussis
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Dimitrios V Vavoulis
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Holger Allroggen
- Neurosciences Department, UHCW NHS Trust, Clifford Bridge Road, Coventry, CV2 2DX, UK
| | - Olaf Ansorge
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Christian Babbs
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Siddharth Banka
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Saint Mary's Hospital, Oxford Road, Manchester, M13 9WL, UK
| | - Benito Baños-Piñero
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - David Beeson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Tal Ben-Ami
- Pediatric Hematology-Oncology Unit, Kaplan Medical Center, Rehovot, Israel
| | - David L Bennett
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Celeste Bento
- Hematology Department, Hospitais da Universidade de Coimbra, Coimbra, Portugal
| | - Edward Blair
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Charlotte Brasch-Andersen
- Department of Clinical Genetics, Odense University Hospital and Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Katherine R Bull
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, UK
| | - Holger Cario
- Department of Pediatrics and Adolescent Medicine, University Medical Center, Eythstrasse 24, 89075, Ulm, Germany
| | - Deirdre Cilliers
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Valerio Conti
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - E Graham Davies
- Department of Immunology, Great Ormond Street Hospital for Children NHS Trust and UCL Great Ormond Street Institute of Child Health, Zayed Centre for Research, 2Nd Floor, 20C Guilford Street, London, WC1N 1DZ, UK
| | - Fatima Dhalla
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, IMS-Tetsuya Nakamura Building, Old Road Campus, Roosevelt Drive, Oxford, OX3 7TY, UK
| | - Beatriz Diez Dacal
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Yin Dong
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - James E Dunford
- Oxford NIHR Musculoskeletal BRC and Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Nuffield Orthopaedic Centre, Old Road, Oxford, OX3 7HE, UK
| | - Renzo Guerrini
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - Adrian L Harris
- Department of Oncology, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
| | - Jane Hartley
- Liver Unit, Birmingham Women's & Children's Hospital and University of Birmingham, Steelhouse Lane, Birmingham, B4 6NH, UK
| | - Georg Hollander
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Kassim Javaid
- Oxford NIHR Musculoskeletal BRC and Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Nuffield Orthopaedic Centre, Old Road, Oxford, OX3 7HE, UK
| | - Maureen Kane
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Pharmacy Hall North, Room 731, 20 N. Pine Street, Baltimore, MD, 21201, USA
| | - Deirdre Kelly
- Liver Unit, Birmingham Women's & Children's Hospital and University of Birmingham, Steelhouse Lane, Birmingham, B4 6NH, UK
| | - Dominic Kelly
- Children's Hospital, OUH NHS Foundation Trust, NIHR Oxford BRC, Headley Way, Oxford, OX3 9DU, UK
| | - Samantha J L Knight
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Alexandra Y Kreins
- Department of Immunology, Great Ormond Street Hospital for Children NHS Trust and UCL Great Ormond Street Institute of Child Health, Zayed Centre for Research, 2Nd Floor, 20C Guilford Street, London, WC1N 1DZ, UK
| | - Erika M Kvikstad
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Craig B Langman
- Feinberg School of Medicine, Northwestern University, 211 E Chicago Avenue, Chicago, IL, MS37, USA
| | - Tracy Lester
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Kate E Lines
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Simon R Lord
- Early Phase Clinical Trials Unit, Department of Oncology, University of Oxford, Cancer and Haematology Centre, Level 2 Administration Area, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Xin Lu
- Nuffield Department of Clinical Medicine, Ludwig Institute for Cancer Research, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
| | - Sahar Mansour
- St George's University Hospitals NHS Foundation Trust, Blackshore Road, Tooting, London, SW17 0QT, UK
| | - Adnan Manzur
- MRC Centre for Neuromuscular Diseases, National Hospital for Neurology and Neurosurgery, Queen Square, London, WC1N 3BG, UK
| | - Reza Maroofian
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, WC1N 3BG, UK
| | - Brian Marsden
- Nuffield Department of Medicine, Kennedy Institute, University of Oxford, Oxford, OX3 7BN, UK
| | - Joanne Mason
- Yourgene Health Headquarters, Skelton House, Lloyd Street North, Manchester Science Park, Manchester, M15 6SH, UK
| | - Simon J McGowan
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Davide Mei
- Neuroscience Department, Meyer Children's Hospital IRCCS, Viale Pieraccini 24, 50139, Florence, Italy
| | - Hana Mlcochova
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Yoshiko Murakami
- Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Andrea H Németh
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Steven Okoli
- Imperial College NHS Trust, Department of Haematology, Hammersmith Hospital, Du Cane Road, London, W12 0HS, UK
| | - Elizabeth Ormondroyd
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Level 6 West Wing, Oxford, OX3 9DU, JR, UK
| | - Lilian Bomme Ousager
- Department of Clinical Genetics, Odense University Hospital and Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Jacqueline Palace
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Smita Y Patel
- Clinical Immunology, John Radcliffe Hospital, Level 4A, Oxford, OX3 9DU, UK
| | - Melissa M Pentony
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
| | - Chris Pugh
- Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, UK
| | - Aboulfazl Rad
- Department of Otolaryngology-Head & Neck Surgery, Tübingen Hearing Research Centre, Eberhard Karls University, Elfriede-Aulhorn-Str. 5, 72076, Tübingen, Germany
| | - Archana Ramesh
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Simone G Riva
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Irene Roberts
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Noémi Roy
- Department of Haematology, Oxford University Hospitals NHS Foundation Trust, Level 4, Haematology, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Outi Salminen
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Kyleen D Schilling
- Ann & Robert H. Lurie Children's Hospital of Chicago, 225 E Chicago Avenue, Chicago, IL, 60611, USA
| | - Caroline Scott
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Arjune Sen
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Conrad Smith
- Oxford Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Churchill Hospital, Old Road, Oxford, OX3 7LE, UK
| | - Mark Stevenson
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Rajesh V Thakker
- University of Oxford, Academic Endocrine Unit, OCDEM, Churchill Hospital, Oxford, OX3 7LJ, UK
| | - Stephen R F Twigg
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Holm H Uhlig
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Paediatrics, University of Oxford, Level 2, Children's Hospital, John Radcliffe Hospital, Oxford, OX3 9DU, UK
- Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| | - Richard van Wijk
- UMC Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Barbara Vona
- Department of Otolaryngology-Head & Neck Surgery, Tübingen Hearing Research Centre, Eberhard Karls University, Elfriede-Aulhorn-Str. 5, 72076, Tübingen, Germany
- Institute of Human Genetics, University Medical Center Göttingen, Heinrich-Düker-Weg 12, 37073, Göttingen, Germany
- Institute for Auditory Neuroscience and InnerEarLab, University Medical Center Göttingen, Robert-Koch-Str. 40, 37075, Göttingen, Germany
| | - Steven Wall
- Oxford Craniofacial Unit, John Radcliffe Hospital, Level LG1, West Wing, Oxford, OX3 9DU, UK
| | - Jing Wang
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
| | - Hugh Watkins
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- University of Oxford, Level 6 West Wing, Oxford, OX3 9DU, JR, UK
| | - Jaroslav Zak
- Nuffield Department of Clinical Medicine, Ludwig Institute for Cancer Research, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, UK
- Department of Immunology and Microbiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Anna H Schuh
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, UK
| | - Usha Kini
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 7LE, UK
| | - Andrew O M Wilkie
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK
| | - Niko Popitsch
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK
- Department of Biochemistry and Cell Biology, Max Perutz Labs, University of Vienna, Vienna BioCenter(VBC), Dr.-Bohr-Gasse 9, 1030, Vienna, Austria
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford, OX3 7BN, UK.
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, OX3 9DU, UK.
| |
Collapse
|
24
|
Malinverno L, Barros V, Ghisoni F, Visonà G, Kern R, Nickel PJ, Ventura BE, Šimić I, Stryeck S, Manni F, Ferri C, Jean-Quartier C, Genga L, Schweikert G, Lovrić M, Rosen-Zvi M. A historical perspective of biomedical explainable AI research. PATTERNS (NEW YORK, N.Y.) 2023; 4:100830. [PMID: 37720333 PMCID: PMC10500028 DOI: 10.1016/j.patter.2023.100830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/19/2023]
Abstract
The black-box nature of most artificial intelligence (AI) models encourages the development of explainability methods to engender trust into the AI decision-making process. Such methods can be broadly categorized into two main types: post hoc explanations and inherently interpretable algorithms. We aimed at analyzing the possible associations between COVID-19 and the push of explainable AI (XAI) to the forefront of biomedical research. We automatically extracted from the PubMed database biomedical XAI studies related to concepts of causality or explainability and manually labeled 1,603 papers with respect to XAI categories. To compare the trends pre- and post-COVID-19, we fit a change point detection model and evaluated significant changes in publication rates. We show that the advent of COVID-19 in the beginning of 2020 could be the driving factor behind an increased focus concerning XAI, playing a crucial role in accelerating an already evolving trend. Finally, we present a discussion with future societal use and impact of XAI technologies and potential future directions for those who pursue fostering clinical trust with interpretable machine learning models.
Collapse
Affiliation(s)
| | - Vesna Barros
- AI for Accelerated Healthcare & Life Sciences Discovery, IBM R&D Laboratories, University of Haifa Campus, Mount Carmel, Haifa 3498825, Israel
- The Hebrew University of Jerusalem, Ein Kerem Campus, 9112102, Jerusalem, Israel
| | | | - Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, 72076 Tübingen, Germany
| | - Roman Kern
- Institute of Interactive Systems and Data Science, Graz University of Technology, Sandgasse 36/III, 8010 Graz, Austria
- Know-Center GmbH, Sandgasse 36/4A 8010, Graz, Austria
| | - Philip J. Nickel
- Eindhoven University of Technology, 5135600 MB Eindhoven, The Netherlands
| | | | - Ilija Šimić
- Know-Center GmbH, Sandgasse 36/4A 8010, Graz, Austria
| | - Sarah Stryeck
- Research Center Pharmaceutical Engineering GmbH, Inffeldgasse 138010 Graz, Austria
| | | | - Cesar Ferri
- VRAIN, Universitat Politècnica de València, Camino de Vera, s/n 46022 Valencia, Spain
| | - Claire Jean-Quartier
- Research Data Management, Graz University of Technology, Brockmanngasse 84, 8010 Graz, Austria
| | - Laura Genga
- Eindhoven University of Technology, 5135600 MB Eindhoven, The Netherlands
| | - Gabriele Schweikert
- School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | - Mario Lovrić
- Know-Center GmbH, Sandgasse 36/4A 8010, Graz, Austria
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Michal Rosen-Zvi
- AI for Accelerated Healthcare & Life Sciences Discovery, IBM R&D Laboratories, University of Haifa Campus, Mount Carmel, Haifa 3498825, Israel
- The Hebrew University of Jerusalem, Ein Kerem Campus, 9112102, Jerusalem, Israel
| |
Collapse
|
25
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
26
|
Stenton SL, O’Leary M, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O’Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson M, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JO, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O’Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.02.23293212. [PMID: 37577678 PMCID: PMC10418577 DOI: 10.1101/2023.08.02.23293212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Background A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.
Collapse
Affiliation(s)
- Sarah L. Stenton
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Melanie O’Leary
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace E. VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie DiTroia
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vijay S. Ganesh
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Emily Groopman
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily O’Heir
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brian Mangilog
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lynn S. Pais
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jillian Serrano
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marwa Abdelhakim
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Riccardo Bellazzi
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Maria Giulia Carta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | - Matteo Floris
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Julius O.B. Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | - Thomas Joseph
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Panagiotis Katsonis
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Structural and Computational Biology & Molecular Biophysics Program, Baylor College of Medicine, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Yulan Lu
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Paolo Magni
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tarun Karthik Kumar Mamidi
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Marta Mulargia
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | | | | | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Thi Hong Cam Pham
- Anatomy and Surgical Training Department, University of Medicine and Pharmacy, Hue University, Vietnam
| | - Maurizio S. Podda
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Aditya Rao
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | | | - Vangala G Saipradeep
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, Texas, USA
| | - Naveen Sivadasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | | | - Rajgopal Srinivasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Uma Sunderam
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Naina Tiwari
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Xiao Wang
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Yaqiong Wang
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Amanda Williams
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth A. Worthey
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Rujie Yin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yuning You
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
27
|
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, Yuan B, Boone PM, Groopman EE, Délot EC, Jain D, Sanchis-Juan A, Starita LM, Talkowski M, Montgomery SB, Bamshad MJ, Chong JX, Wheeler MT, Berger SI, O'Donnell-Luria A, Sedlazeck FJ, Miller DE. Beyond the exome: What's next in diagnostic testing for Mendelian conditions. Am J Hum Genet 2023; 110:1229-1248. [PMID: 37541186 PMCID: PMC10432150 DOI: 10.1016/j.ajhg.2023.06.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/06/2023] Open
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.
Collapse
Affiliation(s)
- Monica H Wojcik
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Newborn Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Chloe M Reuter
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Michael H Duyzend
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Hayk Barseghyan
- Center for Genetics Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20010, USA; Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics and Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Philip M Boone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emily E Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Emmanuèle C Délot
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA; Center for Genetics Medicine Research, Children's National Research and Innovation Campus, Washington, DC, USA; Department of Pediatrics, George Washington University, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA
| | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lea M Starita
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephen B Montgomery
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael J Bamshad
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jessica X Chong
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
| | - Matthew T Wheeler
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Seth I Berger
- Center for Genetics Medicine Research and Rare Disease Institute, Children's National Hospital, Washington, DC 20010, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA; Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
28
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
29
|
Danis D, Jacobsen JOB, Wagner AH, Groza T, Beckwith MA, Rekerle L, Carmody LC, Reese J, Hegde H, Ladewig MS, Seitz B, Munoz-Torres M, Harris NL, Rambla J, Baudis M, Mungall CJ, Haendel MA, Robinson PN. Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One 2023; 18:e0285433. [PMID: 37196000 PMCID: PMC10191354 DOI: 10.1371/journal.pone.0285433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/21/2023] [Indexed: 05/19/2023] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Julius O. B. Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Alex H. Wagner
- Departments of Pediatrics and Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, United States of America
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States of America
| | | | - Martha A. Beckwith
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Markus S. Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center, Homburg/Saar, Germany
| | - Monica Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Nomi L. Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Jordi Rambla
- European Genome-Phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Michael Baudis
- University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Melissa A. Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, United States of America
| |
Collapse
|
30
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
31
|
Mendelian inheritance revisited: dominance and recessiveness in medical genetics. Nat Rev Genet 2023:10.1038/s41576-023-00574-0. [PMID: 36806206 DOI: 10.1038/s41576-023-00574-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2022] [Indexed: 02/22/2023]
Abstract
Understanding the consequences of genotype for phenotype (which ranges from molecule-level effects to whole-organism traits) is at the core of genetic diagnostics in medicine. Many measures of the deleteriousness of individual alleles exist, but these have limitations for predicting the clinical consequences. Various mechanisms can protect the organism from the adverse effects of functional variants, especially when the variant is paired with a wild type allele. Understanding why some alleles are harmful in the heterozygous state - representing dominant inheritance - but others only with the biallelic presence of pathogenic variants - representing recessive inheritance - is particularly important when faced with the deluge of rare genetic alterations identified by high throughput DNA sequencing. Both awareness of the specific quantitative and/or qualitative effects of individual variants and the elucidation of allelic and non-allelic interactions are essential to optimize genetic diagnosis and counselling.
Collapse
|
32
|
Hocking LJ, Andrews C, Armstrong C, Ansari M, Baty D, Berg J, Bradley T, Clark C, Diamond A, Doherty J, Lampe A, McGowan R, Moore DJ, O'Sullivan D, Purvis A, Santoyo-Lopez J, Westwood P, Abbott M, Williams N, Aitman TJ, Miedzybrodzka Z, Humphrey WI, Martin S, Meynert A, Murphy F, Nourse C, Semple CA, Williams N, Dean J, Foley P, Robertson L, Ross A, Williamson K, Berg J, Goudie D, McWilliam C, Fitzpatrick D, Fletcher E, Jackson A, Lam W, Porteous M, Barr K, Bradshaw N, Davidson R, Gardiner C, Gorrie J, Hague R, Hamilton M, Joss S, Kinning E, Longman C, Martin N, McGowan R, Paterson J, Pilz D, Snadden L, Tobias E, Wedderburn S, Whiteford M, Aitman TJ, Miedzybrodzka Z. Genome sequencing with gene panel-based analysis for rare inherited conditions in a publicly funded healthcare system: implications for future testing. Eur J Hum Genet 2023; 31:231-238. [PMID: 36474026 PMCID: PMC9905562 DOI: 10.1038/s41431-022-01226-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/14/2022] [Accepted: 10/25/2022] [Indexed: 12/12/2022] Open
Abstract
NHS genetics centres in Scotland sought to investigate the Genomics England 100,000 Genomes Project diagnostic utility to evaluate genome sequencing for in rare, inherited conditions. Four regional services recruited 999 individuals from 394 families in 200 rare phenotype categories, with negative historic genetic testing. Genome sequencing was performed at Edinburgh Genomics, and phenotype and sequence data were transferred to Genomics England for variant calling, gene-based filtering and variant prioritisation. NHS Scotland genetics laboratories performed interpretation, validation and reporting. New diagnoses were made in 23% cases - 19% in genes implicated in disease at the time of variant prioritisation, and 4% from later review of additional genes. Diagnostic yield varied considerably between phenotype categories and was minimal in cases with prior exome testing. Genome sequencing with gene panel filtering and reporting achieved improved diagnostic yield over previous historic testing but not over now routine trio-exome sequence tests. Re-interpretation of genomic data with updated gene panels modestly improved diagnostic yield at minimal cost. However, to justify the additional costs of genome vs exome sequencing, efficient methods for analysis of structural variation will be required and / or cost of genome analysis and storage will need to decrease.
Collapse
Affiliation(s)
- Lynne J Hocking
- Institute of Medical Sciences, University of Aberdeen, Aberdeen, Scotland, UK
| | - Claire Andrews
- East of Scotland Regional Genetics Service, NHS Tayside, Ninewells Hospital, Dundee, Scotland, UK
| | - Christine Armstrong
- North of Scotland Medical Genetic Service, NHS Grampian, Polwarth Building, Foresterhill, Aberdeen, Scotland, UK
| | - Morad Ansari
- South East Scotland Genetic Service, NHS Lothian, Western General Hospital, Edinburgh, Scotland, UK
| | - David Baty
- East of Scotland Regional Genetics Service, NHS Tayside, Ninewells Hospital, Dundee, Scotland, UK
| | - Jonathan Berg
- East of Scotland Regional Genetics Service, NHS Tayside, Ninewells Hospital, Dundee, Scotland, UK.,School of Medicine, University of Dundee, Dundee, Scotland, UK
| | - Therese Bradley
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK
| | - Caroline Clark
- North of Scotland Medical Genetic Service, NHS Grampian, Polwarth Building, Foresterhill, Aberdeen, Scotland, UK
| | - Austin Diamond
- South East Scotland Genetic Service, NHS Lothian, Western General Hospital, Edinburgh, Scotland, UK
| | - Jill Doherty
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK
| | - Anne Lampe
- South East Scotland Genetic Service, NHS Lothian, Western General Hospital, Edinburgh, Scotland, UK
| | - Ruth McGowan
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK.,School of Medicine, Dentistry & Nursing, University of Glasgow, Glasgow, Scotland, UK
| | - David J Moore
- South East Scotland Genetic Service, NHS Lothian, Western General Hospital, Edinburgh, Scotland, UK
| | - Dawn O'Sullivan
- North of Scotland Medical Genetic Service, NHS Grampian, Polwarth Building, Foresterhill, Aberdeen, Scotland, UK
| | - Andrew Purvis
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK
| | | | - Paul Westwood
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK
| | - Michael Abbott
- Health Economics Research Unit, University of Aberdeen, Aberdeen, Scotland, UK
| | - Nicola Williams
- West of Scotland Centre for Genomic Medicine, NHS Greater Glasgow & Clyde, Queen Elizabeth University Hospital, Glasgow, Scotland, UK
| | | | - Timothy J Aitman
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, Scotland, UK.
| | - Zosia Miedzybrodzka
- Institute of Medical Sciences, University of Aberdeen, Aberdeen, Scotland, UK. .,North of Scotland Medical Genetic Service, NHS Grampian, Polwarth Building, Foresterhill, Aberdeen, Scotland, UK. .,North of Scotland Regional Genetic Service, NHS Grampian, Ashgrove House, Foresterhill, Aberdeen, Scotland, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Fan C, Chen K, Wang Y, Ball EV, Stenson PD, Mort M, Bacolla A, Kehrer-Sawatzki H, Tainer JA, Cooper DN, Zhao H. Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections. Hum Genet 2023; 142:245-274. [PMID: 36344696 PMCID: PMC10290229 DOI: 10.1007/s00439-022-02500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/24/2022] [Indexed: 11/09/2022]
Abstract
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
Collapse
Affiliation(s)
- Cong Fan
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Yukai Wang
- School of Life Science, Sun Yat-Sen University, Guangzhou, 500001, China
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Albino Bacolla
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | | | - John A Tainer
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, 107 Yan Jiang West Road, Guangzhou, 500001, People's Republic of China.
| |
Collapse
|
34
|
Wojcik MH, Reuter CM, Marwaha S, Mahmoud M, Duyzend MH, Barseghyan H, Yuan B, Boone PM, Groopman EE, Délot EC, Jain D, Sanchis-Juan A, Starita LM, Talkowski M, Montgomery SB, Bamshad MJ, Chong JX, Wheeler MT, Berger SI, O’Donnell-Luria A, Sedlazeck FJ, Miller DE. Beyond the exome: what's next in diagnostic testing for Mendelian conditions. ARXIV 2023:arXiv:2301.07363v1. [PMID: 36713248 PMCID: PMC9882576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order and emerging technologies, such as optical genome mapping and long-read DNA or RNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to a consortium such as GREGoR, which is focused on elucidating the underlying cause of rare unsolved genetic disorders.
Collapse
Affiliation(s)
- Monica H. Wojcik
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Division of Newborn Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Chloe M. Reuter
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Shruti Marwaha
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 USA
| | - Michael H. Duyzend
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 USA
| | - Hayk Barseghyan
- Center for Genetics Medicine Research, Children’s National Research Institute, Children’s National Hospital, Washington, DC 20010 USA
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037 USA
| | - Bo Yuan
- Department of Molecular and Human Genetics and Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 USA
| | - Philip M. Boone
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 USA
| | - Emily E. Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 USA
| | - Emmanuèle C. Délot
- Department of Genomics and Precision Medicine, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037 USA
- Center for Genetics Medicine Research, Children’s National Research and Innovation Campus, Washington, DC, USA
- Department of Pediatrics, George Washington University, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037 USA
| | - Deepti Jain
- Department of Biostatistics, School of Public Health, University of Washington, Seattle WA 98195 USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 USA
| | | | - Lea M. Starita
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195 USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195 USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114 USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Stephen B. Montgomery
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305 USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305 USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Michael J. Bamshad
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195 USA
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195 USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195 USA
| | - Jessica X. Chong
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195 USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195 USA
| | - Matthew T. Wheeler
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Seth I. Berger
- Center for Genetics Medicine Research and Rare Disease Institute, Children’s National Hospital, Washington, DC 20010 USA
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142 USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
- Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114 USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030 USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005 USA
| | - Danny E. Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195 USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195 USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195 USA
| |
Collapse
|
35
|
Schubach M, Nazaretyan L, Kircher M. The Regulatory Mendelian Mutation score for GRCh38. Gigascience 2022; 12:giad024. [PMID: 37083939 PMCID: PMC10120424 DOI: 10.1093/gigascience/giad024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/10/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow. RESULTS Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup. CONCLUSIONS Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, 10117 Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, 23562 Lübeck, Germany
| |
Collapse
|
36
|
Barbosa P, Savisaar R, Carmo-Fonseca M, Fonseca A. Computational prediction of human deep intronic variation. Gigascience 2022; 12:giad085. [PMID: 37878682 PMCID: PMC10599398 DOI: 10.1093/gigascience/giad085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. RESULTS In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. CONCLUSIONS Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | | | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | - Alcides Fonseca
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
| |
Collapse
|
37
|
Niazi Y, Paramasivam N, Blocka J, Kumar A, Huhn S, Schlesner M, Weinhold N, Sijmons R, De Jong M, Durie B, Goldschmidt H, Hemminki K, Försti A. Investigation of Rare Non-Coding Variants in Familial Multiple Myeloma. Cells 2022; 12:cells12010096. [PMID: 36611892 PMCID: PMC9818386 DOI: 10.3390/cells12010096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/16/2022] [Accepted: 12/22/2022] [Indexed: 12/28/2022] Open
Abstract
Multiple myeloma (MM) is a plasma cell malignancy whereby a single clone of plasma cells over-propagates in the bone marrow, resulting in the increased production of monoclonal immunoglobulin. While the complex genetic architecture of MM is well characterized, much less is known about germline variants predisposing to MM. Genome-wide sequencing approaches in MM families have started to identify rare high-penetrance coding risk alleles. In addition, genome-wide association studies have discovered several common low-penetrance risk alleles, which are mainly located in the non-coding genome. Here, we further explored the genetic basis in familial MM within the non-coding genome in whole-genome sequencing data. We prioritized and characterized 150 upstream, 5' untranslated region (UTR) and 3' UTR variants from 14 MM families, including 20 top-scoring variants. These variants confirmed previously implicated biological pathways in MM development. Most importantly, protein network and pathway enrichment analyses also identified 10 genes involved in mitogen-activated protein kinase (MAPK) signaling pathways, which have previously been established as important MM pathways.
Collapse
Affiliation(s)
- Yasmeen Niazi
- Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
- Correspondence: (Y.N.); (K.H.)
| | - Nagarajan Paramasivam
- Computational Oncology, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany
| | - Joanna Blocka
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
- Department of Medical Oncology, Jerome Lipper Multiple Myeloma Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Abhishek Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India
- Manipal Academy of Higher Education (MAHE), Manipal 576104, India
| | - Stefanie Huhn
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
- National Center for Tumor Diseases Heidelberg (NCT), 69120 Heidelberg, Germany
| | - Matthias Schlesner
- Bioinformatics and Omics Data Analytics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Niels Weinhold
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
| | - Rolf Sijmons
- University Medical Center Groningen, University of Groningen, 9712 Groningen, The Netherlands
| | - Mirjam De Jong
- University Medical Center Groningen, University of Groningen, 9712 Groningen, The Netherlands
| | - Brian Durie
- Cedars Sinai Cancer Center, Los Angeles, CA 90048, USA
| | - Hartmut Goldschmidt
- Computational Oncology, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
| | - Kari Hemminki
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, 323 00 Pilsen, Czech Republic
- Correspondence: (Y.N.); (K.H.)
| | - Asta Försti
- Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| |
Collapse
|
38
|
Zieger HK, Weinhold L, Schmidt A, Holtgrewe M, Juranek SA, Siewert A, Scheer AB, Thieme F, Mangold E, Ishorst N, Brand FU, Welzenbach J, Beule D, Paeschke K, Krawitz PM, Ludwig KU. Prioritization of non-coding elements involved in non-syndromic cleft lip with/without cleft palate through genome-wide analysis of de novo mutations. HGG ADVANCES 2022; 4:100166. [PMID: 36589413 PMCID: PMC9795529 DOI: 10.1016/j.xhgg.2022.100166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Non-syndromic cleft lip with/without cleft palate (nsCL/P) is a highly heritable facial disorder. To date, systematic investigations of the contribution of rare variants in non-coding regions to nsCL/P etiology are sparse. Here, we re-analyzed available whole-genome sequence (WGS) data from 211 European case-parent trios with nsCL/P and identified 13,522 de novo mutations (DNMs) in nsCL/P cases, 13,055 of which mapped to non-coding regions. We integrated these data with DNMs from a reference cohort, with results of previous genome-wide association studies (GWASs), and functional and epigenetic datasets of relevance to embryonic facial development. A significant enrichment of nsCL/P DNMs was observed at two GWAS risk loci (4q28.1 (p = 8 × 10-4) and 2p21 (p = 0.02)), suggesting a convergence of both common and rare variants at these loci. We also mapped the DNMs to 810 position weight matrices indicative of transcription factor (TF) binding, and quantified the effect of the allelic changes in silico. This revealed a nominally significant overrepresentation of DNMs (p = 0.037), and a stronger effect on binding strength, for DNMs located in the sequence of the core binding region of the TF Musculin (MSC). Notably, MSC is involved in facial muscle development, together with a set of nsCL/P genes located at GWAS loci. Supported by additional results from single-cell transcriptomic data and molecular binding assays, this suggests that variation in MSC binding sites contributes to nsCL/P etiology. Our study describes a set of approaches that can be applied to increase the added value of WGS data.
Collapse
Affiliation(s)
- Hanna K. Zieger
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Leonie Weinhold
- Institute for Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn 53127, Germany
| | - Axel Schmidt
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Manuel Holtgrewe
- Core Unit Bioinformatics, Berlin Institute of Health, Berlin 10117, Germany
| | - Stefan A. Juranek
- Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn 53127, Germany
| | - Anna Siewert
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Annika B. Scheer
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Frederic Thieme
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Elisabeth Mangold
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Nina Ishorst
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Fabian U. Brand
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn 53127, Germany
| | - Julia Welzenbach
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany
| | - Dieter Beule
- Core Unit Bioinformatics, Berlin Institute of Health, Berlin 10117, Germany,Max Delbrück Center for Molecular Medicine, Berlin 13125, Germany
| | - Katrin Paeschke
- Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn 53127, Germany
| | - Peter M. Krawitz
- Institute for Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn 53127, Germany
| | - Kerstin U. Ludwig
- Institute of Human Genetics, University of Bonn, School of Medicine and University Hospital Bonn, Bonn 53127, Germany,Corresponding author
| |
Collapse
|
39
|
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu Rev Genet 2022; 56:441-465. [PMID: 36055970 DOI: 10.1146/annurev-genet-072920-032107] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Collapse
Affiliation(s)
- Daniel Tabet
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Victoria Parikh
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Frederick P Roth
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Genomic Medicine and Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Harvard University, Boston, Massachusetts, USA;
| |
Collapse
|
40
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA. .,Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.,Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA.,Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
41
|
Pakkasjärvi N, Syvänen J, Wiro M, Koskimies‐Virta E. Amelia and phocomelia in Finland: Characteristics and prevalences in a nationwide population‐based study. Birth Defects Res 2022; 114:1427-1433. [PMID: 36353751 PMCID: PMC10100479 DOI: 10.1002/bdr2.2123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/23/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
BACKGROUND Amelia and phocomelia represent severe limb reduction defects. Specific epidemiologic data on these defects are scarce. We conducted a descriptive analysis of prevalence data in Finland during 1993-2008 to clarify the epidemiology nationwide in a population-based register study. We hypothesized that increasing maternal age would affect the total prevalence of each disorder. MATERIALS AND METHODS We collected information on all fetuses and infants affected by amelia and phocomelia during 1993-2008 from the National Register of Congenital Malformations in Finland. The clinical, laboratory, autopsy, and imaging data were re-evaluated where available for all cases found. RESULTS A total of 23 amelia and 7 phocomelia patients were identified. Thalidomide was not an etiological factor in any of the cases. The total prevalence of amelia was 2.43 per 100,000 births. The live birth prevalence was 0.63 per 100,000 live births. The total prevalence of phocomelia was 0.74 per 100,000 births, and the live birth prevalence was 0.53 per 100,000 live births. Infant mortality in amelia and phocomelia was 67% and 60%, respectively. CONCLUSIONS Infant mortality is high among amelia and phocomelia. Most cases had other major associated anomalies, but syndromic amelia cases were rare. Total prevalences were higher than previously reported and showed an increase in prevalence toward the end of the study period. The percentage of elective terminations of pregnancy for these disorders is high. While isolated cases are rare, they most likely present a better prognosis. Thus, correct diagnosis is essential in counseling for possible elective termination.
Collapse
Affiliation(s)
- Niklas Pakkasjärvi
- Department of Pediatric Surgery Turku University Hospital and University of Turku Turku Finland
- New Children's Hospital University of Helsinki and Helsinki University Hospital Helsinki Finland
- Department of Pediatric Surgery Uppsala Akademiska Barnsjukhuset Uppsala Sweden
| | - Johanna Syvänen
- Department of Pediatric Surgery Turku University Hospital and University of Turku Turku Finland
| | - Markus Wiro
- Department of Pediatric Surgery Turku University Hospital and University of Turku Turku Finland
| | - Eeva Koskimies‐Virta
- Department of Women's and Children's Health Karolinska Institutet Solna Sweden
- Section of Pediatric Orthopaedic Surgery Karolinska University Hospital Solna Sweden
| |
Collapse
|
42
|
Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO, Gee HY, Oh J, Jang IJ, Lee S, Baek D, Koh Y, Yoon SS, Kim YJ, Chae JH, Park WY, Bhak JH, Choi M. A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Exp Mol Med 2022; 54:1862-1871. [PMID: 36323850 PMCID: PMC9628380 DOI: 10.1038/s12276-022-00871-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/21/2022] [Accepted: 08/08/2022] [Indexed: 11/29/2022] Open
Abstract
Despite substantial advances in disease genetics, studies to date have largely focused on individuals of European descent. This limits further discoveries of novel functional genetic variants in other ethnic groups. To alleviate the paucity of East Asian population genome resources, we established the Korean Variant Archive 2 (KOVA 2), which is composed of 1896 whole-genome sequences and 3409 whole-exome sequences from healthy individuals of Korean ethnicity. This is the largest genome database from the ethnic Korean population to date, surpassing the 1909 Korean individuals deposited in gnomAD. The variants in KOVA 2 displayed all the known genetic features of those from previous genome databases, and we compiled data from Korean-specific runs of homozygosity, positively selected intervals, and structural variants. In doing so, we found loci, such as the loci of ADH1A/1B and UHRF1BP1, that are strongly selected in the Korean population relative to other East Asian populations. Our analysis of allele ages revealed a correlation between variant functionality and evolutionary age. The data can be browsed and downloaded from a public website ( https://www.kobic.re.kr/kova/ ). We anticipate that KOVA 2 will serve as a valuable resource for genetic studies involving East Asian populations.
Collapse
Affiliation(s)
- Jeongeun Lee
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea
| | - Jean Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Sungwon Jeon
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Jeongha Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Insu Jang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jin Ok Yang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea ,grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| | - Soojin Park
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byungwook Lee
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jinwook Choi
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea ,grid.31501.360000 0004 0470 5905Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byung-Ok Choi
- grid.264381.a0000 0001 2181 989XDepartment of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351 Republic of Korea
| | - Heon Yung Gee
- grid.15444.300000 0004 0470 5454Department of Pharmacology, Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, Seoul, 03722 Republic of Korea
| | - Jaeseong Oh
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - In-Jin Jang
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - Sanghyuk Lee
- grid.255649.90000 0001 2171 7754Department of Bio-Information Science, Ewha Womans University, Seoul, 03760 Republic of Korea
| | - Daehyun Baek
- grid.31501.360000 0004 0470 5905School of Biological Sciences, Seoul National University, Seoul, 08826 Republic of Korea
| | - Youngil Koh
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Sung-Soo Yoon
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Young-Joon Kim
- grid.15444.300000 0004 0470 5454Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, 03722 Republic of Korea
| | - Jong-Hee Chae
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea ,grid.412484.f0000 0001 0302 820XDepartment of Genomic Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Woong-Yang Park
- grid.414964.a0000 0001 0640 5613Samsung Genome Institute, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Jong Hwa Bhak
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Murim Choi
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| |
Collapse
|
43
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
44
|
Nayara Góes de Araújo J, Fernandes de Oliveira V, Bassani Borges J, Dagli-Hernandez C, da Silva Rodrigues Marçal E, Caroline Costa de Freitas R, Medeiros Bastos G, Marques Gonçalves R, Arpad Faludi A, Elim Jannes C, da Costa Pereira A, Dominguez Crespo Hirata R, Hiroyuki Hirata M, Ducati Luchessi A, Nogueira Silbiger V. In silico analysis of upstream variants in Brazilian patients with Familial Hypercholesterolemia. Gene X 2022; 849:146908. [PMID: 36167182 DOI: 10.1016/j.gene.2022.146908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 08/16/2022] [Accepted: 09/19/2022] [Indexed: 10/14/2022] Open
Abstract
Familial hypercholesterolemia (FH) is a prevalent autosomal genetic disease associated with increased risk of early cardiovascular events and death due to chronic exposure to very high levels of low-density lipoprotein cholesterol (LDL-c). Pathogenic variants in the coding regions of LDLR, APOB and PCSK9 account for most FH cases, and variants in non-coding regions maybe involved in FH as well. Variants in the upstream region of LDLR, APOB and PCSK9 were screened by targeted next-generation sequencing and their effects were explored using in silico tools. Twenty-five patients without pathogenic variants in FH-related genes were selected. 3 kb upstream regions of LDLR, APOB and PCSK9 were sequenced using the AmpliSeq (Illumina) and Miseq Reagent Nano Kit v2 (Illumina). Sequencing data were analyzed using variant discovery and functional annotation tools. Potentially regulatory variants were selected by integrating data from public databases, published data and context-dependent regulatory prediction score. Thirty-four single nucleotide variants (SNVs) in upstream regions were identified (6 in LDLR, 15 in APOB, and 13 in PCSK9). Five SNVs were prioritized as potentially regulatory variants (rs934197, rs9282606, rs36218923, rs538300761, g.55038486A>G). APOB rs934197 was previously associated with increased rate of transcription, which in silico analysis suggests that could be due to reducing binding affinity of a transcriptional repressor. Our findings highlight the importance of variant screening outside of coding regions of all relevant genes. Further functional studies are necessary to confirm that prioritized variants could impact gene regulation and contribute to the FH phenotype.
Collapse
Affiliation(s)
- Jéssica Nayara Góes de Araújo
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil
| | - Victor Fernandes de Oliveira
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Jéssica Bassani Borges
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil; Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil
| | - Carolina Dagli-Hernandez
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | | | - Renata Caroline Costa de Freitas
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Gisele Medeiros Bastos
- Laboratory of Molecular Research in Cardiology, Institute Dante Pazzanese of Cardiology, Sao Paulo, 04012-909, Brazil; Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | | | - André Arpad Faludi
- Medical Clinic Division, Institute Dante Pazzanese of Cardiology, Sao Paulo 04012-909, Brazil
| | - Cinthia Elim Jannes
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Alexandre da Costa Pereira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of Sao Paulo 05403-900, Brazil
| | - Rosario Dominguez Crespo Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - Mario Hiroyuki Hirata
- Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of Sao Paulo, Sao Paulo 05508-000, Brazil
| | - André Ducati Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil
| | - Vivian Nogueira Silbiger
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal 59078-900, Brazil; Department of Clinical and Toxicological Analyses, Federal University of Rio Grande do Norte, Natal 59012-570, Brazil.
| |
Collapse
|
45
|
Barbosa P, Ribeiro M, Carmo-Fonseca M, Fonseca A. Clinical significance of genetic variation in hypertrophic cardiomyopathy: comparison of computational tools to prioritize missense variants. Front Cardiovasc Med 2022; 9:975478. [PMID: 36061567 PMCID: PMC9433717 DOI: 10.3389/fcvm.2022.975478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
Hypertrophic cardiomyopathy (HCM) is a common heart disease associated with sudden cardiac death. Early diagnosis is critical to identify patients who may benefit from implantable cardioverter defibrillator therapy. Although genetic testing is an integral part of the clinical evaluation and management of patients with HCM and their families, in many cases the genetic analysis fails to identify a disease-causing mutation. This is in part due to difficulties in classifying newly detected rare genetic variants as well as variants-of-unknown-significance (VUS). Multiple computational algorithms have been developed to predict the potential pathogenicity of genetic variants, but their relative performance in HCM has not been comprehensively assessed. Here, we compared the performance of 39 currently available prediction tools in distinguishing between high-confidence HCM-causing missense variants and benign variants, and we developed an easy-to-use-tool to perform variant prediction benchmarks based on annotated VCF files (VETA). Our results show that tool performance increases after HCM-specific calibration of thresholds. After excluding potential biases due to circularity type I issues, we identified ClinPred, MISTIC, FATHMM, MPC and MetaLR as the five best performer tools in discriminating HCM-associated variants. We propose combining these tools in order to prioritize unknown HCM missense variants that should be closely followed-up in the clinic.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
| | - Marta Ribeiro
- Department of Bioengineering and iBB-Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
- *Correspondence: Maria Carmo-Fonseca
| | - Alcides Fonseca
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
- GenoMed - Diagnósticos de Medicina Molecular, Lisboa, Portugal
- Alcides Fonseca
| |
Collapse
|
46
|
Jacobsen JOB, Kelly C, Cipriani V, Research Consortium GE, Mungall CJ, Reese J, Danis D, Robinson PN, Smedley D. Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat 2022; 43:1071-1081. [PMID: 35391505 PMCID: PMC9288531 DOI: 10.1002/humu.24380] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 01/25/2022] [Accepted: 04/03/2022] [Indexed: 11/20/2022]
Abstract
Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project. Finally, we discuss the challenges that need to be overcome if we are going to improve detection rates and help the majority of patients that still remain without a molecular diagnosis after state-of-the-art genomic interpretation.
Collapse
Affiliation(s)
- Julius O. B. Jacobsen
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry QueenQueen Mary University of LondonLondonUK
| | - Catherine Kelly
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry QueenQueen Mary University of LondonLondonUK
| | - Valentina Cipriani
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry QueenQueen Mary University of LondonLondonUK
| | | | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Daniel Danis
- The Jackson Laboratory for Genomic MedicineFarmingtonConnecticutUSA
| | | | - Damian Smedley
- William Harvey Research Institute, Charterhouse Square, Barts and the London School of Medicine and Dentistry QueenQueen Mary University of LondonLondonUK
| |
Collapse
|
47
|
Seaby EG, Smedley D, Taylor Tavares AL, Brittain H, van Jaarsveld RH, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. A gene-to-patient approach uplifts novel disease gene discovery and identifies 18 putative novel disease genes. Genet Med 2022; 24:1697-1707. [PMID: 35532742 DOI: 10.1016/j.gim.2022.04.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 04/14/2022] [Accepted: 04/14/2022] [Indexed: 12/14/2022] Open
Abstract
PURPOSE Exome and genome sequencing have drastically accelerated novel disease gene discoveries. However, discovery is still hindered by myriad variants of uncertain significance found in genes of undetermined biological function. This necessitates intensive functional experiments on genes of equal predicted causality, leading to a major bottleneck. METHODS We apply the loss-of-function observed/expected upper-bound fraction metric of intolerance to gene inactivation to curate a list of predicted haploinsufficient disease genes. Using data from the 100,000 Genomes Project, we adopt a gene-to-patient approach that matches de novo loss-of-function variants in constrained genes to patients with rare disease. Through large-scale aggregation of data, we reduce excess analytical noise currently hindering novel discoveries. RESULTS Results from 13,949 trios revealed 643 rare, de novo predicted loss-of-function events filtered from 1044 loss-of-function observed/expected upper-bound fraction-constrained genes. A total of 168 variants occurred within 126 genes without a known disease-gene relationship. Of these, 27 genes had >1 kindred affected, and for 18 of these genes, multiple kindreds had overlapping phenotypes. Two years after initial analysis, 11 of 18 (61%) of these genes have been independently published as novel disease gene discoveries. CONCLUSION Using large cohorts and adopting gene-based approaches can rapidly and objectively accelerate dominantly inherited novel gene discovery by targeting the most appropriate genes for functional validation.
Collapse
Affiliation(s)
- Eleanor G Seaby
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom; Program in Medical and Population Genetics, Broad institute of MIT and Harvard, Boston, MA; Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA.
| | - Damian Smedley
- Genomics England, Dawson Hall, Charterhouse Square, London, EC1M 6BQ, United Kingdom
| | | | - Helen Brittain
- Genomics England, Dawson Hall, Charterhouse Square, London, EC1M 6BQ, United Kingdom
| | | | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad institute of MIT and Harvard, Boston, MA; Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad institute of MIT and Harvard, Boston, MA; Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Sarah Ennis
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | | |
Collapse
|
48
|
Ellingford JM, Ahn JW, Bagnall RD, Baralle D, Barton S, Campbell C, Downes K, Ellard S, Duff-Farrier C, FitzPatrick DR, Greally JM, Ingles J, Krishnan N, Lord J, Martin HC, Newman WG, O’Donnell-Luria A, Ramsden SC, Rehm HL, Richardson E, Singer-Berk M, Taylor JC, Williams M, Wood JC, Wright CF, Harrison SM, Whiffin N. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 2022; 14:73. [PMID: 35850704 PMCID: PMC9295495 DOI: 10.1186/s13073-022-01073-3] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/16/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. METHODS We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. RESULTS We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. CONCLUSIONS These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.
Collapse
Affiliation(s)
- Jamie M. Ellingford
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK ,grid.498322.6Genomics England, London, UK
| | - Joo Wook Ahn
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Richard D. Bagnall
- grid.1013.30000 0004 1936 834XAgnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia
| | - Diana Baralle
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK ,grid.430506.40000 0004 0465 4079Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Stephanie Barton
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Chris Campbell
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Kate Downes
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Sian Ellard
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK ,grid.419309.60000 0004 0495 6261South West Genomic Laboratory Hub, Exeter Genomic Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
| | - Celia Duff-Farrier
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - David R. FitzPatrick
- grid.417068.c0000 0004 0624 9907MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - John M. Greally
- grid.251993.50000000121791997Department of Pediatrics, Division of Pediatric Genetic, Medicine, Children’s Hospital at Montefiore/Montefiore Medical Center/Albert, Einstein College of Medicine, Bronx, NY USA
| | - Jodie Ingles
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Neesha Krishnan
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Jenny Lord
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - William G. Newman
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Anne O’Donnell-Luria
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.2515.30000 0004 0378 8438Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Simon C. Ramsden
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Heidi L. Rehm
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Ebony Richardson
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Moriel Singer-Berk
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Jenny C. Taylor
- grid.4991.50000 0004 1936 8948National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Maggie Williams
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Jordan C. Wood
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Caroline F. Wright
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Steven M. Harrison
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.465138.d0000 0004 0455 211XAmbry Genetics, Aliso Viejo, CA USA
| | - Nicola Whiffin
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| |
Collapse
|
49
|
Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022; 15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene–disease associations. We found that mouse genotype–phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper. Editor's choice: We investigated the use of model organism phenotypes in the computational identification of disease genes, identifying several data biases and concluding that mouse model phenotypes contribute most to computational disease gene identification.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
50
|
Lewis MA, Schulte BA, Dubno JR, Steel KP. Investigating the characteristics of genes and variants associated with self-reported hearing difficulty in older adults in the UK Biobank. BMC Biol 2022; 20:150. [PMID: 35761239 PMCID: PMC9238072 DOI: 10.1186/s12915-022-01349-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 06/10/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Age-related hearing loss is a common, heterogeneous disease with a strong genetic component. More than 100 loci have been reported to be involved in human hearing impairment to date, but most of the genes underlying human adult-onset hearing loss remain unknown. Most genetic studies have focussed on very rare variants (such as family studies and patient cohort screens) or very common variants (genome-wide association studies). However, the contribution of variants present in the human population at intermediate frequencies is hard to quantify using these methods, and as a result, the landscape of variation associated with adult-onset hearing loss remains largely unknown. RESULTS Here we present a study based on exome sequencing and self-reported hearing difficulty in the UK Biobank, a large-scale biomedical database. We have carried out variant load analyses using different minor allele frequency and impact filters, and compared the resulting gene lists to a manually curated list of nearly 700 genes known to be involved in hearing in humans and/or mice. An allele frequency cutoff of 0.1, combined with a high predicted variant impact, was found to be the most effective filter setting for our analysis. We also found that separating the participants by sex produced markedly different gene lists. The gene lists obtained were investigated using gene ontology annotation, functional prioritisation and expression analysis, and this identified good candidates for further study. CONCLUSIONS Our results suggest that relatively common as well as rare variants with a high predicted impact contribute to age-related hearing impairment and that the genetic contributions to adult hearing difficulty may differ between the sexes. Our manually curated list of deafness genes is a useful resource for candidate gene prioritisation in hearing loss.
Collapse
Affiliation(s)
- Morag A Lewis
- Wolfson Centre for Age-Related Diseases, King's College London, London, SE1 1UL, UK.
| | | | - Judy R Dubno
- The Medical University of South Carolina, Charleston, SC, USA
| | - Karen P Steel
- Wolfson Centre for Age-Related Diseases, King's College London, London, SE1 1UL, UK
| |
Collapse
|