1
|
Lemire G, Sanchis-Juan A, Russell K, Baxter S, Chao KR, Singer-Berk M, Groopman E, Wong I, England E, Goodrich J, Pais L, Austin-Tse C, DiTroia S, O'Heir E, Ganesh VS, Wojcik MH, Evangelista E, Snow H, Osei-Owusu I, Fu J, Singh M, Mostovoy Y, Huang S, Garimella K, Kirkham SL, Neil JE, Shao DD, Walsh CA, Argilli E, Le C, Sherr EH, Gleeson JG, Shril S, Schneider R, Hildebrandt F, Sankaran VG, Madden JA, Genetti CA, Beggs AH, Agrawal PB, Bujakowska KM, Place E, Pierce EA, Donkervoort S, Bönnemann CG, Gallacher L, Stark Z, Tan TY, White SM, Töpf A, Straub V, Fleming MD, Pollak MR, Õunap K, Pajusalu S, Donald KA, Bruwer Z, Ravenscroft G, Laing NG, MacArthur DG, Rehm HL, Talkowski ME, Brand H, O'Donnell-Luria A. Exome copy number variant detection, analysis, and classification in a large cohort of families with undiagnosed rare genetic disease. Am J Hum Genet 2024; 111:863-876. [PMID: 38565148 PMCID: PMC11080278 DOI: 10.1016/j.ajhg.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 03/09/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and, with new innovative methods, can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of the Genomics Research to Elucidate the Genetics of Rare Diseases consortium and analyzed using the seqr platform. The addition of CNV detection to exome analysis identified causal CNVs for 171 families (2.6%). The estimated sizes of CNVs ranged from 293 bp to 80 Mb. The causal CNVs consisted of 140 deletions, 15 duplications, 3 suspected complex structural variants (SVs), 3 insertions, and 10 complex SVs, the latter two groups being identified by orthogonal confirmation methods. To classify CNV variant pathogenicity, we used the 2020 American College of Medical Genetics and Genomics/ClinGen CNV interpretation standards and developed additional criteria to evaluate allelic and functional data as well as variants on the X chromosome to further advance the framework. We interpreted 151 CNVs as likely pathogenic/pathogenic and 20 CNVs as high-interest variants of uncertain significance. Calling CNVs from existing exome data increases the diagnostic yield for individuals undiagnosed after standard testing approaches, providing a higher-resolution alternative to arrays at a fraction of the cost of genome sequencing. Our improvements to the classification approach advances the systematic framework to assess the pathogenicity of CNVs.
Collapse
Affiliation(s)
- Gabrielle Lemire
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| | - Alba Sanchis-Juan
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Kathryn Russell
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samantha Baxter
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Katherine R Chao
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Emily Groopman
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Isaac Wong
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Eleina England
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Julia Goodrich
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Lynn Pais
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Christina Austin-Tse
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Stephanie DiTroia
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Emily O'Heir
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Vijay S Ganesh
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA
| | - Monica H Wojcik
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
| | - Emily Evangelista
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hana Snow
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jack Fu
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mugdha Singh
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yulia Mostovoy
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Steve Huang
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kiran Garimella
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samantha L Kirkham
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Jennifer E Neil
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA
| | - Diane D Shao
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Department of Neurology, Boston Children's Hospital, Boston, MA, USA
| | - Christopher A Walsh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA
| | - Emanuela Argilli
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Carolyn Le
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Elliott H Sherr
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Joseph G Gleeson
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA; Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Shirlee Shril
- Harvard Medical School, Boston, MA, USA; Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
| | - Ronen Schneider
- Harvard Medical School, Boston, MA, USA; Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
| | - Friedhelm Hildebrandt
- Harvard Medical School, Boston, MA, USA; Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
| | - Vijay G Sankaran
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jill A Madden
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
| | - Casie A Genetti
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
| | - Alan H Beggs
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
| | - Pankaj B Agrawal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
| | - Kinga M Bujakowska
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Emily Place
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Eric A Pierce
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Sandra Donkervoort
- Neuromuscular and Neurogenetic Disorders of Childhood Section, Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Carsten G Bönnemann
- Neuromuscular and Neurogenetic Disorders of Childhood Section, Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Lyndon Gallacher
- Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia; Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Zornitza Stark
- Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia; Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Tiong Yang Tan
- Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia; Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Susan M White
- Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia; Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Ana Töpf
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Mark D Fleming
- Harvard Medical School, Boston, MA, USA; Department of Pathology, Boston Children's Hospital, Boston, MA, USA
| | - Martin R Pollak
- Harvard Medical School, Boston, MA, USA; Division of Nephrology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Katrin Õunap
- Department of Clinical Genetics, Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia; Department of Genetics and Personalized Medicine, Institute of Clinical Medicine, Faculty of Medicine, University of Tartu, Tartu, Estonia
| | - Sander Pajusalu
- Department of Clinical Genetics, Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia; Department of Genetics and Personalized Medicine, Institute of Clinical Medicine, Faculty of Medicine, University of Tartu, Tartu, Estonia
| | - Kirsten A Donald
- Department of Paediatrics and Child Health, Red Cross War Memorial Children's Hospital, Cape Town, South Africa; Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Zandre Bruwer
- Department of Paediatrics and Child Health, Red Cross War Memorial Children's Hospital, Cape Town, South Africa; Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Gianina Ravenscroft
- University of Western Australia Centre for Medical Research, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, Australia
| | - Nigel G Laing
- University of Western Australia Centre for Medical Research, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA, Australia
| | - Daniel G MacArthur
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Centre for Population Genomics, Garvan Institute of Medical Research and UNSW, Sydney, NSW, Australia; Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, VIC, Australia
| | - Heidi L Rehm
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Michael E Talkowski
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Harrison Brand
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O'Donnell-Luria
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA.
| |
Collapse
|
2
|
Schuetz RJ, Ceyhan D, Antoniou AA, Chaudhari BP, White P. CNVoyant: A Highly Performant and Explainable Multi-Classifier Machine Learning Approach for Determining the Clinical Significance of Copy Number Variants. RESEARCH SQUARE 2024:rs.3.rs-4308324. [PMID: 38746157 PMCID: PMC11092842 DOI: 10.21203/rs.3.rs-4308324/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The precise classification of copy number variants (CNVs) presents a significant challenge in genomic medicine, primarily due to the complex nature of CNVs and their diverse impact on genetic disorders. This complexity is compounded by the limitations of existing methods in accurately distinguishing between benign, uncertain, and pathogenic CNVs. Addressing this gap, we introduce CNVoyant, a machine learning-based multi-class framework designed to enhance the clinical significance classification of CNVs. Trained on a comprehensive dataset of 52,176 ClinVar entries across pathogenic, uncertain, and benign classifications, CNVoyant incorporates a broad spectrum of genomic features, including genome position, disease-gene annotations, dosage sensitivity, and conservation scores. Models to predict the clinical significance of copy number gains and losses were trained independently. Final models were selected after testing 29 machine learning architectures and 10,000 hyperparameter combinations each for deletions and duplications via 5-fold cross-validation. We validate the performance of the CNVoyant by leveraging a comprehensive set of 21,574 CNVs from the DECIPHER database, a highly regarded resource known for its extensive catalog of chromosomal imbalances linked to clinical outcomes. Compared to alternative approaches, CNVoyant shows marked improvements in precision-recall and ROC AUC metrics for binary pathogenic classifications while going one step further, offering multi-classification of clinical significance and corresponding SHAP explainability plots. This large-scale validation demonstrates CNVoyant's superior accuracy and underscores its potential to aid genomic researchers and clinical geneticists in interpreting the clinical implications of real CNVs.
Collapse
Affiliation(s)
- Robert J Schuetz
- The Abigail Wexner Research Institute at Nationwide Children's Hospital
| | - Defne Ceyhan
- The Abigail Wexner Research Institute at Nationwide Children's Hospital
| | - Austin A Antoniou
- The Abigail Wexner Research Institute at Nationwide Children's Hospital
| | - Bimal P Chaudhari
- The Abigail Wexner Research Institute at Nationwide Children's Hospital
| | - Peter White
- The Abigail Wexner Research Institute at Nationwide Children's Hospital
| |
Collapse
|
3
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
4
|
Fazal S, Danzi MC, Xu I, Kobren SN, Sunyaev S, Reuter C, Marwaha S, Wheeler M, Dolzhenko E, Lucas F, Wuchty S, Tekin M, Züchner S, Aguiar-Pulido V. RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci. Genome Biol 2024; 25:39. [PMID: 38297326 PMCID: PMC10832122 DOI: 10.1186/s13059-024-03171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 01/10/2024] [Indexed: 02/02/2024] Open
Abstract
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
Collapse
Affiliation(s)
- Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Matt C Danzi
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Isaac Xu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | | | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA
| | - Chloe Reuter
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Shruti Marwaha
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Matthew Wheeler
- Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, 94305, USA
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Francesca Lucas
- Department of Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami, FL, USA
- Deptartment of Biology, University of Miami, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Mustafa Tekin
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA
| | - Stephan Züchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
| | | |
Collapse
|
5
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
6
|
Xu Z, Li Q, Marchionni L, Wang K. PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 2023; 14:7805. [PMID: 38016949 PMCID: PMC10684511 DOI: 10.1038/s41467-023-43651-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023] Open
Abstract
Structural variants (SVs) represent a major source of genetic variation associated with phenotypic diversity and disease susceptibility. While long-read sequencing can discover over 20,000 SVs per human genome, interpreting their functional consequences remains challenging. Existing methods for identifying disease-related SVs focus on deletion/duplication only and cannot prioritize individual genes affected by SVs, especially for noncoding SVs. Here, we introduce PhenoSV, a phenotype-aware machine-learning model that interprets all major types of SVs and genes affected. PhenoSV segments and annotates SVs with diverse genomic features and employs a transformer-based architecture to predict their impacts under a multiple-instance learning framework. With phenotype information, PhenoSV further utilizes gene-phenotype associations to prioritize phenotype-related SVs. Evaluation on extensive human SV datasets covering all SV types demonstrates PhenoSV's superior performance over competing methods. Applications in diseases suggest that PhenoSV can determine disease-related genes from SVs. A web server and a command-line tool for PhenoSV are available at https://phenosv.wglab.org .
Collapse
Affiliation(s)
- Zhuoran Xu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Quan Li
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
7
|
Lemire G, Sanchis-Juan A, Russell K, Baxter S, Chao KR, Singer-Berk M, Groopman E, Wong I, England E, Goodrich J, Pais L, Austin-Tse C, DiTroia S, O’Heir E, Ganesh VS, Wojcik MH, Evangelista E, Snow H, Osei-Owusu I, Fu J, Singh M, Mostovoy Y, Huang S, Garimella K, Kirkham SL, Neil JE, Shao DD, Walsh CA, Argili E, Le C, Sherr EH, Gleeson J, Shril S, Schneider R, Hildebrandt F, Sankaran VG, Madden JA, Genetti CA, Beggs AH, Agrawal PB, Bujakowska KM, Place E, Pierce EA, Donkervoort S, Bönnemann CG, Gallacher L, Stark Z, Tan T, White SM, Töpf A, Straub V, Fleming MD, Pollak MR, Õunap K, Pajusalu S, Donald KA, Bruwer Z, Ravenscroft G, Laing NG, MacArthur DG, Rehm HL, Talkowski ME, Brand H, O’Donnell-Luria A. Exome copy number variant detection, analysis and classification in a large cohort of families with undiagnosed rare genetic disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.05.23296595. [PMID: 37873196 PMCID: PMC10593084 DOI: 10.1101/2023.10.05.23296595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and with new innovative methods can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of the GREGoR consortium. Each family's CNV data was analyzed using the seqr platform and candidate CNVs classified using the 2020 ACMG/ClinGen CNV interpretation standards. We developed additional evidence criteria to address situations not covered by the current standards. The addition of CNV calling to exome analysis identified causal CNVs for 173 families (2.6%). The estimated sizes of CNVs ranged from 293 bp to 80 Mb with estimates that 44% would not have been detected by standard chromosomal microarrays. The causal CNVs consisted of 141 deletions, 15 duplications, 4 suspected complex structural variants (SVs), 3 insertions and 10 complex SVs, the latter two groups being identified by orthogonal validation methods. We interpreted 153 CNVs as likely pathogenic/pathogenic and 20 CNVs as high interest variants of uncertain significance. Calling CNVs from existing exome data increases the diagnostic yield for individuals undiagnosed after standard testing approaches, providing a higher resolution alternative to arrays at a fraction of the cost of genome sequencing. Our improvements to the classification approach advances the systematic framework to assess the pathogenicity of CNVs.
Collapse
Affiliation(s)
- Gabrielle Lemire
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- These authors contributed equally
| | - Alba Sanchis-Juan
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- These authors contributed equally
| | - Kathryn Russell
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samantha Baxter
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Katherine R. Chao
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Emily Groopman
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
| | - Isaac Wong
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Eleina England
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Julia Goodrich
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Lynn Pais
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Christina Austin-Tse
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Stephanie DiTroia
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Emily O’Heir
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Vijay S. Ganesh
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Monica H. Wojcik
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Emily Evangelista
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hana Snow
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jack Fu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mugdha Singh
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yulia Mostovoy
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Steve Huang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kiran Garimella
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samantha L. Kirkham
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
| | - Jennifer E. Neil
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Howard Hughes Medical Institute, Boston Children’s Hospital, Boston, MA, USA
| | - Diane D. Shao
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
| | - Christopher A. Walsh
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston Children’s Hospital, Boston, MA, USA
| | - Emanuela Argili
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Carolyn Le
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Elliott H. Sherr
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute of Human Genetics and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Joseph Gleeson
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
| | - Shirlee Shril
- Harvard Medical School, Boston, MA, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Ronen Schneider
- Harvard Medical School, Boston, MA, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Friedhelm Hildebrandt
- Harvard Medical School, Boston, MA, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Vijay G. Sankaran
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Jill A. Madden
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, USA
| | - Casie A. Genetti
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, USA
| | - Alan H. Beggs
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, USA
| | - Pankaj B. Agrawal
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Boston, MA, USA
| | - Kinga M. Bujakowska
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | - Emily Place
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | - Eric A. Pierce
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | - Sandra Donkervoort
- Neuromuscular and Neurogenetic Disorders of Childhood Section, Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Carsten G. Bönnemann
- Neuromuscular and Neurogenetic Disorders of Childhood Section, Neurogenetics Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Lyndon Gallacher
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Parkville, Victoria, Australia
| | - Zornitza Stark
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Parkville, Victoria, Australia
| | - Tiong Tan
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Parkville, Victoria, Australia
| | - Susan M. White
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Parkville, Victoria, Australia
| | - Ana Töpf
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Mark D. Fleming
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Boston Children’s Hospital, Boston, MA, USA
| | - Martin R. Pollak
- Harvard Medical School, Boston, MA, USA
- Division of Nephrology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Katrin Õunap
- Department of Clinical Genetics, Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia
- Department of Clinical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Tartu, Tartu, Estonia
| | - Sander Pajusalu
- Department of Clinical Genetics, Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia
- Department of Clinical Genetics, Institute of Clinical Medicine, Faculty of Medicine, University of Tartu, Tartu, Estonia
| | - Kirsten A. Donald
- Department of Paediatrics and Child Health, Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
- University of Cape Town, Cape Town, South Africa
| | - Zandre Bruwer
- Department of Paediatrics and Child Health, Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
- University of Cape Town, Cape Town, South Africa
| | - Gianina Ravenscroft
- University of Western Australia, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, Australia
| | - Nigel G. Laing
- University of Western Australia, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, Australia
| | - Daniel G. MacArthur
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute, Sydney, Australia
- Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Heidi L. Rehm
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Michael E. Talkowski
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Harrison Brand
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Senior authors
| | - Anne O’Donnell-Luria
- Broad Institute Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Senior authors
| |
Collapse
|
8
|
Liu Z, Huang YF. Deep multiple-instance learning accurately predicts gene haploinsufficiency and deletion pathogenicity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555384. [PMID: 37693607 PMCID: PMC10491176 DOI: 10.1101/2023.08.29.555384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Copy number losses (deletions) are a major contributor to the etiology of severe genetic disorders. Although haploinsufficient genes play a critical role in deletion pathogenicity, current methods for deletion pathogenicity prediction fail to integrate multiple lines of evidence for haploinsufficiency at the gene level, limiting their power to pinpoint deleterious deletions associated with genetic disorders. Here we introduce DosaCNV, a deep multiple-instance learning framework that, for the first time, models deletion pathogenicity jointly with gene haploinsufficiency. By integrating over 30 gene-level features potentially predictive of haploinsufficiency, DosaCNV shows unmatched performance in prioritizing pathogenic deletions associated with a broad spectrum of genetic disorders. Furthermore, DosaCNV outperforms existing methods in predicting gene haploinsufficiency even though it is not trained on known haploinsufficient genes. Finally, DosaCNV leverages a state-of-the-art technique to quantify the contributions of individual gene-level features to haploinsufficiency, allowing for human-understandable explanations of model predictions. Altogether, DosaCNV is a powerful computational tool for both fundamental and translational research.
Collapse
Affiliation(s)
- Zhihan Liu
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Molecular, Cellular, and Integrative Biosciences Program, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
9
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
10
|
Alibutud R, Hansali S, Cao X, Zhou A, Mahaganapathy V, Azaro M, Gwin C, Wilson S, Buyske S, Bartlett CW, Flax JF, Brzustowicz LM, Xing J. Structural Variations Contribute to the Genetic Etiology of Autism Spectrum Disorder and Language Impairments. Int J Mol Sci 2023; 24:13248. [PMID: 37686052 PMCID: PMC10487745 DOI: 10.3390/ijms241713248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 08/24/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by restrictive interests and/or repetitive behaviors and deficits in social interaction and communication. ASD is a multifactorial disease with a complex polygenic genetic architecture. Its genetic contributing factors are not yet fully understood, especially large structural variations (SVs). In this study, we aimed to assess the contribution of SVs, including copy number variants (CNVs), insertions, deletions, duplications, and mobile element insertions, to ASD and related language impairments in the New Jersey Language and Autism Genetics Study (NJLAGS) cohort. Within the cohort, ~77% of the families contain SVs that followed expected segregation or de novo patterns and passed our filtering criteria. These SVs affected 344 brain-expressed genes and can potentially contribute to the genetic etiology of the disorders. Gene Ontology and protein-protein interaction network analysis suggested several clusters of genes in different functional categories, such as neuronal development and histone modification machinery. Genes and biological processes identified in this study contribute to the understanding of ASD and related neurodevelopment disorders.
Collapse
Affiliation(s)
- Rohan Alibutud
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Sammy Hansali
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Anbo Zhou
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Vaidhyanathan Mahaganapathy
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Marco Azaro
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Christine Gwin
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Sherri Wilson
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Steven Buyske
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA;
| | - Christopher W. Bartlett
- The Steve Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
- Department of Pediatrics, College of Medicine, The Ohio State University, Columbus, OH 43205, USA
| | - Judy F. Flax
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
| | - Linda M. Brzustowicz
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
- The Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.A.); (S.H.); (X.C.); (A.Z.); (V.M.); (M.A.); (C.G.); (S.W.); (J.F.F.); (L.M.B.)
- The Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
11
|
Sharo AG, Zou Y, Adhikari AN, Brenner SE. ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden. Genome Med 2023; 15:51. [PMID: 37443081 PMCID: PMC10347827 DOI: 10.1186/s13073-023-01199-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 05/31/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines. METHODS Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD. RESULTS While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar's lower false-positive rate. CONCLUSIONS Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.
Collapse
Affiliation(s)
- Andrew G. Sharo
- Biophysics Graduate Group, University of California, Berkeley, CA 94720 USA
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Ecology and Evolutionary Biology, University of California, 124 Biomed Building, 1156 High St., Santa Cruz, CA 95064 USA
| | - Yangyun Zou
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
- Currently at: Department of Clinical Research, Yikon Genomics Company, Ltd., Shanghai, China
| | - Aashish N. Adhikari
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
- Currently at: Illumina, Foster City, CA 94404 USA
| | - Steven E. Brenner
- Biophysics Graduate Group, University of California, Berkeley, CA 94720 USA
- Center for Computational Biology, University of California, Berkeley, CA 94720 USA
- Department of Plant and Microbial Biology, University of California, 461 Koshland Hall, Berkeley, CA 94720 USA
| |
Collapse
|
12
|
Sládeček T, Gažiová M, Kucharík M, Zaťková A, Pös Z, Pös O, Krampl W, Tomková E, Hýblová M, Minárik G, Radvánszky J, Budiš J, Szemes T. Combination of expert guidelines-based and machine learning-based approaches leads to superior accuracy of automated prediction of clinical effect of copy number variations. Sci Rep 2023; 13:10531. [PMID: 37386017 PMCID: PMC10310736 DOI: 10.1038/s41598-023-37352-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 06/20/2023] [Indexed: 07/01/2023] Open
Abstract
Clinical interpretation of copy number variants (CNVs) is a complex process that requires skilled clinical professionals. General recommendations have been recently released to guide the CNV interpretation based on predefined criteria to uniform the decision process. Several semiautomatic computational methods have been proposed to recommend appropriate choices, relieving clinicians of tedious searching in vast genomic databases. We have developed and evaluated such a tool called MarCNV and tested it on CNV records collected from the ClinVar database. Alternatively, the emerging machine learning-based tools, such as the recently published ISV (Interpretation of Structural Variants), showed promising ways of even fully automated predictions using broader characterization of affected genomic elements. Such tools utilize features additional to ACMG criteria, thus providing supporting evidence and the potential to improve CNV classification. Since both approaches contribute to evaluation of CNVs clinical impact, we propose a combined solution in the form of a decision support tool based on automated ACMG guidelines (MarCNV) supplemented by a machine learning-based pathogenicity prediction (ISV) for the classification of CNVs. We provide evidence that such a combined approach is able to reduce the number of uncertain classifications and reveal potentially incorrect classifications using automated guidelines. CNV interpretation using MarCNV, ISV, and combined approach is available for non-commercial use at https://predict.genovisio.com/ .
Collapse
Affiliation(s)
- Tomáš Sládeček
- Geneton Ltd., Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
| | - Michaela Gažiová
- Geneton Ltd., Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
| | - Marcel Kucharík
- Geneton Ltd., Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
| | - Andrea Zaťková
- Geneton Ltd., Bratislava, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Zuzana Pös
- Geneton Ltd., Bratislava, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Ondrej Pös
- Geneton Ltd., Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
| | - Werner Krampl
- Geneton Ltd., Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
| | | | - Michaela Hýblová
- Medirex Group Academy NPO, Nitra, Slovakia
- Trisomy Ltd., Nitra, Slovakia
| | - Gabriel Minárik
- Medirex Group Academy NPO, Nitra, Slovakia
- Trisomy Ltd., Nitra, Slovakia
| | - Ján Radvánszky
- Geneton Ltd., Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Jaroslav Budiš
- Geneton Ltd., Bratislava, Slovakia.
- Comenius University Science Park, Bratislava, Slovakia.
- Slovak Center of Scientific and Technical Information, Bratislava, Slovakia.
| | - Tomáš Szemes
- Geneton Ltd., Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
- Comenius University Science Park, Bratislava, Slovakia
| |
Collapse
|
13
|
Geoffroy V, Lamouche JB, Guignard T, Nicaise S, Kress A, Scheidecker S, Le Béchec A, Muller J. The AnnotSV webserver in 2023: updated visualization and ranking. Nucleic Acids Res 2023:7175348. [PMID: 37216590 DOI: 10.1093/nar/gkad426] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 04/20/2023] [Accepted: 05/09/2023] [Indexed: 05/24/2023] Open
Abstract
Much of the human genetics variant repertoire is composed of single nucleotide variants (SNV) and small insertion/deletions (indel) but structural variants (SV) remain a major part of our modified DNA. SV detection has often been a complex question to answer either because of the necessity to use different technologies (array CGH, SNP array, Karyotype, Optical Genome Mapping…) to detect each category of SV or to get an appropriate resolution (Whole Genome Sequencing). Thanks to the deluge of pangenomic analysis, Human geneticists are accumulating SV and their interpretation remains time consuming and challenging. The AnnotSV webserver (https://www.lbgi.fr/AnnotSV/) aims at being an efficient tool to (i) annotate and interpret SV potential pathogenicity in the context of human diseases, (ii) recognize potential false positive variants from all the SV identified and (iii) visualize the patient variants repertoire. The most recent developments in the AnnotSV webserver are: (i) updated annotations sources and ranking, (ii) three novel output formats to allow diverse utilization (analysis, pipelines), as well as (iii) two novel user interfaces including an interactive circos view.
Collapse
Affiliation(s)
- Véronique Geoffroy
- Université de Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
| | - Jean-Baptiste Lamouche
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | | | - Samuel Nicaise
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics, ICube, UMR 7357, University of Strasbourg, CNRS, FMTS, Strasbourg, France
| | - Sophie Scheidecker
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Antony Le Béchec
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jean Muller
- Laboratoire de Génétique Médicale, UMR 1112, INSERM, IGMA, Université de Strasbourg, Strasbourg, France
- Unité Fonctionnelle de Bioinformatique Médicale appliquée au diagnostic (UF7363), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
- Laboratoires de Diagnostic Génétique, IGMA, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| |
Collapse
|
14
|
Lv K, Chen D, Xiong D, Tang H, Ou T, Kan L, Zhang X. dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations. BMC Genomics 2023; 24:131. [PMID: 36941551 PMCID: PMC10029177 DOI: 10.1186/s12864-023-09225-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 03/06/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Copy number variation (CNV) is a type of structural variation, which is a gain or loss event with abnormal changes in copy number. Methods to predict the pathogenicity of CNVs are required to realize the relationship between these variants and clinical phenotypes. ClassifyCNV, X-CNV, StrVCTVRE, etc. have been trained to predict the pathogenicity of CNVs, but few studies have been reported based on the deleterious significance of features. RESULTS From single nucleotide polymorphism (SNP), gene and region dimensions, we collected 79 informative features that quantitatively describe the characteristics of CNV, such as CNV length, the number of protein genes, the number of three prime untranslated region. Then, according to the deleterious significance, we formulated quantitative methods for features, which fall into two categories: the first is variable type, including maximum, minimum and mean; the second is attribute type, which is measured by numerical sum. We used Gradient Boosted Trees (GBT) algorithm to construct dbCNV, which can be used to predict pathogenicity for five-tier classification and binary classification of CNVs. We demonstrated that the distribution of most feature values was consistent with the deleterious significance. The five-tier classification model accuracy for 0.85 and 0.79 in loss and gain CNVs, which proved that it has high discrimination power in predicting the pathogenicity of five-tier classification CNVs. The binary model achieved area under curve (AUC) values of 0.96 and 0.81 in the validation set, respectively, in gain and loss CNVs. CONCLUSION The performance of the dbCNV suggest that functional deleteriousness-based model of CNV is a promising approach to support the classification prediction and to further understand the pathogenic mechanism.
Collapse
Affiliation(s)
- Kangqi Lv
- Xinxiang Medical University, 453003, Xinxiang, China
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Dayang Chen
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Dan Xiong
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Huamei Tang
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Tong Ou
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| | - Lijuan Kan
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China.
| | - Xiuming Zhang
- Xinxiang Medical University, 453003, Xinxiang, China
- Medical Laboratory of the Third Affiliated Hospital of Shenzhen University, No. 47 of Youyi Road, 518001, Shenzhen City, Guangdong Province, China
| |
Collapse
|
15
|
Stoltze UK, Hagen CM, van Overeem Hansen T, Byrjalsen A, Gerdes AM, Yakimov V, Rasmussen S, Bækvad-Hansen M, Hougaard DM, Schmiegelow K, Hjalgrim H, Wadt K, Bybjerg-Grauholm J. Combinatorial batching of DNA for ultralow-cost detection of pathogenic variants. Genome Med 2023; 15:17. [PMID: 36918911 PMCID: PMC10013285 DOI: 10.1186/s13073-023-01167-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 02/28/2023] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) based population screening holds great promise for disease prevention and earlier diagnosis, but the costs associated with screening millions of humans remain prohibitive. New methods for population genetic testing that lower the costs of NGS without compromising diagnostic power are needed. METHODS We developed double batched sequencing where DNA samples are batch-sequenced twice - directly pinpointing individuals with rare variants. We sequenced batches of at-birth blood spot DNA using a commercial 113-gene panel in an explorative (n = 100) and a validation (n = 100) cohort of children who went on to develop pediatric cancers. All results were benchmarked against individual whole genome sequencing data. RESULTS We demonstrated fully replicable detection of cancer-causing germline variants, with positive and negative predictive values of 100% (95% CI, 0.91-1.00 and 95% CI, 0.98-1.00, respectively). Pathogenic and clinically actionable variants were detected in RB1, TP53, BRCA2, APC, and 19 other genes. Analyses of larger batches indicated that our approach is highly scalable, yielding more than 95% cost reduction or less than 3 cents per gene screened for rare disease-causing mutations. We also show that double batched sequencing could cost-effectively prevent childhood cancer deaths through broad genomic testing. CONCLUSIONS Our ultracheap genetic diagnostic method, which uses existing sequencing hardware and standard newborn blood spots, should readily open up opportunities for population-wide risk stratification using genetic screening across many fields of clinical genetics and genomics.
Collapse
Affiliation(s)
- Ulrik Kristoffer Stoltze
- Department of Pediatrics and Adolescent Medicine, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark. .,Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark.
| | - Christian Munch Hagen
- Department of Congenital Disorders, Statens Serum Institute, 2300, KBH S, Artillerivej 5, Denmark
| | - Thomas van Overeem Hansen
- Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark.,Department of Clinical Medicine, Copenhagen University, Blegdamsvej 3B, 2200, KBH N, Denmark
| | - Anna Byrjalsen
- Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark
| | - Anne-Marie Gerdes
- Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark
| | - Victor Yakimov
- Department of Congenital Disorders, Statens Serum Institute, 2300, KBH S, Artillerivej 5, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Copenhagen University, Blegdamsvej 3B, 2200, KBH N, Denmark
| | - Marie Bækvad-Hansen
- Department of Congenital Disorders, Statens Serum Institute, 2300, KBH S, Artillerivej 5, Denmark
| | - David Michael Hougaard
- Department of Congenital Disorders, Statens Serum Institute, 2300, KBH S, Artillerivej 5, Denmark
| | - Kjeld Schmiegelow
- Department of Pediatrics and Adolescent Medicine, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark.,Department of Clinical Medicine, Copenhagen University, Blegdamsvej 3B, 2200, KBH N, Denmark
| | - Henrik Hjalgrim
- Department of Clinical Medicine, Copenhagen University, Blegdamsvej 3B, 2200, KBH N, Denmark.,Danish Cancer Society Research Centre, Danish Cancer Society, Strandboulevarden 49, 2100, KBH Ø, Denmark.,Department of Epidemiology Research, Statens Serum Institut, 2300, KBH S, Artillerivej 5, Denmark.,Department of Haematology, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen Ø, Denmark
| | - Karin Wadt
- Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, 2100, KBH Ø, Denmark
| | - Jonas Bybjerg-Grauholm
- Department of Congenital Disorders, Statens Serum Institute, 2300, KBH S, Artillerivej 5, Denmark.
| |
Collapse
|
16
|
Agarwal I, Fuller ZL, Myers SR, Przeworski M. Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. eLife 2023; 12:83172. [PMID: 36648429 PMCID: PMC9937649 DOI: 10.7554/elife.83172] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
Causal loss-of-function (LOF) variants for Mendelian and severe complex diseases are enriched in 'mutation intolerant' genes. We show how such observations can be interpreted in light of a model of mutation-selection balance and use the model to relate the pathogenic consequences of LOF mutations at present to their evolutionary fitness effects. To this end, we first infer posterior distributions for the fitness costs of LOF mutations in 17,318 autosomal and 679 X-linked genes from exome sequences in 56,855 individuals. Estimated fitness costs for the loss of a gene copy are typically above 1%; they tend to be largest for X-linked genes, whether or not they have a Y homolog, followed by autosomal genes and genes in the pseudoautosomal region. We compare inferred fitness effects for all possible de novo LOF mutations to those of de novo mutations identified in individuals diagnosed with one of six severe, complex diseases or developmental disorders. Probands carry an excess of mutations with estimated fitness effects above 10%; as we show by simulation, when sampled in the population, such highly deleterious mutations are typically only a couple of generations old. Moreover, the proportion of highly deleterious mutations carried by probands reflects the typical age of onset of the disease. The study design also has a discernible influence: a greater proportion of highly deleterious mutations is detected in pedigree than case-control studies, and for autism, in simplex than multiplex families and in female versus male probands. Thus, anchoring observations in human genetics to a population genetic model allows us to learn about the fitness effects of mutations identified by different mapping strategies and for different traits.
Collapse
Affiliation(s)
- Ipsita Agarwal
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Statistics, University of OxfordOxfordUnited Kingdom
| | - Zachary L Fuller
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
| | - Simon R Myers
- Department of Statistics, University of OxfordOxfordUnited Kingdom
- The Wellcome Centre for Human Genetics, University of OxfordOxfordUnited Kingdom
| | - Molly Przeworski
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
| |
Collapse
|
17
|
Nicholas TJ, Cormier MJ, Quinlan AR. Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate. BMC Bioinformatics 2022; 23:490. [PMID: 36384437 PMCID: PMC9670370 DOI: 10.1186/s12859-022-05008-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 10/25/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.
Collapse
Affiliation(s)
- Thomas J. Nicholas
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112 USA
| | - Michael J. Cormier
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112 USA
| | - Aaron R. Quinlan
- grid.223827.e0000 0001 2193 0096Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112 USA ,grid.223827.e0000 0001 2193 0096Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112 USA
| |
Collapse
|
18
|
Kuksa PP, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H, Cheng PL, Lee WP, Wang LS, Leung YY. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum Mol Genet 2022; 31:R62-R72. [PMID: 35943817 PMCID: PMC9585666 DOI: 10.1093/hmg/ddac191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emily Greenfest-Allen
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey Cifello
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matei Ionita
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hui Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Po-Liang Cheng
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
19
|
Mendoza-Alvarez A, Tosco-Herrera E, Muñoz-Barrera A, Rubio-Rodríguez LA, Alonso-Gonzalez A, Corrales A, Iñigo-Campos A, Almeida-Quintana L, Martin-Fernandez E, Martinez-Beltran D, Perez-Rodriguez E, Callero A, Garcia-Robaina JC, González-Montelongo R, Marcelino-Rodriguez I, Lorenzo-Salazar JM, Flores C. A catalog of the genetic causes of hereditary angioedema in the Canary Islands (Spain). Front Immunol 2022; 13:997148. [PMID: 36203598 PMCID: PMC9531158 DOI: 10.3389/fimmu.2022.997148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/23/2022] [Indexed: 11/13/2022] Open
Abstract
Hereditary angioedema (HAE) is a rare disease where known causes involve C1 inhibitor dysfunction or dysregulation of the kinin cascade. The updated HAE management guidelines recommend performing genetic tests to reach a precise diagnosis. Unfortunately, genetic tests are still uncommon in the diagnosis routine. Here, we characterized for the first time the genetic causes of HAE in affected families from the Canary Islands (Spain). Whole-exome sequencing data was obtained from 41 affected patients and unaffected relatives from 29 unrelated families identified in the archipelago. The Hereditary Angioedema Database Annotation (HADA) tool was used for pathogenicity classification and causal variant prioritization among the genes known to cause HAE. Manual reclassification of prioritized variants was used in those families lacking known causal variants. We detected a total of eight different variants causing HAE in this patient series, affecting essentially SERPING1 and F12 genes, one of them being a novel SERPING1 variant (c.686-12A>G) with a predicted splicing effect which was reclassified as likely pathogenic in one family. Altogether, the diagnostic yield by assessing previously reported causal genes and considering variant reclassifications according to the American College of Medical Genetics guidelines reached 66.7% (95% Confidence Interval [CI]: 30.1-91.0) in families with more than one affected member and 10.0% (95% CI: 1.8-33.1) among cases without family information for the disease. Despite the genetic causes of many patients remain to be identified, our results reinforce the need of genetic tests as first-tier diagnostic tool in this disease, as recommended by the international WAO/EAACI guidelines for the management of HAE.
Collapse
Affiliation(s)
| | - Eva Tosco-Herrera
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | - Adrian Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables, Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables, Santa Cruz de Tenerife, Spain
| | - Aitana Alonso-Gonzalez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
- Universidad de Santiago de Compostela, Santiago de Compostela, Spain
| | - Almudena Corrales
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Antonio Iñigo-Campos
- Genomics Division, Instituto Tecnológico y de Energías Renovables, Santa Cruz de Tenerife, Spain
| | - Lourdes Almeida-Quintana
- Allergy Service, Hospital Universitario de Gran Canaria Dr. Negrín, Las Palmas de Gran Canaria, Spain
| | - Elena Martin-Fernandez
- Allergy Service, Hospital Universitario Dr. Molina Orosa, Las Palmas de Gran Canaria, Spain
| | - Dara Martinez-Beltran
- Allergy Service, Hospital Universitario Insular-Materno Infantil, Las Palmas de Gran Canaria, Spain
| | - Eva Perez-Rodriguez
- Allergy Service, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | - Ariel Callero
- Allergy Service, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | - Jose C. Garcia-Robaina
- Allergy Service, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
| | | | - Itahisa Marcelino-Rodriguez
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
- Public Health and Preventive Medicine Area, Universidad de La Laguna, Santa Cruz de Tenerife, Spain
| | - Jose M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables, Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Santa Cruz de Tenerife, Spain
- Genomics Division, Instituto Tecnológico y de Energías Renovables, Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando Pessoa Canarias, Las Palmas de Gran Canaria, Spain
- *Correspondence: Carlos Flores,
| |
Collapse
|
20
|
Koczwara KE, Lake NJ, DeSimone AM, Lek M. Neuromuscular disorders: finding the missing genetic diagnoses. Trends Genet 2022; 38:956-971. [PMID: 35908999 DOI: 10.1016/j.tig.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 07/01/2022] [Accepted: 07/04/2022] [Indexed: 11/24/2022]
Abstract
Neuromuscular disorders (NMDs) are a wide-ranging group of diseases that seriously affect the quality of life of affected individuals. The development of next-generation sequencing revolutionized the diagnosis of NMD, enabling the discovery of hundreds of NMD genes and many more pathogenic variants. However, the diagnostic yield of genetic testing in NMD cohorts remains incomplete, indicating a large number of genetic diagnoses are not identified through current methods. Fortunately, recent advancements in sequencing technologies, analytical tools, and high-throughput functional screening provide an opportunity to circumvent current challenges. Here, we discuss reasons for missing genetic diagnoses in NMD, how emerging technologies and tools can overcome these hurdles, and examine future approaches to improving diagnostic yields in NMD.
Collapse
Affiliation(s)
- Katherine E Koczwara
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Nicole J Lake
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Alec M DeSimone
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Monkol Lek
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA.
| |
Collapse
|
21
|
New Developments and Possibilities in Reanalysis and Reinterpretation of Whole Exome Sequencing Datasets for Unsolved Rare Diseases Using Machine Learning Approaches. Int J Mol Sci 2022; 23:ijms23126792. [PMID: 35743235 PMCID: PMC9224427 DOI: 10.3390/ijms23126792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 11/21/2022] Open
Abstract
Rare diseases impact the lives of 300 million people in the world. Rapid advances in bioinformatics and genomic technologies have enabled the discovery of causes of 20–30% of rare diseases. However, most rare diseases have remained as unsolved enigmas to date. Newer tools and availability of high throughput sequencing data have enabled the reanalysis of previously undiagnosed patients. In this review, we have systematically compiled the latest developments in the discovery of the genetic causes of rare diseases using machine learning methods. Importantly, we have detailed methods available to reanalyze existing whole exome sequencing data of unsolved rare diseases. We have identified different reanalysis methodologies to solve problems associated with sequence alterations/mutations, variation re-annotation, protein stability, splice isoform malfunctions and oligogenic analysis. In addition, we give an overview of new developments in the field of rare disease research using whole genome sequencing data and other omics.
Collapse
|
22
|
Spielmann M, Kircher M. Computational and experimental methods for classifying variants of unknown clinical significance. Cold Spring Harb Mol Case Stud 2022; 8:mcs.a006196. [PMID: 35483875 PMCID: PMC9059783 DOI: 10.1101/mcs.a006196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The increase in sequencing capacity, reduction in costs, and national and international coordinated efforts have led to the widespread introduction of next-generation sequencing (NGS) technologies in patient care. More generally, human genetics and genomic medicine are gaining importance for more and more patients. Some communities are already discussing the prospect of sequencing each individual's genome at time of birth. Together with digital health records, this shall enable individualized treatments and preventive measures, so-called precision medicine. A central step in this process is the identification of disease causal mutations or variant combinations that make us more susceptible for diseases. Although various technological advances have improved the identification of genetic alterations, the interpretation and ranking of the identified variants remains a major challenge. Based on our knowledge of molecular processes or previously identified disease variants, we can identify potentially functional genetic variants and, using different lines of evidence, we are sometimes able to demonstrate their pathogenicity directly. However, the vast majority of variants are classified as variants of uncertain clinical significance (VUSs) with not enough experimental evidence to determine their pathogenicity. In these cases, computational methods may be used to improve the prioritization and an increasing toolbox of experimental methods is emerging that can be used to assay the molecular effects of VUSs. Here, we discuss how computational and experimental methods can be used to create catalogs of variant effects for a variety of molecular and cellular phenotypes. We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity predictions for clinical applications.
Collapse
Affiliation(s)
- Malte Spielmann
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Institute of Human Genetics, Christian-Albrechts-Universität, 24105 Kiel, Germany;,Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Martin Kircher
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Berlin Institute of Health at Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Berlin, 10115 Berlin, Germany
| |
Collapse
|
23
|
A framework to score the effects of structural variants in health and disease. Genome Res 2022; 32:766-777. [PMID: 35197310 PMCID: PMC8997355 DOI: 10.1101/gr.275995.121] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/22/2022] [Indexed: 11/25/2022]
Abstract
While technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training-objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human and chimpanzee derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.
Collapse
|