1
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
2
|
Caporali L, Fiorini C, Palombo F, Romagnoli M, Baccari F, Zenesini C, Visconti P, Posar A, Scaduto MC, Ormanbekova D, Battaglia A, Tancredi R, Cameli C, Viggiano M, Olivieri A, Torroni A, Maestrini E, Rochat MJ, Bacchelli E, Carelli V, Maresca A. Dissecting the multifaceted contribution of the mitochondrial genome to autism spectrum disorder. Front Genet 2022; 13:953762. [PMID: 36419830 PMCID: PMC9676943 DOI: 10.3389/fgene.2022.953762] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 10/12/2022] [Indexed: 11/15/2023] Open
Abstract
Autism spectrum disorder (ASD) is a clinically heterogeneous class of neurodevelopmental conditions with a strong, albeit complex, genetic basis. The genetic architecture of ASD includes different genetic models, from monogenic transmission at one end, to polygenic risk given by thousands of common variants with small effects at the other end. The mitochondrial DNA (mtDNA) was also proposed as a genetic modifier for ASD, mostly focusing on maternal mtDNA, since the paternal mitogenome is not transmitted to offspring. We extensively studied the potential contribution of mtDNA in ASD pathogenesis and risk through deep next generation sequencing and quantitative PCR in a cohort of 98 families. While the maternally-inherited mtDNA did not seem to predispose to ASD, neither for haplogroups nor for the presence of pathogenic mutations, an unexpected influence of paternal mtDNA, apparently centered on haplogroup U, came from the Italian families extrapolated from the test cohort (n = 74) when compared to the control population. However, this result was not replicated in an independent Italian cohort of 127 families and it is likely due to the elevated paternal age at time of conception. In addition, ASD probands showed a reduced mtDNA content when compared to their unaffected siblings. Multivariable regression analyses indicated that variants with 15%-5% heteroplasmy in probands are associated to a greater severity of ASD based on ADOS-2 criteria, whereas paternal super-haplogroups H and JT were associated with milder phenotypes. In conclusion, our results suggest that the mtDNA impacts on ASD, significantly modifying the phenotypic expression in the Italian population. The unexpected finding of protection induced by paternal mitogenome in term of severity may derive from a role of mtDNA in influencing the accumulation of nuclear de novo mutations or epigenetic alterations in fathers' germinal cells, affecting the neurodevelopment in the offspring. This result remains preliminary and needs further confirmation in independent cohorts of larger size. If confirmed, it potentially opens a different perspective on how paternal non-inherited mtDNA may predispose or modulate other complex diseases.
Collapse
Affiliation(s)
- Leonardo Caporali
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| | - Claudio Fiorini
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| | - Flavia Palombo
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| | - Martina Romagnoli
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| | - Flavia Baccari
- IRCCS Istituto delle Scienze Neurologiche di Bologna, UOSI Epidemiologia e Statistica, Bologna, Italy
| | - Corrado Zenesini
- IRCCS Istituto delle Scienze Neurologiche di Bologna, UOSI Epidemiologia e Statistica, Bologna, Italy
| | - Paola Visconti
- IRCCS Istituto delle Scienze Neurologiche di Bologna, UOSI Disturbi dello Spettro Autistico, Bologna, Italy
| | - Annio Posar
- IRCCS Istituto delle Scienze Neurologiche di Bologna, UOSI Disturbi dello Spettro Autistico, Bologna, Italy
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Maria Cristina Scaduto
- IRCCS Istituto delle Scienze Neurologiche di Bologna, UOSI Disturbi dello Spettro Autistico, Bologna, Italy
| | - Danara Ormanbekova
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| | - Agatino Battaglia
- IRCCS Stella Maris Foundation, Department of Developmental Neuroscience, Pisa, Italy
| | - Raffaella Tancredi
- IRCCS Stella Maris Foundation, Department of Developmental Neuroscience, Pisa, Italy
| | - Cinzia Cameli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Marta Viggiano
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Anna Olivieri
- Department of Biology and Biotechnology “L. Spallanzani”, University of Pavia, Pavia, Italy
| | - Antonio Torroni
- Department of Biology and Biotechnology “L. Spallanzani”, University of Pavia, Pavia, Italy
| | - Elena Maestrini
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Magali Jane Rochat
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma Diagnostica Funzionale Neuroradiologica, Bologna, Italy
| | - Elena Bacchelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Valerio Carelli
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Alessandra Maresca
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, Italy
| |
Collapse
|
3
|
Markello C, Huang C, Rodriguez A, Carroll A, Chang PC, Eizenga J, Markello T, Haussler D, Paten B. A complete pedigree-based graph workflow for rare candidate variant analysis. Genome Res 2022; 32:893-903. [PMID: 35483961 PMCID: PMC9104704 DOI: 10.1101/gr.276387.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/24/2022] [Indexed: 11/24/2022]
Abstract
Methods that use a linear genome reference for genome sequencing data analysis are reference-biased. In the field of clinical genetics for rare diseases, a resulting reduction in genotyping accuracy in some regions has likely prevented the resolution of some cases. Pangenome graphs embed population variation into a reference structure. Although pangenome graphs have helped to reduce reference mapping bias, further performance improvements are possible. We introduce VG-Pedigree, a pedigree-aware workflow based on the pangenome-mapping tool of Giraffe and the variant calling tool DeepTrio using a specially trained model for Giraffe-based alignments. We demonstrate mapping and variant calling improvements in both single-nucleotide variants (SNVs) and insertion and deletion (indel) variants over those produced by alignments created using BWA-MEM to a linear-reference and Giraffe mapping to a pangenome graph containing data from the 1000 Genomes Project. We have also adapted and upgraded deleterious-variant (DV) detecting methods and programs into a streamlined workflow. We used these workflows in combination to detect small lists of candidate DVs among 15 family quartets and quintets of the Undiagnosed Diseases Program (UDP). All candidate DVs that were previously diagnosed using the Mendelian models covered by the previously published methods were recapitulated by these workflows. The results of these experiments indicate that a slightly greater absolute count of DVs are detected in the proband population than in their matched unaffected siblings.
Collapse
Affiliation(s)
- Charles Markello
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| | - Charles Huang
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Alex Rodriguez
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Andrew Carroll
- Google Incorporated, Mountain View, California 94043, USA
| | - Pi-Chuan Chang
- Google Incorporated, Mountain View, California 94043, USA
| | - Jordan Eizenga
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| | - Thomas Markello
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| |
Collapse
|
4
|
Lähnemann D, Köster J, Fischer U, Borkhardt A, McHardy AC, Schönhuth A. Accurate and scalable variant calling from single cell DNA sequencing data with ProSolo. Nat Commun 2021; 12:6744. [PMID: 34795237 PMCID: PMC8602313 DOI: 10.1038/s41467-021-26938-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 10/22/2021] [Indexed: 01/14/2023] Open
Abstract
Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable-because computationally efficient-manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo.
Collapse
Affiliation(s)
- David Lähnemann
- grid.7490.a0000 0001 2238 295XDepartment for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany ,grid.6738.a0000 0001 1090 0254Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany ,grid.411327.20000 0001 2176 9917Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany ,grid.14778.3d0000 0000 8922 7789Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany ,grid.5718.b0000 0001 2187 5445Algorithms for Reproducible Bioinformatics, Institute of Human Genetics, University of Duisburg-Essen, 45147 Essen, Germany
| | - Johannes Köster
- grid.5718.b0000 0001 2187 5445Algorithms for Reproducible Bioinformatics, Institute of Human Genetics, University of Duisburg-Essen, 45147 Essen, Germany ,grid.6054.70000 0004 0369 4183Genome Data Science, Life Sciences Group, Centrum Wiskunde & Informatica, 1098 XG Amsterdam, The Netherlands
| | - Ute Fischer
- grid.14778.3d0000 0000 8922 7789Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Arndt Borkhardt
- grid.14778.3d0000 0000 8922 7789Department of Paediatric Oncology, Haematology and Immunology, University Hospital, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Alice C. McHardy
- grid.7490.a0000 0001 2238 295XDepartment for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany ,grid.6738.a0000 0001 1090 0254Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany ,grid.411327.20000 0001 2176 9917Algorithmic Bioinformatics, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Alexander Schönhuth
- Genome Data Science, Life Sciences Group, Centrum Wiskunde & Informatica, 1098 XG, Amsterdam, The Netherlands. .,Genome Data Science, Faculty of Technology, Bielefeld University, 33615, Bielefeld, Germany.
| |
Collapse
|
5
|
Rajabli F, Feliciano-Astacio BE, Cukier HN, Wang L, Griswold AJ, Hamilton-Nelson KL, Adams LD, Rodriguez VC, Mena PR, Tejada S, Celis K, Whitehead PL, Van Booven DJ, Hofmann NK, Bussies PL, Prough M, Chinea A, Feliciano NI, Vardarajan BN, Reitz C, Lee JH, Prince MJ, Jimenez IZ, Mayeux RP, Acosta H, Dalgard CL, Haines JL, Vance JM, Cuccaro ML, Beecham GW, Pericak-Vance MA. Linkage of Alzheimer disease families with Puerto Rican ancestry identifies a chromosome 9 locus. Neurobiol Aging 2021; 104:115.e1-115.e7. [PMID: 33902942 DOI: 10.1016/j.neurobiolaging.2021.02.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/08/2021] [Accepted: 02/23/2021] [Indexed: 12/30/2022]
Abstract
The genetic admixture of Caribbean Hispanics provides an opportunity to discover novel genetic factors in Alzheimer disease (AD). We sought to identify genetic variants for AD through a family-based design using the Puerto Rican (PR) Alzheimer Disease Initiative (PRADI). Whole-genome sequencing (WGS) and parametric linkage analysis were performed for 100 individuals from 23 multiplex PRADI families. Variants were prioritized by minor allele frequency (<0.01), functional potential [combined annotation dependent depletion score (CADD) >10], and co-segregation with AD. Variants were further ranked using an independent PR case-control WGS dataset (PR10/66). A genome-wide significant linkage peak was found in 9p21 with a heterogeneity logarithm of the odds score (HLOD) >5.1, which overlaps with an AD linkage region from two published independent studies. The region harbors C9orf72, but no expanded repeats were observed in the families. Seven variants prioritized by the PRADI families also displayed evidence for association in the PR10/66 (p < 0.05), including a missense variant in UNC13B. Our study demonstrated the importance of family-based design and WGS in genetic study of AD.
Collapse
Affiliation(s)
- Farid Rajabli
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | - Holly N Cukier
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Liyong Wang
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Anthony J Griswold
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Larry D Adams
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Vanessa C Rodriguez
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Pedro R Mena
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Sergio Tejada
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Katrina Celis
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Patrice L Whitehead
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Derek J Van Booven
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Natalia K Hofmann
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Parker L Bussies
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Michael Prough
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Angel Chinea
- Universidad Central del Caribe, Bayamón, PR, USA
| | - Nereida I Feliciano
- Hospital De Psiquiatría Estatal Dr. Ramón Fernández Marina-Centro Médico, San Juan, PR, USA
| | - Badri N Vardarajan
- Departments of Neurology, Psychiatry, and Epidemiology, Gertrude H. Sergievsky Center, Taub Institute for Research on the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Christiane Reitz
- Departments of Neurology, Psychiatry, and Epidemiology, Gertrude H. Sergievsky Center, Taub Institute for Research on the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Joseph H Lee
- Departments of Neurology, Psychiatry, and Epidemiology, Gertrude H. Sergievsky Center, Taub Institute for Research on the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Martin J Prince
- Department of Epidemiological Psychiatry, Centre for Public Mental Health, Institute of Psychiatry, King's College, London, UK
| | | | - Richard P Mayeux
- Departments of Neurology, Psychiatry, and Epidemiology, Gertrude H. Sergievsky Center, Taub Institute for Research on the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | | | - Clifton L Dalgard
- Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Jonathan L Haines
- Department of Population & Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Jeffery M Vance
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Michael L Cuccaro
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Gary W Beecham
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Margaret A Pericak-Vance
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Department of Neurology, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
| |
Collapse
|
6
|
Venkataraman GR, Rivas MA. Rare and common variant discovery in complex disease: the IBD case study. Hum Mol Genet 2020; 28:R162-R169. [PMID: 31363759 DOI: 10.1093/hmg/ddz189] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 07/24/2019] [Accepted: 07/25/2019] [Indexed: 12/15/2022] Open
Abstract
Complex diseases such as inflammatory bowel disease (IBD), which consists of ulcerative colitis and Crohn's disease, are a significant medical burden-70 000 new cases of IBD are diagnosed in the United States annually. In this review, we examine the history of genetic variant discovery in complex disease with a focus on IBD. We cover methods that have been applied to microsatellite, common variant, targeted resequencing and whole-exome and -genome data, specifically focusing on the progression of technologies towards rare-variant discovery. The inception of these methods combined with better availability of population level variation data has led to rapid discovery of IBD-causative and/or -associated variants at over 200 loci; over time, these methods have grown exponentially in both power and ascertainment to detect rare variation. We highlight rare-variant discoveries critical to the elucidation of the pathogenesis of IBD, including those in NOD2, IL23R, CARD9, RNF186 and ADCY7. We additionally identify the major areas of rare-variant discovery that will evolve in the coming years. A better understanding of the genetic basis of IBD and other complex diseases will lead to improved diagnosis, prognosis, treatment and surveillance.
Collapse
Affiliation(s)
- Guhan R Venkataraman
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| | - Manuel A Rivas
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
| |
Collapse
|
7
|
Mohanty AK, Vuzman D, Francioli L, Cassa C, Toth-Petroczy A, Sunyaev S. novoCaller: a Bayesian network approach for de novo variant calling from pedigree and population sequence data. Bioinformatics 2020; 35:1174-1180. [PMID: 30169785 PMCID: PMC6449753 DOI: 10.1093/bioinformatics/bty749] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/19/2018] [Accepted: 08/29/2018] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION De novo mutations (i.e. newly occurring mutations) are a pre-dominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation studies also inform population genetics models and shed light on the biology of DNA replication and repair. Despite the broad interest, there is room for improvement with regard to the accuracy of de novo mutation calling. RESULTS We designed novoCaller, a Bayesian variant calling algorithm that uses information from read-level data both in the pedigree and in unrelated samples. The method was extensively tested using large trio-sequencing studies, and it consistently achieved over 97% sensitivity. We applied the algorithm to 48 trio cases of suspected rare Mendelian disorders as part of the Brigham Genomic Medicine gene discovery initiative. Its application resulted in a significant reduction in the resources required for manual inspection and experimental validation of the calls. Three de novo variants were found in known genes associated with rare disorders, leading to rapid genetic diagnosis of the probands. Another 14 variants were found in genes that are likely to explain the phenotype, and could lead to novel disease-gene discovery. AVAILABILITY AND IMPLEMENTATION Source code implemented in C++ and Python can be downloaded from https://github.com/bgm-cwg/novoCaller. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anwoy Kumar Mohanty
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dana Vuzman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Laurent Francioli
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | | | - Agnes Toth-Petroczy
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
8
|
Yun Y, Hong SA, Kim KK, Baek D, Lee D, Londhe AM, Lee M, Yu J, McEachin ZT, Bassell GJ, Bowser R, Hales CM, Cho SR, Kim J, Pae AN, Cheong E, Kim S, Boulis NM, Bae S, Ha Y. CRISPR-mediated gene correction links the ATP7A M1311V mutations with amyotrophic lateral sclerosis pathogenesis in one individual. Commun Biol 2020; 3:33. [PMID: 31959876 PMCID: PMC6970999 DOI: 10.1038/s42003-020-0755-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 12/17/2019] [Indexed: 12/11/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a severe disease causing motor neuron death, but a complete cure has not been developed and related genes have not been defined in more than 80% of cases. Here we compared whole genome sequencing results from a male ALS patient and his healthy parents to identify relevant variants, and chose one variant in the X-linked ATP7A gene, M1311V, as a strong disease-linked candidate after profound examination. Although this variant is not rare in the Ashkenazi Jewish population according to results in the genome aggregation database (gnomAD), CRISPR-mediated gene correction of this mutation in patient-derived and re-differentiated motor neurons drastically rescued neuronal activities and functions. These results suggest that the ATP7A M1311V mutation has a potential responsibility for ALS in this patient and might be a potential therapeutic target, revealed here by a personalized medicine strategy.
Collapse
Affiliation(s)
- Yeomin Yun
- Department of Neurosurgery, Spine & Spinal Cord Institute, College of Medicine, Yonsei University, Seoul, 03722, South Korea
- Brain Korea 21 PLUS Project for Medical Science, College of Medicine, Yonsei University, Seoul, 03722, South Korea
| | - Sung-Ah Hong
- Department of Chemistry, Hanyang University, Seoul, 04763, South Korea
- Research Institute for Natural Sciences, Hanyang University, Seoul, 04763, South Korea
| | - Ka-Kyung Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, South Korea
| | - Daye Baek
- Department of Neurosurgery, Spine & Spinal Cord Institute, College of Medicine, Yonsei University, Seoul, 03722, South Korea
- Brain Korea 21 PLUS Project for Medical Science, College of Medicine, Yonsei University, Seoul, 03722, South Korea
| | - Dongsu Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, 03722, South Korea
| | - Ashwini M Londhe
- Convergence Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology, PO Box 131, Cheongryang, Seoul, 130-650, South Korea
- Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul, 02792, South Korea
| | - Minhyung Lee
- Stem Cell Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, South Korea
- Department of Functional Genomics, KRIBB School of Bioscience, Korea University of Science and Technology, Daejeon, 34113, South Korea
| | - Jihyeon Yu
- Department of Chemistry, Hanyang University, Seoul, 04763, South Korea
| | - Zachary T McEachin
- Laboratory of Translational Cell Biology, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Gary J Bassell
- Laboratory of Translational Cell Biology, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Cell Biology, Emory University, Atlanta, GA, 30322, USA
| | - Robert Bowser
- Department of Neurobiology, Barrow Neurological Institute and St. Joseph's Hospital and Medical Center, Phoenix, AZ, 85013, USA
| | - Chadwick M Hales
- Department of Neurology, Emory University, Atlanta, GA, 30322, USA
| | - Sung-Rae Cho
- Brain Korea 21 PLUS Project for Medical Science, College of Medicine, Yonsei University, Seoul, 03722, South Korea
- Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, 03722, South Korea
| | - Janghwan Kim
- Stem Cell Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, South Korea
- Department of Functional Genomics, KRIBB School of Bioscience, Korea University of Science and Technology, Daejeon, 34113, South Korea
| | - Ae Nim Pae
- Convergence Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology, PO Box 131, Cheongryang, Seoul, 130-650, South Korea
- Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul, 02792, South Korea
| | - Eunji Cheong
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, 03722, South Korea
| | - Sangwoo Kim
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, South Korea
| | - Nicholas M Boulis
- Department of Neurosurgery, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Sangsu Bae
- Department of Chemistry, Hanyang University, Seoul, 04763, South Korea.
- Research Institute for Natural Sciences, Hanyang University, Seoul, 04763, South Korea.
| | - Yoon Ha
- Department of Neurosurgery, Spine & Spinal Cord Institute, College of Medicine, Yonsei University, Seoul, 03722, South Korea.
- Brain Korea 21 PLUS Project for Medical Science, College of Medicine, Yonsei University, Seoul, 03722, South Korea.
| |
Collapse
|
9
|
Kottyan LC, Parameswaran S, Weirauch MT, Rothenberg ME, Martin LJ. The genetic etiology of eosinophilic esophagitis. J Allergy Clin Immunol 2020; 145:9-15. [PMID: 31910986 PMCID: PMC6984394 DOI: 10.1016/j.jaci.2019.11.013] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 11/15/2019] [Accepted: 11/15/2019] [Indexed: 12/13/2022]
Abstract
Eosinophilic esophagitis (EoE) is a chronic allergic disease associated with marked mucosal eosinophil accumulation. Multiple studies have reported a strong familial component to EoE, with the presence of EoE increasing the risk for other family members with EoE. Epidemiologic studies support an important role for environmental risk factors as modulators of genetic risk. In a small percentage of cases, including patients who have Mendelian diseases with co-occurrent EoE, rare genetic variation with large effect sizes could mediate EoE and explain multigenerational incidence in families. Common genetic risk variants mediate genetic risk for the majority of patients with EoE. Across the 31 reported independent EoE risk loci (P < 10-5), most of the EoE risk variants are located in between genes (36.7%) or within the introns of genes (42.4%). Although some variants do change the amino acid sequence of genes (2.2%), only 3 of the 31 EoE risk loci harbor an amino acid-changing variant. Thus most EoE risk loci are outside of the coding regions of genes, suggesting a key role for gene regulation in patients with EoE, which is consistent with most other complex diseases.
Collapse
Affiliation(s)
- Leah C Kottyan
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
| | - Sreeja Parameswaran
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Matthew T Weirauch
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Marc E Rothenberg
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Lisa J Martin
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| |
Collapse
|
10
|
Kómár P, Kural D. geck: trio-based comparative benchmarking of variant calls. Bioinformatics 2019; 34:3488-3495. [PMID: 29850774 PMCID: PMC6184596 DOI: 10.1093/bioinformatics/bty415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 05/22/2018] [Indexed: 12/30/2022] Open
Abstract
Motivation Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. Results We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. Availability and implementation The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
Müller H, Jimenez-Heredia R, Krolo A, Hirschmugl T, Dmytrus J, Boztug K, Bock C. VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data. Nucleic Acids Res 2019; 45:W567-W572. [PMID: 28520890 PMCID: PMC5570181 DOI: 10.1093/nar/gkx425] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 05/04/2017] [Indexed: 02/07/2023] Open
Abstract
Next generation sequencing is widely used to link genetic variants to diseases, and it has massively accelerated the diagnosis and characterization of rare genetic diseases. After initial bioinformatic data processing, the interactive analysis of genome, exome, and panel sequencing data typically starts from lists of genetic variants in VCF format. Medical geneticists filter and annotate these lists to identify variants that may be relevant for the disease under investigation, or to select variants that are reported in a clinical diagnostics setting. We developed VCF.Filter to facilitate the search for disease-linked variants, providing a standalone Java program with a user-friendly interface for interactive variant filtering and annotation. VCF.Filter allows the user to define a broad range of filtering criteria through a graphical interface. Common workflows such as trio analysis and cohort-based filtering are pre-configured, and more complex analyses can be performed using VCF.Filter's support for custom annotations and filtering criteria. All filtering is documented in the results file, thus providing traceability of the interactive variant prioritization. VCF.Filter is an open source tool that is freely and openly available at http://vcffilter.rarediseases.at.
Collapse
Affiliation(s)
- Heiko Müller
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,Fondazione Istituto Italiano di Tecnologia, 16163 Genoa, Italy
| | - Raul Jimenez-Heredia
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Ana Krolo
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Tatjana Hirschmugl
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Jasmin Dmytrus
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria
| | - Kaan Boztug
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria.,Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, 1090 Vienna, Austria.,St. Anna Kinderspital and Children's Cancer Research Institute, Department of Pediatrics, Medical University of Vienna, 1090 Vienna, Austria
| | - Christoph Bock
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, 1090 Vienna, Austria.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria.,Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| |
Collapse
|
12
|
Liang Y, He L, Zhao Y, Hao Y, Zhou Y, Li M, Li C, Pu X, Wen Z. Comparative Analysis for the Performance of Variant Calling Pipelines on Detecting the de novo Mutations in Humans. Front Pharmacol 2019; 10:358. [PMID: 31105557 PMCID: PMC6499170 DOI: 10.3389/fphar.2019.00358] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 03/21/2019] [Indexed: 01/22/2023] Open
Abstract
Despite of the low occurrence rate in the entire genomes, de novo mutation is proved to be deleterious and will lead to severe genetic diseases via impacting on the gene function. Considering the fact that the traditional family based linkage approaches and the genome-wide association studies are unsuitable for identifying the de novo mutations, in recent years, several pipelines have been proposed to detect them based on the whole-genome or whole-exome sequencing data and were used for calling them in the rare diseases. However, how the performance of these variant calling pipelines on detecting the de novo mutations is still unexplored. For the purpose of facilitating the appropriate choice of the pipelines and reducing the false positive rate, in this study, we thoroughly evaluated the performance of the commonly used trio calling methods on the detection of the de novo single-nucleotide variants (DNSNVs) by conducting a comparative analysis for the calling results. Our results exhibited that different pipelines have a specific tendency to detect the DNSNVs in the genomic regions with different GC contents. Additionally, to refine the calling results for a single pipeline, our proposed filter achieved satisfied results, indicating that the read coverage at the mutation positions can be used as an effective index to identify the high-confidence DNSNVs. Our findings should be good support for the committees to choose an appropriate way to explore the de novo mutations for the rare diseases.
Collapse
Affiliation(s)
- Yu Liang
- College of Chemistry, Sichuan University, Chengdu, China
| | - Li He
- Biogas Appliance Quality Supervision and Inspection Center, Biogas Institute of Ministry of Agriculture, Chengdu, China
| | - Yiru Zhao
- College of Computer Science, Sichuan University, Chengdu, China
| | - Yinyi Hao
- College of Chemistry, Sichuan University, Chengdu, China
| | - Yifan Zhou
- College of Chemistry, Sichuan University, Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Chuan Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, China
| |
Collapse
|
13
|
Mortensen Ó, Lydersen LN, Apol KD, Andorsdóttir G, Steig BÁ, Gregersen NO. Using dried blood spot samples from a trio for linked-read whole-exome sequencing. Eur J Hum Genet 2019; 27:980-988. [PMID: 30765883 PMCID: PMC6777531 DOI: 10.1038/s41431-019-0343-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 12/21/2018] [Accepted: 01/05/2019] [Indexed: 01/22/2023] Open
Abstract
Long-term collection of dried blood spot (DBS) samples through newborn screening may have retrospective and prospective advantages, especially in combination with advanced analytical techniques. This work concerns whether linked-reads may overcome some of the limitations of short-read sequencing of DBS samples, such as performing molecular phasing. We performed whole-exome sequencing of DNA extracted from DBS and corresponding whole blood (WB) reference samples, belonging to a trio with unaffected parents and a proband affected by primary carnitine deficiency (PCD). For the DBS samples we were able to phase >21% of the genes under 100 kb, >40% of the SNPs, and the longest phase block was >72 kb. Corresponding results for the WB reference samples was >85%, >75%, and >915 kb, respectively. Concerning the PCD causing variant (rs72552725:A > G) in the SLC22A5 gene we observe full genotype concordance between DBS and WB for all three samples. Furthermore, we were able to phase all variants within the SLC22A5 gene in the proband’s WB data, which shows that linked-read sequencing may replace the trio information for haplotype detection. However, due to smaller molecular lengths in the DBS data only small phase blocks were observed in the proband’s DBS sample. Therefore, further optimisation of the DBS workflow is needed in order to explore the full potential of DBS samples as a test bed for molecular phasing.
Collapse
Affiliation(s)
- Ólavur Mortensen
- FarGen, The Genetic Biobank of the Faroe Islands, Tórshavn, Faroe Islands
| | | | | | | | - Bjarni Á Steig
- General Medical Department, National Hospital of the Faroe Islands, Tórshavn, Faroe Islands
| | | |
Collapse
|
14
|
Zhou X, Batzoglou S, Sidow A, Zhang L. HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data. BMC Genomics 2018; 19:467. [PMID: 29914369 PMCID: PMC6006847 DOI: 10.1186/s12864-018-4867-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 06/13/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.
Collapse
Affiliation(s)
- Xin Zhou
- Department of Computer Science, Stanford University, Stanford, California, 94305, USA
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, California, 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University School of Medicine, Stanford, California, 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California, 94305, USA
| | - Lu Zhang
- Department of Computer Science, Stanford University, Stanford, California, 94305, USA. .,Department of Pathology, Stanford University School of Medicine, Stanford, California, 94305, USA.
| |
Collapse
|
15
|
Abstract
The precise location of variants in the human genome is of utmost importance. We present a unique approach, coverage-based single nucleotide variant (SNV) identification (COBASI), which uses only perfect matches between the reads of a sequence project and a reference genome to detect and accurately identify de novo SNVs. From the perfect matches, a representation of the read coverage per nucleotide along the genome, the variation landscape, is generated. SNVs are then pinpointed as significant changes in coverage and de novo SNVs can be identified with high precision. The performance of COBASI was analyzed using simulations and experimentally validated by sequencing de novo SNVs identified from a parent–offspring trio. We propose this pipeline as a useful tool for different genomic applications. The precise determination of de novo genetic variants has enormous implications across different fields of biology and medicine, particularly personalized medicine. Currently, de novo variations are identified by mapping sample reads from a parent–offspring trio to a reference genome, allowing for a certain degree of differences. While widely used, this approach often introduces false-positive (FP) results due to misaligned reads and mischaracterized sequencing errors. In a previous study, we developed an alternative approach to accurately identify single nucleotide variants (SNVs) using only perfect matches. However, this approach could be applied only to haploid regions of the genome and was computationally intensive. In this study, we present a unique approach, coverage-based single nucleotide variant identification (COBASI), which allows the exploration of the entire genome using second-generation short sequence reads without extensive computing requirements. COBASI identifies SNVs using changes in coverage of exactly matching unique substrings, and is particularly suited for pinpointing de novo SNVs. Unlike other approaches that require population frequencies across hundreds of samples to filter out any methodological biases, COBASI can be applied to detect de novo SNVs within isolated families. We demonstrate this capability through extensive simulation studies and by studying a parent–offspring trio we sequenced using short reads. Experimental validation of all 58 candidate de novo SNVs and a selection of non-de novo SNVs found in the trio confirmed zero FP calls. COBASI is available as open source at https://github.com/Laura-Gomez/COBASI for any researcher to use.
Collapse
|
16
|
John J, Kukshal P, Bhatia T, Chowdari KV, Nimgaonkar VL, Deshpande SN, Thelma BK. Possible role of rare variants in Trace amine associated receptor 1 in schizophrenia. Schizophr Res 2017; 189:190-195. [PMID: 28242106 PMCID: PMC5569002 DOI: 10.1016/j.schres.2017.02.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 02/15/2017] [Accepted: 02/16/2017] [Indexed: 10/20/2022]
Abstract
Schizophrenia (SZ) is a chronic mental illness with behavioral abnormalities. Recent common variant based genome wide association studies and rare variant detection using next generation sequencing approaches have identified numerous variants that confer risk for SZ, but etiology remains unclear propelling continuing investigations. Using whole exome sequencing, we identified a rare heterozygous variant (c.545G>T; p.Cys182Phe) in Trace amine associated receptor 1 gene (TAAR1 6q23.2) in three affected members in a small SZ family. The variant predicted to be damaging by 15 prediction tools, causes breakage of a conserved disulfide bond in this G-protein-coupled receptor. On screening this intronless gene for additional variant(s) in ~800 sporadic SZ patients, we identified six rare protein altering variants (MAF<0.001) namely p.Ser47Cys, p.Phe51Leu, p.Tyr294Ter, p.Leu295Ser in four unrelated north Indian cases (n=475); p.Ala109Thr and p.Val250Ala in two independent Caucasian/African-American patients (n=310). Five of these variants were also predicted to be damaging. Besides, a rare synonymous variant was observed in SZ patients. These rare variants were absent in north Indian healthy controls (n=410) but significantly enriched in patients (p=0.036). Conversely, three common coding SNPs (rs8192621, rs8192620 and rs8192619) and a promoter SNP (rs60266355) tested for association with SZ in the north Indian cohort were not significant (P>0.05). TAAR1 is a modulator of monoaminergic pathways and interacts with AKT signaling pathways. Substantial animal model based pharmacological and functional data implying its relevance in SZ are also available. However, this is the first report suggestive of the likely contribution of rare variants in this gene to SZ.
Collapse
Affiliation(s)
- Jibin John
- Department of Genetics, University of Delhi South Campus, Benito Juarez Road, New Delhi 110 021, India
| | - Prachi Kukshal
- Department of Genetics, University of Delhi South Campus, Benito Juarez Road, New Delhi 110 021, India
| | - Triptish Bhatia
- Department of Psychiatry, PGIMER-Dr. RML Hospital, New Delhi 110 001, India
| | - K V Chowdari
- Department of Psychiatry, Western Psychiatric Institute and Clinic, University of Pittsburgh School of Medicine, 3811 O'Hara Street,Pittsburgh, PA 15213, USA
| | - V L Nimgaonkar
- Department of Psychiatry, Western Psychiatric Institute and Clinic, University of Pittsburgh School of Medicine, 3811 O'Hara Street,Pittsburgh, PA 15213, USA; Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, DeSoto St, Pittsburgh, PA 15213, USA
| | - S N Deshpande
- Department of Psychiatry, PGIMER-Dr. RML Hospital, New Delhi 110 001, India
| | - B K Thelma
- Department of Genetics, University of Delhi South Campus, Benito Juarez Road, New Delhi 110 021, India.
| |
Collapse
|
17
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rachel S Schwartz
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Department of Biological Sciences, The University of Rhode Island, Kingston, RI, USA
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Donald F Conrad
- Department of Genetics, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
18
|
Molecular genetic findings and clinical correlations in 100 patients with Joubert syndrome and related disorders prospectively evaluated at a single center. Genet Med 2017; 19:875-882. [PMID: 28125082 DOI: 10.1038/gim.2016.204] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 11/08/2016] [Indexed: 11/08/2022] Open
Abstract
PURPOSE Joubert syndrome (JS) is a genetically and clinically heterogeneous ciliopathy characterized by distinct cerebellar and brainstem malformations resulting in the diagnostic "molar tooth sign" on brain imaging. To date, more than 30 JS genes have been identified, but these do not account for all patients. METHODS In our cohort of 100 patients with JS from 86 families, we prospectively performed extensive clinical evaluation and provided molecular diagnosis using a targeted 27-gene Molecular Inversion Probes panel followed by whole-exome sequencing (WES). RESULTS We identified the causative gene in 94% of the families; 126 (27 novel) unique potentially pathogenic variants were found in 20 genes, including KIAA0753 and CELSR2, which had not previously been associated with JS. Genotype-phenotype correlation revealed the absence of retinal degeneration in patients with TMEM67, C5orf52, or KIAA0586 variants. Chorioretinal coloboma was associated with a decreased risk for retinal degeneration and increased risk for liver disease. TMEM67 was frequently associated with kidney disease. CONCLUSION In JS, WES significantly increases the yield for molecular diagnosis, which is essential for reproductive counseling and the option of preimplantation and prenatal diagnosis as well as medical management and prognostic counseling for the age-dependent and progressive organ-specific manifestations, including retinal, liver, and kidney disease.Genet Med advance online publication 26 January 2017.
Collapse
|
19
|
Chang LC, Li B, Fang Z, Vrieze S, McGue M, Iacono WG, Tseng GC, Chen W. A computational method for genotype calling in family-based sequencing data. BMC Bioinformatics 2016; 17:37. [PMID: 26772743 PMCID: PMC4715317 DOI: 10.1186/s12859-016-0880-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 01/06/2016] [Indexed: 12/12/2022] Open
Abstract
Background As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage sequence data. However, genotype-calling methods for family-based sequence data, particularly for complex families beyond parent-offspring trios, are still lacking. Results In this study, first, we proposed an algorithm that considers both linkage disequilibrium (LD) patterns and familial transmission in nuclear and multi-generational families while retaining the computational efficiency. Second, we extended our method to incorporate external reference panels to analyze family-based sequence data with a small sample size. In simulation studies, we show that modeling multiple offspring can dramatically increase genotype calling accuracy and reduce phasing and Mendelian errors, especially at low to modest coverage. In addition, we show that using external panels can greatly facilitate genotype calling of sequencing data with a small number of individuals. We applied our method to a whole genome sequencing study of 1339 individuals at ~10X coverage from the Minnesota Center for Twin and Family Research. Conclusions The aggregated results show that our methods significantly outperform existing ones that ignore family constraints or LD information. We anticipate that our method will be useful for many ongoing family-based sequencing projects. We have implemented our methods efficiently in a C++ program FamLDCaller, which is available from http://www.pitt.edu/~wec47/famldcaller.html. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0880-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lun-Ching Chang
- Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, 20892, USA.
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Zhou Fang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Scott Vrieze
- Department of Psychology & Neuroscience, Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA.
| | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - William G Iacono
- Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA.
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Wei Chen
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA. .,Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, PA, 15224, USA.
| |
Collapse
|
20
|
Liu Y, Liu J, Lu J, Peng J, Juan L, Zhu X, Li B, Wang Y. Joint detection of copy number variations in parent-offspring trios. Bioinformatics 2015; 32:1130-7. [PMID: 26644415 DOI: 10.1093/bioinformatics/btv707] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/27/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. RESULTS In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. AVAILABILITY AND IMPLEMENTATION The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jianguo Lu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaolin Zhu
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, University Program in Genetics and Genomics, Duke University Medical School, Durham, NC 27708
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235 and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
21
|
SeqMule: automated pipeline for analysis of human exome/genome sequencing data. Sci Rep 2015; 5:14283. [PMID: 26381817 PMCID: PMC4585643 DOI: 10.1038/srep14283] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 08/21/2015] [Indexed: 11/16/2022] Open
Abstract
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Collapse
|
22
|
Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 2015; 11:e1005271. [PMID: 26043085 PMCID: PMC4456389 DOI: 10.1371/journal.pgen.1005271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 05/12/2015] [Indexed: 12/23/2022] Open
Abstract
Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing. To identify disease variants that occur less frequently in population, sequencing families in which multiple individuals are affected is more powerful due to the enrichment of causal variants. An important step in such studies is to infer individual genotypes from sequencing data. Existing methods do not utilize full familial transmission information and therefore result in reduced accuracy of inferred genotypes. In this study we describe a new method that infers shared genetic materials among family members and then incorporate the shared genomic information in a novel algorithm that can accurately infer genotypes. Our method is particularly advantageous when inferring low frequency variants with fewer sequence data, making it effective in analyzing genome-wide sequence data. We implemented the algorithm in a computationally efficient tool to facilitate cost-effective sequencing in families for identifying disease genetic variants.
Collapse
|
23
|
Zhi D, Liu N, Zhang K. On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads. Methods 2015; 79-80:41-6. [PMID: 25644447 PMCID: PMC4437872 DOI: 10.1016/j.ymeth.2015.01.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 12/19/2014] [Accepted: 01/23/2015] [Indexed: 12/30/2022] Open
Abstract
Next-generation sequencing (NGS) technologies, which can provide base-pair resolution genetic information for all types of genetic variations, are increasingly used in genetics research. However, due to the complex nature of NGS technologies and analytics and their relatively high cost, investigators face practical challenges for both design and analysis. These challenges are further complicated by recent methodological developments that make it possible to use haplotype information in sequencing reads. In light of these developments, we conducted comprehensive simulations to evaluate the effects of sequencing coverage, insert size of paired-end reads, and sample size on genotype calling and haplotype phasing in NGS studies. In contrast to previous studies that typically use idealized scenarios to tease out the effects of individual design and analytic decisions, we used a complete analytical pipeline from read mapping and variant detection to genotype calling and haplotype phasing so that we can assess the joint effects of multiple decisions and thus make more realistic recommendations to investigators. Consistent with previous studies, we found that the use of haplotype information in reads can improve the accuracy of genotype calling and haplotype phasing, and we also found that a mixture of short and long insert sizes of paired-end reads may offer even greater accuracy. However, this benefit is only clear in high coverage sequencing where variant detection is close to perfect. Finally, we observed that LD-based refinement methods do not always outperform single site based methods for genotype calling. Therefore, we should choose analytical methods that are appropriate to the sequencing coverage and sample size in order to use haplotype information in sequencing reads.
Collapse
Affiliation(s)
- Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, United States.
| | - Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, United States
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, United States.
| |
Collapse
|
24
|
Zeng P, Zhao Y, Liu J, Liu L, Zhang L, Wang T, Huang S, Chen F. Likelihood ratio tests in rare variant detection for continuous phenotypes. Ann Hum Genet 2015; 78:320-32. [PMID: 25117149 DOI: 10.1111/ahg.12071] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Accepted: 04/22/2014] [Indexed: 12/30/2022]
Abstract
It is believed that rare variants play an important role in human phenotypes; however, the detection of rare variants is extremely challenging due to their very low minor allele frequency. In this paper, the likelihood ratio test (LRT) and restricted likelihood ratio test (ReLRT) are proposed to test the association of rare variants based on the linear mixed effects model, where a group of rare variants are treated as random effects. Like the sequence kernel association test (SKAT), a state-of-the-art method for rare variant detection, LRT and ReLRT can effectively overcome the problem of directionality of effect inherent in the burden test in practice. By taking full advantage of the spectral decomposition, exact finite sample null distributions for LRT and ReLRT are obtained by simulation. We perform extensive numerical studies to evaluate the performance of LRT and ReLRT, and compare to the burden test, SKAT and SKAT-O. The simulations have shown that LRT and ReLRT can correctly control the type I error, and the controls are robust to the weights chosen and the number of rare variants under study. LRT and ReLRT behave similarly to the burden test when all the causal rare variants share the same direction of effect, and outperform SKAT across various situations. When both positive and negative effects exist, LRT and ReLRT suffer from few power reductions compared to the other two competing methods; under this case, an additional finding from our simulations is that SKAT-O is no longer the optimal test, and its power is even lower than that of SKAT. The exome sequencing SNP data from Genetic Analysis Workshop 17 were employed to illustrate the proposed methods, and interesting results are described.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, 211166, P. R. China; Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Juan L, Liu Y, Wang Y, Teng M, Zang T, Wang Y. Family genome browser: visualizing genomes with pedigree information. Bioinformatics 2015; 31:2262-8. [PMID: 25788626 DOI: 10.1093/bioinformatics/btv151] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/11/2015] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. RESULTS We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. AVAILABILITY AND IMPLEMENTATION The FGB is available at http://mlg.hit.edu.cn/FGB/.
Collapse
Affiliation(s)
- Liran Juan
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yongzhuang Liu
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yongtian Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Mingxiang Teng
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tianyi Zang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
26
|
Cleary JG, Braithwaite R, Gaastra K, Hilbush BS, Inglis S, Irvine SA, Jackson A, Littin R, Nohzadeh-Malakshah S, Rathod M, Ware D, Trigg L, De La Vega FM. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J Comput Biol 2015; 21:405-19. [PMID: 24874280 DOI: 10.1089/cmb.2014.0029] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyzes data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of whole-genome sequencing (WGS) data from a 17-individual, 3-generation CEPH pedigree sequenced to 50× average depth. Compared with singleton calling, our family caller produced more high-quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance, and by comparing to ground truth variant sets available for this sample. We identify all previously validated de novo mutations in NA12878, concurrent with a 7× precision improvement. Our results show that our method is scalable to large genomics and human disease studies.
Collapse
|
27
|
Li J, Jiang Y, Wang T, Chen H, Xie Q, Shao Q, Ran X, Xia K, Sun ZS, Wu J. mirTrios: an integrated pipeline for detection of de novo and rare inherited mutations from trios-based next-generation sequencing. J Med Genet 2015; 52:275-81. [PMID: 25596308 DOI: 10.1136/jmedgenet-2014-102656] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
OBJECTIVES Recently, several studies documented that de novo mutations (DNMs) play important roles in the aetiology of sporadic diseases. Next-generation sequencing (NGS) enables variant calling at single-base resolution on a genome-wide scale. However, accurate identification of DNMs from NGS data still remains a major challenge. We developed mirTrios, a web server, to accurately detect DNMs and rare inherited mutations from NGS data in sporadic diseases. METHODS The expectation-maximisation (EM) model was adopted to accurately identify DNMs from variant call files of a trio generated by GATK (Genome Analysis Toolkit). The GATK results, which contain certain basic properties (such as PL, PRT and PART), are iteratively integrated into the EM model to strike a threshold for DNMs detection. Training sets of true and false positive DNMs in the EM model were built from whole genome sequencing data of 64 trios. RESULTS With our in-house whole exome sequencing datasets from 20 trios, mirTrios totally identified 27 DNMs in the coding region, 25 of which (92.6%) are validated as true positives. In addition, to facilitate the interpretation of diverse mutations, mirTrios can also be employed in the identification of rare inherited mutations. Embedded with abundant annotation of DNMs and rare inherited mutations, mirTrios also supports known diagnostic variants and causative gene identification, as well as the prioritisation of novel and promising candidate genes. CONCLUSIONS mirTrios provides an intuitive interface for the general geneticist and clinician, and can be widely used for detection of DNMs and rare inherited mutations, and annotation in sporadic diseases. mirTrios is freely available at http://centre.bioinformatics.zj.cn/mirTrios/.
Collapse
Affiliation(s)
- Jinchen Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Tao Wang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Huiqian Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qing Xie
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qianzhi Shao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xia Ran
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Kun Xia
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Zhong Sheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jinyu Wu
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
28
|
Peng G, Fan Y, Wang W. FamSeq: a variant calling program for family-based sequencing data using graphics processing units. PLoS Comput Biol 2014; 10:e1003880. [PMID: 25357123 PMCID: PMC4214554 DOI: 10.1371/journal.pcbi.1003880] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 08/20/2014] [Indexed: 12/30/2022] Open
Abstract
Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.
Collapse
Affiliation(s)
- Gang Peng
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Yu Fan
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
29
|
Salzberg SL, Pertea M, Fahrner JA, Sobreira N. DIAMUND: direct comparison of genomes to detect mutations. Hum Mutat 2014; 35:283-8. [PMID: 24375697 PMCID: PMC4031744 DOI: 10.1002/humu.22503] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 12/19/2013] [Indexed: 12/30/2022]
Abstract
DNA sequencing has become a powerful method to discover the genetic basis of disease. Standard, widely used protocols for analysis usually begin by comparing each individual to the human reference genome. When applied to a set of related individuals, this approach reveals millions of differences, most of which are shared among the individuals and unrelated to the disease being investigated. We have developed a novel algorithm for variant detection, one that compares DNA sequences directly to one another, without aligning them to the reference genome. When used to find de novo mutations in exome sequences from family trios, or to compare normal and diseased samples from the same individual, the new method, direct alignment for mutation discovery (DIAMUND), produces a dramatically smaller list of candidate mutations than previous methods, without losing sensitivity to detect the true cause of a genetic disease. We demonstrate our results on several example cases, including two family trios in which it correctly found the disease-causing variant while excluding thousands of harmless variants that standard methods had identified.
Collapse
Affiliation(s)
- Steven L Salzberg
- Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, 21205; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, 21205
| | | | | | | |
Collapse
|
30
|
Bao R, Huang L, Andrade J, Tan W, Kibbe WA, Jiang H, Feng G. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform 2014; 13:67-82. [PMID: 25288881 PMCID: PMC4179624 DOI: 10.4137/cin.s13779] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 07/06/2014] [Accepted: 07/07/2014] [Indexed: 12/21/2022] Open
Abstract
The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, pre-processing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics.
Collapse
Affiliation(s)
- Riyue Bao
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Lei Huang
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Jorge Andrade
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Wei Tan
- IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA
| | - Warren A Kibbe
- Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA
| | - Hongmei Jiang
- Department of Statistics, Northwestern University, Evanston, IL, USA
| | - Gang Feng
- Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA
| |
Collapse
|
31
|
Al-Sinani S, Hassan MO, Zadjali F, Al-Yahyaee S, Albarwani S, Rizvi S, Jaju D, Comuzzie A, Voruganti VS, Bayoumi R. Utility of large consanguineous family-based model for investigating the genetics of type 2 diabetes mellitus. Gene 2014; 548:22-8. [DOI: 10.1016/j.gene.2014.06.053] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 06/16/2014] [Accepted: 06/23/2014] [Indexed: 12/24/2022]
|
32
|
Bahlo M, Tankard R, Lukic V, Oliver KL, Smith KR. Using familial information for variant filtering in high-throughput sequencing studies. Hum Genet 2014; 133:1331-41. [PMID: 25129038 PMCID: PMC4185103 DOI: 10.1007/s00439-014-1479-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/07/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, ‘sporadic’ cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data.
Collapse
Affiliation(s)
- Melanie Bahlo
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia,
| | | | | | | | | |
Collapse
|
33
|
Teare MD, Santibañez Koref MF. Linkage analysis and the study of Mendelian disease in the era of whole exome and genome sequencing. Brief Funct Genomics 2014; 13:378-83. [PMID: 25024279 DOI: 10.1093/bfgp/elu024] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Whole exome and whole genome sequencing are now routinely used in the study of inherited disease, and some of their major successes have been the identification of genes involved in disease predisposition in pedigrees where disease seems to follow Mendelian inheritance patterns. These successes include scenarios where only a single individual was sequenced and raise the question whether linkage analysis has become superfluous. Linkage analysis requires genome-wide genotyping on family-based data, and traditionally the linkage analysis was performed before the targeting sequencing stage. However, methods are emerging that seek to exploit the capability of linkage analysis to integrate data both across individuals and across pedigrees. This ability has been exploited to select samples used for sequencing studies and to identify among the variants uncovered by sequencing those mapping to regions likely to contain the gene of interest and, more generally, to improve variant detection. So, although the formal isolated linkage analysis stage is less commonly seen, when uncovering the genetic basis of Mendelian disease, methods relying heavily on genetic linkage analysis principles are being integrated directly into the whole mapping process ranging from sample selection to variant calling and filtering.
Collapse
|
34
|
Alemán A, Garcia-Garcia F, Salavert F, Medina I, Dopazo J. A web-based interactive framework to assist in the prioritization of disease candidate genes in whole-exome sequencing studies. Nucleic Acids Res 2014; 42:W88-93. [PMID: 24803668 PMCID: PMC4086071 DOI: 10.1093/nar/gku407] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Whole-exome sequencing has become a fundamental tool for the discovery of
disease-related genes of familial diseases and the identification of somatic
driver variants in cancer. However, finding the causal mutation among the
enormous background of individual variability in a small number of samples is
still a big challenge. Here we describe a web-based tool, BiERapp, which
efficiently helps in the identification of causative variants in family and
sporadic genetic diseases. The program reads lists of predicted variants
(nucleotide substitutions and indels) in affected individuals or tumor samples
and controls. In family studies, different modes of inheritance can easily be
defined to filter out variants that do not segregate with the disease along the
family. Moreover, BiERapp integrates additional information such as allelic
frequencies in the general population and the most popular damaging scores to
further narrow down the number of putative variants in successive filtering
steps. BiERapp provides an interactive and user-friendly interface that
implements the filtering strategy used in the context of a large-scale genomic
project carried out by the Spanish Network for Research in Rare Diseases
(CIBERER) in which more than 800 exomes have been analyzed. BiERapp is freely
available at: http://bierapp.babelomics.org/
Collapse
Affiliation(s)
- Alejandro Alemán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia 46010, Spain
| | - Francisco Garcia-Garcia
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain
| | - Francisco Salavert
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia 46010, Spain
| | - Ignacio Medina
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain
| | - Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia 46012, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia 46010, Spain Functional Genomics Node, (INB) at CIPF, Valencia 46012, Spain
| |
Collapse
|
35
|
Zeng P, Zhao Y, Zhang L, Huang S, Chen F. Rare variants detection with kernel machine learning based on likelihood ratio test. PLoS One 2014; 9:e93355. [PMID: 24675868 PMCID: PMC3968153 DOI: 10.1371/journal.pone.0093355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2013] [Accepted: 03/03/2014] [Indexed: 11/18/2022] Open
Abstract
This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
36
|
Strom SP, Lee H, Das K, Vilain E, Nelson SF, Grody WW, Deignan JL. Assessing the necessity of confirmatory testing for exome-sequencing results in a clinical molecular diagnostic laboratory. Genet Med 2014; 16:510-5. [PMID: 24406459 PMCID: PMC4079763 DOI: 10.1038/gim.2013.183] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 10/18/2013] [Indexed: 02/07/2023] Open
Abstract
Purpose Sanger sequencing is currently considered the gold standard methodology for clinical molecular diagnostic testing. However, next generation sequencing (NGS) has already emerged as a much more efficient means to identify genetic variants within gene panels, the exome, or the genome. We sought to assess the accuracy of NGS variant identification in our clinical genomics laboratory with the goal of establishing a quality score threshold for confirmatory Sanger-based testing. Methods Confirmation data for reported results from 144 sequential clinical exome sequencing cases (94 unique variants) and an additional set of 16 variants from comparable research samples were analyzed. Results 103 of 110 total SNVs analyzed had a quality score ≥Q500, 103 (100%) of which were confirmed by Sanger sequencing. Of the remaining 7 variants with quality scores <Q500, 6 were confirmed by Sanger sequencing (85%). Conclusions For single nucleotide variants, we predict we will be able to reduce our Sanger confirmation workload going forward by 70–80%. This serves as a proof of principle that as long as sufficient validation and quality control measures are implemented, the volume of Sanger confirmation can be reduced, alleviating a significant amount of the labor and cost burden on clinical laboratories wishing to utilize NGS technology. However, Sanger confirmation of low quality single nucleotide variants and all indels (insertions or deletions less than 10 bp) remains necessary at this time in our laboratory.
Collapse
Affiliation(s)
- Samuel P Strom
- 1] Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA [2] Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Hane Lee
- Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Kingshuk Das
- Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Eric Vilain
- 1] Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA [2] Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Stanley F Nelson
- 1] Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA [2] Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Wayne W Grody
- 1] Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA [2] Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA [3] Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | - Joshua L Deignan
- Department of Pathology and Laboratory Medicine¸ David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| |
Collapse
|
37
|
Santoni FA, Makrythanasis P, Nikolaev S, Guipponi M, Robyr D, Bottani A, Antonarakis SE. Simultaneous identification and prioritization of variants in familial, de novo, and somatic genetic disorders with VariantMaster. Genome Res 2014; 24:349-55. [PMID: 24389049 PMCID: PMC3912425 DOI: 10.1101/gr.163832.113] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
There is increasing interest in clinical genetics pertaining to the utilization of high-throughput sequencing data for accurate diagnoses of monogenic diseases. Moreover, massive whole-exome sequencing of tumors has provided significant advances in the understanding of cancer development through the recognition of somatic driver variants. To improve the identification of the variants from HTS, we developed VariantMaster, an original program that accurately and efficiently extracts causative variants in familial and sporadic genetic diseases. The algorithm takes into account predicted variants (SNPs and indels) in affected individuals or tumor samples and utilizes the row (BAM) data to robustly estimate the conditional probability of segregation in a family, as well as the probability of it being de novo or somatic. In familial cases, various modes of inheritance are considered: X-linked, autosomal dominant, and recessive (homozygosity or compound heterozygosity). Moreover, VariantMaster integrates phenotypes and genotypes, and employs Annovar to produce additional information such as allelic frequencies in the general population and damaging scores to further reduce the number of putative variants. As a proof of concept, we successfully applied VariantMaster to identify (1) de novo mutations in a previously described data set, (2) causative variants in a rare Mendelian genetic disease, and (3) known and new “driver” mutations in previously reported cancer data sets. Our results demonstrate that VariantMaster is considerably more accurate in terms of precision and sensitivity compared with previously published algorithms.
Collapse
Affiliation(s)
- Federico A Santoni
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva 4, Switzerland
| | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
Recent advances in genetic analysis especially DNA sequencing technology open a new strategy for adult disease prevention by genetic screening. Physicians presently treat disease pathology with less emphasis on disease risk prevention/reduction. Genetic screening has reduced the incidence of untreatable childhood genetic diseases and improved the care of newborns. The opportunity exists to expand screening programs and reduce the incidence of adult onset diseases via genetic risk identification and disease intervention. This article outlines the approach, challenges, and benefits of such screening for adult genetic disease risks.
Collapse
|
39
|
Punetha J, Hoffman EP. Short read (next-generation) sequencing: a tutorial with cardiomyopathy diagnostics as an exemplar. ACTA ACUST UNITED AC 2013; 6:427-34. [PMID: 23852418 DOI: 10.1161/circgenetics.113.000085] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Jaya Punetha
- Department of Integrative Systems Biology, The George Washington University School of Medicine, Washington, DC, USA
| | | |
Collapse
|
40
|
Rare variants run in the family. Nat Methods 2013. [DOI: 10.1038/nmeth.2419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|