1
|
Tan R, Shen Y. Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning. Nucleic Acids Res 2022; 50:e123. [PMID: 36124672 PMCID: PMC9756945 DOI: 10.1093/nar/gkac788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/08/2022] [Accepted: 09/01/2022] [Indexed: 12/24/2022] Open
Abstract
Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring-parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies.
Collapse
Affiliation(s)
- Renjie Tan
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032, USA
| |
Collapse
|
2
|
Demarest S, Calhoun J, Eschbach K, Yu HC, Mirsky D, Angione K, Shaikh TH, Carvill GL, Benke TA, Gunti J, Vanderveen G. Whole-exome sequencing and adrenocorticotropic hormone therapy in individuals with infantile spasms. Dev Med Child Neurol 2022; 64:633-640. [PMID: 35830182 DOI: 10.1111/dmcn.15109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 10/22/2021] [Accepted: 10/25/2021] [Indexed: 12/20/2022]
Abstract
AIM To identify additional genes associated with infantile spasms using a cohort with defined infantile spasms. METHOD Whole-exome sequencing (WES) was performed on 21 consented individuals with infantile spasms and their unaffected parents (a trio-based study). Clinical history and imaging were reviewed. Potentially deleterious exonic variants were identified and segregated. To refine potential candidates, variants were further prioritized on the basis of evidence for relevance to disease phenotype or known associations with infantile spasms, epilepsy, or neurological disease. RESULTS Likely pathogenic de novo variants were identified in NR2F1, GNB1, NEUROD2, GABRA2, and NDUFAF5. Suggestive dominant and recessive candidate variants were identified in PEMT, DYNC1I1, ASXL1, RALGAPB, and STRADA; further confirmation is required to support their relevance to disease etiology. INTERPRETATION This study supports the utility of WES in uncovering the genetic etiology in undiagnosed individuals with infantile spasms with an overall yield of five out of 21. High-priority candidates were identified in an additional five individuals. WES provides additional support for previously described disease-associated genes and expands their already broad mutational and phenotypic spectrum.
Collapse
Affiliation(s)
- Scott Demarest
- Children's Hospital Colorado, Aurora, CO, USA.,Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA
| | - Jeff Calhoun
- Ken and Ruth Davee Department of Neurology, Northwestern University, School of Medicine, Chicago, IL, USA
| | - Krista Eschbach
- Children's Hospital Colorado, Aurora, CO, USA.,Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA
| | - Hung-Chun Yu
- Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA
| | - David Mirsky
- Children's Hospital Colorado, Aurora, CO, USA.,Department of Radiology, University of Colorado, School of Medicine, Aurora, CO, USA
| | - Katie Angione
- Children's Hospital Colorado, Aurora, CO, USA.,Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA
| | - Tamim H Shaikh
- Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA
| | - Gemma L Carvill
- Ken and Ruth Davee Department of Neurology, Northwestern University, School of Medicine, Chicago, IL, USA.,Department of Pharmacology, Northwestern University, School of Medicine, Chicago, IL, USA.,Department of Pediatrics, Northwestern University, School of Medicine, Chicago, IL, USA
| | - Tim A Benke
- Children's Hospital Colorado, Aurora, CO, USA.,Department of Pediatrics, University of Colorado, School of Medicine, Aurora, CO, USA.,Department of Pharmacology, University of Colorado, School of Medicine, Aurora, CO, USA.,Department of Neurology, University of Colorado, School of Medicine, Aurora, CO, USA.,Department of Otolaryngology, University of Colorado, School of Medicine, Aurora, CO, USA
| | | | | | | | | |
Collapse
|
3
|
Combining callers improves the detection of copy number variants from whole-genome sequencing. Eur J Hum Genet 2022; 30:178-186. [PMID: 34744167 PMCID: PMC8821561 DOI: 10.1038/s41431-021-00983-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 09/23/2021] [Accepted: 10/04/2021] [Indexed: 01/03/2023] Open
Abstract
Copy Number Variants (CNVs) are deletions, duplications or insertions larger than 50 base pairs. They account for a large percentage of the normal genome variation and play major roles in human pathology. While array-based approaches have long been used to detect them in clinical practice, whole-genome sequencing (WGS) bears the promise to allow concomitant exploration of CNVs and smaller variants. However, accurately calling CNVs from WGS remains a difficult computational task, for which a consensus is still lacking. In this paper, we explore practical calling options to reach the best compromise between sensitivity and sensibility. We show that callers based on different signal (paired-end reads, split reads, coverage depth) yield complementary results. We suggest approaches combining four selected callers (Manta, Delly, ERDS, CNVnator) and a regenotyping tool (SV2), and show that this is applicable in everyday practice in terms of computation time and further interpretation. We demonstrate the superiority of these approaches over array-based Comparative Genomic Hybridization (aCGH), specifically regarding the lack of resolution in breakpoint definition and the detection of potentially relevant CNVs. Finally, we confirm our results on the NA12878 benchmark genome, as well as one clinically validated sample. In conclusion, we suggest that WGS constitutes a timely and economically valid alternative to the combination of aCGH and whole-exome sequencing.
Collapse
|
4
|
Garcia-Rosa S, de Amorim MG, Valieris R, Marques VD, Lorenzi JCC, Toller VB, do Olival GS, da Silva Júnior WA, da Silva IT, Barreira AA, Nunes DN, Dias-Neto E. Exome sequencing of multiple-sclerosis patients and their unaffected first-degree relatives. BMC Res Notes 2017; 10:735. [PMID: 29233175 PMCID: PMC5727932 DOI: 10.1186/s13104-017-3072-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Accepted: 12/06/2017] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES The understanding of complex multifactorial diseases requires the availability of a variety of data for a large-number of affected individuals. In this data note here we provide whole exome sequencing data from a set of non-familiar multiple-sclerosis (MS) patients as well as their unaffected first-degree relatives. This data might help the identification of genomic alterations, including single nucleotide polymorphisms, de novo variations and structural genomic variations, such as copy-number alterations that may impact this disease. DATA DESCRIPTION This dataset comprises the full exome of 28 Brazilian subjects grouped in eight distinct families, consisting of four complete trios (mother-patient-father) plus another four complete trios with one added unaffected sibling. In total, we present the full exome data of eight patients diagnosed with recurrent remittent multiple sclerosis. Diagnoses were made by experienced neurologists and all enrolled patients had at least 5 years of follow up and specific MS treatment. Exomes were sequenced from leukocyte-derived DNA, after the capture of exons using biotinylated probes, in the Ion Proton platform. For each exome we generated an average of 66.1 million good quality mapped reads with an average length of ~ 160nt. On average, for 90% of the exome a vertical coverage above 20× was reached.
Collapse
Affiliation(s)
- Sheila Garcia-Rosa
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Maria Galli de Amorim
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Renan Valieris
- Laboratory of Computational Biology and Bioinformatics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Vanessa Daccach Marques
- Department of Neurosciences, Clinical Neuroimmunology Division, Medical School and Hospital das Clínicas of Ribeirão Preto, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Julio Cesar Cetrulo Lorenzi
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Vania Balardin Toller
- Neurosciences Research Group, Faculdade de Ciências Médicas da Santa Casa de São Paulo, Rua Doutor Cesário Motta Júnior, 61 - Vila Buarque, São Paulo, SP 01221-020 Brazil
| | - Guilherme Sciascia do Olival
- Neurosciences Research Group, Faculdade de Ciências Médicas da Santa Casa de São Paulo, Rua Doutor Cesário Motta Júnior, 61 - Vila Buarque, São Paulo, SP 01221-020 Brazil
| | - Wilson Araújo da Silva Júnior
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Israel Tojal da Silva
- Laboratory of Computational Biology and Bioinformatics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Amilton Antunes Barreira
- Department of Neurosciences, Clinical Neuroimmunology Division, Medical School and Hospital das Clínicas of Ribeirão Preto, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Diana Noronha Nunes
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Emmanuel Dias-Neto
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
- Lab. of Neurosciences (LIM-27), Institute of Psychiatry, Faculdade de Medicina, Universidade de São Paulo, São Paulo, SP Brazil
| |
Collapse
|