1
|
Tsai CY, Hsu JSJ, Chen PL, Wu CC. Implementing next-generation sequencing for diagnosis and management of hereditary hearing impairment: a comprehensive review. Expert Rev Mol Diagn 2024; 24:753-765. [PMID: 39194060 DOI: 10.1080/14737159.2024.2396866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 08/22/2024] [Indexed: 08/29/2024]
Abstract
INTRODUCTION Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS) offers high-throughput screening and high-sensitivity detection of genetic etiologies of SNHI, enabling clinicians to make informed medical decisions, provide tailored treatments, and improve prognostic outcomes. AREAS COVERED This review covers the diverse etiologies of HHI and the utility of different NGS modalities (targeted sequencing and whole exome/genome sequencing), and includes HHI-related studies on newborn screening, genetic counseling, prognostic prediction, and personalized treatment. Challenges such as the trade-off between cost and diagnostic yield, detection of structural variants, and exploration of the non-coding genome are also highlighted. EXPERT OPINION In the current landscape of NGS-based diagnostics for HHI, there are both challenges (e.g. detection of structural variants and non-coding genome variants) and opportunities (e.g. the emergence of medical artificial intelligence tools). The authors advocate the use of technological advances such as long-read sequencing for structural variant detection, multi-omics analysis for non-coding variant exploration, and medical artificial intelligence for pathogenicity assessment and outcome prediction. By integrating these innovations into clinical practice, precision medicine in the diagnosis and management of HHI can be further improved.
Collapse
Affiliation(s)
- Cheng-Yu Tsai
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
| | - Jacob Shu-Jui Hsu
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Pei-Lung Chen
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University College of Medicine, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Institute of Molecular Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Chen-Chi Wu
- Department of Otolaryngology, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Department of Medical Research, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
- Department of Otolaryngology, National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| |
Collapse
|
2
|
Duan J, Zhao X, Wu X. LoRA-TV: read depth profile-based clustering of tumor cells in single-cell sequencing. Brief Bioinform 2024; 25:bbae277. [PMID: 38877886 PMCID: PMC11179121 DOI: 10.1093/bib/bbae277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/17/2024] [Accepted: 05/29/2024] [Indexed: 06/18/2024] Open
Abstract
Single-cell sequencing has revolutionized our ability to dissect the heterogeneity within tumor populations. In this study, we present LoRA-TV (Low Rank Approximation with Total Variation), a novel method for clustering tumor cells based on the read depth profiles derived from single-cell sequencing data. Traditional analysis pipelines process read depth profiles of each cell individually. By aggregating shared genomic signatures distributed among individual cells using low-rank optimization and robust smoothing, the proposed method enhances clustering performance. Results from analyses of both simulated and real data demonstrate its effectiveness compared with state-of-the-art alternatives, as supported by improvements in the adjusted Rand index and computational efficiency.
Collapse
Affiliation(s)
- Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xinrui Zhao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xiaoming Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
3
|
Myers MA, Arnold BJ, Bansal V, Balaban M, Mullen KM, Zaccaria S, Raphael BJ. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. Genome Biol 2024; 25:130. [PMID: 38773520 PMCID: PMC11110434 DOI: 10.1186/s13059-024-03267-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 05/03/2024] [Indexed: 05/24/2024] Open
Abstract
Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.
Collapse
Affiliation(s)
- Matthew A Myers
- Department of Computer Science, Princeton University, Princeton, USA
| | - Brian J Arnold
- Center for Statistics and Machine Learning, Princeton University, Princeton, USA
| | - Vineet Bansal
- Princeton Research Computing, Princeton University, Princeton, NJ, USA
| | - Metin Balaban
- Department of Computer Science, Princeton University, Princeton, USA
| | - Katelyn M Mullen
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simone Zaccaria
- Computational Cancer Genomics Research Group, University College London Cancer Institute, London, UK.
| | | |
Collapse
|
4
|
Lynch A, Bradford S, Burkard ME. The reckoning of chromosomal instability: past, present, future. Chromosome Res 2024; 32:2. [PMID: 38367036 DOI: 10.1007/s10577-024-09746-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 01/11/2024] [Accepted: 01/27/2024] [Indexed: 02/19/2024]
Abstract
Quantitative measures of CIN are crucial to our understanding of its role in cancer. Technological advances have changed the way CIN is quantified, offering increased accuracy and insight. Here, we review measures of CIN through its rise as a field, discuss considerations for its measurement, and look forward to future quantification of CIN.
Collapse
Affiliation(s)
- Andrew Lynch
- UW Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- McArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI, USA
- Division of Hematology/Oncology, Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA
| | - Shermineh Bradford
- UW Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- McArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI, USA
- Division of Hematology/Oncology, Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA
| | - Mark E Burkard
- UW Carbone Cancer Center, University of Wisconsin, Madison, WI, USA.
- McArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI, USA.
- Division of Hematology/Oncology, Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA.
| |
Collapse
|
5
|
Zhang Y, Liu W, Duan J. On the core segmentation algorithms of copy number variation detection tools. Brief Bioinform 2024; 25:bbae022. [PMID: 38340093 PMCID: PMC10858679 DOI: 10.1093/bib/bbae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/26/2023] [Indexed: 02/12/2024] Open
Abstract
Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.
Collapse
Affiliation(s)
- Yibo Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Wenyu Liu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
6
|
Petrini S, Righi C, Mészáros I, D’Errico F, Tamás V, Pela M, Olasz F, Gallardo C, Fernandez-Pinero J, Göltl E, Magyar T, Feliziani F, Zádori Z. The Production of Recombinant African Swine Fever Virus Lv17/WB/Rie1 Strains and Their In Vitro and In Vivo Characterizations. Vaccines (Basel) 2023; 11:1860. [PMID: 38140263 PMCID: PMC10748256 DOI: 10.3390/vaccines11121860] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/08/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Lv17/WB/Rie1-Δ24 was produced via illegitimate recombination mediated by low-dilution serial passage in the Cos7 cell line and isolated on PAM cell culture. The virus contains a huge ~26.4 Kb deletion in the left end of its genome. Lv17/WB/Rie1-ΔCD-ΔGL was generated via homologous recombination, crossing two ASFV strains (Lv17/WB/Rie1-ΔCD and Lv17/WB/Rie1-ΔGL containing eGFP and mCherry markers) during PAM co-infection. The presence of unique parental markers in the Lv17/WB/Rie1-ΔCD-ΔGL genome indicates at least two recombination events during the crossing, suggesting that homologous recombination is a relatively frequent event in the ASFV genome during replication in PAM. Pigs infected with Lv17/WB/Rie1-Δ24 and Lv17/WB/Rie1/ΔCD-ΔGL strains have shown mild clinical signs despite that ASFV could not be detected in their sera until a challenge infection with the Armenia/07 ASFV strain. The two viruses were not able to induce protective immunity in pigs against a virulent Armenia/07 challenge.
Collapse
Affiliation(s)
- Stefano Petrini
- National Reference Centre for Pestiviruses and Asfivirus, Istituto Zooprofilattico Sperimentale Umbria-Marche “Togo Rosati”, Via Gaetano Salvemini, 1, 06126 Perugia, Italy; (S.P.); (C.R.); (F.D.); (M.P.)
| | - Cecilia Righi
- National Reference Centre for Pestiviruses and Asfivirus, Istituto Zooprofilattico Sperimentale Umbria-Marche “Togo Rosati”, Via Gaetano Salvemini, 1, 06126 Perugia, Italy; (S.P.); (C.R.); (F.D.); (M.P.)
| | - István Mészáros
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| | - Federica D’Errico
- National Reference Centre for Pestiviruses and Asfivirus, Istituto Zooprofilattico Sperimentale Umbria-Marche “Togo Rosati”, Via Gaetano Salvemini, 1, 06126 Perugia, Italy; (S.P.); (C.R.); (F.D.); (M.P.)
| | - Vivien Tamás
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| | - Michela Pela
- National Reference Centre for Pestiviruses and Asfivirus, Istituto Zooprofilattico Sperimentale Umbria-Marche “Togo Rosati”, Via Gaetano Salvemini, 1, 06126 Perugia, Italy; (S.P.); (C.R.); (F.D.); (M.P.)
| | - Ferenc Olasz
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| | - Carmina Gallardo
- European Union Reference Laboratory for ASF (EURL-ASF), Centro de Investigación en Sanidad Animal (CISA-INIA, CSIC), Valdeolmos, 28130 Madrid, Spain; (C.G.)
| | - Jovita Fernandez-Pinero
- European Union Reference Laboratory for ASF (EURL-ASF), Centro de Investigación en Sanidad Animal (CISA-INIA, CSIC), Valdeolmos, 28130 Madrid, Spain; (C.G.)
| | - Eszter Göltl
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| | - Tibor Magyar
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| | - Francesco Feliziani
- National Reference Centre for Pestiviruses and Asfivirus, Istituto Zooprofilattico Sperimentale Umbria-Marche “Togo Rosati”, Via Gaetano Salvemini, 1, 06126 Perugia, Italy; (S.P.); (C.R.); (F.D.); (M.P.)
| | - Zoltán Zádori
- HUN-REN Veterinary Medical Research Institute (VMRI), Hungária krt. 21, 1143 Budapest, Hungary; (I.M.); (V.T.); (F.O.); (E.G.); (T.M.)
| |
Collapse
|
7
|
Schlebusch SA, Rídl J, Poignet M, Ruiz-Ruano FJ, Reif J, Pajer P, Pačes J, Albrecht T, Suh A, Reifová R. Rapid gene content turnover on the germline-restricted chromosome in songbirds. Nat Commun 2023; 14:4579. [PMID: 37516764 PMCID: PMC10387091 DOI: 10.1038/s41467-023-40308-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 07/20/2023] [Indexed: 07/31/2023] Open
Abstract
The germline-restricted chromosome (GRC) of songbirds represents a taxonomically widespread example of programmed DNA elimination. Despite its apparent indispensability, we still know very little about the GRC's genetic composition, function, and evolutionary significance. Here we assemble the GRC in two closely related species, the common and thrush nightingale. In total we identify 192 genes across the two GRCs, with many of them present in multiple copies. Interestingly, the GRC appears to be under little selective pressure, with the genetic content differing dramatically between the two species and many GRC genes appearing to be pseudogenized fragments. Only one gene, cpeb1, has a complete coding region in all examined individuals of the two species and shows no copy number variation. The acquisition of this gene by the GRC corresponds with the earliest estimates of the GRC origin, making it a good candidate for the functional indispensability of the GRC in songbirds.
Collapse
Affiliation(s)
- Stephen A Schlebusch
- Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic.
| | - Jakub Rídl
- Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Manon Poignet
- Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Francisco J Ruiz-Ruano
- School of Biological Sciences, University of East Anglia, Norwich, UK
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Science for Life Laboratory, Uppsala University, Norbyvägen 18D, 752 36, Uppsala, Sweden
- Institute of Evolutionary Biology and Ecology, University of Bonn, An der Immenburg 1, 53121, Bonn, Germany
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 127, 53113, Bonn, Germany
| | - Jiří Reif
- Institute for Environmental Studies, Faculty of Science, Charles University, Prague, Czech Republic
- Department of Zoology, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Petr Pajer
- Military Health Institute, Military Medical Agency, Tychonova 1, 160 01, Prague 6, San Antonio, Czech Republic
| | - Jan Pačes
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Tomáš Albrecht
- Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic
| | - Alexander Suh
- School of Biological Sciences, University of East Anglia, Norwich, UK
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Science for Life Laboratory, Uppsala University, Norbyvägen 18D, 752 36, Uppsala, Sweden
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 127, 53113, Bonn, Germany
| | - Radka Reifová
- Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic.
| |
Collapse
|
8
|
Myers MA, Arnold BJ, Bansal V, Mullen KM, Zaccaria S, Raphael BJ. HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548855. [PMID: 37502835 PMCID: PMC10370020 DOI: 10.1101/2023.07.13.548855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Multi-region DNA sequencing of primary tumors and metastases from individual patients helps identify somatic aberrations driving cancer development. However, most methods to infer copy-number aberrations (CNAs) analyze individual samples. We introduce HATCHet2 to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 introduces a novel statistic, the mirrored haplotype B-allele frequency (mhBAF), to identify mirrored-subclonal CNAs having different numbers of copies of parental haplotypes in different tumor clones. HATCHet2 also has high accuracy in identifying focal CNAs and extends the earlier HATCHet method in several directions. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 50 prostate cancer samples from 10 patients reveals previously-unreported mirrored-subclonal CNAs affecting cancer genes.
Collapse
Affiliation(s)
- Matthew A. Myers
- Department of Computer Science, Princeton University, Princeton, USA
| | - Brian J. Arnold
- Center for Statistics and Machine Learning, Princeton University, Princeton, USA
| | - Vineet Bansal
- Princeton Research Computing, Princeton University, Princeton, NJ, USA
| | - Katelyn M. Mullen
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simone Zaccaria
- Computational Cancer Genomics Research Group, University College London Cancer Institute, London, UK
| | | |
Collapse
|
9
|
Noninvasive Prenatal Screening for Common Fetal Aneuploidies Using Single-Molecule Sequencing. J Transl Med 2023; 103:100043. [PMID: 36870287 DOI: 10.1016/j.labinv.2022.100043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/07/2022] [Accepted: 12/10/2022] [Indexed: 01/11/2023] Open
Abstract
Amplification biases caused by next-generation sequencing (NGS) for noninvasive prenatal screening (NIPS) may be reduced using single-molecule sequencing (SMS), during which PCR is omitted. Therefore, the performance of SMS-based NIPS was evaluated. We used SMS-based NIPS to screen for common fetal aneuploidies in 477 pregnant women. The sensitivity, specificity, positive predictive value, and negative predictive value were estimated. The GC-induced bias was compared between the SMS- and NGS-based NIPS methods. Notably, a sensitivity of 100% was achieved for fetal trisomy 13 (T13), trisomy 18 (T18), and trisomy 21 (T21). The positive predictive value was 46.15% for T13, 96.77% for T18, and 99.07% for T21. The overall specificity was 100% (334/334). Compared with NGS, SMS (without PCR) had less GC bias, a better distinction between T21 or T18 and euploidies, and better diagnostic performance. Overall, our results suggest that SMS improves the performance of NIPS for common fetal aneuploidies by reducing the GC bias introduced during library preparation and sequencing.
Collapse
|
10
|
Söylev A, Çokoglu SS, Koptekin D, Alkan C, Somel M. CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data. PLoS Comput Biol 2022; 18:e1010788. [PMID: 36516232 PMCID: PMC9873172 DOI: 10.1371/journal.pcbi.1010788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 01/24/2023] [Accepted: 12/03/2022] [Indexed: 12/15/2022] Open
Abstract
To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage.
Collapse
Affiliation(s)
- Arda Söylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- * E-mail: (AS); (MS)
| | | | - Dilek Koptekin
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | - Mehmet Somel
- Department of Biology, Middle East Technical University, Ankara, Turkey
- * E-mail: (AS); (MS)
| |
Collapse
|
11
|
Ming C, Wang M, Wang Q, Neff R, Wang E, Shen Q, Reddy JS, Wang X, Allen M, Ertekin‐Taner N, De Jager PL, Bennett DA, Haroutunian V, Schadt E, Zhang B. Whole genome sequencing-based copy number variations reveal novel pathways and targets in Alzheimer's disease. Alzheimers Dement 2022; 18:1846-1867. [PMID: 34918867 PMCID: PMC9264340 DOI: 10.1002/alz.12507] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 09/21/2021] [Accepted: 09/21/2021] [Indexed: 01/28/2023]
Abstract
INTRODUCTION A few copy number variations (CNVs) have been reported for Alzheimer's disease (AD). However, there is a lack of a systematic investigation of CNVs in AD based on whole genome sequencing (WGS) data. METHODS We used four methods to identify consensus CNVs from the WGS data of 1,411 individuals and further investigated their functional roles in AD using the matched transcriptomic and clinicopathological data. RESULTS We identified 3,012 rare AD-specific CNVs whose residing genes are enriched for cellular glucuronidation and neuron projection pathways. Genes whose mRNA expressions are significantly correlated with common CNVs are involved in major histocompatibility complex class II receptor activity. Integration of CNVs, gene expression, and clinical and pathological traits further pinpoints a key CNV that potentially regulates immune response in AD. DISCUSSION We identify CNVs as potential genetic regulators of immune response in AD. The identified CNVs and their downstream gene networks reveal novel pathways and targets for AD.
Collapse
Affiliation(s)
- Chen Ming
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Minghui Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Qian Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ryan Neff
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Erming Wang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Qi Shen
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Joseph S. Reddy
- Department of Quantitative Health SciencesMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Xue Wang
- Department of Quantitative Health SciencesMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Mariet Allen
- Department of NeuroscienceMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Nilüfer Ertekin‐Taner
- Department of NeuroscienceMayo Clinic FloridaJacksonvilleFloridaUSA
- Department of NeurologyMayo Clinic FloridaJacksonvilleFloridaUSA
| | - Philip L. De Jager
- Center for Translational & Computational NeuroimmunologyDepartment of Neurology and the Taub InstituteColumbia University Medical CenterNew YorkNew YorkUSA
- The Broad Institute of MIT and HarvardCambridgeMassachusettsUSA
| | - David A. Bennett
- Rush Alzheimer's Disease CenterRush University Medical CenterChicagoIllinoisUSA
| | - Vahram Haroutunian
- Nash Family Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Alzheimer's Disease Research CenterIcahn School of Medicine at Mount SinaiNew YorkNew York
- PsychiatryJJ Peters VA Medical CenterBronxNew YorkUSA
| | - Eric Schadt
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute of Genomics and Multiscale BiologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
12
|
Cosenza MR, Rodriguez-Martin B, Korbel JO. Structural Variation in Cancer: Role, Prevalence, and Mechanisms. Annu Rev Genomics Hum Genet 2022; 23:123-152. [DOI: 10.1146/annurev-genom-120121-101149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Somatic rearrangements resulting in genomic structural variation drive malignant phenotypes by altering the expression or function of cancer genes. Pan-cancer studies have revealed that structural variants (SVs) are the predominant class of driver mutation in most cancer types, but because they are difficult to discover, they remain understudied when compared with point mutations. This review provides an overview of the current knowledge of somatic SVs, discussing their primary roles, prevalence in different contexts, and mutational mechanisms. SVs arise throughout the life history of cancer, and 55% of driver mutations uncovered by the Pan-Cancer Analysis of Whole Genomes project represent SVs. Leveraging the convergence of cell biology and genomics, we propose a mechanistic classification of somatic SVs, from simple to highly complex DNA rearrangement classes. The actions of DNA repair and DNA replication processes together with mitotic errors result in a rich spectrum of SV formation processes, with cascading effects mediating extensive structural diversity after an initiating DNA lesion has formed. Thanks to new sequencing technologies, including the sequencing of single-cell genomes, open questions about the molecular triggers and the biomolecules involved in SV formation as well as their mutational rates can now be addressed. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 23 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
| | | | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
13
|
Identification of Copy Number Alterations from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:55-74. [DOI: 10.1007/978-3-030-91836-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Tu J, Yang Z, Lu N, Lu Z. Improving the efficiency of single-cell genome sequencing based on overlapping pooling strategy and CNV analysis. ROYAL SOCIETY OPEN SCIENCE 2022; 9:211330. [PMID: 35116153 PMCID: PMC8790377 DOI: 10.1098/rsos.211330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 01/04/2022] [Indexed: 06/14/2023]
Abstract
Single-cell genome sequencing has become a useful tool in medicine and biology studies. However, an independent library is required for each cell in single-cell genome sequencing, so that the cost grows with the number of cells. In this study, we report a study which efficiently analyses single-cell copy number variation (CNV) using overlapping pooling strategy and branch and bound (B&B) algorithm. Single cells were overlapped pooled before sequencing, and later were assorted into specific types by estimating their CNV patterns by B&B algorithm. Instead of constructing libraries for each cell, a library is required only for each pool. As the number of pools is smaller than the cells, fewer libraries are required, which means lower cost. Through computer simulations, we overlapped pooled 80 cells into 40 or 27 pools and classified them into cell types based on CNV pattern. The results showed that 84% cells in 40 pools and 76.5% cells in 27 pools were correctly classified on average, while only half or one-third of the sequencing libraries were required. Combining with traditional approaches, our method is expected to significantly improve the efficiency of single-cell genome sequencing.
Collapse
Affiliation(s)
- Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China
| | - Zengyan Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China
| | - Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China
| |
Collapse
|
15
|
Baslan T, Kovaka S, Sedlazeck FJ, Zhang Y, Wappel R, Tian S, Lowe SW, Goodwin S, Schatz MC. High resolution copy number inference in cancer using short-molecule nanopore sequencing. Nucleic Acids Res 2021; 49:e124. [PMID: 34551429 PMCID: PMC8643650 DOI: 10.1093/nar/gkab812] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 07/19/2021] [Accepted: 09/09/2021] [Indexed: 01/23/2023] Open
Abstract
Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.
Collapse
Affiliation(s)
- Timour Baslan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yanming Zhang
- Cytogenetics Laboratory, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sha Tian
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
16
|
Rosa-Teijeiro C, Wagner V, Corbeil A, d'Annessa I, Leprohon P, do Monte-Neto RL, Fernandez-Prada C. Three different mutations in the DNA topoisomerase 1B in Leishmania infantum contribute to resistance to antitumor drug topotecan. Parasit Vectors 2021; 14:438. [PMID: 34454601 PMCID: PMC8399852 DOI: 10.1186/s13071-021-04947-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/11/2021] [Indexed: 11/30/2022] Open
Abstract
Background The evolution of drug resistance is one of the biggest challenges in leishmaniasis and has prompted the need for new antileishmanial drugs. Repurposing of approved drugs is a faster and very attractive strategy that is gaining supporters worldwide. Different anticancer topoisomerase 1B (TOP1B) inhibitors have shown strong antileishmanial activity and promising selective indices, supporting the potential repurposing of these drugs. However, cancer cells and Leishmania share the ability to become rapidly resistant. The aim of this study was to complete a whole-genome exploration of the effects caused by exposure to topotecan in order to highlight the potential mechanisms deployed by Leishmania to favor its survival in the presence of a TOP1B inhibitor. Methods We used a combination of stepwise drug resistance selection, whole-genome sequencing, functional validation, and theoretical approaches to explore the propensity of and potential mechanisms deployed by three independent clones of L. infantum to resist the action of TOP1B inhibitor topotecan. Results We demonstrated that L. infantum is capable of becoming resistant to high concentrations of topotecan without impaired growth ability. No gene deletions or amplifications were identified from the next-generation sequencing data in any of the three resistant lines, ruling out the overexpression of efflux pumps as the preferred mechanism of topotecan resistance. We identified three different mutations in the large subunit of the leishmanial TOP1B (Top1BF187Y, Top1BG191A, and Top1BW232R). Overexpression of these mutated alleles in the wild-type background led to high levels of resistance to topotecan. Computational molecular dynamics simulations, in both covalent and non-covalent complexes, showed that these mutations have an effect on the arrangement of the catalytic pentad and on the interaction of these residues with surrounding amino acids and DNA. This altered architecture of the binding pocket results in decreased persistence of topotecan in the ternary complex. Conclusions This work helps elucidate the previously unclear potential mechanisms of topotecan resistance in Leishmania by mutations in the large subunit of TOP1B and provides a valuable clue for the design of improved inhibitors to combat resistance in both leishmaniasis and cancer. Our data highlights the importance of including drug resistance evaluation in drug discovery cascades. Graphical abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04947-4.
Collapse
Affiliation(s)
- Chloé Rosa-Teijeiro
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada.,The Research Group on Infectious Diseases in Production Animals (GREMIP), Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada
| | - Victoria Wagner
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada.,The Research Group on Infectious Diseases in Production Animals (GREMIP), Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada
| | - Audrey Corbeil
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada.,The Research Group on Infectious Diseases in Production Animals (GREMIP), Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada
| | - Ilda d'Annessa
- Medtronic EMEA, Study and Scientific Solutions, Milan, Italy
| | - Philippe Leprohon
- Centre de Recherche en Infectiologie du Centre de Recherche du Centre Hospitalier Universitaire de Québec, Université Laval, Quebec City, Canada
| | | | - Christopher Fernandez-Prada
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada. .,The Research Group on Infectious Diseases in Production Animals (GREMIP), Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, QC, Canada. .,Department of Microbiology and Immunology, Faculty of Medicine, McGill University, Montréal, QC, Canada.
| |
Collapse
|
17
|
Filer DL, Kuo F, Brandt AT, Tilley CR, Mieczkowski PA, Berg JS, Robasky K, Li Y, Bizon C, Tilson JL, Powell BC, Bost DM, Jeffries CD, Wilhelmsen KC. Pre-capture multiplexing provides additional power to detect copy number variation in exome sequencing. BMC Bioinformatics 2021; 22:374. [PMID: 34284719 PMCID: PMC8293537 DOI: 10.1186/s12859-021-04246-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 05/18/2021] [Indexed: 11/10/2022] Open
Abstract
Background As exome sequencing (ES) integrates into clinical practice, we should make every effort to utilize all information generated. Copy-number variation can lead to Mendelian disorders, but small copy-number variants (CNVs) often get overlooked or obscured by under-powered data collection. Many groups have developed methodology for detecting CNVs from ES, but existing methods often perform poorly for small CNVs and rely on large numbers of samples not always available to clinical laboratories. Furthermore, methods often rely on Bayesian approaches requiring user-defined priors in the setting of insufficient prior knowledge. This report first demonstrates the benefit of multiplexed exome capture (pooling samples prior to capture), then presents a novel detection algorithm, mcCNV (“multiplexed capture CNV”), built around multiplexed capture. Results We demonstrate: (1) multiplexed capture reduces inter-sample variance; (2) our mcCNV method, a novel depth-based algorithm for detecting CNVs from multiplexed capture ES data, improves the detection of small CNVs. We contrast our novel approach, agnostic to prior information, with the the commonly-used ExomeDepth. In a simulation study mcCNV demonstrated a favorable false discovery rate (FDR). When compared to calls made from matched genome sequencing, we find the mcCNV algorithm performs comparably to ExomeDepth. Conclusion Implementing multiplexed capture increases power to detect single-exon CNVs. The novel mcCNV algorithm may provide a more favorable FDR than ExomeDepth. The greatest benefits of our approach derive from (1) not requiring a database of reference samples and (2) not requiring prior information about the prevalance or size of variants. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04246-w.
Collapse
Affiliation(s)
- Dayne L Filer
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA. .,Renaissance Computing Institute, Chapel Hill, USA.
| | - Fengshen Kuo
- Renaissance Computing Institute, Chapel Hill, USA
| | - Alicia T Brandt
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA
| | | | | | - Jonathan S Berg
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA
| | - Kimberly Robasky
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA.,Renaissance Computing Institute, Chapel Hill, USA.,UNC School of Information and Library Science, Chapel Hill, USA
| | - Yun Li
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA.,Department of Biostatistics, UNC Gillings School of Global Public Health, Chapel Hill, USA
| | - Chris Bizon
- Renaissance Computing Institute, Chapel Hill, USA
| | | | - Bradford C Powell
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA.,Renaissance Computing Institute, Chapel Hill, USA
| | - Darius M Bost
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA.,Renaissance Computing Institute, Chapel Hill, USA
| | | | - Kirk C Wilhelmsen
- Department of Genetics, UNC School of Medicine, Chapel Hill, USA.,Renaissance Computing Institute, Chapel Hill, USA.,Department of Neurology, UNC School of Medicine, Chapel Hill, USA
| |
Collapse
|
18
|
Liu G, Zhang J. A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:699510. [PMID: 34262604 PMCID: PMC8273656 DOI: 10.3389/fgene.2021.699510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open
Abstract
The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey's fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.
Collapse
Affiliation(s)
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
19
|
Hadi K, Yao X, Behr JM, Deshpande A, Xanthopoulakis C, Tian H, Kudman S, Rosiene J, Darmofal M, DeRose J, Mortensen R, Adney EM, Shaiber A, Gajic Z, Sigouros M, Eng K, Wala JA, Wrzeszczyński KO, Arora K, Shah M, Emde AK, Felice V, Frank MO, Darnell RB, Ghandi M, Huang F, Dewhurst S, Maciejowski J, de Lange T, Setton J, Riaz N, Reis-Filho JS, Powell S, Knowles DA, Reznik E, Mishra B, Beroukhim R, Zody MC, Robine N, Oman KM, Sanchez CA, Kuhner MK, Smith LP, Galipeau PC, Paulson TG, Reid BJ, Li X, Wilkes D, Sboner A, Mosquera JM, Elemento O, Imielinski M. Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs. Cell 2021; 183:197-210.e32. [PMID: 33007263 DOI: 10.1016/j.cell.2020.08.006] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 04/08/2020] [Accepted: 08/03/2020] [Indexed: 12/12/2022]
Abstract
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Collapse
Affiliation(s)
- Kevin Hadi
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Xiaotong Yao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Julie M Behr
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Aditya Deshpande
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Huasong Tian
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Sarah Kudman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Joel Rosiene
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Madison Darmofal
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Tri-institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | | | | - Emily M Adney
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA
| | - Alon Shaiber
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zoran Gajic
- New York Genome Center, New York, NY 10013, USA
| | - Michael Sigouros
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Kenneth Eng
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Jeremiah A Wala
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Departments of Medical Oncology and Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | | | | | - Minita Shah
- New York Genome Center, New York, NY 10013, USA
| | | | | | - Mayu O Frank
- New York Genome Center, New York, NY 10013, USA; Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Robert B Darnell
- New York Genome Center, New York, NY 10013, USA; Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10065, USA
| | - Mahmoud Ghandi
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Franklin Huang
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Sally Dewhurst
- Laboratory of Cell Biology and Genetics, The Rockefeller University, New York, NY 10065, USA
| | - John Maciejowski
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Titia de Lange
- Laboratory of Cell Biology and Genetics, The Rockefeller University, New York, NY 10065, USA
| | - Jeremy Setton
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Immunogenomics and Precision Oncology Platform, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jorge S Reis-Filho
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Simon Powell
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David A Knowles
- New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Ed Reznik
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Bud Mishra
- Departments of Computer Science, Mathematics and Cell Biology, Courant Institute and NYU School of Medicine, New York University, New York, NY 10012, USA
| | - Rameen Beroukhim
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Departments of Medical Oncology and Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | | | | | - Kenji M Oman
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Carissa A Sanchez
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Mary K Kuhner
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Lucian P Smith
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Patricia C Galipeau
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Thomas G Paulson
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Brian J Reid
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Xiaohong Li
- Divisions of Human Biology and Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - David Wilkes
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Andrea Sboner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Juan Miguel Mosquera
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Olivier Elemento
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Marcin Imielinski
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA; New York Genome Center, New York, NY 10013, USA; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA.
| |
Collapse
|
20
|
Halliwell JA, Baker D, Judge K, Quail MA, Oliver K, Betteridge E, Skelton J, Andrews PW, Barbaric I. Nanopore Sequencing Indicates That Tandem Amplification of Chromosome 20q11.21 in Human Pluripotent Stem Cells Is Driven by Break-Induced Replication. Stem Cells Dev 2021; 30:578-586. [PMID: 33757297 PMCID: PMC8165465 DOI: 10.1089/scd.2021.0013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Copy number variants (CNVs) are genomic rearrangements implicated in numerous congenital and acquired diseases, including cancer. The appearance of culture-acquired CNVs in human pluripotent stem cells (PSCs) has prompted concerns for their use in regenerative medicine. A particular problem in PSC is the frequent occurrence of CNVs in the q11.21 region of chromosome 20. However, the exact mechanism of origin of this amplicon remains elusive due to the difficulty in delineating its sequence and breakpoints. Here, we have addressed this problem using long-read Nanopore sequencing of two examples of this CNV, present as duplication and as triplication. In both cases, the CNVs were arranged in a head-to-tail orientation, with microhomology sequences flanking or overlapping the proximal and distal breakpoints. These breakpoint signatures point to a mechanism of microhomology-mediated break-induced replication in CNV formation, with surrounding Alu sequences likely contributing to the instability of this genomic region.
Collapse
Affiliation(s)
- Jason A Halliwell
- Department of Biomedical Science, University of Sheffield, Sheffield, United Kingdom
| | - Duncan Baker
- Sheffield Diagnostic Genetic Services, Sheffield Children's Hospital, Sheffield, United Kingdom
| | - Kim Judge
- Department of Sequencing R & D, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Michael A Quail
- Department of Sequencing R & D, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Karen Oliver
- Department of Sequencing R & D, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Emma Betteridge
- Department of Sequencing R & D, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Jason Skelton
- Department of Sequencing R & D, Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Peter W Andrews
- Department of Biomedical Science, University of Sheffield, Sheffield, United Kingdom
| | - Ivana Barbaric
- Department of Biomedical Science, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
21
|
Yan C, He J, Luo J, Wang J, Zhang G, Luo H. SIns: A Novel Insertion Detection Approach Based on Soft-Clipped Reads. Front Genet 2021; 12:665812. [PMID: 33995493 PMCID: PMC8120196 DOI: 10.3389/fgene.2021.665812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
As a common type of structural variation, an insertion refers to the addition of a DNA sequence into an individual genome and is usually associated with some inherited diseases. In recent years, many methods have been proposed for detecting insertions. However, the accurate calling of insertions is also a challenging task. In this study, we propose a novel insertion detection approach based on soft-clipped reads, which is called SIns. First, based on the alignments between paired reads and the reference genome, SIns extracts breakpoints from soft-clipped reads and determines insertion locations. The insert size information about paired reads is then further clustered to determine the genotype, and SIns subsequently adopts Minia to assemble the insertion sequences. Experimental results show that SIns can achieve better performance than other methods in terms of the F-score value for simulated and true datasets.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junyi He
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
22
|
Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
Collapse
|
23
|
Tarabichi M, Salcedo A, Deshwar AG, Leathlobhair MN, Wintersinger J, Wedge DC, Loo PV, Morris QD, Boutros PC. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021; 18:144-155. [PMID: 33398189 PMCID: PMC7867630 DOI: 10.1038/s41592-020-01013-2] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 11/09/2020] [Indexed: 01/28/2023]
Abstract
Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Adriana Salcedo
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Amit G. Deshwar
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto, Toronto, Canada
| | - Máire Ni Leathlobhair
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Ludwig Institute for Cancer Research, University of Oxford, Oxford, United Kingdom
| | - Jeff Wintersinger
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - David C. Wedge
- Big Data Institute, University of Oxford, Oxford, United Kingdom
- Oxford NIHR Biomedical Research Centre, Oxford, United Kingdom
- Manchester Cancer Research Centre, University of Manchester, Manchester, United Kingdom
| | | | - Quaid D. Morris
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Vector Institute, Toronto, Canada
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Paul C. Boutros
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Department of Human Genetics, University of California, Los Angeles
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles
- Institute for Precision Health, University of California, Los Angeles
- Vector Institute, Toronto, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada
- Department of Urology, David Geffen School of Medicine, University of California, Los Angeles
| |
Collapse
|
24
|
Statistical Considerations on NGS Data for Inferring Copy Number Variations. Methods Mol Biol 2021; 2243:27-58. [PMID: 33606251 DOI: 10.1007/978-1-0716-1103-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics, resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinformaticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for data modeling and interpretation.To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects (or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy number variations using NGS data.
Collapse
|
25
|
Mu Q, Wang J. CNAPE: A Machine Learning Method for Copy Number Alteration Prediction from Gene Expression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:306-311. [PMID: 31581092 DOI: 10.1109/tcbb.2019.2944827] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Detection of DNA copy number alteration in cancer cells is critical to understanding cancer initiation and progression. Widely used methods, such as DNA arrays and genomic DNA sequencing, are relatively expensive and require DNA samples at a microgram level, which are not avaiblable in certain situations like clinical biopsies or single-cell genomes. Here, we developed an alternative method-CNAPE to computationally infer copy number alterations from gene expression data. A prior knowledge-aided machine learning model was proposed, trained and tested on 9,740 cancer samples from The Cancer Genome Atlas. We then applied CNAPE to study gliomas, the most common and aggressive brain cancer in adult. Particularly, using RNA sequencing data, CNAPE respectively predicted DNA copy number of chromosomes, chromosomal arms, and 12 commonly altered genes, and achieved over 80 percent accuracy in almost all broad regions and some focal regions. CNAPE was developed as an easy-to-use tool at https://github.com/WangLabHKUST/CNAPE.
Collapse
|
26
|
Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020; 11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open
Abstract
Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.
Collapse
Affiliation(s)
- Guojun Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chao Wei
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
27
|
S100A9 Upregulation Contributes to Learning and Memory Impairments by Promoting Microglia M1 Polarization in Sepsis Survivor Mice. Inflammation 2020; 44:307-320. [PMID: 32918665 DOI: 10.1007/s10753-020-01334-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/16/2020] [Accepted: 08/25/2020] [Indexed: 12/17/2022]
Abstract
Sepsis-associated encephalopathy (SAE) is a clinical syndrome of brain dysfunction secondary to sepsis, which is characterized by long-term neurocognitive deficits such as memory, attention, and executive dysfunction. However, the mechanisms underlying SAE remain unclear. By using transcriptome sequencing approach, we showed that hippocampal S100A9 was significantly increased in sepsis induced by cecal ligation and puncture (CLP) or lipopolysaccharide (LPS) challenge. Thus, we used S100A9 inhibitor Paquinimod to study the role of S100A9 in cognitive impairments in CLP-induced and LPS-induced mice models of SAE. Sepsis survivor mice underwent behavioral tests or the hippocampal tissues subjected to Western blotting, real-time quantitative PCR, and immunohistochemistry. Our results showed that CLP-induced and LPS-induced memory impairments were accompanied with increased expressions of hippocampal microglia Iba1 and CD86 (M1 markers), but reduced expression of Arg1 (M2 marker). Notably, S100A9 inhibition significantly improved the survival rate and learning and memory impairments in sepsis survivors, with a shift from M1 to M2 phenotype. Taken together, our study suggests that S100A9 upregulation might contribute to learning and memory impairments by promoting microglia M1 polarization in sepsis survivors, whereas S100A9 inhibition might provide a potential therapeutic target for SAE.
Collapse
|
28
|
Zaccaria S, Raphael BJ. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat Commun 2020; 11:4301. [PMID: 32879317 PMCID: PMC7468132 DOI: 10.1038/s41467-020-17967-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022] Open
Abstract
Copy-number aberrations (CNAs) and whole-genome duplications (WGDs) are frequent somatic mutations in cancer but their quantification from DNA sequencing of bulk tumor samples is challenging. Standard methods for CNA inference analyze tumor samples individually; however, DNA sequencing of multiple samples from a cancer patient has recently become more common. We introduce HATCHet (Holistic Allele-specific Tumor Copy-number Heterogeneity), an algorithm that infers allele- and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient. We show that HATCHet outperforms current state-of-the-art methods on multi-sample DNA sequencing data that we simulate using MASCoTE (Multiple Allele-specific Simulation of Copy-number Tumor Evolution). Applying HATCHet to 84 tumor samples from 14 prostate and pancreas cancer patients, we identify subclonal CNAs and WGDs that are more plausible than previously published analyses and more consistent with somatic single-nucleotide variants (SNVs) and small indels in the same samples.
Collapse
Affiliation(s)
- Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
29
|
Thermodynamic energetics underlying genomic instability and whole-genome doubling in cancer. Proc Natl Acad Sci U S A 2020; 117:18880-18890. [PMID: 32694208 DOI: 10.1073/pnas.1920870117] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genomic instability contributes to tumorigenesis through the amplification and deletion of cancer driver genes. DNA copy number (CN) profiling of ensembles of tumors allows a thermodynamic analysis of the profile for each tumor. The free energy of the distribution of CNs is found to be a monotonically increasing function of the average chromosomal ploidy. The dependence is universal across several cancer types. Surprisal analysis distinguishes two main known subgroups: tumors with cells that have or have not undergone whole-genome duplication (WGD). The analysis uncovers that CN states having a narrower distribution are energetically more favorable toward the WGD transition. Surprisal analysis also determines the deviations from a fully stable-state distribution. These deviations reflect constraints imposed by tumor fitness selection pressures. The results point to CN changes that are more common in high-ploidy tumors and thus support altered selection pressures upon WGD.
Collapse
|
30
|
Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data. PLoS Comput Biol 2020; 16:e1008012. [PMID: 32658894 PMCID: PMC7377518 DOI: 10.1371/journal.pcbi.1008012] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 07/23/2020] [Accepted: 06/03/2020] [Indexed: 12/22/2022] Open
Abstract
Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods–Ginkgo, HMMcopy, and CopyNumber–on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient. Copy number aberrations, or CNAs, refer to evolutionary events that act on cancer genomes by deleting segments of the genomes or introducing new copies of existing segments. These events have been implicated in various types of cancer; consequently, their accurate detection could shed light on the initiation and progression of tumor, as well as on the development of potential targeted therapeutics. Single-cell DNA sequencing technologies are now producing the type of data that would allow such detection at the resolution of individual cells. However, to achieve this detection task, methods have to implement several steps of “data wrangling” and dealing with technical artifacts. In this work, we benchmarked three widely used methods for CNA detection from single-cell DNA data, namely Ginkgo, HMMcopy, and CopyNumber. To accomplish this study, we developed a novel simulator and devised a phylogeny-based measure of potentially erroneous CNA calls. We find that none of these methods has high accuracy, and all of them can be computationally very demanding. These findings call for the development of more accurate and more efficient methods for CNA detection from single-cell DNA data.
Collapse
|
31
|
Zhang H, Song Y, Du Z, Li X, Zhang J, Chen S, Chen F, Li T, Zhan Q. Exome sequencing identifies new somatic alterations and mutation patterns of tongue squamous cell carcinoma in a Chinese population. J Pathol 2020; 251:353-364. [PMID: 32432340 DOI: 10.1002/path.5467] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 04/08/2020] [Accepted: 05/07/2020] [Indexed: 12/21/2022]
Abstract
Tongue squamous cell carcinoma (TSCC) is an aggressive group of tumors characterized by high rates of regional lymph node metastasis and local recurrence. Emerging evidence has revealed genetic variations of TSCC across different geographical regions due to the impact of multiple risk factors such as chewing betel-quid. However, we know little of the mutational processes of TSCC in the Chinese population without the history of chewing betel-quid/tobacco. To explore the mutational spectrum of this disease, we performed whole-exome sequencing of sample pairs, comprising tumors and normal tissue, from 82 TSCC patients. In addition to identifying seven previously known TSCC-associated genes (TP53, CDKN2A, PIK3CA, NOTCH1, ASXL1, USH2A, and CSMD3), the analysis revealed six new genes (GNAQ, PRG4, RP1, ZNF16, NBEA, and PTPRC) that had not been reported previously in TSCC. Our in vitro experiments identified ZNF16 for the first time as a solid tumor associated gene to promote malignancy of TSCC cells. We also identified a microRNA (miR-585-5p) encoded by the 5q35.1 region and characterized it as a tumor suppressor by targeting SOX9. At least one non-silent mutation of genes involved in the 10 canonical oncogenic pathways (Notch, RTK-RAS, PI3K, Wnt, Cell cycle, p53, Myc, Hippo, TGFβ, and Nrf2) was found in 82.9% of cases. Collectively, our data extend the spectrum of TSCC mutations and define novel diagnosis markers and potential clinical targets for TSCC. © 2020 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Heyu Zhang
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, PR China.,National Clinical Research Center for Oral Diseases, Peking University School and Hospital of Stomatology, Beijing, PR China.,Research Unit of Precision Pathologic Diagnosis in Tumors of the Oral and Maxillofacial Regions, Chinese Academy of Medical Sciences (2019RU034), Beijing, PR China
| | - Yongmei Song
- State Key Laboratory of Molecular Oncology, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| | - Zhenglin Du
- China National Center for Bioinformation & National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China
| | - Xuefen Li
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, PR China
| | - Jianyun Zhang
- Department of Oral Pathology, Peking University School and Hospital of Stomatology, Beijing, PR China.,Research Unit of Precision Pathologic Diagnosis in Tumors of the Oral and Maxillofacial Regions, Chinese Academy of Medical Sciences (2019RU034), Beijing, PR China
| | - Shuai Chen
- Department of Oral Pathology, Peking University School and Hospital of Stomatology, Beijing, PR China
| | - Feng Chen
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, PR China
| | - Tiejun Li
- Central Laboratory, Peking University School and Hospital of Stomatology, Beijing, PR China.,Department of Oral Pathology, Peking University School and Hospital of Stomatology, Beijing, PR China.,National Clinical Research Center for Oral Diseases, Peking University School and Hospital of Stomatology, Beijing, PR China.,Research Unit of Precision Pathologic Diagnosis in Tumors of the Oral and Maxillofacial Regions, Chinese Academy of Medical Sciences (2019RU034), Beijing, PR China
| | - Qimin Zhan
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Laboratory of Molecular Oncology, Peking University Cancer Hospital & Institute, Beijing, PR China.,State Key Laboratory of Molecular Oncology, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, PR China
| |
Collapse
|
32
|
Yuan X, Bai J, Zhang J, Yang L, Duan J, Li Y, Gao M. CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1141-1153. [PMID: 30489272 DOI: 10.1109/tcbb.2018.2883333] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Characterizing copy number variations (CNVs) from sequenced genomes is a both feasible and cost-effective way to search for driver genes in cancer diagnosis. A number of existing algorithms for CNV detection only explored part of the features underlying sequence data and copy number structures, resulting in limited performance. Here, we describe CONDEL, a method for detecting CNVs from single tumor samples using high-throughput sequence data. CONDEL utilizes a novel statistic in combination with a peel-off scheme to assess the statistical significance of genome bins, and adopts a Bayesian approach to infer copy number gains, losses, and deletion zygosity based on statistical mixture models. We compare CONDEL to six peer methods on a large number of simulation datasets, showing improved performance in terms of true positive and false positive rates, and further validate CONDEL on three real datasets derived from the 1000 Genomes Project and the EGA archive. CONDEL obtained higher consistent results in comparison with other three single sample-based methods, and exclusively identified a number of CNVs that were previously associated with cancers. We conclude that CONDEL is a powerful tool for detecting copy number variations on single tumor samples even if these are sequenced at low-coverage.
Collapse
|
33
|
Wei YC, Huang GH. CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths. Sci Rep 2020; 10:10493. [PMID: 32591545 PMCID: PMC7319969 DOI: 10.1038/s41598-020-64353-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 04/15/2020] [Indexed: 12/26/2022] Open
Abstract
Copy number variations (CNVs) are genomic structural mutations consisting of abnormal numbers of fragment copies. Next-generation sequencing of read-depth signals mirrors these variants. Some tools used to predict CNVs by depth have been published, but most of these tools can be applied to only a specific data type due to modeling limitations. We develop a tool for copy number variation detection by a Bayesian procedure, i.e., CONY, that adopts a Bayesian hierarchical model and an efficient reversible-jump Markov chain Monte Carlo inference algorithm for whole genome sequencing of read-depth data. CONY can be applied not only to individual samples for estimating the absolute number of copies but also to case-control pairs for detecting patient-specific variations. We evaluate the performance of CONY and compare CONY with competing approaches through simulations and by using experimental data from the 1000 Genomes Project. CONY outperforms the other methods in terms of accuracy in both single-sample and paired-samples analyses. In addition, CONY performs well regardless of whether the data coverage is high or low. CONY is useful for detecting both absolute and relative CNVs from read-depth data sequences. The package is available at https://github.com/weiyuchung/CONY.
Collapse
Affiliation(s)
- Yu-Chung Wei
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, No.1 Jinde Road, Changhua City, Changhua County, 50007, Taiwan
| | - Guan-Hua Huang
- Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu, 30010, Taiwan.
| |
Collapse
|
34
|
Jian M, Ren L, He G, Lin Q, Tang W, Chen Y, Chen J, Liu T, Ji M, Wei Y, Chang W, Xu J. A novel patient-derived organoids-based xenografts model for preclinical drug response testing in patients with colorectal liver metastases. J Transl Med 2020; 18:234. [PMID: 32532289 PMCID: PMC7291745 DOI: 10.1186/s12967-020-02407-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 06/05/2020] [Indexed: 12/11/2022] Open
Abstract
Backgrounds Cancer-related mortality in patients with colorectal cancer (CRC) is predominantly caused by development of colorectal liver metastases (CLMs). How to screen the sensitive chemotherapy and targeted therapy is the key element to improve the prognosis of CLMs patients. The study aims to develop patient-derived organoids-based xenografted liver metastases (PDOX-LM) model of CRC, to recapitulate the clinical drug response. Methods We transplanted human CRC primary tumor derived organoids in murine spleen to obtain xenografted liver metastases in murine liver. Immunohistochemistry (IHC) staining, whole-exome and RNA sequencing, and drug response testing were utilized to identify the homogeneity in biological and genetic characteristics, and drug response between the PDOX-LM models and donor liver metastases. Results We successfully established PDOX-LM models from patients with CLMs. IHC staining showed that positive expression of CEA, Ki67, VEGF, FGFR2 in donor liver metastases were also well preserved in matched xenografted liver metastases. Whole-exon sequencing and transcriptome analysis showed that both xenografted and donor liver metastases were highly concordant in somatic variants (≥ 0.90 frequency of concordance) and co-expression of driver genes (Pearson’s correlation coefficient reach up to 0.99, P = 0.001). Furthermore, drug response testing showed that the PDOX-LM models can closely recapitulated the clinical response to mFOLFOX6 regiments. Conclusions This PDOX-LM model provides a more convenient and informative platform for preclinical testing of individual tumors by retaining the histologic and genetic features of donor liver metastases. This technology holds great promise to predict treatment sensitivity for patients with CLMs undergoing chemotherapy.
Collapse
Affiliation(s)
- Mi Jian
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China
| | - Li Ren
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Guodong He
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Qi Lin
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Wentao Tang
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Yijiao Chen
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China
| | - Jingwen Chen
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Tianyu Liu
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China
| | - Meiling Ji
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Ye Wei
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China.,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China
| | - Wenju Chang
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China. .,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China.
| | - Jianmin Xu
- Department of General Surgery, Zhongshan Hospital, Fudan University, No. 180 Fenglin Road, Shanghai, 200030, China. .,Shanghai Engineering Research Center of Colorectal Cancer Minimally Invasive, Shanghai, 200030, China.
| |
Collapse
|
35
|
Douanne N, Wagner V, Roy G, Leprohon P, Ouellette M, Fernandez-Prada C. MRPA-independent mechanisms of antimony resistance in Leishmania infantum. INTERNATIONAL JOURNAL FOR PARASITOLOGY-DRUGS AND DRUG RESISTANCE 2020; 13:28-37. [PMID: 32413766 PMCID: PMC7225602 DOI: 10.1016/j.ijpddr.2020.03.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 03/25/2020] [Accepted: 03/27/2020] [Indexed: 12/30/2022]
Abstract
Control of both human and canine leishmaniasis is based on a very short list of chemotherapeutic agents, headed by antimonial derivatives (Sb). The utility of these molecules is severely threatened by high rates of drug resistance. The ABC transporter MRPA is one of the few key Sb resistance proteins described to date, whose role in detoxification has been thoroughly studied in Leishmania parasites. Nonetheless, its rapid amplification during drug selection complicates the discovery of other mechanisms potentially involved in Sb resistance. In this study, stepwise drug-resistance selection and next-generation sequencing were combined in the search for novel Sb-resistance mechanisms deployed by parasites when MRPA is abolished by targeted gene disruption. The gene mrpA is not essential in L. infantum, and its disruption leads to an Sb hypersensitive phenotype in both promastigotes and amastigotes. Five independent mrpA-/- mutants were selected for antimony resistance. These mutants displayed major changes in their ploidy, as well as extrachromosomal linear amplifications of the subtelomeric region of chromosome 23, which includes the genes coding for ABCC1 and ABCC2. Overexpression of ABCC2, but not of ABCC1, resulted in increased Sb tolerance in the mrpA-/- mutant. SNP analyses revealed three different heterozygous mutations in the gene coding for a serine acetyltransferase (SAT) involved in de novo cysteine synthesis in Leishmania. Overexpression of satQ390K, satG321R and satG325R variants led to a 2-3.2 -fold increase in Sb resistance in mrpA-/- parasites. Only satG321R and satG325R induced increased Sb resistance in wild-type parasites. These results reinforce and expand knowledge on the complex nature of Sb resistance in Leishmania parasites.
Collapse
Affiliation(s)
- Noélie Douanne
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire Université de Montréal, Saint-Hyacinthe, Québec, Canada
| | - Victoria Wagner
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire Université de Montréal, Saint-Hyacinthe, Québec, Canada
| | - Gaetan Roy
- Centre de Recherche en Infectiologie du Centre de Recherche du CHU Québec and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec, Québec, Canada
| | - Philippe Leprohon
- Centre de Recherche en Infectiologie du Centre de Recherche du CHU Québec and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec, Québec, Canada
| | - Marc Ouellette
- Centre de Recherche en Infectiologie du Centre de Recherche du CHU Québec and Département de Microbiologie, Infectiologie et Immunologie, Faculté de Médecine, Université Laval, Québec, Québec, Canada
| | - Christopher Fernandez-Prada
- Département de Pathologie et Microbiologie, Faculté de Médecine Vétérinaire Université de Montréal, Saint-Hyacinthe, Québec, Canada; Department of Microbiology and Immunology, Faculty of Medicine, McGill University, Montréal, Québec, Canada.
| |
Collapse
|
36
|
Lee J, Chen J. A modified information criterion for tuning parameter selection in 1d fused LASSO for inference on multiple change points. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1732379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- J. Lee
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - J. Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| |
Collapse
|
37
|
Xi J, Li A, Wang M. HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:422-434. [PMID: 29994262 DOI: 10.1109/tcbb.2018.2846599] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A common strategy to discovering cancer associated copy number aberrations (CNAs) from a cohort of cancer samples is to detect recurrent CNAs (RCNAs). Although the previous methods can successfully identify communal RCNAs shared by nearly all tumor samples, detecting subgroup-specific RCNAs and their related subgroup samples from cancer samples with heterogeneity is still invalid for these existing approaches. In this paper, we introduce a novel integrated method called HetRCNA, which can identify statistically significant subgroup-specific RCNAs and their related subgroup samples. Based on matrix decomposition framework with weight constraint, HetRCNA can successfully measure the subgroup samples by coefficients of left vectors with weight constraint and subgroup-specific RCNAs by coefficients of the right vectors and significance test. When we evaluate HetRCNA on simulated dataset, the results show that HetRCNA gives the best performances among the competing methods and is robust to the noise factors of the simulated data. When HetRCNA is applied on a real breast cancer dataset, our approach successfully identifies a bunch of RCNA regions and the result is highly correlated with the results of the other two investigated approaches. Notably, the genomic regions identified by HetRCNA harbor many breast cancer related genes reported by previous researches.
Collapse
|
38
|
Li LJ, Wang YB, Qu PF, Ma L, Liu K, Yang L, Nie SJ, Xi YM, Jia PL, Tang X, Sun ZC, Huang WL, Li YH, Dong Y, Lei PP. Genetic analysis of Yunnan sudden unexplained death by whole genome sequencing in Southwest of China. J Forensic Leg Med 2020; 70:101896. [PMID: 32090967 DOI: 10.1016/j.jflm.2020.101896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 12/03/2019] [Accepted: 01/05/2020] [Indexed: 11/16/2022]
Affiliation(s)
- Lan-Jiang Li
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Yue-Bing Wang
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Peng-Fei Qu
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Lin Ma
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Kai Liu
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Lin Yang
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Sheng-Jie Nie
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Yan-Mei Xi
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Peng-Lin Jia
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Xue Tang
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Zhong-Chun Sun
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Wen-Li Huang
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Yu-Hua Li
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| | - Yi Dong
- Yunnan Institute of Endemic Disease Control and Prevention, Yunnan Province, China.
| | - Pu-Ping Lei
- Department of Forensic Medicine, Kunming Medical University, Yunnan Province, China.
| |
Collapse
|
39
|
Development and Validation of a 34-Gene Inherited Cancer Predisposition Panel Using Next-Generation Sequencing. BIOMED RESEARCH INTERNATIONAL 2020; 2020:3289023. [PMID: 32090079 PMCID: PMC6998746 DOI: 10.1155/2020/3289023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 08/04/2019] [Indexed: 12/18/2022]
Abstract
The use of genetic testing to identify individuals with hereditary cancer syndromes has been widely adopted by clinicians for management of inherited cancer risk. The objective of this study was to develop and validate a 34-gene inherited cancer predisposition panel using targeted capture-based next-generation sequencing (NGS). The panel incorporates genes underlying well-characterized cancer syndromes, such as BRCA1 and BRCA2 (BRCA1/2), along with more recently discovered genes associated with increased cancer risk. We performed a validation study on 133 unique specimens, including 33 with known variant status; known variants included single nucleotide variants (SNVs) and small insertions and deletions (Indels), as well as copy-number variants (CNVs). The analytical validation study achieved 100% sensitivity and specificity for SNVs and small Indels, with 100% sensitivity and 98.0% specificity for CNVs using in-house developed CNV flagging algorithm. We employed a microarray comparative genomic hybridization (aCGH) method for all specimens that the algorithm flags as CNV-positive for confirmation. In combination with aCGH confirmation, CNV detection specificity improved to 100%. We additionally report results of the first 500 consecutive specimens submitted for clinical testing with the 34-gene panel, identifying 53 deleterious variants in 13 genes in 49 individuals. Half of the detected pathogenic/likely pathogenic variants were found in BRCA1 (23%), BRCA2 (23%), or the Lynch syndrome-associated genes PMS2 (4%) and MLH1 (2%). The other half were detected in 9 other genes: MUTYH (17%), CHEK2 (15%), ATM (4%), PALB2 (4%), BARD1 (2%), CDH1 (2%), CDKN2A (2%), RAD51C (2%), and RET (2%). Our validation studies and initial clinical data demonstrate that a 34-gene inherited cancer predisposition panel can provide clinically significant information for cancer risk assessment.
Collapse
|
40
|
Luo F. A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data. BMC Bioinformatics 2019; 20:692. [PMID: 31874603 PMCID: PMC6929333 DOI: 10.1186/s12859-019-3266-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The Copy Number Alterations (CNAs) are discovered to be tightly associated with cancers, so accurately detecting them is one of the most important tasks in the cancer genomics. A series of CNAs detection methods have been proposed and new ones are still being developed. Due to the complexity of CNAs in cancers, no CNAs detection method has been accepted as the gold standard caller. Several evaluation works have made attempts to reveal typical CNAs detection methods' performance. Limited by the scale of evaluation data, these different comparison works don't reach a consensus and the researchers are still confused on how to choose one proper CNAs caller for their analysis. Therefore, it needs a more comprehensive evaluation of typical CNAs detection methods' performance. RESULTS In this work, we use a large-scale real dataset from CAGEKID consortium to evaluate total 12 typical CNAs detection methods. These methods are most widely used in cancer researches and always used as benchmark for the newly proposed CNAs detection methods. This large-scale dataset comprises of SNP array data on 94 samples and the whole genome sequencing data on 10 samples. Evaluations are comprehensively implemented in current scenarios of CNAs detection, which include that detect CNAs on SNP array data, on sequencing data with tumor and normal matched samples and on sequencing data with single tumor sample. Three SNP based methods are firstly ranked. Subsequently, the best SNP based method's results are used as benchmark to compare six matched samples based methods and three single tumor sample based methods in terms of the preprocessing, recall rate, Jaccard index and segmentation characteristics. CONCLUSIONS Our survey thoroughly reveals 12 typical methods' superiority and inferiority. We explain why methods show specific characteristics from a methodological standpoint. Finally, we present the guiding principle for choosing one proper CNAs detection method under specific conditions. Some unsolved problems and expectations are also addressed for upcoming CNAs detection methods.
Collapse
Affiliation(s)
- Fei Luo
- School of Computer Science, Wuhan University, Wuhan, China.
| |
Collapse
|
41
|
Zhou Z, Wang W, Wang LS, Zhang NR. Integrative DNA copy number detection and genotyping from sequencing and array-based platforms. Bioinformatics 2019; 34:2349-2355. [PMID: 29992253 DOI: 10.1093/bioinformatics/bty104] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 02/22/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads. Results We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods. Availability and implementation https://github.com/zhouzilu/iCNV. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zilu Zhou
- Graduate Group in Genomics and Computational Biology
| | - Weixin Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine
| | - Nancy Ruonan Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
42
|
Detection of de novo genetic variants in Mayer-Rokitansky-Küster-Hauser syndrome by whole genome sequencing. Eur J Obstet Gynecol Reprod Biol X 2019; 4:100089. [PMID: 31517310 PMCID: PMC6728744 DOI: 10.1016/j.eurox.2019.100089] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 06/09/2019] [Accepted: 07/28/2019] [Indexed: 11/22/2022] Open
Abstract
Objective The aim of this study was to use whole genome sequencing (WGS) help detect de novo mutations or pathogenic genes of Mayer-Rokitansky-Küster-Hauser syndrome type 1(MRKH syndrome type 1). Study design This was a case-parent trios study. Nine unrelated probands, with MRKH syndrome type 1 and their parents were enrolled. The enrollment, sequencing process, establishment of the de novo mutations detecting procedure and experiment part were performed over a 2-year period. Results we detected 632 de novo single nucleotide variants (SNVs), 267 de novo small insertions/deletions (indels), 39 de novo structural variations (SVs) and 28 de novo copy number alterations (CNAs). Three novel damaging coding de novo SNVs with three damaging coding de novo genes (PIK3CD, SLC4A10 and TNK2) were revealed. Two SNVs were annotated of the promoter region of gene NBPF10 and 3'UTR of NOTCH2NL, potentially contributing to the pathogenesis of MRKH. Conclusion We identified five de novo mutations in BAZ2B, KLHL18, PIK3CD, SLC4A10 and TNK2 by performing WGS, the functional involvement of all deleterious mutations in MRKH candidate genes of the trios warrant further study. WGS may complement conventional array to capture the complete landscape of the genome in MRKH.
Collapse
|
43
|
Prabakar RK, Xu L, Hicks J, Smith AD. SMURF-seq: efficient copy number profiling on long-read sequencers. Genome Biol 2019; 20:134. [PMID: 31287019 PMCID: PMC6615205 DOI: 10.1186/s13059-019-1732-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 06/06/2019] [Indexed: 12/21/2022] Open
Abstract
We present SMURF-seq, a protocol to efficiently sequence short DNA molecules on a long-read sequencer by randomly ligating them to form long molecules. Applying SMURF-seq using the Oxford Nanopore MinION yields up to 30 fragments per read, providing an average of 6.2 and up to 7.5 million mappable fragments per run, increasing information throughput for read-counting applications. We apply SMURF-seq on the MinION to generate copy number profiles. A comparison with profiles from Illumina sequencing reveals that SMURF-seq attains similar accuracy. More broadly, SMURF-seq expands the utility of long-read sequencers for read-counting applications.
Collapse
Affiliation(s)
- Rishvanth K. Prabakar
- Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, 90089 USA
| | - Liya Xu
- Michelson Center for Convergent Bioscience, University of Southern California, 1002 Childs Way, Los Angeles, 90089 USA
| | - James Hicks
- Michelson Center for Convergent Bioscience, University of Southern California, 1002 Childs Way, Los Angeles, 90089 USA
| | - Andrew D. Smith
- Quantitative and Computational Biology Section, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, 90089 USA
| |
Collapse
|
44
|
Filia A, Droop A, Harland M, Thygesen H, Randerson-Moor J, Snowden H, Taylor C, Diaz JMS, Pozniak J, Nsengimana J, Laye J, Newton-Bishop JA, Bishop DT. High-Resolution Copy Number Patterns From Clinically Relevant FFPE Material. Sci Rep 2019; 9:8908. [PMID: 31222134 PMCID: PMC6586881 DOI: 10.1038/s41598-019-45210-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 05/07/2019] [Indexed: 11/09/2022] Open
Abstract
Systematic tumour profiling is essential for biomarker research and clinically for assessing response to therapy. Solving the challenge of delivering informative copy number (CN) profiles from formalin-fixed paraffin embedded (FFPE) material, the only likely readily available biospecimen for most cancers, involves successful processing of small quantities of degraded DNA. To investigate the potential for analysis of such lesions, whole-genome CNVseq was applied to 300 FFPE primary tumour samples, obtained from a large-scale epidemiological study of melanoma. The quality and the discriminatory power of CNVseq was assessed. Libraries were successfully generated for 93% of blocks, with input DNA quantity being the only predictor of success (success rate dropped to 65% if <20 ng available); 3% of libraries were dropped because of low sequence alignment rates. Technical replicates showed high reproducibility. Comparison with targeted CN assessment showed consistency with the Next Generation Sequencing (NGS) analysis. We were able to detect and distinguish CN changes with a resolution of ≤10 kb. To demonstrate performance, we report the spectrum of genomic CN alterations (CNAs) detected at 9p21, the major site of CN change in melanoma. This successful analysis of CN in FFPE material using NGS provides proof of principle for intensive examination of population-based samples.
Collapse
Affiliation(s)
- Anastasia Filia
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
- Centre for Translational Research, Biomedical Research Foundation of the Academy of Athens (BRFAA), Athens, Greece
| | - Alastair Droop
- MRC Medical Bioinformatics Centre, Leeds Institute of Data Analytics, University of Leeds, Leeds, United Kingdom
| | - Mark Harland
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Helene Thygesen
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Juliette Randerson-Moor
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Helen Snowden
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Claire Taylor
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Joey Mark S Diaz
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Joanna Pozniak
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Jérémie Nsengimana
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Jon Laye
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - Julia A Newton-Bishop
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom
| | - D Timothy Bishop
- Section of Epidemiology and Biostatistics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.
| |
Collapse
|
45
|
Lee J, Chen J. A penalized regression approach for DNA copy number study using the sequencing data. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0001. [PMID: 31145697 DOI: 10.1515/sagmb-2018-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.
Collapse
Affiliation(s)
- Jaeeun Lee
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| | - Jie Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| |
Collapse
|
46
|
Jin Y, Chen G, Xiao W, Hong H, Xu J, Guo Y, Xiao W, Shi T, Shi L, Tong W, Ning B. Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine. SCIENCE CHINA-LIFE SCIENCES 2019; 62:895-904. [PMID: 31114935 DOI: 10.1007/s11427-018-9479-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 12/06/2018] [Indexed: 12/26/2022]
Abstract
High-throughput next generation sequencing (NGS) is a shotgun approach applied in a parallel fashion by which the genome is fragmented and sequenced through small pieces and then analyzed either by aligning to a known reference genome or by de novo assembly without reference genome. This technology has led researchers to conduct an explosion of sequencing related projects in multidisciplinary fields of science. However, due to the limitations of sequencing-based chemistry, length of sequencing reads and the complexity of genes, it is difficult to determine the sequences of some portions of the human genome, leaving gaps in genomic data that frustrate further analysis. Particularly, some complex genes are difficult to be accurately sequenced or mapped because they contain high GC-content and/or low complexity regions, and complicated pseudogenes, such as the genes encoding xenobiotic metabolizing enzymes and transporters (XMETs). The genetic variants in XMET genes are critical to predicate inter-individual variability in drug efficacy, drug safety and susceptibility to environmental toxicity. We summarized and discussed challenges, wet-lab methods, and bioinformatics algorithms in sequencing "complex" XMET genes, which may provide insightful information in the application of NGS technology for implementation in toxicogenomics and pharmacogenomics.
Collapse
Affiliation(s)
- Yaqiong Jin
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yongli Guo
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Wenzhong Xiao
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Cancer Center; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, 200433, China
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
47
|
Li J, Du P, Ye AY, Zhang Y, Song C, Zeng H, Chen C. GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation. GENOMICS PROTEOMICS & BIOINFORMATICS 2019; 17:106-117. [PMID: 31026578 PMCID: PMC6520909 DOI: 10.1016/j.gpb.2018.12.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 10/09/2018] [Accepted: 12/25/2018] [Indexed: 11/21/2022]
Abstract
Identifying antimicrobial resistant (AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing (NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations (CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments (GPA). Simulation results showed that GPA has reduced the false discovery rate (FDR) and mean absolute error (MAE) in CNV and single nucleotide variant (SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.
Collapse
Affiliation(s)
- Jiarui Li
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
| | - Pengcheng Du
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
| | - Adam Yongxin Ye
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yuanyuan Zhang
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
| | - Chuan Song
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
| | - Hui Zeng
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China.
| | - Chen Chen
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China.
| |
Collapse
|
48
|
Rajaby R, Sung WK. SurVIndel: improving CNV calling from high-throughput sequencing data through statistical testing. Bioinformatics 2019; 37:1497-1505. [PMID: 30989231 DOI: 10.1093/bioinformatics/btz261] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 02/15/2019] [Accepted: 04/09/2019] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Structural variations (SV) are large scale mutations in a genome; although less frequent than point mutations, due to their large size they are responsible for more heritable differences between individuals. Two prominent classes of SVs are deletions and tandem duplications. They play important roles in many devastating genetic diseases, such as Smith-Magenis syndrome, Potocki-Lupski syndrome and Williams-Beuren syndrome.Since paired-end whole genome sequencing data has become widespread and affordable, reliably calling deletions and tandem duplications has been a major target in bioinformatics; unfortunately, the problem is far from being solved, since existing solutions often offer poor results when applied to real data. RESULTS We developed a novel caller, SurVIndel, which focuses on detecting deletions and tandem duplications from paired next-generation sequencing data. SurVIndel uses discordant paired reads, clipped reads as well as statistical methods. We show that SurVIndel outperforms existing methods on both simulated and real biological datasets. AVAILABILITY SurVIndel is available at https://github.com/Mesh89/SurVIndel.
Collapse
Affiliation(s)
- Ramesh Rajaby
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore.,Genome Institute of Singapore, 60 Biopolis Street, Genome, Singapore
| |
Collapse
|
49
|
Biswas B, Lai Y. A distance-type measure approach to the analysis of copy number variation in DNA sequencing data. BMC Genomics 2019; 20:195. [PMID: 30967117 PMCID: PMC6456939 DOI: 10.1186/s12864-019-5491-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The next generation sequencing technology allows us to obtain a large amount of short DNA sequence (DNA-seq) reads at a genome-wide level. DNA-seq data have been increasingly collected during the recent years. Count-type data analysis is a widely used approach for DNA-seq data. However, the related data pre-processing is based on the moving window method, in which a window size need to be defined in order to obtain count-type data. Furthermore, useful information can be reduced after data pre-processing for count-type data. RESULTS In this study, we propose to analyze DNA-seq data based on the related distance-type measure. Distances are measured in base pairs (bps) between two adjacent alignments of short reads mapped to a reference genome. Our experimental data based simulation study confirms the advantages of distance-type measure approach in both detection power and detection accuracy. Furthermore, we propose artificial censoring for the distance data so that distances larger than a given value are considered potential outliers. Our purpose is to simplify the pre-processing of DNA-seq data. Statistically, we consider a mixture of right censored geometric distributions to model the distance data. Additionally, to reduce the GC-content bias, we extend the mixture model to a mixture of generalized linear models (GLMs). The estimation of model can be achieved by the Newton-Raphson algorithm as well as the Expectation-Maximization (E-M) algorithm. We have conducted simulations to evaluate the performance of our approach. Based on the rank based inverse normal transformation of distance data, we can obtain the related z-values for a follow-up analysis. For an illustration, an application to the DNA-seq data from a pair of normal and tumor cell lines is presented with a change-point analysis of z-values to detect DNA copy number alterations. CONCLUSION Our distance-type measure approach is novel. It does not require either a fixed or a sliding window procedure for generating count-type data. Its advantages have been demonstrated by our simulation studies and its practical usefulness has been illustrated by an experimental data application.
Collapse
Affiliation(s)
- Bipasa Biswas
- Diagnostics Devices Branch 1, FDA/CDRH/OSB-DBS, White Oak Bldg #66, Room 2222, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Yinglei Lai
- Department of Statistics and Biostatistics Center, The George Washington University, Rome Hall, 7th Floor, 801, 22nd Street NW, Washington D.C, 20052, USA.
| |
Collapse
|
50
|
Detection of False-Positive Deletions from the Database of Genomic Variants. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8420547. [PMID: 31080831 PMCID: PMC6475568 DOI: 10.1155/2019/8420547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 02/24/2019] [Accepted: 03/04/2019] [Indexed: 11/24/2022]
Abstract
Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.
Collapse
|