51
|
Wold J, Koepfli KP, Galla SJ, Eccles D, Hogg CJ, Le Lec MF, Guhlin J, Santure AW, Steeves TE. Expanding the conservation genomics toolbox: Incorporating structural variants to enhance genomic studies for species of conservation concern. Mol Ecol 2021; 30:5949-5965. [PMID: 34424587 PMCID: PMC9290615 DOI: 10.1111/mec.16141] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/28/2021] [Accepted: 08/18/2021] [Indexed: 12/28/2022]
Abstract
Structural variants (SVs) are large rearrangements (>50 bp) within the genome that impact gene function and the content and structure of chromosomes. As a result, SVs are a significant source of functional genomic variation, that is, variation at genomic regions underpinning phenotype differences, that can have large effects on individual and population fitness. While there are increasing opportunities to investigate functional genomic variation in threatened species via single nucleotide polymorphism (SNP) data sets, SVs remain understudied despite their potential influence on fitness traits of conservation interest. In this future-focused Opinion, we contend that characterizing SVs offers the conservation genomics community an exciting opportunity to complement SNP-based approaches to enhance species recovery. We also leverage the existing literature-predominantly in human health, agriculture and ecoevolutionary biology-to identify approaches for readily characterizing SVs and consider how integrating these into the conservation genomics toolbox may transform the way we manage some of the world's most threatened species.
Collapse
Affiliation(s)
- Jana Wold
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, Front Royal, Virginia, USA.,Centre for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA.,Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Stephanie J Galla
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - David Eccles
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Marissa F Le Lec
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand
| | - Joseph Guhlin
- Department of Biochemistry, University of Otago, Dunedin, Otago, New Zealand.,Genomics Aotearoa, Dunedin, Otago, New Zealand
| | - Anna W Santure
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
52
|
SILO: A Computational Method for Detecting Copy Number Gain in Clinical Specimens Analyzed on a Next-Generation Sequencing Platform. J Mol Diagn 2021; 23:1241-1248. [PMID: 34365010 DOI: 10.1016/j.jmoldx.2021.07.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 05/07/2021] [Accepted: 07/07/2021] [Indexed: 12/28/2022] Open
Abstract
Next-generation sequencing (NGS) has proved to be a beneficial approach for genotyping solid tumor specimens and for identifying clinically actionable mutations. However, copy number variations (CNVs), which can be equally important, are often challenging to detect from NGS data. Current bioinformatics methods for CNV detection from NGS often require comparison of tumor/normal pairs and/or the sequencing of whole genome or whole exome. These approaches are currently impractical for routine clinical practice. However, clinical practice does involve repeated use of the same gene panel on a large number of specimens over a long period of time. We take advantage of this repetitiveness and present SILO: a procedure for CNV detection based on NGS on a gene panel. The SILO algorithm analyzes coverage depth of the aligned reads from a sample and predicts CNV by comparing this depth to the average depth seen in a large training set of other samples. Such comparison is robust and can reliably detect copy number gain, although it is found to be unreliable in detecting copy number losses. Successful validation of SILO on NGS data from the Ion Torrent platform with two panels is presented: a small hotspot panel and a larger cancer gene panel.
Collapse
|
53
|
Jugas R, Sedlar K, Vitek M, Nykrynova M, Barton V, Bezdicek M, Lengerova M, Skutkova H. CNproScan: Hybrid CNV detection for bacterial genomes. Genomics 2021; 113:3103-3111. [PMID: 34224809 DOI: 10.1016/j.ygeno.2021.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 06/13/2021] [Accepted: 06/30/2021] [Indexed: 10/20/2022]
Abstract
Discovering copy number variation (CNV) in bacteria is not in the spotlight compared to the attention focused on CNV detection in eukaryotes. However, challenges arising from bacterial drug resistance bring further interest to the topic of CNV and its role in drug resistance. General CNV detection methods do not consider bacteria's features and there is space to improve detection accuracy. Here, we present a CNV detection method called CNproScan focused on bacterial genomes. CNproScan implements a hybrid approach and other bacteria-focused features and depends only on NGS data. We benchmarked our method and compared it to the previously published methods and we can resolve to achieve a higher detection rate together with providing other beneficial features, such as CNV classification. Compared with other methods, CNproScan can detect much shorter CNV events.
Collapse
Affiliation(s)
- Robin Jugas
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic.
| | - Karel Sedlar
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Martin Vitek
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Marketa Nykrynova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Barton
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Matej Bezdicek
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Martina Lengerova
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Helena Skutkova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
54
|
Liu G, Zhang J. A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:699510. [PMID: 34262604 PMCID: PMC8273656 DOI: 10.3389/fgene.2021.699510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/07/2021] [Indexed: 11/13/2022] Open
Abstract
The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey's fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.
Collapse
Affiliation(s)
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
55
|
Zhao HY, Li Q, Tian Y, Chen YH, Alvi HAK, Yuan XG. CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data. BIOLOGY 2021; 10:biology10070584. [PMID: 34202028 PMCID: PMC8301091 DOI: 10.3390/biology10070584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/10/2021] [Accepted: 06/21/2021] [Indexed: 12/29/2022]
Abstract
Simple Summary In this study, we propose a copy number variation (CNV) detection method called CIRCNV, which is based on a circular profile of the read depth from sequencing data. The proposed method is an extended version of our previously developed method CNV-LOF. The main difference of CIRCNV from CNV-LOF lies in its two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation to generate a meaningful two-dimensional dataset for CNV analysis and promote fairness between the ends and middle part of the genome, and (2) it performs two rounds of CNV declaration via estimating tumor purity and recovering the truth circular RD profile. We test and evaluate the performance of CIRCNV via conducting simulation studies and real sequencing tumor sample applications. The experimental results show that CIRCNV outperforms peer methods with respect to sensitivity, precision, and the F1-score. The experiments prove that the proposed method is a reliable and effective tool in the field of variation analysis of tumor genomes. Abstract Copy number variation (CNV) is a common type of structural variation in the human genome. Accurate detection of CNVs from tumor genomes can provide crucial information for the study of tumor genesis and cancer precision diagnosis. However, the contamination of normal genomes in tumor genomes and the crude profiles of the read depth make such a task difficult. In this paper, we propose an alternative approach, called CIRCNV, for the detection of CNVs from sequencing data. CIRCNV is an extension of our previously developed method CNV-LOF, which uses local outlier factors to predict CNVs. Comparatively, CIRCNV can be performed on individual tumor samples and has the following two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation, in order to improve the efficiency of the read depth (RD) profile for the detection of CNVs; and (2) it performs a second round of CNV declaration based on the truth circular RD profile, which is recovered by estimating tumor purity. We test and validate the performance of CIRCNV based on simulation and real sequencing data and perform comparisons with several peer methods. The results demonstrate that CIRCNV can obtain superior performance in terms of sensitivity and precision. We expect that our proposed method will be a supplement to existing methods and become a routine tool in the field of variation analysis of tumor genomes.
Collapse
Affiliation(s)
- Hai-Yong Zhao
- School of Computer Science and Technology, Liaocheng University, Liaocheng 252000, China;
| | - Qi Li
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Ye Tian
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Yue-Hui Chen
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Ji’nan 250022, China;
| | - Haque A. K. Alvi
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Xi-Guo Yuan
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
- Correspondence:
| |
Collapse
|
56
|
Guzel F, Romano M, Keles E, Piskin D, Ozen S, Poyrazoglu H, Kasapcopur O, Demirkaya E. Next Generation Sequencing Based Multiplex Long-Range PCR for Routine Genotyping of Autoinflammatory Disorders. Front Immunol 2021; 12:666273. [PMID: 34177904 PMCID: PMC8219981 DOI: 10.3389/fimmu.2021.666273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/14/2021] [Indexed: 01/06/2023] Open
Abstract
Background During the last decade, remarkable progress with massive sequencing has been made in the identification of disease-associated genes for AIDs using next-generation sequencing technologies (NGS). An international group of experts described the ideal genetic screening method which should give information about SNVs, InDels, Copy Number Variations (CNVs), GC rich regions. We aimed to develop and validate a molecular diagnostic method in conjunction with the NGS platform as an inexpensive, extended and uniform coverage and fast screening tool which consists of nine genes known to be associated with various AIDs. Methods For the validation of basic and expanded panels, long-range multiplex models were setup on healthy samples without any known variations for MEFV, MVK, TNFRSF1A, NLRP3, PSTPIP1, IL1RN, NOD2, NLRP12 and LPIN2 genes. Patients with AIDs who had already known causative variants in these genes were sequenced for analytical validation. As a last step, multiplex models were validated on patients with pre-diagnosis of AIDs. All sequencing steps were performed on the Illumina NGS platform. Validity steps included the selection of related candidate genes, primer design, development of screening methods, validation and verification of the product. The GDPE (Gentera) bioinformatics pipeline was followed. Results Although there was no nonsynonymous variation in 21 healthy samples, 107 synonymous variant alleles and some intronic and UTR variants were detected. In 10 patients who underwent analytical validation, besides the 11 known nonsynonymous variant alleles, 11 additional nonsynonymous variant alleles and a total of 81 synonymous variants were found. In the clinical validation phase, 46 patients sequenced with multiplex panels, genetic and clinical findings were combined for diagnosis. Conclusion In this study, we describe the development and validation of an NGS-based multiplex array enabling the "long-amplicon" approach for targeted sequencing of nine genes associated with common AIDs. This screening tool is less expensive and more comprehensive compared to other methods and more informative than traditional sequencing. The proposed panel offers advantages to WES or hybridization probe equivalents in terms of CNV analysis, high sensitivity and uniformity, GC-rich region sequencing, InDel detection and intron covering.
Collapse
Affiliation(s)
- Ferhat Guzel
- Department of Research and Development, Gentera Biotechnology, Istanbul, Turkey
| | - Micol Romano
- Department of Paediatrics, Division of Paediatric Rheumatology, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| | - Erdi Keles
- Department of Research and Development, Gentera Biotechnology, Istanbul, Turkey
| | - David Piskin
- Department of Paediatrics, Division of Paediatric Rheumatology, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada.,Department of Epidemiology and Biostatistics, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| | - Seza Ozen
- Department of Paediatrics, Division of Paediatric Rheumatology, Hacettepe University, Ankara, Turkey
| | - Hakan Poyrazoglu
- Department of Paediatrics, Division of Paediatric Rheumatology, Erciyes University, Kayseri, Turkey
| | - Ozgur Kasapcopur
- Department of Paediatrics, Division of Paediatric Rheumatology, Cerrhapasa Medical School, Istanbul University, Istanbul, Turkey
| | - Erkan Demirkaya
- Department of Paediatrics, Division of Paediatric Rheumatology, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada.,Department of Epidemiology and Biostatistics, Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada
| |
Collapse
|
57
|
Guo Y, Wang S, Yuan X. HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 12:642473. [PMID: 34163521 PMCID: PMC8215577 DOI: 10.3389/fgene.2021.642473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open
Abstract
Copy number variation (CNV) is a genomic mutation that plays an important role in tumor evolution and tumor genesis. Accurate detection of CNVs from next-generation sequencing (NGS) data is still a challenging task due to artifacts such as uneven mapped reads and unbalanced amplitudes of gains and losses. This study proposes a new approach called HBOS-CNV to detect CNVs from NGS data. The central point of HBOS-CNV is that it uses a new statistic, the histogram-based outlier score (HBOS), to evaluate the fluctuation of genome bins to determine those of changed copy numbers. In comparison with existing statistics in the evaluation of CNVs, HBOS is a non-linearly transformed value from the observed read depth (RD) value of each genome bin, having the potential ability to relieve the effects resulted from the above artifacts. In the calculation of HBOS values, a dynamic width histogram is utilized to depict the density of bins on the genome being analyzed, which can reduce the effects of noises partially contributed by mapping and sequencing errors. The evaluation of genome bins using such a new statistic can lead to less extremely significant CNVs having a high probability of detection. We evaluated this method using a large number of simulation datasets and compared it with four existing methods (CNVnator, CNV-IFTV, CNV-LOF, and iCopyDav). The results demonstrated that our proposed method outperforms the others in terms of sensitivity, precision, and F1-measure. Furthermore, we applied the proposed method to a set of real sequencing samples from the 1000 Genomes Project and determined a number of CNVs with biological meanings. Thus, the proposed method can be regarded as a routine approach in the field of genome mutation analysis for cancer samples.
Collapse
Affiliation(s)
- Yang Guo
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Shuzhen Wang
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
58
|
Nandolo W, Mészáros G, Wurzinger M, Banda LJ, Gondwe TN, Mulindwa HA, Nakimbugwe HN, Clark EL, Woodward-Greene MJ, Liu M, Liu GE, Van Tassell CP, Rosen BD, Sölkner J. Detection of copy number variants in African goats using whole genome sequence data. BMC Genomics 2021; 22:398. [PMID: 34051743 PMCID: PMC8164248 DOI: 10.1186/s12864-021-07703-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 05/11/2021] [Indexed: 12/21/2022] Open
Abstract
Background Copy number variations (CNV) are a significant source of variation in the genome and are therefore essential to the understanding of genetic characterization. The aim of this study was to develop a fine-scaled copy number variation map for African goats. We used sequence data from multiple breeds and from multiple African countries. Results A total of 253,553 CNV (244,876 deletions and 8677 duplications) were identified, corresponding to an overall average of 1393 CNV per animal. The mean CNV length was 3.3 kb, with a median of 1.3 kb. There was substantial differentiation between the populations for some CNV, suggestive of the effect of population-specific selective pressures. A total of 6231 global CNV regions (CNVR) were found across all animals, representing 59.2 Mb (2.4%) of the goat genome. About 1.6% of the CNVR were present in all 34 breeds and 28.7% were present in all 5 geographical areas across Africa, where animals had been sampled. The CNVR had genes that were highly enriched in important biological functions, molecular functions, and cellular components including retrograde endocannabinoid signaling, glutamatergic synapse and circadian entrainment. Conclusions This study presents the first fine CNV map of African goat based on WGS data and adds to the growing body of knowledge on the genetic characterization of goats. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07703-1.
Collapse
Affiliation(s)
- Wilson Nandolo
- University of Natural Resources and Life Sciences, Vienna, Austria.,Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | - Gábor Mészáros
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Maria Wurzinger
- University of Natural Resources and Life Sciences, Vienna, Austria
| | - Liveness J Banda
- Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | - Timothy N Gondwe
- Lilongwe University of Agriculture and Natural Resources, Lilongwe, Malawi
| | | | | | - Emily L Clark
- The Roslin Institute, University of Edinburgh, Edinburgh, Scotland, UK
| | - M Jennifer Woodward-Greene
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.,National Agricultural Library, USDA-ARS, Beltsville, MD, USA
| | - Mei Liu
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - George E Liu
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.
| | - Johann Sölkner
- University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
59
|
Zhang Q, Qin Z, Yi S, Wei H, Zhou XZ, Su J. Clinical application of whole-exome sequencing: A retrospective, single-center study. Exp Ther Med 2021; 22:753. [PMID: 34035850 PMCID: PMC8135134 DOI: 10.3892/etm.2021.10185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 08/26/2020] [Indexed: 12/13/2022] Open
Abstract
The aim of the present study was to assess the practical diagnostic value of whole-exome sequencing (WES) in patients with different phenotypes and to explore possible strategies to increase the capability of WES in identifying disease-causing genes. A total of 1,360 patients (aged from 1 day to 42 years old) with manifestations of genetic diseases were genotyped using WES and statistical analysis was performed on the results obtained. Within this cohort, the overall positive rate of identification of a disease-causing gene alteration was 44.41%. The positive identification rate where trio-samples were used (from the proband and both parents) was higher than that where a single proband sample was used (50.00 vs. 43.71%), and 604 positive cases with 150 genetic syndromes, 510 genes and 718 mutations were detected. Missense mutations were the most common variations (n=335, 45.27%) and visual or auditory abnormalities (58.51%) had the highest rate of association with a genetic abnormality. The positive detection rate of WES was elevated with the increase in the number of clinical symptoms from 1 to 8. The present study indicated that WES may be used as a valuable tool in the clinic and the positive rate depends more on the professional experience of clinicians rather than on the analytical capabilities of the data analyst. At the same time, particular attention must be paid to certain possible factors (such as the age of the patients as well as possible exon deletions), which may affect the diagnostic rate while applying this process.
Collapse
Affiliation(s)
- Qiang Zhang
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| | - Zailong Qin
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| | - Shang Yi
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| | - Hao Wei
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| | - Xun Zhao Zhou
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| | - Jiasun Su
- Laboratory of Genetic and Metabolism, Department of Paediatric Endocrine and Metabolism, Maternal and Child Health Hospital of Guangxi, Nanning, Guangxi 530000, P.R. China
| |
Collapse
|
60
|
Yan C, He J, Luo J, Wang J, Zhang G, Luo H. SIns: A Novel Insertion Detection Approach Based on Soft-Clipped Reads. Front Genet 2021; 12:665812. [PMID: 33995493 PMCID: PMC8120196 DOI: 10.3389/fgene.2021.665812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
As a common type of structural variation, an insertion refers to the addition of a DNA sequence into an individual genome and is usually associated with some inherited diseases. In recent years, many methods have been proposed for detecting insertions. However, the accurate calling of insertions is also a challenging task. In this study, we propose a novel insertion detection approach based on soft-clipped reads, which is called SIns. First, based on the alignments between paired reads and the reference genome, SIns extracts breakpoints from soft-clipped reads and determines insertion locations. The insert size information about paired reads is then further clustered to determine the genotype, and SIns subsequently adopts Minia to assemble the insertion sequences. Experimental results show that SIns can achieve better performance than other methods in terms of the F-score value for simulated and true datasets.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junyi He
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
61
|
Copy Number Variant Detection with Low-Coverage Whole-Genome Sequencing Represents a Viable Alternative to the Conventional Array-CGH. Diagnostics (Basel) 2021; 11:diagnostics11040708. [PMID: 33920867 PMCID: PMC8071346 DOI: 10.3390/diagnostics11040708] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/09/2021] [Accepted: 04/13/2021] [Indexed: 12/13/2022] Open
Abstract
Copy number variations (CNVs) represent a type of structural variant involving alterations in the number of copies of specific regions of DNA that can either be deleted or duplicated. CNVs contribute substantially to normal population variability, however, abnormal CNVs cause numerous genetic disorders. At present, several methods for CNV detection are applied, ranging from the conventional cytogenetic analysis, through microarray-based methods (aCGH), to next-generation sequencing (NGS). In this paper, we present GenomeScreen, an NGS-based CNV detection method for low-coverage, whole-genome sequencing. We determined the theoretical limits of its accuracy and obtained confirmation in an extensive in silico study and in real patient samples with known genotypes. In theory, at least 6 M uniquely mapped reads are required to detect a CNV with the length of 100 kilobases (kb) or more with high confidence (Z-score > 7). In practice, the in silico analysis required at least 8 M to obtain >99% accuracy (for 100 kb deviations). We compared GenomeScreen with one of the currently used aCGH methods in diagnostic laboratories, which has mean resolution of 200 kb. GenomeScreen and aCGH both detected 59 deviations, while GenomeScreen furthermore detected 134 other (usually) smaller variations. When compared to aCGH, overall performance of the proposed GenemoScreen tool is comparable or superior in terms of accuracy, turn-around time, and cost-effectiveness, thus providing reasonable benefits, particularly in a prenatal diagnosis setting.
Collapse
|
62
|
Belyeu JR, Brand H, Wang H, Zhao X, Pedersen BS, Feusier J, Gupta M, Nicholas TJ, Brown J, Baird L, Devlin B, Sanders SJ, Jorde LB, Talkowski ME, Quinlan AR. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families. Am J Hum Genet 2021; 108:597-607. [PMID: 33675682 PMCID: PMC8059337 DOI: 10.1016/j.ajhg.2021.02.012] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 02/12/2021] [Indexed: 01/05/2023] Open
Abstract
Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.
Collapse
Affiliation(s)
- Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Julie Feusier
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - Meenal Gupta
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Lisa Baird
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Stephan J Sanders
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02114, USA.
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA; Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA.
| |
Collapse
|
63
|
Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
Collapse
|
64
|
Abstract
Gains and losses of large segments of genomic DNA, known as copy number variants (CNVs) gained considerable interest in clinical diagnostics lately, as particular forms may lead to inherited genetic diseases. In recent decades, researchers developed a wide variety of cytogenetic and molecular methods with different detection capabilities to detect clinically relevant CNVs. In this review, we summarize methodological progress from conventional approaches to current state of the art techniques capable of detecting CNVs from a few bases up to several megabases. Although the recent rapid progress of sequencing methods has enabled precise detection of CNVs, determining their functional effect on cellular and whole-body physiology remains a challenge. Here, we provide a comprehensive list of databases and bioinformatics tools that may serve as useful assets for researchers, laboratory diagnosticians, and clinical geneticists facing the challenge of CNV detection and interpretation.
Collapse
|
65
|
Xie K, Tian Y, Yuan X. A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 11:632311. [PMID: 33519925 PMCID: PMC7838601 DOI: 10.3389/fgene.2020.632311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/21/2020] [Indexed: 11/29/2022] Open
Abstract
Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation sequencing (NGS) platforms provides unprecedented opportunities for the detection of CNVs at a base-level resolution. However, due to the intrinsic characteristics behind NGS data, accurate detection of CNVs is still a challenging task. In this article, we propose a new density peak-based method, called dpCNV, for the detection of CNVs from NGS data. The algorithm of dpCNV is designed based on density peak clustering algorithm. It extracts two features, i.e., local density and minimum distance, from sequencing read depth (RD) profile and generates a two-dimensional data. Based on the generated data, a two-dimensional null distribution is constructed to test the significance of each genome bin and then the significant genome bins are declared as CNVs. We test the performance of the dpCNV method on a number of simulated datasets and make comparison with several existing methods. The experimental results demonstrate that our proposed method outperforms others in terms of sensitivity and F1-score. We further apply it to a set of real sequencing samples and the results demonstrate the validity of dpCNV. Therefore, we expect that dpCNV can be used as a supplementary to existing methods and may become a routine tool in the field of genome mutation analysis.
Collapse
Affiliation(s)
- Kun Xie
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Ye Tian
- The School of Computer Science and Technology, Xidian University, Xi'an, China.,Xi'an Key Laboratory of Computational Bioinformatics, The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- The School of Computer Science and Technology, Xidian University, Xi'an, China.,Xi'an Key Laboratory of Computational Bioinformatics, The School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
66
|
Kohsaka S, Hirata M, Ikegami M, Ueno T, Kojima S, Sakai T, Ito K, Naka N, Ogura K, Kawai A, Iwata S, Okuma T, Yonemoto T, Kobayashi H, Suehara Y, Hiraga H, Kawamoto T, Motoi T, Oda Y, Matsubara D, Matsuda K, Nishida Y, Mano H. Comprehensive molecular and clinicopathological profiling of desmoid tumours. Eur J Cancer 2021; 145:109-120. [PMID: 33444924 DOI: 10.1016/j.ejca.2020.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 12/02/2020] [Indexed: 10/22/2022]
Abstract
Previous studies have not clearly identified a prognostic factor for desmoid tumours (DT). Whole-exome sequencing (WES) and/or RNA sequencing (RNA-seq) were performed in 64 cases of DT to investigate the molecular profiles in combination with the clinicopathological characteristics. CTNNB1 mutations with specific hotspots were identified in 56 cases (87.5%). A copy number loss in chromosome 6 (chr6) was identified in 14 cases (21.9%). Clustering based on the mRNA expression profiles was predictive of the patients' prognoses. The risk score generated by the expression of a three-gene set (IFI6, LGMN, and CKLF) was a strong prognostic marker for recurrence-free survival (RFS) in our cohort. In risk groups stratified by the expression of IFI6, the hazard ratio for recurrence-free survival in the high-risk group relative to the low-risk group was 12.12 (95% confidence interval: 1.56-94.2; p = 8.0 × 106). In conclusion, CTNNB1 mutations and a chr6 copy number loss are likely the causative mutations underlying the tumorigenesis of DT while the gene expression profiles may help to differentiate patients who would be good candidates for wait-and-see management and those who might benefit from additional systemic or radiation therapies.
Collapse
Affiliation(s)
- Shinji Kohsaka
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| | - Makoto Hirata
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| | - Masachika Ikegami
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Toshihide Ueno
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Shinya Kojima
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Tomohisa Sakai
- Department of Orthopaedic Surgery, Nagoya University Hospital, Nagoya, 466-8550, Japan
| | - Kan Ito
- Department of Orthopaedic Surgery, Nagoya University Hospital, Nagoya, 466-8550, Japan
| | - Norifumi Naka
- Musculoskeletal Oncology Service, Osaka International Cancer Institute, Osaka, 541-8567, Japan
| | - Koichi Ogura
- Department of Musculoskeletal Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Akira Kawai
- Department of Musculoskeletal Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
| | - Shintaro Iwata
- Department of Musculoskeletal Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan; Division of Orthopaedic Surgery, Chiba Cancer Center, Chiba, 260-8717, Japan
| | - Tomotake Okuma
- Department of Muscloskeletal Oncology, Tokyo Metropolitan Cancer and Infectious Diseases Center Komagome Hospital, Tokyo, 113-0021, Japan
| | - Tsukasa Yonemoto
- Division of Orthopaedic Surgery, Chiba Cancer Center, Chiba, 260-8717, Japan
| | - Hiroshi Kobayashi
- Department of Orthopaedic Surgery, Faculty of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Yoshiyuki Suehara
- Department of Orthopedic Surgery, Juntendo University, Graduate School of Medicine, Tokyo, 113-8431, Japan
| | - Hiroaki Hiraga
- Department of Orthopaedic Surgery, Hokkaido Cancer Center, Sapporo, 003-0804, Japan
| | - Teruya Kawamoto
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, 650-0017, Japan
| | - Toru Motoi
- Department of Pathology, Tokyo Metropolitan Cancer and Infectious Diseases Center Komagome Hospital, Tokyo, 113-0021, Japan
| | - Yoshinao Oda
- Department of Anatomic Pathology, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan
| | - Daisuke Matsubara
- Division of Integrative Pathology, Jichi Medical University, Shimotsuke, 329-0498, Japan
| | - Koichi Matsuda
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| | - Yoshihiro Nishida
- Department of Orthopaedic Surgery, Nagoya University Hospital, Nagoya, 466-8550, Japan.
| | - Hiroyuki Mano
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
| |
Collapse
|
67
|
Statistical Considerations on NGS Data for Inferring Copy Number Variations. Methods Mol Biol 2021; 2243:27-58. [PMID: 33606251 DOI: 10.1007/978-1-0716-1103-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics, resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinformaticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for data modeling and interpretation.To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects (or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy number variations using NGS data.
Collapse
|
68
|
Binversie EE, Baker LA, Engelman CD, Hao Z, Moran JJ, Piazza AM, Sample SJ, Muir P. Analysis of copy number variation in dogs implicates genomic structural variation in the development of anterior cruciate ligament rupture. PLoS One 2020; 15:e0244075. [PMID: 33382735 PMCID: PMC7774950 DOI: 10.1371/journal.pone.0244075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 12/02/2020] [Indexed: 11/19/2022] Open
Abstract
Anterior cruciate ligament (ACL) rupture is an important condition of the human knee. Second ruptures are common and societal costs are substantial. Canine cranial cruciate ligament (CCL) rupture closely models the human disease. CCL rupture is common in the Labrador Retriever (5.79% prevalence), ~100-fold more prevalent than in humans. Labrador Retriever CCL rupture is a polygenic complex disease, based on genome-wide association study (GWAS) of single nucleotide polymorphism (SNP) markers. Dissection of genetic variation in complex traits can be enhanced by studying structural variation, including copy number variants (CNVs). Dogs are an ideal model for CNV research because of reduced genetic variability within breeds and extensive phenotypic diversity across breeds. We studied the genetic etiology of CCL rupture by association analysis of CNV regions (CNVRs) using 110 case and 164 control Labrador Retrievers. CNVs were called from SNPs using three different programs (PennCNV, CNVPartition, and QuantiSNP). After quality control, CNV calls were combined to create CNVRs using ParseCNV and an association analysis was performed. We found no strong effect CNVRs but found 46 small effect (max(T) permutation P<0.05) CCL rupture associated CNVRs in 22 autosomes; 25 were deletions and 21 were duplications. Of the 46 CCL rupture associated CNVRs, we identified 39 unique regions. Thirty four were identified by a single calling algorithm, 3 were identified by two calling algorithms, and 2 were identified by all three algorithms. For 42 of the associated CNVRs, frequency in the population was <10% while 4 occurred at a frequency in the population ranging from 10–25%. Average CNVR length was 198,872bp and CNVRs covered 0.11 to 0.15% of the genome. All CNVRs were associated with case status. CNVRs did not overlap previous canine CCL rupture risk loci identified by GWAS. Associated CNVRs contained 152 annotated genes; 12 CNVRs did not have genes mapped to CanFam3.1. Using pathway analysis, a cluster of 19 homeobox domain transcript regulator genes was associated with CCL rupture (P = 6.6E-13). This gene cluster influences cranial-caudal body pattern formation during embryonic limb development. Clustered genes were found in 3 CNVRs on chromosome 14 (HoxA), 28 (NKX6-2), and 36 (HoxD). When analysis was limited to deletion CNVRs, the association was strengthened (P = 8.7E-16). This study suggests a component of the polygenic risk of CCL rupture in Labrador Retrievers is associated with small effect CNVs and may include aspects of stifle morphology regulated by homeobox domain transcript regulator genes.
Collapse
Affiliation(s)
- Emily E. Binversie
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Lauren A. Baker
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Corinne D. Engelman
- Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Zhengling Hao
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - John J. Moran
- Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Alexander M. Piazza
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susannah J. Sample
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Peter Muir
- Comparative Orthopaedic and Genetics Research Laboratory, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
69
|
Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020; 11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open
Abstract
Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.
Collapse
Affiliation(s)
- Guojun Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chao Wei
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
70
|
Hu Y, Xia H, Li M, Xu C, Ye X, Su R, Zhang M, Nash O, Sonstegard TS, Yang L, Liu GE, Zhou Y. Comparative analyses of copy number variations between Bos taurus and Bos indicus. BMC Genomics 2020; 21:682. [PMID: 33004001 PMCID: PMC7528262 DOI: 10.1186/s12864-020-07097-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 09/23/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Bos taurus and Bos indicus are two main sub-species of cattle. However, the differential copy number variations (CNVs) between them are not yet well studied. RESULTS Based on the new high-quality cattle reference genome ARS-UCD1.2, we identified 13,234 non-redundant CNV regions (CNVRs) from 73 animals of 10 cattle breeds (4 Bos taurus and 6 Bos indicus), by integrating three detection strategies. While 6990 CNVRs (52.82%) were shared by Bos taurus and Bos indicus, large CNV differences were discovered between them and these differences could be used to successfully separate animals into two subspecies. We found that 2212 and 538 genes uniquely overlapped with either indicine-specific CNVRs and or taurine-specific CNVRs, respectively. Based on FST, we detected 16 candidate lineage-differential CNV segments (top 0.1%) under selection, which overlapped with eight genes (CTNNA1, ENSBTAG00000004415, PKN2, BMPER, PDE1C, DNAJC18, MUSK, and PLCXD3). Moreover, we obtained 1.74 Mbp indicine-specific sequences, which could only be mapped on the Bos indicus reference genome UOA_Brahman_1. We found these sequences and their associated genes were related to heat resistance, lipid and ATP metabolic process, and muscle development under selection. We further analyzed and validated the top significant lineage-differential CNV. This CNV overlapped genes related to muscle cell differentiation, which might be generated from a retropseudogene of CTH but was deleted along Bos indicus lineage. CONCLUSIONS This study presents a genome wide CNV comparison between Bos taurus and Bos indicus. It supplied essential genome diversity information for understanding of adaptation and phenotype differences between the Bos taurus and Bos indicus populations.
Collapse
Affiliation(s)
- Yan Hu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Han Xia
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mingxun Li
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Building 306, Room 111, BARC-East, Beltsville, MD, 20705, USA
- College of Animal Science and Technology, Yangzhou University, Yangzhou, 225009, China
| | - Chang Xu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaowei Ye
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ruixue Su
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mai Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Oyekanmi Nash
- Centre for Genomics Research and Innovation, National Biotechnology Development Agency, Abuja, Nigeria
| | | | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Building 306, Room 111, BARC-East, Beltsville, MD, 20705, USA.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
71
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
72
|
Mallory XF, Edrisi M, Navin N, Nakhleh L. Methods for copy number aberration detection from single-cell DNA-sequencing data. Genome Biol 2020; 21:208. [PMID: 32807205 PMCID: PMC7433197 DOI: 10.1186/s13059-020-02119-8] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 07/23/2020] [Indexed: 02/06/2023] Open
Abstract
Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.
Collapse
Affiliation(s)
- Xian F. Mallory
- Department of Computer Science, Rice University, Houston, TX USA
- Department of Computer Science, Florida State University, Tallahassee, FL USA
| | | | - Nicholas Navin
- Department of Genetics, the University of Texas M.D. Anderson Cancer Center, Houston, TX USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX USA
| |
Collapse
|
73
|
Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020; 21:189. [PMID: 32746918 PMCID: PMC7477834 DOI: 10.1186/s13059-020-02107-y] [Citation(s) in RCA: 166] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/14/2020] [Indexed: 01/01/2023] Open
Abstract
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yue Jiang
- Nebula Genomics, Harbin, 150030, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Yan Gao
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Zhe Cui
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
74
|
Xu ML, Qin JC, Chen BY, Yang XX, Liu HP, Yuan WX, Zhong JM, Huang LM, Zhou WJ. Characterization of a Novel 71.8 kb α 0-Thalassemia Deletion and Subsequent Summary of a Practical Procedure for Thalassemia Molecular Diagnosis. Hemoglobin 2020; 44:259-263. [PMID: 32646243 DOI: 10.1080/03630269.2020.1790385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Thalassemia is the most common monogenic disorder around the world. Based on the principle of genotype-phenotype correlation, identification of thalassemia mutations is the essential prerequisite for clinical diagnosis and management. Because only common mutations are routinely detected, the identification of rare or undetermined mutations is a challenge for clinical laboratories. Herein, a proband presenting with inconsistent phenotype-genotype correlation after routine molecular screening was investigated by multiplex ligation-dependent probe amplification (MLPA), targeted-next generation sequencing (targeted-NGS), gap-polymerase chain reaction (gap-PCR) and Sanger sequencing. Eventually, a novel 71.8 kb deletion (- -71.8) was identified and characterized, which included HBZ (ζ), HBA2 (α2), and HBA1 (α1) genes and was causing α0-thalassemia (α0-thal). Furthermore, we summarized a practical procedure based on accumulated experience in studies and clinical practice, which can be a guide for molecular screening and clinical diagnosis of thalassemia, especially for identification of undetermined or novel mutations.
Collapse
Affiliation(s)
- Ming-Li Xu
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| | - Jia-Chun Qin
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| | - Bi-Yan Chen
- Department of Genetic Metabolism, Prenatal Diagnostic Center, Maternal and Child Health Hospital of Guangxi Zhuang Autonomous Region, Nanning, People's Republic of China
| | - Xue-Xi Yang
- School of Laboratory Medical and Biotechnology, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| | - Hai-Ping Liu
- Neonatal Screening Center, Maternal and Child Health Hospital of Foshan, Foshan, Guangdong Province, People's Republic of China
| | - Wei-Xi Yuan
- Neonatal Screening Center, Maternal and Child Health Hospital of Foshan, Foshan, Guangdong Province, People's Republic of China
| | - Jian-Mei Zhong
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| | - Li-Min Huang
- School of Laboratory Medical and Biotechnology, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| | - Wan-Jun Zhou
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong Province, People's Republic of China
| |
Collapse
|
75
|
Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, Levy Y, Harel TH, Shalev-Schlosser G, Amsellem Z, Razifard H, Caicedo AL, Tieman DM, Klee H, Kirsche M, Aganezov S, Ranallo-Benavidez TR, Lemmon ZH, Kim J, Robitaille G, Kramer M, Goodwin S, McCombie WR, Hutton S, Van Eck J, Gillis J, Eshed Y, Sedlazeck FJ, van der Knaap E, Schatz MC, Lippman ZB. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell 2020; 182:145-161.e23. [PMID: 32553272 PMCID: PMC7354227 DOI: 10.1016/j.cell.2020.05.021] [Citation(s) in RCA: 400] [Impact Index Per Article: 100.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 04/10/2020] [Accepted: 05/12/2020] [Indexed: 12/22/2022]
Abstract
Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.
Collapse
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Xingang Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Matthias Benoit
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sebastian Soyk
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Lara Pereira
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA
| | - Lei Zhang
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA
| | - Hamsini Suresh
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - Danielle Ciren
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yuval Levy
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Tom Hai Harel
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Gili Shalev-Schlosser
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ziva Amsellem
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hamid Razifard
- Institute for Applied Life Sciences, University of Massachusetts Amherst, Amherst, MA 01003, USA; Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Ana L Caicedo
- Institute for Applied Life Sciences, University of Massachusetts Amherst, Amherst, MA 01003, USA; Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Denise M Tieman
- Horticultural Sciences, Plant Innovation Center, University of Florida, Gainesville, FL 32611, USA
| | - Harry Klee
- Horticultural Sciences, Plant Innovation Center, University of Florida, Gainesville, FL 32611, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | | | - Zachary H Lemmon
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Jennifer Kim
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Gina Robitaille
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - W Richard McCombie
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Samuel Hutton
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL 33598, USA
| | - Joyce Van Eck
- Boyce Thompson Institute, Ithaca, NY 14853, USA; Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yuval Eshed
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Esther van der Knaap
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA; Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, GA 30602, USA; Department of Horticulture, University of Georgia, Athens, GA 30602, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
76
|
Yuan X, Bai J, Zhang J, Yang L, Duan J, Li Y, Gao M. CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1141-1153. [PMID: 30489272 DOI: 10.1109/tcbb.2018.2883333] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Characterizing copy number variations (CNVs) from sequenced genomes is a both feasible and cost-effective way to search for driver genes in cancer diagnosis. A number of existing algorithms for CNV detection only explored part of the features underlying sequence data and copy number structures, resulting in limited performance. Here, we describe CONDEL, a method for detecting CNVs from single tumor samples using high-throughput sequence data. CONDEL utilizes a novel statistic in combination with a peel-off scheme to assess the statistical significance of genome bins, and adopts a Bayesian approach to infer copy number gains, losses, and deletion zygosity based on statistical mixture models. We compare CONDEL to six peer methods on a large number of simulation datasets, showing improved performance in terms of true positive and false positive rates, and further validate CONDEL on three real datasets derived from the 1000 Genomes Project and the EGA archive. CONDEL obtained higher consistent results in comparison with other three single sample-based methods, and exclusively identified a number of CNVs that were previously associated with cancers. We conclude that CONDEL is a powerful tool for detecting copy number variations on single tumor samples even if these are sequenced at low-coverage.
Collapse
|
77
|
Wei YC, Huang GH. CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths. Sci Rep 2020; 10:10493. [PMID: 32591545 PMCID: PMC7319969 DOI: 10.1038/s41598-020-64353-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 04/15/2020] [Indexed: 12/26/2022] Open
Abstract
Copy number variations (CNVs) are genomic structural mutations consisting of abnormal numbers of fragment copies. Next-generation sequencing of read-depth signals mirrors these variants. Some tools used to predict CNVs by depth have been published, but most of these tools can be applied to only a specific data type due to modeling limitations. We develop a tool for copy number variation detection by a Bayesian procedure, i.e., CONY, that adopts a Bayesian hierarchical model and an efficient reversible-jump Markov chain Monte Carlo inference algorithm for whole genome sequencing of read-depth data. CONY can be applied not only to individual samples for estimating the absolute number of copies but also to case-control pairs for detecting patient-specific variations. We evaluate the performance of CONY and compare CONY with competing approaches through simulations and by using experimental data from the 1000 Genomes Project. CONY outperforms the other methods in terms of accuracy in both single-sample and paired-samples analyses. In addition, CONY performs well regardless of whether the data coverage is high or low. CONY is useful for detecting both absolute and relative CNVs from read-depth data sequences. The package is available at https://github.com/weiyuchung/CONY.
Collapse
Affiliation(s)
- Yu-Chung Wei
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, No.1 Jinde Road, Changhua City, Changhua County, 50007, Taiwan
| | - Guan-Hua Huang
- Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu, 30010, Taiwan.
| |
Collapse
|
78
|
A genome-wide survey of copy number variations reveals an asymmetric evolution of duplicated genes in rice. BMC Biol 2020; 18:73. [PMID: 32591023 PMCID: PMC7318451 DOI: 10.1186/s12915-020-00798-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 05/20/2020] [Indexed: 11/21/2022] Open
Abstract
Background Copy number variations (CNVs) are an important type of structural variations in the genome that usually affect gene expression levels by gene dosage effect. Understanding CNVs as part of genome evolution may provide insights into the genetic basis of important agricultural traits and contribute to the crop breeding in the future. While available methods to detect CNVs utilizing next-generation sequencing technology have helped shed light on prevalence and effects of CNVs, the complexity of crop genomes poses a major challenge and requires development of additional tools. Results Here, we generated genomic and transcriptomic data of 93 rice (Oryza sativa L.) accessions and developed a comprehensive pipeline to call CNVs in this large-scale dataset. We analyzed the correlation between CNVs and gene expression levels and found that approximately 13% of the identified genes showed a significant correlation between their expression levels and copy numbers. Further analysis showed that about 36% of duplicate pairs were involved in pseudogenetic events while only 5% of them showed functional differentiation. Moreover, the offspring copy mainly contributed to the expression levels and seemed more likely to become a pseudogene, whereas the parent copy tended to maintain the function of ancestral gene. Conclusion We provide a high-accuracy CNV dataset that will contribute to functional genomics studies and molecular breeding in rice. We also showed that gene dosage effect of CNVs in rice is not exponential or linear. Our work demonstrates that the evolution of duplicated genes is asymmetric in both expression levels and gene fates, shedding a new insight into the evolution of duplicated genes.
Collapse
|
79
|
Berberich AJ, Wang J, Cao H, McIntyre AD, Spaic T, Miller DB, Stock S, Huot C, Stein R, Knoll J, Yang P, Robinson JF, Hegele RA. Simplifying Detection of Copy-Number Variations in Maturity-Onset Diabetes of the Young. Can J Diabetes 2020; 45:71-77. [PMID: 33011132 DOI: 10.1016/j.jcjd.2020.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 05/29/2020] [Accepted: 06/01/2020] [Indexed: 11/30/2022]
Abstract
OBJECTIVES Copy-number variations (CNVs) are large-scale deletions or duplications of DNA that have required specialized detection methods, such as microarray-based genomic hybridization or multiplex ligation probe amplification. However, recent advances in bioinformatics have made it possible to detect CNVs from next-generation DNA sequencing (NGS) data. Maturity-onset diabetes of the young (MODY) 5 is a subtype of autosomal-dominant diabetes that is often caused by heterozygous deletions involving the HNF1B gene on chromosome 17q12. We evaluated the utility of bioinformatic processing of raw NGS data to detect chromosome 17q12 deletions in MODY5 patients. METHODS NGS data from 57 patients clinically suspected to have MODY but who were negative for pathogenic mutations using a targeted panel were re-examined using a CNV calling tool (CNV Caller, VarSeq version 1.4.3). Potential CNVs for MODY5 were then confirmed using whole-exome sequencing, cytogenetic analysis and breakpoint analysis when possible. RESULTS Whole-gene deletions in HNF1B, ranging from 1.46 to 1.85 million basepairs in size, were detected in 3 individuals with features of MODY5. These were confirmed by independent methods to be part of a more extensive 17q12 deletion syndrome. Two additional patients carrying a 17q12 deletion were subsequently diagnosed using this method. CONCLUSIONS Large-scale deletions are the most common cause of MODY5 and can be detected directly from NGS data, without the need for additional methods.
Collapse
Affiliation(s)
- Amanda J Berberich
- Department of Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.
| | - Jian Wang
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Henian Cao
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Adam D McIntyre
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Tamara Spaic
- Department of Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - David B Miller
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Suzanne Stock
- Department of Pediatrics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Celine Huot
- Department of Pediatrics, CHU Sainte-Justine, University of Montreal, Montréal, Quebec, Canada
| | - Robert Stein
- Department of Pediatrics, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Joan Knoll
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Ping Yang
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - John F Robinson
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Robert A Hegele
- Department of Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada; Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
80
|
Zmienko A, Marszalek-Zenczak M, Wojciechowski P, Samelak-Czajka A, Luczak M, Kozlowski P, Karlowski WM, Figlerowicz M. AthCNV: A Map of DNA Copy Number Variations in the Arabidopsis Genome. THE PLANT CELL 2020; 32:1797-1819. [PMID: 32265262 PMCID: PMC7268809 DOI: 10.1105/tpc.19.00640] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 03/09/2020] [Accepted: 03/30/2020] [Indexed: 05/13/2023]
Abstract
Copy number variations (CNVs) greatly contribute to intraspecies genetic polymorphism and phenotypic diversity. Recent analyses of sequencing data for >1000 Arabidopsis (Arabidopsis thaliana) accessions focused on small variations and did not include CNVs. Here, we performed genome-wide analysis and identified large indels (50 to 499 bp) and CNVs (500 bp and larger) in these accessions. The CNVs fully overlap with 18.3% of protein-coding genes, with enrichment for evolutionarily young genes and genes involved in stress and defense. By combining analysis of both genes and transposable elements (TEs) affected by CNVs, we revealed that the variation statuses of genes and TEs are tightly linked and jointly contribute to the unequal distribution of these elements in the genome. We also determined the gene copy numbers in a set of 1060 accessions and experimentally validated the accuracy of our predictions by multiplex ligation-dependent probe amplification assays. We then successfully used the CNVs as markers to analyze population structure and migration patterns. Finally, we examined the impact of gene dosage variation triggered by a CNV spanning the SEC10 gene on SEC10 expression at both the transcript and protein levels. The catalog of CNVs, CNV-overlapping genes, and their genotypes in a top model dicot will stimulate the exploration of the genetic basis of phenotypic variation.
Collapse
Affiliation(s)
- Agnieszka Zmienko
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
- Institute of Computing Science, Faculty of Computing Science, Poznan University of Technology, Poznan, Poland
| | | | - Pawel Wojciechowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
- Institute of Computing Science, Faculty of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Anna Samelak-Czajka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Magdalena Luczak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Wojciech M Karlowski
- Department of Computational Biology, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Marek Figlerowicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
- Institute of Computing Science, Faculty of Computing Science, Poznan University of Technology, Poznan, Poland
| |
Collapse
|
81
|
Gao B, Baudis M. Minimum error calibration and normalization for genomic copy number analysis. Genomics 2020; 112:3331-3341. [PMID: 32413400 DOI: 10.1016/j.ygeno.2020.05.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 05/05/2020] [Accepted: 05/06/2020] [Indexed: 11/25/2022]
Abstract
BACKGROUND Copy number variations (CNV) are regional deviations from the normal autosomal bi-allelic DNA content. While germline CNVs are a major contributor to genomic syndromes and inherited diseases, the majority of cancers accumulate extensive "somatic" CNV (sCNV or CNA) during the process of oncogenetic transformation and progression. While specific sCNV have closely been associated with tumorigenesis, intriguingly many neoplasias exhibit recurrent sCNV patterns beyond the involvement of a few cancer driver genes. Currently, CNV profiles of tumor samples are generated using genomic micro-arrays or high-throughput DNA sequencing. Regardless of the underlying technology, genomic copy number data is derived from the relative assessment and integration of multiple signals, with the data generation process being prone to contamination from several sources. Estimated copy number values have no absolute or strictly linear correlation to their corresponding DNA levels, and the extent of deviation differs between sample profiles, which poses a great challenge for data integration and comparison in large scale genome analysis. RESULTS In this study, we present a novel method named "Minimum Error Calibration and Normalization for Copy Numbers Analysis" (Mecan4CNA). It only requires CNV segmentation files as input, is platform independent, and has a high performance with limited hardware requirements. For a given multi-sample copy number dataset, Mecan4CNA can batch-normalize all samples to the corresponding true copy number levels of the main tumor clones. Experiments of Mecan4CNA on simulated data showed an overall accuracy of 93% and 91% in determining the normal level and single copy alteration (i.e. duplication or loss of one allele), respectively. Comparison of estimated normal levels and single copy alternations with existing methods and karyotyping data on the NCI-60 tumor cell line produced coherent results. To estimate the method's impact on downstream analyses, we performed GISTIC analyses on the original and Mecan4CNA normalized data from the Cancer Genome Atlas (TCGA) where the normalized data showed prominent improvements of both sensitivity and specificity in detecting focal regions. CONCLUSIONS Mecan4CNA provides an advanced method for CNA data normalization, especially in meta-analyses involving large profile numbers and heterogeneous source data quality. With its informative output and visualization options, Mecan4CNA also can improve the interpretation of individual CNA profiles. Mecan4CNA is freely available as a Python package and through its code repository on Github.
Collapse
Affiliation(s)
- Bo Gao
- Department of Molecular Life Sciences, University of Zurich, Switzerland; Swiss Institute of Bioinformatics, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Switzerland; Swiss Institute of Bioinformatics, Switzerland.
| |
Collapse
|
82
|
Zare F, Ansari S, Najarian K, Nabavi S. Preprocessing Sequence Coverage Data for More Precise Detection of Copy Number Variations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:868-876. [PMID: 30222580 PMCID: PMC7278033 DOI: 10.1109/tcbb.2018.2869738] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Copy number variation (CNV) is a type of genomic/genetic variation that plays an important role in phenotypic diversity, evolution, and disease susceptibility. Next generation sequencing (NGS) technologies have created an opportunity for more accurate detection of CNVs with higher resolution. However, efficient and precise detection of CNVs remains challenging due to high levels of noise and biases, data heterogeneity, and the "big data" nature of NGS data. Sequence coverage (readcount) data are mostly used for detecting CNVs, specially for whole exome sequencing data. Readcount data are contaminated with several types of biases and noise that hinder accurate detection of CNVs. In this work, we introduce a novel preprocessing pipeline for reducing noise and biases to improve the detection accuracy of CNVs in heterogeneous NGS data, such as cancer whole exome sequencing data. We have employed several normalization methods to reduce readcount's biases that are due to GC content of reads, read alignment problems, and sample impurity. We have also developed a novel efficient and effective smoothing approach based on Taut String to reduce noise and increase CNV detection power. Using simulated and real data we showed that employing the proposed preprocessing pipeline significantly improves the accuracy of CNV detection.
Collapse
|
83
|
Jia L, Liu N, Huang F, Zhou Z, He X, Li H, Wang Z, Yao W. intansv: an R package for integrative analysis of structural variations. PeerJ 2020; 8:e8867. [PMID: 32377445 PMCID: PMC7194084 DOI: 10.7717/peerj.8867] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 03/09/2020] [Indexed: 01/31/2023] Open
Abstract
Identification of structural variations between individuals is very important for the understanding of phenotype variations and diseases. Despite the existence of dozens of programs for prediction of structural variations, none of them is the golden standard in this field and the results of multiple programs were usually integrated to get more reliable predictions. Annotation and visualization of structural variations are important for the understanding of their functions. However, no program provides these functions currently as far as we are concerned. We report an R package, intansv, which can integrate the predictions of multiple programs as well as annotate and visualize structural variations. The source code and the help manual of intansv is freely available at https://github.com/venyao/intansv and http://www.bioconductor.org/packages/devel/bioc/html/intansv.html.
Collapse
Affiliation(s)
- Lihua Jia
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China.,National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou, Henan, China
| | - Na Liu
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China
| | - Fangfang Huang
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China
| | - Zhengfu Zhou
- Wheat Research Institute, Henan Academy of Agricultural Sciences, Zhengzhou, Henan, China
| | - Xin He
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China
| | - Haoran Li
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China
| | - Zhizhan Wang
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China
| | - Wen Yao
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan, China.,National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, Hubei, China
| |
Collapse
|
84
|
Zhao L, Liu H, Yuan X, Gao K, Duan J. Comparative study of whole exome sequencing-based copy number variation detection tools. BMC Bioinformatics 2020; 21:97. [PMID: 32138645 PMCID: PMC7059689 DOI: 10.1186/s12859-020-3421-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 02/17/2020] [Indexed: 02/23/2023] Open
Abstract
Background With the rapid development of whole exome sequencing (WES), an increasing number of tools are being proposed for copy number variation (CNV) detection based on this technique. However, no comprehensive guide is available for the use of these tools in clinical settings, which renders them inapplicable in practice. To resolve this problem, in this study, we evaluated the performances of four WES-based CNV tools, and established a guideline for the recommendation of a suitable tool according to the application requirements. Results In this study, first, we selected four WES-based CNV detection tools: CoNIFER, cn.MOPS, CNVkit and exomeCopy. Then, we evaluated their performances in terms of three aspects: sensitivity and specificity, overlapping consistency and computational costs. From this evaluation, we obtained four main results: (1) The sensitivity increases and subsequently stabilizes as the coverage or CNV size increases, while the specificity decreases. (2) CoNIFER performs better for CNV insertions than for CNV deletions, while the remaining tools exhibit the opposite trend. (3) CoNIFER, cn.MOPS and CNVkit realize satisfactory overlapping consistency, which indicates their results are trustworthy. (4) CoNIFER has the best space complexity and cn.MOPS has the best time complexity among these four tools. Finally, we established a guideline for tools’ usage according to these results. Conclusion No available tool performs excellently under all conditions; however, some tools perform excellently in some scenarios. Users can obtain a CNV tool recommendation from our paper according to the targeted CNV size, the CNV type or computational costs of their projects, as presented in Table 1, which is helpful even for users with limited knowledge of computer science.
Collapse
Affiliation(s)
- Lanling Zhao
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Han Liu
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Kun Gao
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Junbo Duan
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
85
|
Roca I, González-Castro L, Maynou J, Palacios L, Fernández H, Couce ML, Fernández-Marmiesse A. PattRec: An easy-to-use CNV detection tool optimized for targeted NGS assays with diagnostic purposes. Genomics 2020; 112:1245-1256. [DOI: 10.1016/j.ygeno.2019.07.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 05/25/2019] [Accepted: 07/21/2019] [Indexed: 12/17/2022]
|
86
|
Falque M, Jebreen K, Paux E, Knaak C, Mezmouk S, Martin OC. CNVmap: A Method and Software To Detect and Map Copy Number Variants from Segregation Data. Genetics 2020; 214:561-576. [PMID: 31882400 PMCID: PMC7054022 DOI: 10.1534/genetics.119.302881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 12/23/2019] [Indexed: 01/22/2023] Open
Abstract
Single nucleotide polymorphisms (SNPs) are used widely for detecting quantitative trait loci, or for searching for causal variants of diseases. Nevertheless, structural variations such as copy-number variants (CNVs) represent a large part of natural genetic diversity, and contribute significantly to trait variation. Numerous methods and softwares based on different technologies (amplicons, CGH, tiling, or SNP arrays, or sequencing) have already been developed to detect CNVs, but they bypass a wealth of information such as genotyping data from segregating populations, produced, e.g., for QTL mapping. Here, we propose an original method to both detect and genetically map CNVs using mapping panels. Specifically, we exploit the apparent heterozygous state of duplicated loci: peaks in appropriately defined genome-wide allelic profiles provide highly specific signatures that identify the nature and position of the CNVs. Our original method and software can detect and map automatically up to 33 different predefined types of CNVs based on segregation data only. We validate this approach on simulated and experimental biparental mapping panels in two maize populations and one wheat population. Most of the events found correspond to having just one extra copy in one of the parental lines, but the corresponding allelic value can be that of either parent. We also find cases with two or more additional copies, especially in wheat, where these copies locate to homeologues. More generally, our computational tool can be used to give additional value, at no cost, to many datasets produced over the past decade from genetic mapping panels.
Collapse
Affiliation(s)
- Matthieu Falque
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Kamel Jebreen
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
- Department of Mathematics, An-Najah National University, Nablus, Palestine
| | - Etienne Paux
- Université Clermont Auvergne, INRAE, GDEC, 63000 Clermont-Ferrand, France
| | | | | | - Olivier C Martin
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| |
Collapse
|
87
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
88
|
Identification of Novel CD74-NRG2α Fusion From Comprehensive Profiling of Lung Adenocarcinoma in Japanese Never or Light Smokers. J Thorac Oncol 2020; 15:948-961. [PMID: 32036070 DOI: 10.1016/j.jtho.2020.01.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/23/2020] [Accepted: 01/24/2020] [Indexed: 11/23/2022]
Abstract
INTRODUCTION Studies are yet to characterize the differences in molecular profiles of lung adenocarcinoma (LUAD) among divergent ethnic groups. Herein, we conducted comprehensive molecular profiling of LUAD in never or light smokers from Asia to discover novel targetable mutations and prognostic biomarkers of this distinct disease entity. METHODS We analyzed 996 cases of Japanese LUAD and performed whole-exome sequencing and RNA-seq in 125 cases of Japanese LUAD negative for the driver oncogenes defined by conventional laboratory testing. We also investigated the clinical and pathologic characteristics among the 996 cases. RESULTS Driver oncogenes were identified in 88 cases (70.4%) with specific hotspot mutations differing from those in The Cancer Genome Atlas study. Two actionable novel fusions of FGFR2 and NRG2α were also identified. Clustering on the basis of mRNA expression profiles, but not genetic mutational ones, could predict patient prognosis. The risk score generated by the expression of a three-gene set was a strong prognostic marker for overall survival and progression-free survival in our cohort, and was further validated using The Cancer Genome Atlas cohort. Among the 996 cases, each driver alteration is distributed across all histologic subtypes. Adenocarcinoma in situ was identified to harbor driver mutations, suggesting that these alterations are early events in the pathogenesis of LUAD. ERBB2 mutations were over-represented in young adults. CONCLUSIONS This study indicates the value of applying gene expression profiling for predicting the prognosis after a surgical operation, and that the identification of actionable mutations is important for optimizing targeted drugs in Japanese LUAD.
Collapse
|
89
|
Välipakka S, Savarese M, Sagath L, Arumilli M, Giugliano T, Udd B, Hackman P. Improving Copy Number Variant Detection from Sequencing Data with a Combination of Programs and a Predictive Model. J Mol Diagn 2020; 22:40-49. [DOI: 10.1016/j.jmoldx.2019.08.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 06/25/2019] [Accepted: 08/08/2019] [Indexed: 12/18/2022] Open
|
90
|
Yang H, Zhu D. Combinatorial Detection Algorithm for Copy Number Variations Using High-throughput Sequencing Reads. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419500228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Copy number variation (CNV) is a prevalent kind of genetic structural variation which leads to an abnormal number of copies of large genomic regions, such as gain or loss of DNA segments larger than 1[Formula: see text]kb. CNV exists not only in human genome but also in plant genome. Current researches have testified that CNV is associated with many complex diseases. In this paper, guanine-cytosine (GC) bias, mappability and their effect on read depth signals in sequencing data are discussed first. Subsequently, a new correction method for GC bias and an improved combinatorial detection algorithm for CNV using high-throughput sequencing reads based on hidden Markov model (CNV-HMM) are proposed. The corrected read depth signals have lower correlation with GC content, mappability of reads and the width of analysis window. Then we create a hidden Markov model which maps the reads onto the reference genome and records the unmapped reads. The unmapped reads are counted and normalized. The CNV-HMM detects the abnormal signal of read count and gains the candidate CNVs using the expectation maximization (EM) algorithm. Finally, we filter the candidate CNVs using split reads to promote the performance of our algorithm. The experiment result indicates that the CNV-HMM algorithm has higher accuracy and sensitivity for CNVs detection than most current detection algorithms.
Collapse
Affiliation(s)
- Hai Yang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, P. R. China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Qingdao 266237, P. R. China
| |
Collapse
|
91
|
Luo F. A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data. BMC Bioinformatics 2019; 20:692. [PMID: 31874603 PMCID: PMC6929333 DOI: 10.1186/s12859-019-3266-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The Copy Number Alterations (CNAs) are discovered to be tightly associated with cancers, so accurately detecting them is one of the most important tasks in the cancer genomics. A series of CNAs detection methods have been proposed and new ones are still being developed. Due to the complexity of CNAs in cancers, no CNAs detection method has been accepted as the gold standard caller. Several evaluation works have made attempts to reveal typical CNAs detection methods' performance. Limited by the scale of evaluation data, these different comparison works don't reach a consensus and the researchers are still confused on how to choose one proper CNAs caller for their analysis. Therefore, it needs a more comprehensive evaluation of typical CNAs detection methods' performance. RESULTS In this work, we use a large-scale real dataset from CAGEKID consortium to evaluate total 12 typical CNAs detection methods. These methods are most widely used in cancer researches and always used as benchmark for the newly proposed CNAs detection methods. This large-scale dataset comprises of SNP array data on 94 samples and the whole genome sequencing data on 10 samples. Evaluations are comprehensively implemented in current scenarios of CNAs detection, which include that detect CNAs on SNP array data, on sequencing data with tumor and normal matched samples and on sequencing data with single tumor sample. Three SNP based methods are firstly ranked. Subsequently, the best SNP based method's results are used as benchmark to compare six matched samples based methods and three single tumor sample based methods in terms of the preprocessing, recall rate, Jaccard index and segmentation characteristics. CONCLUSIONS Our survey thoroughly reveals 12 typical methods' superiority and inferiority. We explain why methods show specific characteristics from a methodological standpoint. Finally, we present the guiding principle for choosing one proper CNAs detection method under specific conditions. Some unsolved problems and expectations are also addressed for upcoming CNAs detection methods.
Collapse
Affiliation(s)
- Fei Luo
- School of Computer Science, Wuhan University, Wuhan, China.
| |
Collapse
|
92
|
Yohe LR, Davies KTJ, Simmons NB, Sears KE, Dumont ER, Rossiter SJ, Dávalos LM. Evaluating the performance of targeted sequence capture, RNA-Seq, and degenerate-primer PCR cloning for sequencing the largest mammalian multigene family. Mol Ecol Resour 2019; 20:140-153. [PMID: 31523924 DOI: 10.1111/1755-0998.13093] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Revised: 08/27/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022]
Abstract
Multigene families evolve from single-copy ancestral genes via duplication, and typically encode proteins critical to key biological processes. Molecular analyses of these gene families require high-confidence sequences, but the high sequence similarity of the members can create challenges for sequencing and downstream analyses. Focusing on the common vampire bat, Desmodus rotundus, we evaluated how different sequencing approaches performed in recovering the largest mammalian protein-coding multigene family: olfactory receptors (OR). Using the genome as a reference, we determined the proportion of intact protein-coding receptors recovered by: (a) amplicons from degenerate primers sequenced via Sanger technology, (b) RNA-Seq of the main olfactory epithelium, and (c) those genes captured with probes designed from transcriptomes of closely-related species. Our initial re-annotation of the high-quality vampire bat genome resulted in >400 intact OR genes, more than doubling the original estimate. Sanger-sequenced amplicons performed the poorest among the three approaches, detecting <33% of receptors in the genome. In contrast, the transcriptome reliably recovered >50% of the annotated genomic ORs, and targeted sequence capture recovered nearly 75% of annotated genes. Each sequencing approach assembled high-quality sequences, even if it did not recover all receptors in the genome. While some variation may be due to limitations of the study design (e.g., different individuals), variation among approaches was mostly caused by low coverage of some receptors rather than high rates of assembly error. Given this variability, we caution against using the counts of intact receptors per species to model the birth-death process of multigene families. Instead, our results support the use of orthologous sequences to explore and model the evolutionary processes shaping these genes.
Collapse
Affiliation(s)
- Laurel R Yohe
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA.,Department of Geology and Geophysics, Yale University, Stony Brook, NY, USA
| | - Kalina T J Davies
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Nancy B Simmons
- Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Karen E Sears
- Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA, USA
| | - Elizabeth R Dumont
- School of Natural Sciences, University of California Merced, Merced, CA, USA
| | - Stephen J Rossiter
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Liliana M Dávalos
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA.,Consortium for Inter-Disciplinary Environmental Research, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
93
|
Vegesna R, Tomaszkiewicz M, Medvedev P, Makova KD. Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes. PLoS Genet 2019; 15:e1008369. [PMID: 31525193 PMCID: PMC6772104 DOI: 10.1371/journal.pgen.1008369] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/01/2019] [Accepted: 08/13/2019] [Indexed: 12/28/2022] Open
Abstract
The Y chromosome harbors nine multi-copy ampliconic gene families expressed exclusively in testis. The gene copies within each family are >99% identical to each other, which poses a major challenge in evaluating their copy number. Recent studies demonstrated high variation in Y ampliconic gene copy number among humans. However, how this variation affects expression levels in human testis remains understudied. Here we developed a novel computational tool Ampliconic Copy Number Estimator (AmpliCoNE) that utilizes read sequencing depth information to estimate Y ampliconic gene copy number per family. We applied this tool to whole-genome sequencing data of 149 men with matched testis expression data whose samples are part of the Genotype-Tissue Expression (GTEx) project. We found that the Y ampliconic gene families with low copy number in humans were deleted or pseudogenized in non-human great apes, suggesting relaxation of functional constraints. Among the Y ampliconic gene families, higher copy number leads to higher expression. Within the Y ampliconic gene families, copy number does not influence gene expression, rather a high tolerance for variation in gene expression was observed in testis of presumably healthy men. No differences in gene expression levels were found among major Y haplogroups. Age positively correlated with expression levels of the HSFY and PRY gene families in the African subhaplogroup E1b, but not in the European subhaplogroups R1b and I1. We also found that expression of five Y ampliconic gene families is coordinated with that of their non-Y (i.e. X or autosomal) homologs. Indeed, five ampliconic gene families had consistently lower expression levels when compared to their non-Y homologs suggesting dosage regulation, while the HSFY family had higher expression levels than its X homolog and thus lacked dosage regulation.
Collapse
MESH Headings
- Animals
- Chromosomes, Human, Y/genetics
- Chromosomes, Human, Y/physiology
- DNA Copy Number Variations/genetics
- Databases, Genetic
- Dosage Compensation, Genetic/genetics
- Dosage Compensation, Genetic/physiology
- Epigenesis, Genetic/genetics
- Gene Dosage/genetics
- Gene Expression/genetics
- Gene Expression Regulation/genetics
- Genes, Y-Linked/genetics
- Genes, Y-Linked/physiology
- Heat Shock Transcription Factors/genetics
- Heat Shock Transcription Factors/metabolism
- Humans
- Male
- Multigene Family/genetics
- Sequence Analysis, DNA/methods
- Testis/metabolism
Collapse
Affiliation(s)
- Rahulsimham Vegesna
- Bioinformatics and Genomics Graduate Program, The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, PA, United States of America
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States of America
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, United States of America
| | - Paul Medvedev
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States of America
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, United States of America
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, United States of America
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, United States of America
| | - Kateryna D. Makova
- Bioinformatics and Genomics Graduate Program, The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, PA, United States of America
- Department of Biology, Pennsylvania State University, University Park, PA, United States of America
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, United States of America
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, United States of America
| |
Collapse
|
94
|
Ried T, Meijer GA, Harrison DJ, Grech G, Franch-Expósito S, Briffa R, Carvalho B, Camps J. The landscape of genomic copy number alterations in colorectal cancer and their consequences on gene expression levels and disease outcome. Mol Aspects Med 2019; 69:48-61. [PMID: 31365882 DOI: 10.1016/j.mam.2019.07.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/23/2019] [Accepted: 07/26/2019] [Indexed: 12/18/2022]
Abstract
Aneuploidy, the unbalanced state of the chromosome content, represents a hallmark of most solid tumors, including colorectal cancer. Such aneuploidies result in tumor specific genomic imbalances, which emerge in premalignant precursor lesions. Moreover, increasing levels of chromosomal instability have been observed in adenocarcinomas and are maintained in distant metastases. A number of studies have systematically integrated copy number alterations with gene expression changes in primary carcinomas, cell lines, and experimental models of aneuploidy. In fact, chromosomal aneuploidies target a number of genes conferring a selective advantage for the metabolism of the cancer cell. Copy number alterations not only have a positive correlation with expression changes of the majority of genes on the altered genomic segment, but also have effects on the transcriptional levels of genes genome-wide. Finally, copy number alterations have been associated with disease outcome; nevertheless, the translational applicability in clinical practice requires further studies. Here, we (i) review the spectrum of genetic alterations that lead to colorectal cancer, (ii) describe the most frequent copy number alterations at different stages of colorectal carcinogenesis, (iii) exemplify their positive correlation with gene expression levels, and (iv) discuss copy number alterations that are potentially involved in disease outcome of individual patients.
Collapse
Affiliation(s)
- Thomas Ried
- Genetics Branch, Center for Cancer Research, National Cancer Institute/National Institutes of Health, Bethesda, MD, USA.
| | - Gerrit A Meijer
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - David J Harrison
- School of Medicine, University of St Andrews, St Andrews, Scotland, UK
| | - Godfrey Grech
- Laboratory of Molecular Pathology, Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
| | - Sebastià Franch-Expósito
- Gastrointestinal and Pancreatic Oncology Group, Institut D'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), CIBEREHD, Barcelona, Spain
| | - Romina Briffa
- School of Medicine, University of St Andrews, St Andrews, Scotland, UK; Laboratory of Molecular Pathology, Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
| | - Beatriz Carvalho
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Jordi Camps
- Gastrointestinal and Pancreatic Oncology Group, Institut D'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), CIBEREHD, Barcelona, Spain; Unitat de Biologia Cel·lular i Genètica Mèdica, Departament de Biologia Cel·lular, Fisiologia i Immunologia, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain.
| |
Collapse
|
95
|
Zhang L, Bai W, Yuan N, Du Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol 2019; 15:e1007069. [PMID: 31136576 PMCID: PMC6555534 DOI: 10.1371/journal.pcbi.1007069] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 06/07/2019] [Accepted: 05/06/2019] [Indexed: 12/15/2022] Open
Abstract
Motivation: Recently, copy number variation (CNV) has gained considerable interest as a type of genomic variation that plays an important role in complex phenotypes and disease susceptibility. Since a number of CNV detection methods have recently been developed, it is necessary to help investigators choose suitable methods for CNV detection depending on their objectives. For this reason, this study compared ten commonly used CNV detection applications, including CNVnator, ReadDepth, RDXplorer, LUMPY and Control-FREEC, benchmarking the applications by sensitivity, specificity and computational demands. Taking the DGV gold standard variants as a standard dataset, we evaluated the ten applications with real sequencing data at sequencing depths from 5X to 50X. Among the ten methods benchmarked, LUMPY performs the best for both high sensitivity and specificity at each sequencing depth. For the purpose of high specificity, Canvas is also a good choice. If high sensitivity is preferred, CNVnator and RDXplorer are better choices. Additionally, CNVnator and GROM-RD perform well for low-depth sequencing data. Our results provide a comprehensive performance evaluation for these selected CNV detection methods and facilitate future development and improvement in CNV prediction methods. As an important type of genomic structural variation, CNVs are associated with complex phenotypes because they change the number of copies of genes in cells, affecting coding sequences and playing an important role in the susceptibility or resistance to human diseases. To identify CNVs, several experimental methods have been developed, but their resolution is very low, and the detection of short CNVs presents a bottleneck. In recent years, the advancement of high-throughput sequencing techniques has made it possible to precisely detect CNVs, especially short ones. Many CNV detection applications were developed based on the availability of high-throughput sequencing data. Due to different CNV detection algorithms, the CNVs identified by different applications vary greatly. Therefore, it is necessary to help investigators choose suitable applications for CNV detection depending upon their objectives. For this reason, we not only compared ten commonly used CNV detection applications but also benchmarked the applications by sensitivity, specificity and computational demands. Our results show that the sequencing depth can strongly affect CNV detection. Among the ten applications benchmarked, LUMPY performs best for both high sensitivity and specificity for each sequencing depth. We also give recommended applications for specific purposes, for example, CNVnator and RDXplorer for high sensitivity and CNVnator and GROM-RD for low-depth sequencing data.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
- Medical Big Data Center, Sichuan University, Chengdu, China
- Zdmedical, Information polytron Technologies Inc. Chongqing, Chongqing, China
- * E-mail: (LZ); (ZD)
| | - Wanyu Bai
- College of Computer Science, Sichuan University, Chengdu, China
| | - Na Yuan
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China
| | - Zhenglin Du
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China
- * E-mail: (LZ); (ZD)
| |
Collapse
|
96
|
Jin Y, Chen G, Xiao W, Hong H, Xu J, Guo Y, Xiao W, Shi T, Shi L, Tong W, Ning B. Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine. SCIENCE CHINA-LIFE SCIENCES 2019; 62:895-904. [PMID: 31114935 DOI: 10.1007/s11427-018-9479-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 12/06/2018] [Indexed: 12/26/2022]
Abstract
High-throughput next generation sequencing (NGS) is a shotgun approach applied in a parallel fashion by which the genome is fragmented and sequenced through small pieces and then analyzed either by aligning to a known reference genome or by de novo assembly without reference genome. This technology has led researchers to conduct an explosion of sequencing related projects in multidisciplinary fields of science. However, due to the limitations of sequencing-based chemistry, length of sequencing reads and the complexity of genes, it is difficult to determine the sequences of some portions of the human genome, leaving gaps in genomic data that frustrate further analysis. Particularly, some complex genes are difficult to be accurately sequenced or mapped because they contain high GC-content and/or low complexity regions, and complicated pseudogenes, such as the genes encoding xenobiotic metabolizing enzymes and transporters (XMETs). The genetic variants in XMET genes are critical to predicate inter-individual variability in drug efficacy, drug safety and susceptibility to environmental toxicity. We summarized and discussed challenges, wet-lab methods, and bioinformatics algorithms in sequencing "complex" XMET genes, which may provide insightful information in the application of NGS technology for implementation in toxicogenomics and pharmacogenomics.
Collapse
Affiliation(s)
- Yaqiong Jin
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yongli Guo
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Wenzhong Xiao
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Cancer Center; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, 200433, China
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
97
|
Detection of False-Positive Deletions from the Database of Genomic Variants. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8420547. [PMID: 31080831 PMCID: PMC6475568 DOI: 10.1155/2019/8420547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 02/24/2019] [Accepted: 03/04/2019] [Indexed: 11/24/2022]
Abstract
Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.
Collapse
|
98
|
Sharim H, Grunwald A, Gabrieli T, Michaeli Y, Margalit S, Torchinsky D, Arielly R, Nifker G, Juhasz M, Gularek F, Almalvez M, Dufault B, Chandra SS, Liu A, Bhattacharya S, Chen YW, Vilain E, Wagner KR, Pevsner J, Reifenberger J, Lam ET, Hastie AR, Cao H, Barseghyan H, Weinhold E, Ebenstein Y. Long-read single-molecule maps of the functional methylome. Genome Res 2019; 29:646-656. [PMID: 30846530 PMCID: PMC6442387 DOI: 10.1101/gr.240739.118] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 02/25/2019] [Indexed: 01/23/2023]
Abstract
We report on the development of a methylation analysis workflow for optical detection of fluorescent methylation profiles along chromosomal DNA molecules. In combination with Bionano Genomics genome mapping technology, these profiles provide a hybrid genetic/epigenetic genome-wide map composed of DNA molecules spanning hundreds of kilobase pairs. The method provides kilobase pair–scale genomic methylation patterns comparable to whole-genome bisulfite sequencing (WGBS) along genes and regulatory elements. These long single-molecule reads allow for methylation variation calling and analysis of large structural aberrations such as pathogenic macrosatellite arrays not accessible to single-cell second-generation sequencing. The method is applied here to study facioscapulohumeral muscular dystrophy (FSHD), simultaneously recording the haplotype, copy number, and methylation status of the disease-associated, highly repetitive locus on Chromosome 4q.
Collapse
Affiliation(s)
- Hila Sharim
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Assaf Grunwald
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Tslil Gabrieli
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Yael Michaeli
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Sapir Margalit
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Dmitry Torchinsky
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Rani Arielly
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Gil Nifker
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| | - Matyas Juhasz
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Felix Gularek
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Miguel Almalvez
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Brandon Dufault
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Sreetama Sen Chandra
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Alexander Liu
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Surajit Bhattacharya
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Yi-Wen Chen
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Eric Vilain
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Kathryn R Wagner
- Kennedy Krieger Institute and Departments of Neurology and Neuroscience, The Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | - Jonathan Pevsner
- Kennedy Krieger Institute and Departments of Neurology and Neuroscience, The Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | | | - Ernest T Lam
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Alex R Hastie
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Han Cao
- Bionano Genomics, Incorporated, San Diego, California 92121, USA
| | - Hayk Barseghyan
- Center for Genetic Medicine Research, Children's National Health System, Children's Research Institute, Washington, DC 20010, USA
| | - Elmar Weinhold
- Institute of Organic Chemistry RWTH Aachen University, D-52056 Aachen, Germany
| | - Yuval Ebenstein
- School of Chemistry, Center for Nanoscience and Nanotechnology, Center for Light-Matter Interaction, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Ramat Aviv 6997801, Israel
| |
Collapse
|
99
|
Marcionetti A, Rossier V, Roux N, Salis P, Laudet V, Salamin N. Insights into the Genomics of Clownfish Adaptive Radiation: Genetic Basis of the Mutualism with Sea Anemones. Genome Biol Evol 2019; 11:869-882. [PMID: 30830203 PMCID: PMC6430985 DOI: 10.1093/gbe/evz042] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2019] [Indexed: 02/06/2023] Open
Abstract
Clownfishes are an iconic group of coral reef fishes, especially known for their mutualism with sea anemones. This mutualism is particularly interesting as it likely acted as the key innovation that triggered clownfish adaptive radiation. Indeed, after the acquisition of the mutualism, clownfishes diversified into multiple ecological niches linked with host and habitat use. However, despite the importance of this mutualism, the genetic mechanisms allowing clownfishes to interact with sea anemones are still unclear. Here, we used a comparative genomics and molecular evolutionary analyses to investigate the genetic basis of clownfish mutualism with sea anemones. We assembled and annotated the genome of nine clownfish species and one closely related outgroup. Orthologous genes inferred between these species and additional publicly available teleost genomes resulted in almost 16,000 genes that were tested for positively selected substitutions potentially involved in the adaptation of clownfishes to live in sea anemones. We identified 17 genes with a signal of positive selection at the origin of clownfish radiation. Two of them (Versican core protein and Protein O-GlcNAse) show particularly interesting functions associated with N-acetylated sugars, which are known to be involved in sea anemone discharge of toxins. This study provides the first insights into the genetic mechanisms of clownfish mutualism with sea anemones. Indeed, we identified the first candidate genes likely to be associated with clownfish protection form sea anemones, and thus the evolution of their mutualism. Additionally, the genomic resources acquired represent a valuable resource for further investigation of the genomic basis of clownfish adaptive radiation.
Collapse
Affiliation(s)
- Anna Marcionetti
- Department of Computational Biology, Génopode, University of Lausanne, Switzerland
| | - Victor Rossier
- Department of Computational Biology, Génopode, University of Lausanne, Switzerland
| | - Natacha Roux
- Observatoire Océanologique de Banyuls-sur-Mer, UMR CNRS 7232 BIOM, Sorbonne University, Banyuls-sur-Mer, France
| | - Pauline Salis
- Observatoire Océanologique de Banyuls-sur-Mer, UMR CNRS 7232 BIOM, Sorbonne University, Banyuls-sur-Mer, France
| | - Vincent Laudet
- Observatoire Océanologique de Banyuls-sur-Mer, UMR CNRS 7232 BIOM, Sorbonne University, Banyuls-sur-Mer, France
| | - Nicolas Salamin
- Department of Computational Biology, Génopode, University of Lausanne, Switzerland
| |
Collapse
|
100
|
Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. SCIENCE CHINA-LIFE SCIENCES 2019; 62:467-488. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|