1
|
Zhang Y, Liu W, Duan J. On the core segmentation algorithms of copy number variation detection tools. Brief Bioinform 2024; 25:bbae022. [PMID: 38340093 PMCID: PMC10858679 DOI: 10.1093/bib/bbae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/26/2023] [Indexed: 02/12/2024] Open
Abstract
Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.
Collapse
Affiliation(s)
- Yibo Zhang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Wenyu Liu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| | - Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
2
|
Borchers NS, Santos-Valente E, Toncheva AA, Wehkamp J, Franke A, Gaertner VD, Nordkild P, Genuneit J, Jensen BAH, Kabesch M. Human β-Defensin 2 Mutations Are Associated With Asthma and Atopy in Children and Its Application Prevents Atopic Asthma in a Mouse Model. Front Immunol 2021; 12:636061. [PMID: 33717182 PMCID: PMC7946850 DOI: 10.3389/fimmu.2021.636061] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/14/2021] [Indexed: 11/13/2022] Open
Abstract
Asthma and allergies are complex, chronic inflammatory diseases in which genetic and environmental factors are crucial. Protection against asthma and allergy development in the context of farming environment is established by early animal contact, unpasteurized milk consumption and gut microbiota maturation. The human β-defensin 2 (hBD-2) is a host defense peptide present almost exclusively in epithelial tissues, with pronounced immunomodulatory properties, which has recently been shown to ameliorate asthma and IBD in animal models. We hypothesized that adequate hBD-2 secretion plays a role in the protection against asthma and allergy development and that genetic variations in the complex gene locus coding for hBD-2 may be a risk factor for developing these diseases, if as a consequence, hBD-2 is insufficiently produced. We used MALDI-TOF MS genotyping, sequencing and a RFLP assay to study the genetic variation including mutations, polymorphisms and copy number variations in the locus harboring both genes coding for hBD-2 (DEFB4A and DEFB4B). We administered hBD-2 orally in a mouse model of house dust mite (HDM)-asthma before allergy challenge to explore its prophylactic potential, thereby mimicking a protective farm effect. Despite the high complexity of the region harboring DEFB4A and DEFB4B we identified numerous genetic variants to be associated with asthma and allergy in the GABRIELA Ulm population of 1,238 children living in rural areas, including rare mutations, polymorphisms and a lack of the DEFB4A. Furthermore, we found that prophylactic oral administration of hBD-2 significantly curbed lung resistance and pulmonary inflammation in our HDM mouse model. These data indicate that inadequate genetic capacity for hBD-2 is associated with increased asthma and allergy risk while adequate and early hBD-2 administration (in a mouse model) prevents atopic asthma. This suggests that hBD-2 could be involved in the protective farm effect and may be an excellent candidate to confer protection against asthma development.
Collapse
Affiliation(s)
- Natascha S. Borchers
- Department of Pediatric Pneumology and Allergy, University Children’s Hospital Regensburg (KUNO) at Hospital St. Hedwig of the Order of St. John, Regensburg, Germany
| | - Elisangela Santos-Valente
- Department of Pediatric Pneumology and Allergy, University Children’s Hospital Regensburg (KUNO) at Hospital St. Hedwig of the Order of St. John, Regensburg, Germany
| | - Antoaneta A. Toncheva
- Department of Pediatric Pneumology and Allergy, University Children’s Hospital Regensburg (KUNO) at Hospital St. Hedwig of the Order of St. John, Regensburg, Germany
| | - Jan Wehkamp
- Department of Internal Medicine II, University Hospital Tübingen, University of Tübingen, Tübingen, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology (IKMB), Kiel University, Kiel, Germany
| | - Vincent D. Gaertner
- Department of Pediatric Pneumology and Allergy, University Children’s Hospital Regensburg (KUNO) at Hospital St. Hedwig of the Order of St. John, Regensburg, Germany
- Newborn Research Zürich, University Hospital and University of Zürich, Zürich, Switzerland
| | | | - Jon Genuneit
- Pediatric Epidemiology, Department of Pediatrics, Medical Faculty, Leipzig University, Leipzig, Germany
| | - Benjamin A. H. Jensen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael Kabesch
- Department of Pediatric Pneumology and Allergy, University Children’s Hospital Regensburg (KUNO) at Hospital St. Hedwig of the Order of St. John, Regensburg, Germany
| |
Collapse
|
3
|
Statistical Considerations on NGS Data for Inferring Copy Number Variations. Methods Mol Biol 2021; 2243:27-58. [PMID: 33606251 DOI: 10.1007/978-1-0716-1103-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics, resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinformaticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for data modeling and interpretation.To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects (or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy number variations using NGS data.
Collapse
|
4
|
Shebanits K, Günther T, Johansson ACV, Maqbool K, Feuk L, Jakobsson M, Larhammar D. Copy number determination of the gene for the human pancreatic polypeptide receptor NPY4R using read depth analysis and droplet digital PCR. BMC Biotechnol 2019; 19:31. [PMID: 31164119 PMCID: PMC6549351 DOI: 10.1186/s12896-019-0523-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 04/30/2019] [Indexed: 01/08/2023] Open
Abstract
Background Copy number variation (CNV) plays an important role in human genetic diversity and has been associated with multiple complex disorders. Here we investigate a CNV on chromosome 10q11.22 that spans NPY4R, the gene for the appetite-regulating pancreatic polypeptide receptor Y4. This genomic region has been challenging to map due to multiple repeated elements and its precise organization has not yet been resolved. Previous studies using microarrays were interpreted to show that the most common copy number was 2 per genome. Results We have investigated 18 individuals from the 1000 Genomes project using the well-established method of read depth analysis and the new droplet digital PCR (ddPCR) method. We find that the most common copy number for NPY4R is 4. The estimated number of copies ranged from three to seven based on read depth analyses with Control-FREEC and CNVnator, and from four to seven based on ddPCR. We suggest that the difference between our results and those published previously can be explained by methodological differences such as reference gene choice, data normalization and method reliability. Three high-quality archaic human genomes (two Neanderthal and one Denisova) display four copies of the NPY4R gene indicating that a duplication occurred prior to the human-Neanderthal/Denisova split. Conclusions We conclude that ddPCR is a sensitive and reliable method for CNV determination, that it can be used for read depth calibration in CNV studies based on already available whole-genome sequencing data, and that further investigation of NPY4R copy number variation and its consequences are necessary due to the role of Y4 receptor in food intake regulation. Electronic supplementary material The online version of this article (10.1186/s12896-019-0523-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kateryna Shebanits
- Department of Neuroscience, SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Torsten Günther
- Human Evolution, Department of Organismal Biology, SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Anna C V Johansson
- Department of Cell and Molecular Biology, SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Khurram Maqbool
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology, SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, SciLifeLab, Uppsala University, Uppsala, Sweden.,Centre for Anthropological Research and Department of Anthropology and Development Studies, University of Johannesburg, Johannesburg, South Africa
| | - Dan Larhammar
- Department of Neuroscience, SciLifeLab, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
5
|
Zhang L, Bai W, Yuan N, Du Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol 2019; 15:e1007069. [PMID: 31136576 PMCID: PMC6555534 DOI: 10.1371/journal.pcbi.1007069] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 06/07/2019] [Accepted: 05/06/2019] [Indexed: 12/15/2022] Open
Abstract
Motivation: Recently, copy number variation (CNV) has gained considerable interest as a type of genomic variation that plays an important role in complex phenotypes and disease susceptibility. Since a number of CNV detection methods have recently been developed, it is necessary to help investigators choose suitable methods for CNV detection depending on their objectives. For this reason, this study compared ten commonly used CNV detection applications, including CNVnator, ReadDepth, RDXplorer, LUMPY and Control-FREEC, benchmarking the applications by sensitivity, specificity and computational demands. Taking the DGV gold standard variants as a standard dataset, we evaluated the ten applications with real sequencing data at sequencing depths from 5X to 50X. Among the ten methods benchmarked, LUMPY performs the best for both high sensitivity and specificity at each sequencing depth. For the purpose of high specificity, Canvas is also a good choice. If high sensitivity is preferred, CNVnator and RDXplorer are better choices. Additionally, CNVnator and GROM-RD perform well for low-depth sequencing data. Our results provide a comprehensive performance evaluation for these selected CNV detection methods and facilitate future development and improvement in CNV prediction methods. As an important type of genomic structural variation, CNVs are associated with complex phenotypes because they change the number of copies of genes in cells, affecting coding sequences and playing an important role in the susceptibility or resistance to human diseases. To identify CNVs, several experimental methods have been developed, but their resolution is very low, and the detection of short CNVs presents a bottleneck. In recent years, the advancement of high-throughput sequencing techniques has made it possible to precisely detect CNVs, especially short ones. Many CNV detection applications were developed based on the availability of high-throughput sequencing data. Due to different CNV detection algorithms, the CNVs identified by different applications vary greatly. Therefore, it is necessary to help investigators choose suitable applications for CNV detection depending upon their objectives. For this reason, we not only compared ten commonly used CNV detection applications but also benchmarked the applications by sensitivity, specificity and computational demands. Our results show that the sequencing depth can strongly affect CNV detection. Among the ten applications benchmarked, LUMPY performs best for both high sensitivity and specificity for each sequencing depth. We also give recommended applications for specific purposes, for example, CNVnator and RDXplorer for high sensitivity and CNVnator and GROM-RD for low-depth sequencing data.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
- Medical Big Data Center, Sichuan University, Chengdu, China
- Zdmedical, Information polytron Technologies Inc. Chongqing, Chongqing, China
- * E-mail: (LZ); (ZD)
| | - Wanyu Bai
- College of Computer Science, Sichuan University, Chengdu, China
| | - Na Yuan
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China
| | - Zhenglin Du
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China
- * E-mail: (LZ); (ZD)
| |
Collapse
|
6
|
Jin Y, Chen G, Xiao W, Hong H, Xu J, Guo Y, Xiao W, Shi T, Shi L, Tong W, Ning B. Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine. SCIENCE CHINA-LIFE SCIENCES 2019; 62:895-904. [PMID: 31114935 DOI: 10.1007/s11427-018-9479-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 12/06/2018] [Indexed: 12/26/2022]
Abstract
High-throughput next generation sequencing (NGS) is a shotgun approach applied in a parallel fashion by which the genome is fragmented and sequenced through small pieces and then analyzed either by aligning to a known reference genome or by de novo assembly without reference genome. This technology has led researchers to conduct an explosion of sequencing related projects in multidisciplinary fields of science. However, due to the limitations of sequencing-based chemistry, length of sequencing reads and the complexity of genes, it is difficult to determine the sequences of some portions of the human genome, leaving gaps in genomic data that frustrate further analysis. Particularly, some complex genes are difficult to be accurately sequenced or mapped because they contain high GC-content and/or low complexity regions, and complicated pseudogenes, such as the genes encoding xenobiotic metabolizing enzymes and transporters (XMETs). The genetic variants in XMET genes are critical to predicate inter-individual variability in drug efficacy, drug safety and susceptibility to environmental toxicity. We summarized and discussed challenges, wet-lab methods, and bioinformatics algorithms in sequencing "complex" XMET genes, which may provide insightful information in the application of NGS technology for implementation in toxicogenomics and pharmacogenomics.
Collapse
Affiliation(s)
- Yaqiong Jin
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yongli Guo
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Wenzhong Xiao
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Cancer Center; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, 200433, China
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
7
|
SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples. Genes Genomics 2019; 41:529-536. [PMID: 30779024 DOI: 10.1007/s13258-019-00788-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 01/21/2019] [Indexed: 12/13/2022]
Abstract
BACKGROUND Copy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases. OBJECTIVE The objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences. METHODS A statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing. RESULTS Compared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively. CONCLUSION SM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/ .
Collapse
|
8
|
Roca I, González-Castro L, Fernández H, Couce ML, Fernández-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH 2019; 779:114-125. [DOI: 10.1016/j.mrrev.2019.02.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 12/25/2018] [Accepted: 02/22/2019] [Indexed: 01/23/2023]
|
9
|
Adewoye AB, Shrine N, Odenthal-Hesse L, Welsh S, Malarstig A, Jelinsky S, Kilty I, Tobin MD, Hollox EJ, Wain LV. Human CCL3L1 copy number variation, gene expression, and the role of the CCL3L1-CCR5 axis in lung function. Wellcome Open Res 2018; 3:13. [PMID: 29682616 PMCID: PMC5883389 DOI: 10.12688/wellcomeopenres.13902.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2018] [Indexed: 01/21/2023] Open
Abstract
Background: The CCL3L1-CCR5 signaling axis is important in a number of inflammatory responses, including macrophage function, and T-cell-dependent immune responses. Small molecule CCR5 antagonists exist, including the approved antiretroviral drug maraviroc, and therapeutic monoclonal antibodies are in development. Repositioning of drugs and targets into new disease areas can accelerate the availability of new therapies and substantially reduce costs. As it has been shown that drug targets with genetic evidence supporting their involvement in the disease are more likely to be successful in clinical development, using genetic association studies to identify new target repurposing opportunities could be fruitful. Here we investigate the potential of perturbation of the CCL3L1-CCR5 axis as treatment for respiratory disease. Europeans typically carry between 0 and 5 copies of CCL3L1 and this multi-allelic variation is not detected by widely used genome-wide single nucleotide polymorphism studies. Methods: We directly measured the complex structural variation of CCL3L1 using the Paralogue Ratio Test and imputed (with validation) CCR5del32 genotypes in 5,000 individuals from UK Biobank, selected from the extremes of the lung function distribution, and analysed DNA and RNAseq data for CCL3L1 from the 1000 Genomes Project. Results: We confirmed the gene dosage effect of CCL3L1 copy number on CCL3L1 mRNA expression levels. We found no evidence for association of CCL3L1 copy number or CCR5del32 genotype with lung function. Conclusions: These results suggest that repositioning CCR5 antagonists is unlikely to be successful for the treatment of airflow obstruction.
Collapse
Affiliation(s)
- Adeolu B. Adewoye
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Nick Shrine
- Department of Health Sciences, University of Leicester, Leicester, UK
| | - Linda Odenthal-Hesse
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | - Scott Jelinsky
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Iain Kilty
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Martin D. Tobin
- Department of Health Sciences, University of Leicester, Leicester, UK,National Institute of Health Research Biomedical Research Centre, University of Leicester, Leicester, UK
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK,
| | - Louise V. Wain
- Department of Health Sciences, University of Leicester, Leicester, UK,National Institute of Health Research Biomedical Research Centre, University of Leicester, Leicester, UK,
| |
Collapse
|
10
|
Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
Affiliation(s)
- Prashanthi Dharanipragada
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Sriharsha Vogeti
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
11
|
Adewoye AB, Shrine N, Odenthal-Hesse L, Welsh S, Malarstig A, Jelinsky S, Kilty I, Tobin MD, Hollox EJ, Wain LV. Human CCL3L1 copy number variation, gene expression, and the role of the CCL3L1-CCR5 axis in lung function. Wellcome Open Res 2018. [DOI: 10.12688/wellcomeopenres.13902.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: The CCL3L1-CCR5 signaling axis is important in a number of inflammatory responses, including macrophage function, and T-cell-dependent immune responses. Small molecule CCR5 antagonists exist, including the approved antiretroviral drug maraviroc, and therapeutic monoclonal antibodies are in development. Repositioning of drugs and targets into new disease areas can accelerate the availability of new therapies and substantially reduce costs. As it has been shown that drug targets with genetic evidence supporting their involvement in the disease are more likely to be successful in clinical development, using genetic association studies to identify new target repurposing opportunities could be fruitful. Here we investigate the potential of perturbation of the CCL3L1-CCR5 axis as treatment for respiratory disease. Europeans typically carry between 0 and 5 copies of CCL3L1 and this multi-allelic variation is not detected by widely used genome-wide single nucleotide polymorphism studies. Methods: We directly measured the complex structural variation of CCL3L1 using the Paralogue Ratio Test and imputed (with validation) CCR5del32 genotypes in 5,000 individuals from UK Biobank, selected from the extremes of the lung function distribution, and analysed DNA and RNAseq data for CCL3L1 from the 1000 Genomes Project. Results: We confirmed the gene dosage effect of CCL3L1 copy number on CCL3L1 mRNA expression levels. We found no evidence for association of CCL3L1 copy number or CCR5del32 genotype with lung function. Conclusions: These results suggest that repositioning CCR5 antagonists is unlikely to be successful for the treatment of airflow obstruction.
Collapse
|
12
|
Abstract
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
Collapse
Affiliation(s)
- Aurélien Macé
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
13
|
Dolatabadian A, Patel DA, Edwards D, Batley J. Copy number variation and disease resistance in plants. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:2479-2490. [PMID: 29043379 DOI: 10.1007/s00122-017-2993-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 09/27/2017] [Indexed: 05/06/2023]
Abstract
Plant genome diversity varies from single nucleotide polymorphisms to large-scale deletions, insertions, duplications, or re-arrangements. These re-arrangements of sequences resulting from duplication, gains or losses of DNA segments are termed copy number variations (CNVs). During the last decade, numerous studies have emphasized the importance of CNVs as a factor affecting human phenotype; in particular, CNVs have been associated with risks for several severe diseases. In plants, the exploration of the extent and role of CNVs in resistance against pathogens and pests is just beginning. Since CNVs are likely to be associated with disease resistance in plants, an understanding of the distribution of CNVs could assist in the identification of novel plant disease-resistance genes. In this paper, we review existing information about CNVs; their importance, role and function, as well as their association with disease resistance in plants.
Collapse
Affiliation(s)
- Aria Dolatabadian
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - Dhwani Apurva Patel
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia.
| |
Collapse
|
14
|
Ji T, Chen J. Statistical models for DNA copy number variation detection using read-depth data from next generation sequencing experiments. AUST NZ J STAT 2016. [DOI: 10.1111/anzs.12175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Tieming Ji
- Department of Statistics; University of Missouri at Columbia; Columbia MI 65211 USA
| | - Jie Chen
- Department of Biostatistics and Epidemiology; Medical College of Georgia, Augusta University; Augusta GA 30912 USA
| |
Collapse
|
15
|
Galla SJ, Buckley TR, Elshire R, Hale ML, Knapp M, McCallum J, Moraga R, Santure AW, Wilcox P, Steeves TE. Building strong relationships between conservation genetics and primary industry leads to mutually beneficial genomic advances. Mol Ecol 2016; 25:5267-5281. [PMID: 27641156 DOI: 10.1111/mec.13837] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Revised: 08/23/2016] [Accepted: 08/24/2016] [Indexed: 02/06/2023]
Abstract
Several reviews in the past decade have heralded the benefits of embracing high-throughput sequencing technologies to inform conservation policy and the management of threatened species, but few have offered practical advice on how to expedite the transition from conservation genetics to conservation genomics. Here, we argue that an effective and efficient way to navigate this transition is to capitalize on emerging synergies between conservation genetics and primary industry (e.g., agriculture, fisheries, forestry and horticulture). Here, we demonstrate how building strong relationships between conservation geneticists and primary industry scientists is leading to mutually-beneficial outcomes for both disciplines. Based on our collective experience as collaborative New Zealand-based scientists, we also provide insight for forging these cross-sector relationships.
Collapse
Affiliation(s)
- Stephanie J Galla
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, 8140, New Zealand.
| | - Thomas R Buckley
- Landcare Research, Private Bag 92170, Auckland Mail Centre, Auckland, 1142, New Zealand.,School of Biological Sciences, University of Auckland, Auckland, 1010, New Zealand
| | - Rob Elshire
- The Elshire Group, Ltd., 52 Victoria Avenue, Palmerston North, 4410, New Zealand
| | - Marie L Hale
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, 8140, New Zealand
| | - Michael Knapp
- Department of Anatomy, University of Otago, P.O. Box 913, Dunedin, 9054, New Zealand
| | - John McCallum
- Breeding and Genomics, New Zealand Institute for Plant and Food Research, Private Bag 4704, Christchurch, 8140, New Zealand
| | - Roger Moraga
- AgResearch, Ruakura Research Centre, Bisley Road, Private Bag 3115, Hamilton, 3240, New Zealand
| | - Anna W Santure
- School of Biological Sciences, University of Auckland, Auckland, 1010, New Zealand
| | - Phillip Wilcox
- Department of Mathematics and Statistics, University of Otago, P.O. Box 56, 710 Cumberland Street, Dunedin, 9054, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, 8140, New Zealand
| |
Collapse
|
16
|
Nguyen HT, Boocock J, Merriman TR, Black MA. SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions. Front Genet 2016; 7:160. [PMID: 27695476 PMCID: PMC5023681 DOI: 10.3389/fgene.2016.00160] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Accepted: 08/24/2016] [Indexed: 12/28/2022] Open
Abstract
Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.
Collapse
Affiliation(s)
- Hoang T Nguyen
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA; Department of Mathematics, Cao Thang College of TechnologyHo Chi Minh City, Vietnam
| | - James Boocock
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA
| | - Tony R Merriman
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| |
Collapse
|
17
|
Boocock J, Chagné D, Merriman TR, Black MA. The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh. BMC Genomics 2015; 16:848. [PMID: 26493398 PMCID: PMC4618995 DOI: 10.1186/s12864-015-2096-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 10/15/2015] [Indexed: 11/14/2022] Open
Abstract
Background Copy number variation (CNV) is a common feature of eukaryotic genomes, and a growing body of evidence suggests that genes affected by CNV are enriched in processes that are associated with environmental responses. Here we use next generation sequence (NGS) data to detect copy-number variable regions (CNVRs) within the Malus x domestica genome, as well as to examine their distribution and impact. Methods CNVRs were detected using NGS data derived from 30 accessions of M. x domestica analyzed using the read-depth method, as implemented in the CNVrd2 software. To improve the reliability of our results, we developed a quality control and analysis procedure that involved checking for organelle DNA, not repeat masking, and the determination of CNVR identity using a permutation testing procedure. Results Overall, we identified 876 CNVRs, which spanned 3.5 % of the apple genome. To verify that detected CNVRs were not artifacts, we analyzed the B- allele-frequencies (BAF) within a single nucleotide polymorphism (SNP) array dataset derived from a screening of 185 individual apple accessions and found the CNVRs were enriched for SNPs having aberrant BAFs (P < 1e-13, Fisher’s Exact test). Putative CNVRs overlapped 845 gene models and were enriched for resistance (R) gene models (P < 1e-22, Fisher’s exact test). Of note was a cluster of resistance gene models on chromosome 2 near a region containing multiple major gene loci conferring resistance to apple scab. Conclusion We present the first analysis and catalogue of CNVRs in the M. x domestica genome. The enrichment of the CNVRs with R gene models and their overlap with gene loci of agricultural significance draw attention to a form of unexplored genetic variation in apple. This research will underpin further investigation of the role that CNV plays within the apple genome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2096-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- James Boocock
- Department of Biochemistry, University of Otago, Dunedin, New Zealand. .,The Virtual Institute of Statistical Genetics (VISG), Rotorua, New Zealand.
| | - David Chagné
- The Virtual Institute of Statistical Genetics (VISG), Rotorua, New Zealand.,The New Zealand Institute for Plant & Food Research Ltd, Palmerston North, New Zealand
| | - Tony R Merriman
- Department of Biochemistry, University of Otago, Dunedin, New Zealand.,The Virtual Institute of Statistical Genetics (VISG), Rotorua, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of Otago, Dunedin, New Zealand. .,The Virtual Institute of Statistical Genetics (VISG), Rotorua, New Zealand.
| |
Collapse
|
18
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|
19
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|