1
|
Watanabe D, Okamoto N, Kobayashi Y, Suzuki H, Kato M, Saitoh S, Kanemura Y, Takenouchi T, Yamada M, Nakato D, Sato M, Tsunoda T, Kosaki K, Miya F. Biallelic structural variants in three patients with ERCC8-related Cockayne syndrome and a potential pitfall of copy number variation analysis. Sci Rep 2024; 14:19741. [PMID: 39187681 PMCID: PMC11347644 DOI: 10.1038/s41598-024-70831-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 08/21/2024] [Indexed: 08/28/2024] Open
Abstract
Cockayne syndrome (CS) is a rare autosomal recessive disorder caused by mutations in ERCC8 or ERCC6. Most pathogenic variants in ERCC8 are single nucleotide substitutions. Structural variants (SVs) have been reported in patients with ERCC8-related CS. However, comprehensive molecular detection, including SVs of ERCC8, in CS patients remains problematic. Herein, we present three Japanese patients with ERCC8-related CS in whom causative SVs were identified using whole-exome-based copy number variation (CNV) detection tools. One patient showed compound heterozygosity for a 259-kb deletion and a deletion of exon 4 which has previously been reported as an Asia-specific variant. The other two patients were homozygous for the same exon 4 deletion. The exon 4 deletion was detected only by the ExomeDepth software. Intrigued by the discrepancy in the detection capability of various tools for the SVs, we evaluated the analytic performance of four whole-exome-based CNV detection tools using an exome data set from 337 healthy individuals. A total of 1,278,141 exons were predicted as being affected by the 4 CNV tools. Interestingly 95.1% of these affected exons were detected by one tool alone. Thus, we expect that the use of multiple tools may improve the detection rate of SVs from aligned exome data.
Collapse
Affiliation(s)
- Daisuke Watanabe
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
- Department of Pediatrics, Yamanashi University, Yamanashi, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Yuichi Kobayashi
- Professional Development Center, Tokyo Medical and Dental University (TMDU), Tokyo, Japan
| | - Hisato Suzuki
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
- Department of Clinical Medicine, Institute of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan
- Epilepsy Medical Center, Showa University Hospital, Tokyo, Japan
| | - Shinji Saitoh
- Department of Pediatrics and Neonatology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Yonehiro Kanemura
- Department of Biomedical Research and Innovation, Institute for Clinical Research, NHO Osaka National Hospital, Osaka, Japan
- Department of Neurosurgery, NHO Osaka National Hospital, Osaka, Japan
| | - Toshiki Takenouchi
- Department of Pediatrics, Keio University School of Medicine, Tokyo, Japan
| | - Mamiko Yamada
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
| | - Daisuke Nakato
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
| | - Masayuki Sato
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan
| | - Fuyuki Miya
- Center for Medical Genetics, Keio University School of Medicine, 35 Shinanomachi, Shinjuku, Tokyo, 160-8582, Japan.
- Innovative Human Resource Development Division, Institute of Education, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.
| |
Collapse
|
2
|
Duan J, Zhao X, Wu X. LoRA-TV: read depth profile-based clustering of tumor cells in single-cell sequencing. Brief Bioinform 2024; 25:bbae277. [PMID: 38877886 PMCID: PMC11179121 DOI: 10.1093/bib/bbae277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/17/2024] [Accepted: 05/29/2024] [Indexed: 06/18/2024] Open
Abstract
Single-cell sequencing has revolutionized our ability to dissect the heterogeneity within tumor populations. In this study, we present LoRA-TV (Low Rank Approximation with Total Variation), a novel method for clustering tumor cells based on the read depth profiles derived from single-cell sequencing data. Traditional analysis pipelines process read depth profiles of each cell individually. By aggregating shared genomic signatures distributed among individual cells using low-rank optimization and robust smoothing, the proposed method enhances clustering performance. Results from analyses of both simulated and real data demonstrate its effectiveness compared with state-of-the-art alternatives, as supported by improvements in the adjusted Rand index and computational efficiency.
Collapse
Affiliation(s)
- Junbo Duan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xinrui Zhao
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xiaoming Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
3
|
Benfica LF, Brito LF, do Bem RD, Mulim HA, Glessner J, Braga LG, Gloria LS, Cyrillo JNSG, Bonilha SFM, Mercadante MEZ. Genome-wide association study between copy number variation and feeding behavior, feed efficiency, and growth traits in Nellore cattle. BMC Genomics 2024; 25:54. [PMID: 38212678 PMCID: PMC10785391 DOI: 10.1186/s12864-024-09976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 01/04/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Feeding costs represent the largest expenditures in beef production. Therefore, the animal efficiency in converting feed in high-quality protein for human consumption plays a major role in the environmental impact of the beef industry and in the beef producers' profitability. In this context, breeding animals for improved feed efficiency through genomic selection has been considered as a strategic practice in modern breeding programs around the world. Copy number variation (CNV) is a less-studied source of genetic variation that can contribute to phenotypic variability in complex traits. In this context, this study aimed to: (1) identify CNV and CNV regions (CNVRs) in the genome of Nellore cattle (Bos taurus indicus); (2) assess potential associations between the identified CNVR and weaning weight (W210), body weight measured at the time of selection (WSel), average daily gain (ADG), dry matter intake (DMI), residual feed intake (RFI), time spent at the feed bunk (TF), and frequency of visits to the feed bunk (FF); and, (3) perform functional enrichment analyses of the significant CNVR identified for each of the traits evaluated. RESULTS A total of 3,161 CNVs and 561 CNVRs ranging from 4,973 bp to 3,215,394 bp were identified. The CNVRs covered up to 99,221,894 bp (3.99%) of the Nellore autosomal genome. Seventeen CNVR were significantly associated with dry matter intake and feeding frequency (number of daily visits to the feed bunk). The functional annotation of the associated CNVRs revealed important candidate genes related to metabolism that may be associated with the phenotypic expression of the evaluated traits. Furthermore, Gene Ontology (GO) analyses revealed 19 enrichment processes associated with FF. CONCLUSIONS A total of 3,161 CNVs and 561 CNVRs were identified and characterized in a Nellore cattle population. Various CNVRs were significantly associated with DMI and FF, indicating that CNVs play an important role in key biological pathways and in the phenotypic expression of feeding behavior and growth traits in Nellore cattle.
Collapse
Affiliation(s)
- Lorena F Benfica
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA.
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil.
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | - Ricardo D do Bem
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil
| | - Henrique A Mulim
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | - Joseph Glessner
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Larissa G Braga
- Department of Animal Science, Faculty of Agricultural and Veterinary Sciences, Sao Paulo State University, Jaboticabal, SP, Brazil
| | - Leonardo S Gloria
- Department of Animal Sciences, Purdue University, 270 S. Russell Street, West Lafayette, IN, 47907, USA
| | | | | | | |
Collapse
|
4
|
Liu G, Yang H, He Z. Detection of copy number variations based on a local distance using next-generation sequencing data. Front Genet 2023; 14:1147761. [PMID: 37811148 PMCID: PMC10556732 DOI: 10.3389/fgene.2023.1147761] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023] Open
Abstract
As one of the main types of structural variation in the human genome, copy number variation (CNV) plays an important role in the occurrence and development of human cancers. Next-generation sequencing (NGS) technology can provide base-level resolution, which provides favorable conditions for the accurate detection of CNVs. However, it is still a very challenging task to accurately detect CNVs from cancer samples with different purity and low sequencing coverage. Local distance-based CNV detection (LDCNV), an innovative computational approach to predict CNVs using NGS data, is proposed in this work. LDCNV calculates the average distance between each read depth (RD) and its k nearest neighbors (KNNs) to define the distance of KNNs of each RD, and the average distance between the KNNs for each RD to define their internal distance. Based on the above definitions, a local distance score is constructed using the ratio between the distance of KNNs and the internal distance of KNNs for each RD. The local distance scores are used to fit a normal distribution to evaluate the significance level of each RDS, and then use the hypothesis test method to predict the CNVs. The performance of the proposed method is verified with simulated and real data and compared with several popular methods. The experimental results show that the proposed method is superior to various other techniques. Therefore, the proposed method can be helpful for cancer diagnosis and targeted drug development.
Collapse
Affiliation(s)
- Guojun Liu
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| | - Hongzhi Yang
- Department of Radiology, XD Group Hospital, Xi’an, China
| | - Zongzhen He
- School of Mathematics, Xi’an University of Finance and Economics, Xi’an, China
| |
Collapse
|
5
|
Liu G, Yang H, Yuan X. A shortest path-based approach for copy number variation detection from next-generation sequencing data. Front Genet 2023; 13:1084974. [PMID: 36733945 PMCID: PMC9887524 DOI: 10.3389/fgene.2022.1084974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/18/2023] Open
Abstract
Copy number variation (CNV) is one of the main structural variations in the human genome and accounts for a considerable proportion of variations. As CNVs can directly or indirectly cause cancer, mental illness, and genetic disease in humans, their effective detection in humans is of great interest in the fields of oncogene discovery, clinical decision-making, bioinformatics, and drug discovery. The advent of next-generation sequencing data makes CNV detection possible, and a large number of CNV detection tools are based on next-generation sequencing data. Due to the complexity (e.g., bias, noise, alignment errors) of next-generation sequencing data and CNV structures, the accuracy of existing methods in detecting CNVs remains low. In this work, we design a new CNV detection approach, called shortest path-based Copy number variation (SPCNV), to improve the detection accuracy of CNVs. SPCNV calculates the k nearest neighbors of each read depth and defines the shortest path, shortest path relation, and shortest path cost sets based on which further calculates the mean shortest path cost of each read depth and its k nearest neighbors. We utilize the ratio between the mean shortest path cost for each read depth and the mean of the mean shortest path cost of its k nearest neighbors to construct a relative shortest path score formula that is able to determine a score for each read depth. Based on the score profile, a boxplot is then applied to predict CNVs. The performance of the proposed method is verified by simulation data experiments and compared against several popular methods of the same type. Experimental results show that the proposed method achieves the best balance between recall and precision in each set of simulated samples. To further verify the performance of the proposed method in real application scenarios, we then select real sample data from the 1,000 Genomes Project to conduct experiments. The proposed method achieves the best F1-scores in almost all samples. Therefore, the proposed method can be used as a more reliable tool for the routine detection of CNVs.
Collapse
Affiliation(s)
- Guojun Liu
- School of Statistics, Xi’an University of Finance and Economics, Xi’an, China,*Correspondence: Guojun Liu, ; Xiguo Yuan,
| | - Hongzhi Yang
- Medical Imaging Center, Xidian Group Hospital, Xi’an, China
| | - Xiguo Yuan
- Hangzhou Institute of Technology, Xidian University, Hangzhou, China,*Correspondence: Guojun Liu, ; Xiguo Yuan,
| |
Collapse
|
6
|
Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
Collapse
|
7
|
Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020; 11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open
Abstract
Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.
Collapse
Affiliation(s)
- Guojun Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Chao Wei
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
8
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
9
|
Mohammadi M. A Projection Neural Network for the Generalized Lasso. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2217-2221. [PMID: 31398133 DOI: 10.1109/tnnls.2019.2927282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The generalized lasso (GLasso) is an extension of the lasso regression in which there is an l1 penalty term (or regularization) of the linearly transformed coefficient vector. Finding the optimal solution of GLasso is not straightforward since the penalty term is not differentiable. This brief presents a novel one-layer neural network to solve the generalized lasso for a wide range of penalty transformation matrices. The proposed neural network is proven to be stable in the sense of Lyapunov and converges globally to the optimal solution of the GLasso. It is also shown that the proposed neural solution can solve many optimization problems, including sparse and weighted sparse representations, (weighted) total variation denoising, fused lasso signal approximator, and trend filtering. Disparate experiments on the above problems illustrate and confirm the excellent performance of the proposed neural network in comparison to other competing techniques.
Collapse
|
10
|
Zhao L, Liu H, Yuan X, Gao K, Duan J. Comparative study of whole exome sequencing-based copy number variation detection tools. BMC Bioinformatics 2020; 21:97. [PMID: 32138645 PMCID: PMC7059689 DOI: 10.1186/s12859-020-3421-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 02/17/2020] [Indexed: 02/23/2023] Open
Abstract
Background With the rapid development of whole exome sequencing (WES), an increasing number of tools are being proposed for copy number variation (CNV) detection based on this technique. However, no comprehensive guide is available for the use of these tools in clinical settings, which renders them inapplicable in practice. To resolve this problem, in this study, we evaluated the performances of four WES-based CNV tools, and established a guideline for the recommendation of a suitable tool according to the application requirements. Results In this study, first, we selected four WES-based CNV detection tools: CoNIFER, cn.MOPS, CNVkit and exomeCopy. Then, we evaluated their performances in terms of three aspects: sensitivity and specificity, overlapping consistency and computational costs. From this evaluation, we obtained four main results: (1) The sensitivity increases and subsequently stabilizes as the coverage or CNV size increases, while the specificity decreases. (2) CoNIFER performs better for CNV insertions than for CNV deletions, while the remaining tools exhibit the opposite trend. (3) CoNIFER, cn.MOPS and CNVkit realize satisfactory overlapping consistency, which indicates their results are trustworthy. (4) CoNIFER has the best space complexity and cn.MOPS has the best time complexity among these four tools. Finally, we established a guideline for tools’ usage according to these results. Conclusion No available tool performs excellently under all conditions; however, some tools perform excellently in some scenarios. Users can obtain a CNV tool recommendation from our paper according to the targeted CNV size, the CNV type or computational costs of their projects, as presented in Table 1, which is helpful even for users with limited knowledge of computer science.
Collapse
Affiliation(s)
- Lanling Zhao
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Han Liu
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Kun Gao
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Junbo Duan
- Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
11
|
Lee J, Chen J. A modified information criterion for tuning parameter selection in 1d fused LASSO for inference on multiple change points. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1732379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- J. Lee
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - J. Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| |
Collapse
|
12
|
Zare F, Nabavi S. Copy Number Variation Detection Using Total Variation. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2019; 2019:423-428. [PMID: 32515750 PMCID: PMC7278034 DOI: 10.1145/3307339.3342181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Next-generation sequencing (NGS) technologies offer new opportunities for precise and accurate identification of genomic aberrations, including copy number variations (CNVs). For high-throughput NGS data, using depth of coverage has become a major approach to identify CNVs, especially for whole exome sequencing (WES) data. Due to the high level of noise and biases of read-count data and complexity of the WES data, existing CNV detection tools identify many false CNV segments. Besides, NGS generates a huge amount of data, requiring to use effective and efficient methods. In this work, we propose a novel segmentation algorithm based on the total variation approach to detect CNVs more precisely and efficiently using WES data. The proposed method also filters out outlier read-counts and identifies significant change points to reduce false positives. We used real and simulated data to evaluate the performance of the proposed method and compare its performance with those of other commonly used CNV detection methods. Using simulated and real data, we show that the proposed method outperforms the existing CNV detection methods in terms of accuracy and false discovery rate and has a faster runtime compared to the circular binary segmentation method.
Collapse
Affiliation(s)
| | - Sheida Nabavi
- Corresponding author. This study was supported by a grant from the National Institutes of Health (NIH, R00LM011595, PI: Nabavi).
| |
Collapse
|
13
|
Lee J, Chen J. A penalized regression approach for DNA copy number study using the sequencing data. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0001. [PMID: 31145697 DOI: 10.1515/sagmb-2018-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.
Collapse
Affiliation(s)
- Jaeeun Lee
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| | - Jie Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| |
Collapse
|
14
|
SM-RCNV: a statistical method to detect recurrent copy number variations in sequenced samples. Genes Genomics 2019; 41:529-536. [PMID: 30779024 DOI: 10.1007/s13258-019-00788-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 01/21/2019] [Indexed: 12/13/2022]
Abstract
BACKGROUND Copy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases. OBJECTIVE The objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences. METHODS A statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing. RESULTS Compared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively. CONCLUSION SM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/ .
Collapse
|
15
|
Cai H, Chen P, Chen J, Cai J, Song Y, Han G. WaveDec: A Wavelet Approach to Identify Both Shared and Individual Patterns of Copy-Number Variations. IEEE Trans Biomed Eng 2019; 65:353-364. [PMID: 29346103 DOI: 10.1109/tbme.2017.2769677] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Copy-number variations (CNVs) are associated with complex diseases and particular tumor types. Array-based comparative genomic hybridization (aCGH) is a common approach for the detection of CNVs. Traditional CNV detection methods for multiple aCGH samples mainly use batch samples to find common variations, not accounting for the individual characteristics of each sample. Accurately differentiating both the commonly shared and the individual CNV patterns is pivotal to identify cell populations, or to distinguish cell growth (as in cancer) from invasion of new cells. Our preliminary results have now demonstrated that both the shared and individual CNV patterns have distinctive characteristics after wavelet transform. METHODS To exploit these characteristics, we propose to formulate a quadratic data-separation problem within the wavelet space to discriminate the shared and individual CNVs from raw data. We have elaborated a numerical solution and shown that the solution can be obtained by solving decoupled subproblems. By this approach, computational costs can be limited, enabling efficient application in the analysis of large sequencing datasets. RESULTS The advantages of our proposed method, called WaveDec, have been demonstrated by comparison with popular CNV-detection methods using synthetic and empirical aCGH data. The performance of WaveDec was further validated by experiments with single-cell-sequencing data. CONCLUSION WaveDec can successfully differentiate shared and individual patterns, and performs well even in data contaminated with high levels of noise. SIGNIFICANCE Both the shared and individual patterns can be uniquely characterized as well as effectively decomposed within the wavelet space.
Collapse
|
16
|
Mohammadi M, Mansoori A. A Projection Neural Network for Identifying Copy Number Variants. IEEE J Biomed Health Inform 2018; 23:2182-2188. [PMID: 30235154 DOI: 10.1109/jbhi.2018.2871619] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The identification of copy number variations (CNVs) helps the diagnosis of many diseases. One major hurdle in the path of CNVs discovery is that the boundaries of normal and aberrant regions cannot be distinguished from the raw data, since various types of noise contaminate them. To tackle this challenge, the total variation regularization is mostly used in the optimization problems to approximate the noise-free data from corrupted observations. The minimization using such regularization is challenging to deal with since it is non-differentiable. In this paper, we propose a projection neural network to solve the non-smooth problem. The proposed neural network has a simple one-layer structure and is theoretically assured to have the global exponential convergence to the solution of the total variation-regularized problem. The experiments on several real and simulated datasets illustrate the reasonable performance of the proposed neural network and show that its performance is comparable with those of more sophisticated algorithms.
Collapse
|
17
|
Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
Affiliation(s)
- Prashanthi Dharanipragada
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Sriharsha Vogeti
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
18
|
Yuan X, Zhang J, Yang L, Bai J, Fan P. Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data. IEEE Trans Nanobioscience 2018; 17:12-20. [PMID: 29570071 DOI: 10.1109/tnb.2017.2783910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Analyzing copy number variations (CNVs) from next-generation sequencing (NGS) data has become a common approach to detect disease susceptibility genes. The main challenge is how to utilize the NGS data with limited coverage depth to detect significant CNVs. Here, we introduce a new statistical method, the derivative of correlation coefficient (DCC), to detect significant CNVs that recurrently occur in multiple samples using read depth signals. We use a sliding window to calculate a correlation coefficient for each genome bin, and compute corresponding derivatives by fitting curves to the correlation coefficient. Then, the detection of significant CNVs was transformed into a problem of detecting significant derivatives reflecting genome breakpoints that can be solved using statistical hypothesis testing. We tested and compared the performance of DCC against several peer methods using a large number of simulation data sets, and validated DCC using several real sequencing data sets derived from the European Genome-Phenome archive, DNA Data Bank of Japan, and the 1000 Genomes Project. Experimental results suggest that DCC is an effective approach for identifying CNVs, outperforming peer methods in the terms of detection power and accuracy. DCC can be used to detect significant or recurrent CNVs in various NGS data sets, thus providing useful information to study genomic mutations and find disease susceptibility genes.
Collapse
|
19
|
Calling Chromosome Alterations, DNA Methylation Statuses, and Mutations in Tumors by Simple Targeted Next-Generation Sequencing. J Mol Diagn 2017; 19:776-787. [DOI: 10.1016/j.jmoldx.2017.06.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Accepted: 06/01/2017] [Indexed: 12/16/2022] Open
|
20
|
A Total-variation Constrained Permutation Model for Revealing Common Copy Number Patterns. Sci Rep 2017; 7:9666. [PMID: 28851906 PMCID: PMC5575355 DOI: 10.1038/s41598-017-09139-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/24/2017] [Indexed: 01/20/2023] Open
Abstract
Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology enables exploration of gene-expression heterogeneity among single cells, providing important information on cell evolution. Evolutionary relationships in accumulated sequence data can be visualized by adjacent positioning of similar cells so that similar copy-number profiles are shown by block patterns. However, single-cell DNA sequencing data usually have low amount of starting genome, which requires an extra step of amplification to accumulate sufficient samples, introducing noise and making regular pattern-finding challenging. In this paper, we will propose to tackle this issue of recovering the hidden blocks within single-cell DNA-sequencing data through continuous sample permutations such that similar samples are positioned adjacently. The permutation is guided by the total variational norm of the recovered copy number profiles, and is continued until the total variational norm is minimized when similar samples are stacked together to reveal block patterns. An efficient numerical scheme for finding this permutation is designed, tailored from the alternating direction method of multipliers. Application of this method to both simulated and real data demonstrates its ability to recover the hidden structures of single-cell DNA sequences.
Collapse
|
21
|
Zhang Y, Cheung YM, Xu B, Su W. Detection Copy Number Variants from NGS with Sparse and Smooth Constraints. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:856-867. [PMID: 27164604 DOI: 10.1109/tcbb.2016.2561933] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
It is known that copy number variations (CNVs) are associated with complex diseases and particular tumor types, thus reliable identification of CNVs is of great potential value. Recent advances in next generation sequencing (NGS) data analysis have helped manifest the richness of CNV information. However, the performances of these methods are not consistent. Reliably finding CNVs in NGS data in an efficient way remains a challenging topic, worthy of further investigation. Accordingly, we tackle the problem by formulating CNVs identification into a quadratic optimization problem involving two constraints. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signal from NGS is anticipated to fit the CNVs patterns more accurately. An efficient numerical solution tailored from alternating direction minimization (ADM) framework is elaborated. We demonstrate the advantages of the proposed method, namely ADM-CNV, by comparing it with six popular CNV detection methods using synthetic, simulated, and empirical sequencing data. It is shown that the proposed approach can successfully reconstruct CNV patterns from raw data, and achieve superior or comparable performance in detection of the CNVs compared to the existing counterparts.
Collapse
|
22
|
Zhang C, Cai H, Huang J, Song Y. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinformatics 2016; 17:384. [PMID: 27639558 PMCID: PMC5027123 DOI: 10.1186/s12859-016-1239-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 09/04/2016] [Indexed: 02/02/2023] Open
Abstract
Background Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. Results We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Conclusions Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1239-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Changsheng Zhang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hongmin Cai
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China.
| | - Jingying Huang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yan Song
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
23
|
Xu B, Cai H, Zhang C, Yang X, Han G. Copy number variants calling for single cell sequencing data by multi-constrained optimization. Comput Biol Chem 2016; 63:15-20. [DOI: 10.1016/j.compbiolchem.2016.02.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 02/01/2016] [Indexed: 01/20/2023]
|
24
|
Duan J, Soussen C, Brie D, Idier J, Wang YP, Wan M. An optimal method to segment piecewise poisson distributed signals with application to sequencing data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:6465-8. [PMID: 26737773 DOI: 10.1109/embc.2015.7319873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
To analyze the next generation sequencing data, the so-called read depth signal is often segmented with standard segmentation tools. However, these tools usually assume the signal to be a piecewise constant signal and contaminated with zero mean Gaussian noise, and therefore modeling error occurs. This paper models the read depth signal with piecewise Poisson distribution, which is more appropriate to the next generation sequencing mechanism. Based on the proposed model, an opti- mal dynamic programming algorithm with parallel computing is proposed to segment the piecewise signal, and furthermore detect the copy number variation.
Collapse
|
25
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
26
|
CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data. PLoS One 2015; 10:e0135895. [PMID: 26291322 PMCID: PMC4546278 DOI: 10.1371/journal.pone.0135895] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 07/28/2015] [Indexed: 11/19/2022] Open
Abstract
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
Collapse
|
27
|
Yokoyama T, Miura F, Araki H, Okamura K, Ito T. Changepoint detection in base-resolution methylome data reveals a robust signature of methylated domain landscape. BMC Genomics 2015; 16:594. [PMID: 26265481 PMCID: PMC4534107 DOI: 10.1186/s12864-015-1809-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 08/03/2015] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Base-resolution methylome data generated by whole-genome bisulfite sequencing (WGBS) is often used to segment the genome into domains with distinct methylation levels. However, most segmentation methods include many parameters to be carefully tuned and/or fail to exploit the unsurpassed resolution of the data. Furthermore, there is no simple method that displays the composition of the domains to grasp global trends in each methylome. RESULTS We propose to use changepoint detection for domain demarcation based on base-resolution methylome data. While the proposed method segments the methylome in a largely comparable manner to conventional approaches, it has only a single parameter to be tuned. Furthermore, it fully exploits the base-resolution of the data to enable simultaneous detection of methylation changes in even contrasting size ranges, such as focal hypermethylation and global hypomethylation in cancer methylomes. We also propose a simple plot termed methylated domain landscape (MDL) that globally displays the size, the methylation level and the number of the domains thus defined, thereby enabling one to intuitively grasp trends in each methylome. Since the pattern of MDL often reflects cell lineages and is largely unaffected by data size, it can serve as a novel signature of methylome. CONCLUSIONS Changepoint detection in base-resolution methylome data followed by MDL plotting provides a novel method for methylome characterization and will facilitate global comparison among various WGBS data differing in size and even species origin.
Collapse
Affiliation(s)
- Takao Yokoyama
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan.
| | - Fumihito Miura
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan. .,Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan.
| | - Hiromitsu Araki
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan.
| | - Kohji Okamura
- Department of Systems Biomedicine, National Research Institute for Child Health and Development, National Center for Child Health and Development, 2-10-1 Okura, Setagaya-ku, Tokyo 157-8535, Japan.
| | - Takashi Ito
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan. .,Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan.
| |
Collapse
|
28
|
Duan J, Wan M, Deng HW, Wang YP. A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data. IEEE Trans Biomed Eng 2015; 63:496-505. [PMID: 26258935 DOI: 10.1109/tbme.2015.2464674] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
GOAL Whole-exome sequencing provides a more cost-effective way than whole-genome sequencing for detecting genetic variants, such as copy number variations (CNVs). Although a number of approaches have been proposed to detect CNVs from whole-genome sequencing, a direct adoption of these approaches to whole-exome sequencing will often fail because exons are separately located along a genome. Therefore, an appropriate method is needed to target the specific features of exome sequencing data. METHODS In this paper, a novel sparse model based method is proposed to discover CNVs from multiple exome sequencing data. First, exome sequencing data are represented with a penalized matrix approximation, and technical variability and random sequencing errors are assumed to follow a generalized Gaussian distribution. Second, an iteratively reweighted least squares algorithm is used to estimate the solution. RESULTS The method is tested and validated on both synthetic and real data, and compared with other approaches including CoNIFER, XHMM, and cn.MOPS. The test demonstrates that the proposed method outperform other approaches. CONCLUSION The proposed sparse model can detect CNVs from exome sequencing data with high power and precision. Significance: Sparse model can target the specific features of exome sequencing data. The software codes are freely available at http://www.tulane.edu/ wyp/software/Exon_CNV.m.
Collapse
|
29
|
Duan J, Zhang JG, Wan M, Deng HW, Wang YP. Population clustering based on copy number variations detected from next generation sequencing data. J Bioinform Comput Biol 2014; 12:1450021. [PMID: 25152046 DOI: 10.1142/s0219720014500218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Xi'an Jiaotong University, Xi'an, P. R. China
| | | | | | | | | |
Collapse
|
30
|
Abstract
Common copy number variations (CNVs) are small regions of genomic variations at the same loci across multiple samples, which can be detected with high resolution from next-generation sequencing (NGS) technique. Multiple sequencing data samples are often available from genomic studies; examples include sequences from multiple platforms and sequences from multiple individuals. By integrating complementary information from multiple data samples, detection power can be potentially improved. However, most of current CNV detection methods often process an individual sequence sample, or two samples in an abnormal versus matched normal study; researches on detecting common CNVs across multiple samples have been very limited but are much needed. In this paper, we propose a novel method to detect common CNVs from multiple sequencing samples by exploiting the concurrency of genomic variations in read depth signals derived from multiple NGS data. We use a penalized sparse regression model to fit multiple read depth profiles, based on which common CNV identification is formulated as a change-point detection problem. Finally, we validate the proposed method on both simulation and real data, showing that it can give both higher detection power and better break point estimation over several published CNV detection methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hong-Wen Deng
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| |
Collapse
|