1
|
He W, Chen J, Zhou Y, Deng T, Feng Y, Luo X, Zhang C, Huang H, Liu J. Mitophagy genes in ovarian cancer: a comprehensive analysis for improved immunotherapy. Discov Oncol 2023; 14:221. [PMID: 38038814 PMCID: PMC10692064 DOI: 10.1007/s12672-023-00750-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/07/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND Mitophagy is a process of selectively degrading damaged mitochondria, which has been found to be related to immunity, tumorigenesis, tumor progression, and metastasis. However, the role of mitophagy-related genes (MRGs) in the tumor microenvironment (TME) of ovarian cancer (OV) remains largely unexplored. METHODS We analyzed the expression, prognosis, and genetic alterations of 29 MRGs in 480 OV samples. Unsupervised clustering was used to classify OV into two subtypes (clusters A and B) based on MRG changes. We compared the clinical features, differential expressed genes (DEGs), pathways, and immune cell infiltration between the two clusters. We constructed a mitophagy scoring system (MRG_score) based on the DEGs and validated its ability to predict overall survival of OV patients. RESULTS We found that patients with high MRG_scores had better survival status and increased infiltration by immune cells. Further analysis showed that these patients may be more sensitive to immune checkpoint inhibitor (ICI) treatment. Additionally, the MRG_score significantly correlated with the sensitivity of chemotherapeutic drugs and targeted inhibitors. CONCLUSION Our comprehensive analysis of MRGs in the TME, clinical features, and patient prognosis revealed that the MRG_score is a potentially effective prognostic biomarker and predictor of treatment. This study provides new insights into the role of MRGs in OV and identifies patients who may benefit from ICI treatment, chemotherapy, or targeted treatment.
Collapse
Affiliation(s)
- Wenting He
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Jieping Chen
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Yun Zhou
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Ting Deng
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Yanling Feng
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Xiaolin Luo
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Chuyao Zhang
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - He Huang
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China.
| | - Jihong Liu
- Department of Gynecologic Oncology, State Key Laboratory of Oncology in South China, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China.
| |
Collapse
|
2
|
Xie K, Liu K, Alvi HAK, Chen Y, Wang S, Yuan X. KNNCNV: A K-Nearest Neighbor Based Method for Detection of Copy Number Variations Using NGS Data. Front Cell Dev Biol 2022; 9:796249. [PMID: 35004691 PMCID: PMC8728060 DOI: 10.3389/fcell.2021.796249] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 11/19/2022] Open
Abstract
Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.
Collapse
Affiliation(s)
- Kun Xie
- School of Computer Science and Technology, Xidian University, Xi'an, China.,Hangzhou Institute of Technology, Xidian University, Hangzhou, China
| | - Kang Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Haque A K Alvi
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yuehui Chen
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Shuzhen Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China.,Hangzhou Institute of Technology, Xidian University, Hangzhou, China
| |
Collapse
|
3
|
Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single nucleotide variant (SNV) plays an important role in cellular proliferation and tumorigenesis in various types of human cancer. Next-generation sequencing (NGS) has provided high-throughput data at an unprecedented resolution to predict SNVs. Currently, there exist many computational methods for either germline or somatic SNV discovery from NGS data, but very few of them are versatile enough to adapt to any situations. In the absence of matched normal samples, the prediction of somatic SNVs from single-tumor samples becomes considerably challenging, especially when the tumor purity is unknown. Here, we propose a new approach, STIC, to predict somatic SNVs and estimate tumor purity from NGS data without matched normal samples. The main features of STIC include: (1) extracting a set of SNV-relevant features on each site and training the BP neural network algorithm on the features to predict SNVs; (2) creating an iterative process to distinguish somatic SNVs from germline ones by disturbing allele frequency; and (3) establishing a reasonable relationship between tumor purity and allele frequencies of somatic SNVs to accurately estimate the purity. We quantitatively evaluate the performance of STIC on both simulation and real sequencing datasets, the results of which indicate that STIC outperforms competing methods.
Collapse
|
4
|
Yuan X, Li J, Bai J, Xi J. A Local Outlier Factor-Based Detection of Copy Number Variations From NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1811-1820. [PMID: 31880558 DOI: 10.1109/tcbb.2019.2961886] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Copy number variation (CNV) is a major type of genomic structural variations that play an important role in human disorders. Next generation sequencing (NGS) has fueled the advancement in algorithm design to detect CNVs at base-pair resolution. However, accurate detection of CNVs of low amplitudes remains a challenging task. This paper proposes a new computational method, CNV-LOF, to identify CNVs of full-range amplitudes from NGS data. CNV-LOF is distinctly different from traditional methods, which mainly consider aberrations from a global perspective and rely on some assumed distribution of NGS read depths. In contrast, CNV-LOF takes a local view on the read depths and assigns an outlier factor to each genome segment. With the outlier factor profile, CNV-LOF uses a boxplot procedure to declare CNVs without the reliance of any distribution assumptions. Simulation experiments indicate that CNV-LOF outperforms five existing methods with respect to F1-measure, sensitivity, and precision. CNV-LOF is further validated on real sequencing samples, yielding highly consistent results with peer methods. CNV-LOF is able to detect CNVs of low and moderate amplitudes where the other existing methods fail, and it is expected to become a routine approach for the discovery of novel CNVs on whole sequencing genome.
Collapse
|
5
|
Zhao HY, Li Q, Tian Y, Chen YH, Alvi HAK, Yuan XG. CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data. BIOLOGY 2021; 10:biology10070584. [PMID: 34202028 PMCID: PMC8301091 DOI: 10.3390/biology10070584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/10/2021] [Accepted: 06/21/2021] [Indexed: 12/29/2022]
Abstract
Simple Summary In this study, we propose a copy number variation (CNV) detection method called CIRCNV, which is based on a circular profile of the read depth from sequencing data. The proposed method is an extended version of our previously developed method CNV-LOF. The main difference of CIRCNV from CNV-LOF lies in its two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation to generate a meaningful two-dimensional dataset for CNV analysis and promote fairness between the ends and middle part of the genome, and (2) it performs two rounds of CNV declaration via estimating tumor purity and recovering the truth circular RD profile. We test and evaluate the performance of CIRCNV via conducting simulation studies and real sequencing tumor sample applications. The experimental results show that CIRCNV outperforms peer methods with respect to sensitivity, precision, and the F1-score. The experiments prove that the proposed method is a reliable and effective tool in the field of variation analysis of tumor genomes. Abstract Copy number variation (CNV) is a common type of structural variation in the human genome. Accurate detection of CNVs from tumor genomes can provide crucial information for the study of tumor genesis and cancer precision diagnosis. However, the contamination of normal genomes in tumor genomes and the crude profiles of the read depth make such a task difficult. In this paper, we propose an alternative approach, called CIRCNV, for the detection of CNVs from sequencing data. CIRCNV is an extension of our previously developed method CNV-LOF, which uses local outlier factors to predict CNVs. Comparatively, CIRCNV can be performed on individual tumor samples and has the following two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation, in order to improve the efficiency of the read depth (RD) profile for the detection of CNVs; and (2) it performs a second round of CNV declaration based on the truth circular RD profile, which is recovered by estimating tumor purity. We test and validate the performance of CIRCNV based on simulation and real sequencing data and perform comparisons with several peer methods. The results demonstrate that CIRCNV can obtain superior performance in terms of sensitivity and precision. We expect that our proposed method will be a supplement to existing methods and become a routine tool in the field of variation analysis of tumor genomes.
Collapse
Affiliation(s)
- Hai-Yong Zhao
- School of Computer Science and Technology, Liaocheng University, Liaocheng 252000, China;
| | - Qi Li
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Ye Tian
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Yue-Hui Chen
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Ji’nan 250022, China;
| | - Haque A. K. Alvi
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
| | - Xi-Guo Yuan
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Q.L.); (Y.T.); (H.A.K.A.)
- Correspondence:
| |
Collapse
|
6
|
Liu Y, Ye X, Zhan X, Yu CY, Zhang J, Huang K. TPQCI: A topology potential-based method to quantify functional influence of copy number variations. Methods 2021; 192:46-56. [PMID: 33894380 DOI: 10.1016/j.ymeth.2021.04.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 04/18/2021] [Accepted: 04/19/2021] [Indexed: 12/21/2022] Open
Abstract
Copy number variation (CNV) is a major type of chromosomal structural variation that play important roles in many diseases including cancers. Due to genome instability, a large number of CNV events can be detected in diseases such as cancer. Therefore, it is important to identify the functionally important CNVs in diseases, which currently still poses a challenge in genomics. One of the critical steps to solve the problem is to define the influence of CNV. In this paper, we provide a topology potential based method, TPQCI, to quantify this kind of influence by integrating statistics, gene regulatory associations, and biological function information. We used this metric to detect functionally enriched genes on genomic segments with CNV in breast cancer and multiple myeloma and discovered biological functions influenced by CNV. Our results demonstrate that, by using our proposed TPQCI metric, we can detect disease-specific genes that are influenced by CNVs. Source codes of TPQCI are provided in Github (https://github.com/usos/TPQCI).
Collapse
Affiliation(s)
- Yusong Liu
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, Heilongjiang 150001, China; Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Xiufen Ye
- Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| | - Xiaohui Zhan
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, Guangdong 518037, China; Department of Bioinformatics, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Christina Y Yu
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Jie Zhang
- Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Kun Huang
- Indiana University School of Medicine, Indianapolis, IN 46202, USA; Regenstrief Institute, Indianapolis, IN 46202, USA.
| |
Collapse
|
7
|
Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.
Collapse
|
8
|
Zhang C, Chen X, Chen Y, Cao M, Tang J, Zhong B, He M. The PITX gene family as potential biomarkers and therapeutic targets in lung adenocarcinoma. Medicine (Baltimore) 2021; 100:e23936. [PMID: 33530195 PMCID: PMC7850728 DOI: 10.1097/md.0000000000023936] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 07/06/2020] [Accepted: 11/25/2020] [Indexed: 01/05/2023] Open
Abstract
ABSTRACT The PITX gene family of transcription factors have been reported to regulate the development of multiple organs. This study was designed to investigate the role of PITXs in lung adenocarcinoma (LUAD).In this study, the transcriptional levels of the 3 identified PITXs in patients with LUAD were examined using the gene expression profiling interactive analysis interactive web server. Meanwhile, the immunohistochemical data of the 3 PITXs were obtained in the Human Protein Atlas website, and western blotting was additionally conducted for further verification. Moreover, the association between the levels of PITXs and the stage plot as well as overall survival of patients with LUAD was analyzed.We found that the mRNA and protein levels of PITX1 and PITX2 were higher in LUAD tissues than those in normal lung tissues, while those of PITX3 displayed no significant differences. Additionally, PITX1 and PITX3 were found to be significantly associated with the stage of LUAD. The Kaplan-Meier Plot showed that the high level of PITX1 conferred a better overall survival of patients with LUAD while the high level of PITX3 was associated with poor prognosis.Our study implied that PITX1 and PITX3 are potential targets of precision therapy for patients with LUAD while PITX1 and PITX2 are regarded as novel biomarkers for the diagnosis of LUAD.
Collapse
|
9
|
Xie K, Tian Y, Yuan X. A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021; 11:632311. [PMID: 33519925 PMCID: PMC7838601 DOI: 10.3389/fgene.2020.632311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/21/2020] [Indexed: 11/29/2022] Open
Abstract
Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation sequencing (NGS) platforms provides unprecedented opportunities for the detection of CNVs at a base-level resolution. However, due to the intrinsic characteristics behind NGS data, accurate detection of CNVs is still a challenging task. In this article, we propose a new density peak-based method, called dpCNV, for the detection of CNVs from NGS data. The algorithm of dpCNV is designed based on density peak clustering algorithm. It extracts two features, i.e., local density and minimum distance, from sequencing read depth (RD) profile and generates a two-dimensional data. Based on the generated data, a two-dimensional null distribution is constructed to test the significance of each genome bin and then the significant genome bins are declared as CNVs. We test the performance of the dpCNV method on a number of simulated datasets and make comparison with several existing methods. The experimental results demonstrate that our proposed method outperforms others in terms of sensitivity and F1-score. We further apply it to a set of real sequencing samples and the results demonstrate the validity of dpCNV. Therefore, we expect that dpCNV can be used as a supplementary to existing methods and may become a routine tool in the field of genome mutation analysis.
Collapse
Affiliation(s)
- Kun Xie
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Ye Tian
- The School of Computer Science and Technology, Xidian University, Xi'an, China.,Xi'an Key Laboratory of Computational Bioinformatics, The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- The School of Computer Science and Technology, Xidian University, Xi'an, China.,Xi'an Key Laboratory of Computational Bioinformatics, The School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
10
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
11
|
Yuan X, Bai J, Zhang J, Yang L, Duan J, Li Y, Gao M. CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1141-1153. [PMID: 30489272 DOI: 10.1109/tcbb.2018.2883333] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Characterizing copy number variations (CNVs) from sequenced genomes is a both feasible and cost-effective way to search for driver genes in cancer diagnosis. A number of existing algorithms for CNV detection only explored part of the features underlying sequence data and copy number structures, resulting in limited performance. Here, we describe CONDEL, a method for detecting CNVs from single tumor samples using high-throughput sequence data. CONDEL utilizes a novel statistic in combination with a peel-off scheme to assess the statistical significance of genome bins, and adopts a Bayesian approach to infer copy number gains, losses, and deletion zygosity based on statistical mixture models. We compare CONDEL to six peer methods on a large number of simulation datasets, showing improved performance in terms of true positive and false positive rates, and further validate CONDEL on three real datasets derived from the 1000 Genomes Project and the EGA archive. CONDEL obtained higher consistent results in comparison with other three single sample-based methods, and exclusively identified a number of CNVs that were previously associated with cancers. We conclude that CONDEL is a powerful tool for detecting copy number variations on single tumor samples even if these are sequenced at low-coverage.
Collapse
|
12
|
Zhao H, Huang T, Li J, Liu G, Yuan X. MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2020; 11:434. [PMID: 32499814 PMCID: PMC7243272 DOI: 10.3389/fgene.2020.00434] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 04/08/2020] [Indexed: 11/13/2022] Open
Abstract
Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.
Collapse
Affiliation(s)
- Haiyong Zhao
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China.,The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Tihao Huang
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Junqing Li
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Guojun Liu
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- The School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
13
|
Yuan X, Gao M, Bai J, Duan J. SVSR: A Program to Simulate Structural Variations and Generate Sequencing Reads for Multiple Platforms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1082-1091. [PMID: 30334804 DOI: 10.1109/tcbb.2018.2876527] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Structural variation accounts for a major fraction of mutations in the human genome and confers susceptibility to complex diseases. Next generation sequencing along with the rapid development of computational methods provides a cost-effective procedure to detect such variations. Simulation of structural variations and sequencing reads with real characteristics is essential for benchmarking the computational methods. Here, we develop a new program, SVSR, to simulate five types of structural variations (indels, tandem duplication, CNVs, inversions, and translocations) and SNPs for the human genome and to generate sequencing reads with features from popular platforms (Illumina, SOLiD, 454, and Ion Torrent). We adopt a selection model trained from real data to predict copy number states, starting from the first site of a particular genome to the end. Furthermore, we utilize references of microbial genomes to produce insertion fragments and design probabilistic models to imitate inversions and translocations. Moreover, we create platform-specific errors and base quality profiles to generate normal, tumor, or normal-tumor mixture reads. Experimental results show that SVSR could capture more features that are realistic and generate datasets with satisfactory quality scores. SVSR is able to evaluate the performance of structural variation detection methods and guide the development of new computational methods.
Collapse
|
14
|
Yuan X, Li Z, Zhao H, Bai J, Zhang J. Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data. Front Genet 2020; 11:458. [PMID: 32425990 PMCID: PMC7205152 DOI: 10.3389/fgene.2020.00458] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023] Open
Abstract
Inference of absolute copy numbers in tumor genomes is one of the key points in the study of tumor genesis. However, the mixture of tumor and normal cells poses a big challenge to this task. Accurate estimation of tumor purity (i.e., the fraction of tumor cells) is a necessary step to solve this problem. In this paper, we propose a new approach, AITAC, to accurately infer tumor purity and absolute copy numbers in a tumor sample by using high-throughput sequencing (HTS) data. In contrast to many existing algorithms for estimating tumor purity, which usually rely on pre-detected mutation genotypes (heterogeneity and homogeneity), AITAC just requires read depths (RDs) observed at the regions with copy number losses. AITAC creates a non-linear model to correlate tumor purity, observed and expected RDs. It adopts an exhaustive search strategy to scan tumor purity in a wide range, and chooses the tumor purity that minimizes the deviation between observed RDs and expected ones as the optimal solution. We apply the proposed approach to both simulation and real sequencing data sets and demonstrate its performance by comparing with two classical approaches. AITAC is freely available at https://github.com/BDanalysis/aitac and can be expected to become a useful approach for researchers to analyze copy numbers in cancer genome.
Collapse
Affiliation(s)
- Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Zhe Li
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Haiyong Zhao
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Jun Bai
- Department of Medical Oncology, Shaanxi Provincial People's Hospital, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
15
|
Xi J, Li A, Wang M. HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:422-434. [PMID: 29994262 DOI: 10.1109/tcbb.2018.2846599] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A common strategy to discovering cancer associated copy number aberrations (CNAs) from a cohort of cancer samples is to detect recurrent CNAs (RCNAs). Although the previous methods can successfully identify communal RCNAs shared by nearly all tumor samples, detecting subgroup-specific RCNAs and their related subgroup samples from cancer samples with heterogeneity is still invalid for these existing approaches. In this paper, we introduce a novel integrated method called HetRCNA, which can identify statistically significant subgroup-specific RCNAs and their related subgroup samples. Based on matrix decomposition framework with weight constraint, HetRCNA can successfully measure the subgroup samples by coefficients of left vectors with weight constraint and subgroup-specific RCNAs by coefficients of the right vectors and significance test. When we evaluate HetRCNA on simulated dataset, the results show that HetRCNA gives the best performances among the competing methods and is robust to the noise factors of the simulated data. When HetRCNA is applied on a real breast cancer dataset, our approach successfully identifies a bunch of RCNA regions and the result is highly correlated with the results of the other two investigated approaches. Notably, the genomic regions identified by HetRCNA harbor many breast cancer related genes reported by previous researches.
Collapse
|
16
|
da Cruz RS, Carney EJ, Clarke J, Cao H, Cruz MI, Benitez C, Jin L, Fu Y, Cheng Z, Wang Y, de Assis S. Paternal malnutrition programs breast cancer risk and tumor metabolism in offspring. Breast Cancer Res 2018; 20:99. [PMID: 30165877 PMCID: PMC6117960 DOI: 10.1186/s13058-018-1034-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 07/31/2018] [Indexed: 12/15/2022] Open
Abstract
Background While many studies have shown that maternal factors in pregnancy affect the cancer risk for offspring, few studies have investigated the impact of paternal exposures on their progeny’s risk of this disease. Population studies generally show a U-shaped association between birthweight and breast cancer risk, with both high and low birthweight increasing the risk compared with average birthweight. Here, we investigated whether paternal malnutrition would modulate the birthweight and later breast cancer risk of daughters. Methods Male mice were fed AIN93G-based diets containing either 17.7% (control) or 8.9% (low-protein (LP)) energy from protein from 3 to 10 weeks of age. Males on either group were mated to females raised on a control diet. Female offspring from control and LP fathers were treated with 7,12-dimethylbenz[a]anthracene (DMBA) to initiate mammary carcinogenesis. Mature sperm from fathers and mammary tissue and tumors from female offspring were used for epigenetic and other molecular analyses. Results We found that paternal malnutrition reduces the birthweight of daughters and leads to epigenetic and metabolic reprogramming of their mammary tissue and tumors. Daughters of LP fathers have higher rates of mammary cancer, with tumors arising earlier and growing faster than in controls. The energy sensor, the AMP-activated protein kinase (AMPK) pathway, is suppressed in both mammary glands and tumors of LP daughters, with consequent activation of mammalian target of rapamycin (mTOR) signaling. Furthermore, LP mammary tumors show altered amino-acid metabolism with increased glutamine utilization. These changes are linked to alterations in noncoding RNAs regulating those pathways in mammary glands and tumors. Importantly, we detect alterations in some of the same microRNAs/target genes found in our animal model in breast tumors of women from populations where low birthweight is prevalent. Conclusions Our study suggests that ancestral paternal malnutrition plays a role in programming offspring cancer risk and phenotype by likely providing a metabolic advantage to cancer cells. Electronic supplementary material The online version of this article (10.1186/s13058-018-1034-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raquel Santana da Cruz
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Elissa J Carney
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Johan Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Hong Cao
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - M Idalia Cruz
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Carlos Benitez
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Lu Jin
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA
| | - Yi Fu
- The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Research Center, Arlington, VA, USA
| | - Zuolin Cheng
- The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Research Center, Arlington, VA, USA
| | - Yue Wang
- The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Research Center, Arlington, VA, USA
| | - Sonia de Assis
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University, 3970 Reservoir Road, NW, The Research Building, Room E410, Washington, DC, 20057, USA.
| |
Collapse
|
17
|
Yuan X, Zhang J, Yang L, Bai J, Fan P. Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data. IEEE Trans Nanobioscience 2018; 17:12-20. [PMID: 29570071 DOI: 10.1109/tnb.2017.2783910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Analyzing copy number variations (CNVs) from next-generation sequencing (NGS) data has become a common approach to detect disease susceptibility genes. The main challenge is how to utilize the NGS data with limited coverage depth to detect significant CNVs. Here, we introduce a new statistical method, the derivative of correlation coefficient (DCC), to detect significant CNVs that recurrently occur in multiple samples using read depth signals. We use a sliding window to calculate a correlation coefficient for each genome bin, and compute corresponding derivatives by fitting curves to the correlation coefficient. Then, the detection of significant CNVs was transformed into a problem of detecting significant derivatives reflecting genome breakpoints that can be solved using statistical hypothesis testing. We tested and compared the performance of DCC against several peer methods using a large number of simulation data sets, and validated DCC using several real sequencing data sets derived from the European Genome-Phenome archive, DNA Data Bank of Japan, and the 1000 Genomes Project. Experimental results suggest that DCC is an effective approach for identifying CNVs, outperforming peer methods in the terms of detection power and accuracy. DCC can be used to detect significant or recurrent CNVs in various NGS data sets, thus providing useful information to study genomic mutations and find disease susceptibility genes.
Collapse
|
18
|
Giunti L, Buccoliero AM, Pantaleo M, Lucchesi M, Provenzano A, Palazzo V, Guarducci S, Guidi M, Genitori L, Zuffardi O, Sardi I, Giglio S. Molecular characterization of paediatric glioneuronal tumours with neuropil-like islands: a genome-wide copy number analysis. Am J Cancer Res 2016; 6:2910-2918. [PMID: 28042510 PMCID: PMC5199764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 07/23/2016] [Indexed: 06/06/2023] Open
Abstract
Paediatric glioneuronal tumour with neuropil-like islands (GTNI) is a rare neoplasm of neuronal differentiation and diffusely infiltrating astroglial and oligodendrocyte-like components. The 2007 World Health Organization classification of central nervous system tumours considered it as a pattern variation of anaplastic astrocytoma. There are few data on paediatric GTNI probably both for their rarity and variable clinical aggressiveness. We studied by SNP/CGH array four tumour samples of GTNI from two males and two females (one new-born and three children aged from 4 to 8 years), in order to identify any possible common genomic alteration. All patients received chemo- and radiotherapy after their surgical treatment. No genomic instability nor recurrent alterations have been demonstrated in two of our GTNI cases. In the remaining two, we detected a mosaic trisomy 8 (15-20%) in one case, and an amplification at 5q14.1 involving DMGDH (partially), BHMT2 and BHMT genes, with the distal breakpoint falling at 23 Kbp from the 5'UTR of JMY, a p53 cofactor. Although the smallness of the sample impairs any clinical-histological correlation, GTNI appear different at the molecular level, with genomic imbalances playing a possible role in at least part of them. Our work gives an important contribution in knowledge and classification of this family of tumours.
Collapse
Affiliation(s)
- Laura Giunti
- Medical Genetics Unit, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Anna Maria Buccoliero
- Anatomic Pathology Unit, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Marilena Pantaleo
- Medical Genetics Unit, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Maurizio Lucchesi
- Neuro-Oncology Unit, Department of Pediatric Oncology, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Aldesia Provenzano
- Medical Genetics Unit, Department of Clinical and Experimental Biomedical Sciences “Mario Serio”, University of FlorenceViale Morgagni 5050134, Florence, Italy (S.G.)
| | - Viviana Palazzo
- Medical Genetics Unit, Department of Clinical and Experimental Biomedical Sciences “Mario Serio”, University of FlorenceViale Morgagni 5050134, Florence, Italy (S.G.)
| | - Silvia Guarducci
- Medical Genetics Unit, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Milena Guidi
- Neuro-Oncology Unit, Department of Pediatric Oncology, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Lorenzo Genitori
- Neurosurgery Unit, Department of Neuroscience, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Orsetta Zuffardi
- Department of Molecular Medicine, University of PaviaViale Forlanini 1427100, Pavia, Italy
| | - Iacopo Sardi
- Neuro-Oncology Unit, Department of Pediatric Oncology, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
| | - Sabrina Giglio
- Medical Genetics Unit, Meyer Children’s University HospitalViale Pieraccini 2450139, Florence, Italy
- Medical Genetics Unit, Department of Clinical and Experimental Biomedical Sciences “Mario Serio”, University of FlorenceViale Morgagni 5050134, Florence, Italy (S.G.)
| |
Collapse
|
19
|
Xi J, Li A. Discovering Recurrent Copy Number Aberrations in Complex Patterns via Non-Negative Sparse Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:656-668. [PMID: 26372614 DOI: 10.1109/tcbb.2015.2474404] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recurrent copy number aberrations (RCNAs) in multiple cancer samples are strongly associated with tumorigenesis, and RCNA discovery is helpful to cancer research and treatment. Despite the emergence of numerous RCNA discovering methods, most of them are unable to detect RCNAs in complex patterns that are influenced by complicating factors including aberration in partial samples, co-existing of gains and losses and normal-like tumor samples. Here, we propose a novel computational method, called non-negative sparse singular value decomposition (NN-SSVD), to address the RCNA discovering problem in complex patterns. In NN-SSVD, the measurement of RCNA is based on the aberration frequency in a part of samples rather than all samples, which can circumvent the complexity of different RCNA patterns. We evaluate NN-SSVD on synthetic dataset by comparison on detection scores and Receiver Operating Characteristics curves, and the results show that NN-SSVD outperforms existing methods in RCNA discovery and demonstrate more robustness to RCNA complicating factors. Applying our approach on a breast cancer dataset, we successfully identify a number of genomic regions that are strongly correlated with previous studies, which harbor a bunch of known breast cancer associated genes.
Collapse
|
20
|
Zhang L, Yuan Y, Lu KH, Zhang L. Identification of recurrent focal copy number variations and their putative targeted driver genes in ovarian cancer. BMC Bioinformatics 2016; 17:222. [PMID: 27230211 PMCID: PMC4881176 DOI: 10.1186/s12859-016-1085-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 05/14/2016] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Genomic regions with recurrent DNA copy number variations (CNVs) are generally believed to encode oncogenes and tumor suppressor genes (TSGs) that drive cancer growth. However, it remains a challenge to delineate the key cancer driver genes from the regions encoding a large number of genes. RESULTS In this study, we developed a new approach to CNV analysis based on spectral decomposition of CNV profiles into focal CNVs and broad CNVs. We performed an analysis of CNV data of 587 serous ovarian cancer samples on multiple platforms. We identified a number of novel focal regions, such as focal gain of ESR1, focal loss of LSAMP, prognostic site at 3q26.2 and losses of sub-telomere regions in multiple chromosomes. Furthermore, we performed network modularity analysis to examine the relationships among genes encoded in the focal CNV regions. Our results also showed that the recurrent focal gains were significantly associated with the known oncogenes and recurrent losses associated with TSGs and the CNVs had a greater effect on the mRNA expression of the driver genes than that of the non-driver genes. CONCLUSIONS Our results demonstrate that spectral decomposition of CNV profiles offers a new way of understanding the role of CNVs in cancer.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, Unit 1410, Houston, TX, 77401, USA
- Department of Statistics, Rice University, Houston, TX, USA
- Department of Biophysics, College of Bioinformatics Sciences and Technology, Harbin Medical University, Harbin, China
| | - Ying Yuan
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, Unit 1410, Houston, TX, 77401, USA
- Department of Statistics, Rice University, Houston, TX, USA
| | - Karen H Lu
- Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Li Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, Unit 1410, Houston, TX, 77401, USA.
| |
Collapse
|
21
|
Yuan X, Zhang J, Yang L. IntSIM: An Integrated Simulator of Next-Generation Sequencing Data. IEEE Trans Biomed Eng 2016; 64:441-451. [PMID: 27164567 DOI: 10.1109/tbme.2016.2560939] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE Next-generation sequencing data has been widely used for DNA variant discovery and tumor study through computational tools. Effective simulation of such data with many realistic features is very necessary for testing existing tools and guiding the development of new tools. METHODS We present an integrated simulation system, IntSIM, to simulate common DNA variants and to generate sequencing reads for mixture genomes. IntSIM has three novel features in comparison with other simulation programs: 1) it is able to simulate both germline and somatic variants in the same sequence, 2) it deals with tumor purity so as to generate reads corresponding to heterogeneous genomes and also produce tumor-normal matched samples, and 3) it simulates correlations among SNPs, among CNVs/CNAs based on HMM models trained from real sequencing genomes, and can simulates broad and focal CNV/CNA events. RESULTS The simulation data of IntSIM can reflect characteristics observed from real data and are consistent with input parameters. The IntSIM software package is freely available at http://intsim.sourceforge.net/. CONCLUSION Based on a great number of experiments, IntSIM performs better than other program for some scenarios, such as simulation of heterozygous SNPs, CNVs/CNAs, and can achieve some functions that other programs cannot achieve. SIGNIFICANCE Simulation with IntSIM can be expected to evaluate performance of methods in detecting various types of variants, analyzing tumor samples, and especially providing a realistic assessment of effect of tumor purity on identification of somatic mutations.
Collapse
|
22
|
Fu Y, Yu G, Levine DA, Wang N, Shih IM, Zhang Z, Clarke R, Wang Y. BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumor. Sci Rep 2015; 5:13955. [PMID: 26350498 PMCID: PMC4563570 DOI: 10.1038/srep13955] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 08/07/2015] [Indexed: 11/18/2022] Open
Abstract
Most published copy number datasets on solid tumors were obtained from specimens comprised of mixed cell populations, for which the varying tumor-stroma proportions are unknown or unreported. The inability to correct for signal mixing represents a major limitation on the use of these datasets for subsequent analyses, such as discerning deletion types or detecting driver aberrations. We describe the BACOM2.0 method with enhanced accuracy and functionality to normalize copy number signals, detect deletion types, estimate tumor purity, quantify true copy numbers, and calculate average-ploidy value. While BACOM has been validated and used with promising results, subsequent BACOM analysis of the TCGA ovarian cancer dataset found that the estimated average tumor purity was lower than expected. In this report, we first show that this lowered estimate of tumor purity is the combined result of imprecise signal normalization and parameter estimation. Then, we describe effective allele-specific absolute normalization and quantification methods that can enhance BACOM applications in many biological contexts while in the presence of various confounders. Finally, we discuss the advantages of BACOM in relation to alternative approaches. Here we detail this revised computational approach, BACOM2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Yi Fu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Douglas A Levine
- Department of Surgery, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA
| | - Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Ie-Ming Shih
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Zhen Zhang
- Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
23
|
Tian Y, Wang SS, Zhang Z, Rodriguez OC, Petricoin E, Shih IM, Chan D, Avantaggiati M, Yu G, Ye S, Clarke R, Wang C, Zhang B, Wang Y, Albanese C. Integration of Network Biology and Imaging to Study Cancer Phenotypes and Responses. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1009-19. [PMID: 25750594 PMCID: PMC4348060 DOI: 10.1109/tcbb.2014.2338304] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Ever growing "omics" data and continuously accumulated biological knowledge provide an unprecedented opportunity to identify molecular biomarkers and their interactions that are responsible for cancer phenotypes that can be accurately defined by clinical measurements such as in vivo imaging. Since signaling or regulatory networks are dynamic and context-specific, systematic efforts to characterize such structural alterations must effectively distinguish significant network rewiring from random background fluctuations. Here we introduced a novel integration of network biology and imaging to study cancer phenotypes and responses to treatments at the molecular systems level. Specifically, Differential Dependence Network (DDN) analysis was used to detect statistically significant topological rewiring in molecular networks between two phenotypic conditions, and in vivo Magnetic Resonance Imaging (MRI) was used to more accurately define phenotypic sample groups for such differential analysis. We applied DDN to analyze two distinct phenotypic groups of breast cancer and study how genomic instability affects the molecular network topologies in high-grade ovarian cancer. Further, FDA-approved arsenic trioxide (ATO) and the ND2-SmoA1 mouse model of Medulloblastoma (MB) were used to extend our analyses of combined MRI and Reverse Phase Protein Microarray (RPMA) data to assess tumor responses to ATO and to uncover the complexity of therapeutic molecular biology.
Collapse
Affiliation(s)
- Ye Tian
- Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203
| | - Sean S. Wang
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231
| | - Olga C. Rodriguez
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| | - Emanuel Petricoin
- Center for Applied Proteomics and Molecular Medicine, George Mason University, Manassas, VA 22030
| | - Ie-Ming Shih
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231
| | - Daniel Chan
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231
| | - Maria Avantaggiati
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203
| | - Shaozhen Ye
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, P. R. China
| | - Robert Clarke
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| | - Chao Wang
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Bai Zhang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203
| | - Chris Albanese
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| |
Collapse
|
24
|
General assessment of copy number variation in normal and tumor tissues of the domestic dog (Canis lupus familiaris). J Appl Genet 2014; 55:353-63. [PMID: 24573641 DOI: 10.1007/s13353-014-0201-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Revised: 01/10/2014] [Accepted: 02/04/2014] [Indexed: 12/22/2022]
Abstract
In recent years, characterization of a copy number variation (CNV) of the genomic DNA has provided evidence for the relationship of this type of genetic variation with the occurrence of a broad spectrum of diseases, including cancer lesions. Copy number variants (CNVs) also occur in the genomes of healthy individuals as a result of abnormal recombination processes in germ cells and have a hereditary character contributing to the natural genetic diversity. Recent image analysis methods and advanced computational techniques allow for identification of CNVs using SNPs genotyping microarrays based on the analysis of signal intensity observed for markers located in the specific genomic regions. In this study we used CanineHD BeadChip assay (Illumina) to identify both natural and cancer-induced CNVs in the genomes of different dog breeds and in different cancer types occurring in this species. The obtained results showed that structural aberrations are a common phenomenon arising during a tumor progression and are more complex and widespread in tumors of mesenchymal tissue origin than in epithelial tissue originating tumors. The tumor derived CNVs, in comparison to healthy samples, were characterized by larger sizes of regions, higher number of amplifications, and in some cases encompassed genes with potential effect on tumor progression.
Collapse
|
25
|
Genome-wide identification of somatic aberrations from paired normal-tumor samples. PLoS One 2014; 9:e87212. [PMID: 24498045 PMCID: PMC3907544 DOI: 10.1371/journal.pone.0087212] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/26/2013] [Indexed: 12/13/2022] Open
Abstract
Genomic copy number alteration and allelic imbalance are distinct features of cancer cells, and recent advances in the genotyping technology have greatly boosted the research in the cancer genome. However, the complicated nature of tumor usually hampers the dissection of the SNP arrays. In this study, we describe a bioinformatic tool, named GIANT, for genome-wide identification of somatic aberrations from paired normal-tumor samples measured with SNP arrays. By efficiently incorporating genotype information of matched normal sample, it accurately detects different types of aberrations in cancer genome, even for aneuploid tumor samples with severe normal cell contamination. Furthermore, it allows for discovery of recurrent aberrations with critical biological properties in tumorigenesis by using statistical significance test. We demonstrate the superior performance of the proposed method on various datasets including tumor replicate pairs, simulated SNP arrays and dilution series of normal-cancer cell lines. Results show that GIANT has the potential to detect the genomic aberration even when the cancer cell proportion is as low as 5∼10%. Application on a large number of paired tumor samples delivers a genome-wide profile of the statistical significance of the various aberrations, including amplification, deletion and LOH. We believe that GIANT represents a powerful bioinformatic tool for interpreting the complex genomic aberration, and thus assisting both academic study and the clinical treatment of cancer.
Collapse
|
26
|
Zhang B, Hou X, Yuan X, Shih IM, Zhang Z, Clarke R, Wang RR, Fu Y, Madhavan S, Wang Y, Yu G. AISAIC: a software suite for accurate identification of significant aberrations in cancers. ACTA ACUST UNITED AC 2013; 30:431-3. [PMID: 24292941 DOI: 10.1093/bioinformatics/btt693] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
UNLABELLED Accurate identification of significant aberrations in cancers (AISAIC) is a systematic effort to discover potential cancer-driving genes such as oncogenes and tumor suppressors. Two major confounding factors against this goal are the normal cell contamination and random background aberrations in tumor samples. We describe a Java AISAIC package that provides comprehensive analytic functions and graphic user interface for integrating two statistically principled in silico approaches to address the aforementioned challenges in DNA copy number analyses. In addition, the package provides a command-line interface for users with scripting and programming needs to incorporate or extend AISAIC to their customized analysis pipelines. This open-source multiplatform software offers several attractive features: (i) it implements a user friendly complete pipeline from processing raw data to reporting analytic results; (ii) it detects deletion types directly from copy number signals using a Bayes hypothesis test; (iii) it estimates the fraction of normal contamination for each sample; (iv) it produces unbiased null distribution of random background alterations by iterative aberration-exclusive permutations; and (v) it identifies significant consensus regions and the percentage of homozygous/hemizygous deletions across multiple samples. AISAIC also provides users with a parallel computing option to leverage ubiquitous multicore machines. AVAILABILITY AND IMPLEMENTATION AISAIC is available as a Java application, with a user's guide and source code, at https://code.google.com/p/aisaic/.
Collapse
Affiliation(s)
- Bai Zhang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21231, USA, School of Computer Science and Technology, Xidian University, Xi'an 710126, China, Department of Oncology and Department of Gynecology/Obstetrics, Johns Hopkins University School of Medicine, Baltimore, MD 21231, USA, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20007, USA Department of Oncology and Department of Physiology and Biophysics, Georgetown University, Washington, DC 20057, USA and Department of Electrical Engineering and Computer Science, University of Michigan, An Arbor, MI 48109, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Pounds S, Cheng C, Li S, Liu Z, Zhang J, Mullighan C. A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics 2013; 29:2088-95. [PMID: 23842812 PMCID: PMC3740633 DOI: 10.1093/bioinformatics/btt372] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 05/31/2013] [Accepted: 06/24/2013] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Tumors exhibit numerous genomic lesions such as copy number variations, structural variations and sequence variations. It is difficult to determine whether a specific constellation of lesions observed across a cohort of multiple tumors provides statistically significant evidence that the lesions target a set of genes that may be located across different chromosomes but yet are all involved in a single specific biological process or function. RESULTS We introduce the genomic random interval (GRIN) statistical model and analysis method that evaluates the statistical significance of the abundance of genomic lesions that overlap a specific locus or a pre-defined set of biologically related loci. The GRIN model retains certain biologically important properties of genomic lesions that are ignored by other methods. In a simulation study and two example analyses of leukemia genomic lesion data, GRIN more effectively identified important loci as significant than did three methods based on a permutation-of-markers model. GRIN also identified biologically relevant pathways with a significant abundance of lesions in both examples. AVAILABILITY An R package will be freely available at CRAN and www.stjuderesearch.org/site/depts/biostats/software.
Collapse
Affiliation(s)
- Stan Pounds
- Department of Biostatistics, Department of Computational Biology and Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN 38135, USA.
| | | | | | | | | | | |
Collapse
|
28
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|