1
|
Moen MT, Johnston IG. HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics 2022; 39:6895098. [PMID: 36511587 PMCID: PMC9848056 DOI: 10.1093/bioinformatics/btac803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The evolution of bacterial drug resistance and other features in biology, the progression of cancer and other diseases and a wide range of broader questions can often be viewed as the sequential stochastic acquisition of binary traits (e.g. genetic changes, symptoms or characters). Using potentially noisy or incomplete data to learn the sequences by which such traits are acquired is a problem of general interest. The problem is complicated for large numbers of traits, which may, individually or synergistically, influence the probability of further acquisitions both positively and negatively. Hypercubic inference approaches, based on hidden Markov models on a hypercubic transition network, address these complications, but previous Bayesian instances can consume substantial time for converged results, limiting their practical use. RESULTS Here, we introduce HyperHMM, an adapted Baum-Welch (expectation-maximization) algorithm for hypercubic inference with resampling to quantify uncertainty, and show that it allows orders-of-magnitude faster inference while making few practical sacrifices compared to previous hypercubic inference approaches. We show that HyperHMM allows any combination of traits to exert arbitrary positive or negative influence on the acquisition of other traits, relaxing a common limitation of only independent trait influences. We apply this approach to synthetic and biological datasets and discuss its more general application in learning evolutionary and progressive pathways. AVAILABILITY AND IMPLEMENTATION Code for inference and visualization, and data for example cases, is freely available at https://github.com/StochasticBiology/hypercube-hmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcus T Moen
- Department of Mathematics, University of Bergen, Bergen, Vestland, Norway
| | | |
Collapse
|
2
|
Dashti H, Dehzangi I, Bayati M, Breen J, Beheshti A, Lovell N, Rabiee HR, Alinejad-Rokny H. Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer. BMC Bioinformatics 2022; 23:138. [PMID: 35439935 PMCID: PMC9017053 DOI: 10.1186/s12859-022-04652-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/24/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. RESULTS In this study, we develop a new pipeline based on a novel concept called 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. CONCLUSION Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Collapse
Affiliation(s)
- Hamed Dashti
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - Iman Dehzangi
- Center for Computational and Integrative Biology (CCIB), Rutgers University, Camden, NJ, 08102, USA
| | - Masroor Bayati
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - James Breen
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.,Robinson Research Institute, University of Adelaide, Adelaide, SA, 5006, Australia.,Bioinformatics Hub, University of Adelaide, Adelaide, SA, 5006, Australia
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, 2109, Australia
| | - Nigel Lovell
- Tyree Institute of Health Engineering and The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Hamid R Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran.
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia. .,UNSW Data Science Hub, The University of New South Wales, Sydney, NSW, 2052, Australia. .,Health Data Analytics Program, AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia.
| |
Collapse
|
3
|
Xiao Y, Wang X, Zhang H, Ulintz PJ, Li H, Guan Y. FastClone is a probabilistic tool for deconvoluting tumor heterogeneity in bulk-sequencing samples. Nat Commun 2020; 11:4469. [PMID: 32901013 PMCID: PMC7478963 DOI: 10.1038/s41467-020-18169-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.
Collapse
Affiliation(s)
- Yao Xiao
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA.,Microsoft Inc., Redmond, WA, USA
| | - Peter J Ulintz
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA. .,Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
DNA Repair Gene Expression Adjusted by the PCNA Metagene Predicts Survival in Multiple Cancers. Cancers (Basel) 2019; 11:cancers11040501. [PMID: 30965671 PMCID: PMC6520950 DOI: 10.3390/cancers11040501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Revised: 03/16/2019] [Accepted: 03/30/2019] [Indexed: 02/06/2023] Open
Abstract
Removal of the proliferation component of gene expression by proliferating cell nuclear antigen (PCNA) adjustment via statistical methods has been addressed in numerous survival prediction studies for breast cancer and all cancers in the Cancer Genome Atlas (TCGA). These studies indicate that the removal of proliferation in gene expression by PCNA adjustment removes the statistical significance for predicting overall survival (OS) when gene selection is performed on a genome-wide basis. Since cancers become addicted to DNA repair as a result of forced cellular replication, increased oxidation, and repair deficiencies from oncogenic loss or genetic polymorphisms, we hypothesized that PCNA adjustment of DNA repair gene expression does not remove statistical significance for OS prediction. The rationale and importance of this translational hypothesis is that new lists of repair genes which are predictive of OS can be identified to establish new targets for inhibition therapy. A candidate gene approach was employed using TCGA RNA-Seq data for 121 DNA repair genes in 8 molecular pathways to predict OS for 18 cancers. Statistical randomization test results indicate that after PCNA adjustment, OS could be predicted significantly by sets of DNA repair genes for 61% (11/18) of the cancers. These findings suggest that removal of the proliferation signal in expression by PCNA adjustment does not remove statistical significance for predicting OS. In conclusion, it is likely that previous studies on PCNA adjustment and survival were biased because genes identified through a genome-wide approach are strongly co-regulated by proliferation.
Collapse
|