1
|
Rupp K, Lösch A, Hu YL, Nie C, Schill R, Klever M, Pfahler S, Grasedyck L, Wettig T, Beerenwinkel N, Spang R. Modeling metastatic progression from cross-sectional cancer genomics data. Bioinformatics 2024; 40:i140-i150. [PMID: 38940126 PMCID: PMC11245855 DOI: 10.1093/bioinformatics/btae250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Metastasis formation is a hallmark of cancer lethality. Yet, metastases are generally unobservable during their early stages of dissemination and spread to distant organs. Genomic datasets of matched primary tumors and metastases may offer insights into the underpinnings and the dynamics of metastasis formation. RESULTS We present metMHN, a cancer progression model designed to deduce the joint progression of primary tumors and metastases using cross-sectional cancer genomics data. The model elucidates the statistical dependencies among genomic events, the formation of metastasis, and the clinical emergence of both primary tumors and their metastatic counterparts. metMHN enables the chronological reconstruction of mutational sequences and facilitates estimation of the timing of metastatic seeding. In a study of nearly 5000 lung adenocarcinomas, metMHN pinpointed TP53 and EGFR as mediators of metastasis formation. Furthermore, the study revealed that post-seeding adaptation is predominantly influenced by frequent copy number alterations. AVAILABILITY AND IMPLEMENTATION All datasets and code are available on GitHub at https://github.com/cbg-ethz/metMHN.
Collapse
Affiliation(s)
- Kevin Rupp
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andreas Lösch
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| | - Yanren Linda Hu
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| | - Chenxi Nie
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
| | - Rudolf Schill
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Maren Klever
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen 52062, Germany
| | - Simon Pfahler
- Faculty of Physics, University of Regensburg, Regensburg 93053, Germany
| | - Lars Grasedyck
- Institute for Geometry and Applied Mathematics, RWTH Aachen, Aachen 52062, Germany
| | - Tilo Wettig
- Faculty of Physics, University of Regensburg, Regensburg 93053, Germany
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rainer Spang
- Faculty of Informatics and Data Science—Statistical Bioinformatics Group, University of Regensburg, Regensburg 93053, Germany
| |
Collapse
|
2
|
Rossi N, Gigante N, Vitacolonna N, Piazza C. Inferring Markov Chains to Describe Convergent Tumor Evolution With CIMICE. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:106-119. [PMID: 38015671 DOI: 10.1109/tcbb.2023.3337258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The field of tumor phylogenetics focuses on studying the differences within cancer cell populations. Many efforts are done within the scientific community to build cancer progression models trying to understand the heterogeneity of such diseases. These models are highly dependent on the kind of data used for their construction, therefore, as the experimental technologies evolve, it is of major importance to exploit their peculiarities. In this work we describe a cancer progression model based on Single Cell DNA Sequencing data. When constructing the model, we focus on tailoring the formalism on the specificity of the data. We operate by defining a minimal set of assumptions needed to reconstruct a flexible DAG structured model, capable of identifying progression beyond the limitation of the infinite site assumption. Our proposal is conservative in the sense that we aim to neither discard nor infer knowledge which is not represented in the data. We provide simulations and analytical results to show the features of our model, test it on real data, show how it can be integrated with other approaches to cope with input noise. Moreover, our framework can be exploited to produce simulated data that follows our theoretical assumptions. Finally, we provide an open source R implementation of our approach, called CIMICE, that is publicly available on BioConductor.
Collapse
|
3
|
Fontana D, Crespiatico I, Crippa V, Malighetti F, Villa M, Angaroni F, De Sano L, Aroldi A, Antoniotti M, Caravagna G, Piazza R, Graudenzi A, Mologni L, Ramazzotti D. Evolutionary signatures of human cancers revealed via genomic analysis of over 35,000 patients. Nat Commun 2023; 14:5982. [PMID: 37749078 PMCID: PMC10519956 DOI: 10.1038/s41467-023-41670-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/13/2023] [Indexed: 09/27/2023] Open
Abstract
Recurring sequences of genomic alterations occurring across patients can highlight repeated evolutionary processes with significant implications for predicting cancer progression. Leveraging the ever-increasing availability of cancer omics data, here we unveil cancer's evolutionary signatures tied to distinct disease outcomes, representing "favored trajectories" of acquisition of driver mutations detected in patients with similar prognosis. We present a framework named ASCETIC (Agony-baSed Cancer EvoluTion InferenCe) to extract such signatures from sequencing experiments generated by different technologies such as bulk and single-cell sequencing data. We apply ASCETIC to (i) single-cell data from 146 myeloid malignancy patients and bulk sequencing from 366 acute myeloid leukemia patients, (ii) multi-region sequencing from 100 early-stage lung cancer patients, (iii) exome/genome data from 10,000+ Pan-Cancer Atlas samples, and (iv) targeted sequencing from 25,000+ MSK-MET metastatic patients, revealing subtype-specific single-nucleotide variant signatures associated with distinct prognostic clusters. Validations on several datasets underscore the robustness and generalizability of the extracted signatures.
Collapse
Affiliation(s)
- Diletta Fontana
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Ilaria Crespiatico
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Valentina Crippa
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Federica Malighetti
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Matteo Villa
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Center of Computational Biology, Human Technopole, Milano, Italy
| | - Luca De Sano
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Andrea Aroldi
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- Hematology and Clinical Research Unit, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre-B4, Milan, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, University of Trieste, Trieste, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre-B4, Milan, Italy.
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy.
| | - Luca Mologni
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Daniele Ramazzotti
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy.
| |
Collapse
|
4
|
Petkovic M, Yalçin M, Heese O, Relógio A. Differential expression of the circadian clock network correlates with tumour progression in gliomas. BMC Med Genomics 2023; 16:154. [PMID: 37400829 DOI: 10.1186/s12920-023-01585-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 06/19/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Gliomas are tumours arising mostly from astrocytic or oligodendrocytic precursor cells. These tumours are classified according to the updated WHO classification from 2021 in 4 grades depending on molecular and histopathological criteria. Despite novel multimodal therapeutic approaches, the vast majority of gliomas (WHO grade III and IV) are not curable. The circadian clock is an important regulator of numerous cellular processes and its dysregulation had been found during the progression of many cancers, including gliomas. RESULTS In this study, we explore expression patterns of clock-controlled genes in low-grade glioma (LGG) and glioblastoma multiforme (GBM) and show that a set of 45 clock-controlled genes can be used to distinguish GBM from normal tissue. Subsequent analysis identified 17 clock-controlled genes with a significant association with survival. The results point to a loss of correlation strength within elements of the circadian clock network in GBM compared to LGG. We further explored the progression patterns of mutations in LGG and GBM, and showed that tumour suppressor APC is lost late both in LGG and GBM. Moreover, HIF1A, involved in cellular response to hypoxia, exhibits subclonal losses in LGG, and TERT, involved in the formation of telomerase, is lost late in the GBM progression. By examining multi-sample LGG data, we find that the clock-controlled driver genes APC, HIF1A, TERT and TP53 experience frequent subclonal gains and losses. CONCLUSIONS Our results show a higher level of disrgulation at the gene expression level in GBM compared to LGG, and indicate an association between the differentially expressed clock-regulated genes and patient survival in both LGG and GBM. By reconstructing the patterns of progression in LGG and GBM, our data reveals the relatively late gains and losses of clock-regulated glioma drivers. Our analysis emphasizes the role of clock-regulated genes in glioma development and progression. Yet, further research is needed to asses their value in the development of new treatments.
Collapse
Affiliation(s)
- Marina Petkovic
- Institute for Theoretical Biology (ITB), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117, Berlin, Germany
| | - Müge Yalçin
- Institute for Theoretical Biology (ITB), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117, Berlin, Germany
- Molecular Cancer Research Center (MKFZ), Medical Department of Hematology, Oncology, and Tumour Immunology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117, Berlin, Germany
- Institute for Systems Medicine, Faculty of Human Medicine, MSH Medical School Hamburg, 20457, Hamburg, Germany
| | - Oliver Heese
- Department of Neurosurgery and Spinal Surgery, HELIOS Medical Center Schwerin, University Campus of MSH Medical School Hamburg, 20457, Hamburg, Germany
| | - Angela Relógio
- Institute for Theoretical Biology (ITB), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117, Berlin, Germany.
- Molecular Cancer Research Center (MKFZ), Medical Department of Hematology, Oncology, and Tumour Immunology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117, Berlin, Germany.
- Institute for Systems Medicine, Faculty of Human Medicine, MSH Medical School Hamburg, 20457, Hamburg, Germany.
| |
Collapse
|
5
|
Proietto M, Crippa M, Damiani C, Pasquale V, Sacco E, Vanoni M, Gilardi M. Tumor heterogeneity: preclinical models, emerging technologies, and future applications. Front Oncol 2023; 13:1164535. [PMID: 37188201 PMCID: PMC10175698 DOI: 10.3389/fonc.2023.1164535] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open
Abstract
Heterogeneity describes the differences among cancer cells within and between tumors. It refers to cancer cells describing variations in morphology, transcriptional profiles, metabolism, and metastatic potential. More recently, the field has included the characterization of the tumor immune microenvironment and the depiction of the dynamics underlying the cellular interactions promoting the tumor ecosystem evolution. Heterogeneity has been found in most tumors representing one of the most challenging behaviors in cancer ecosystems. As one of the critical factors impairing the long-term efficacy of solid tumor therapy, heterogeneity leads to tumor resistance, more aggressive metastasizing, and recurrence. We review the role of the main models and the emerging single-cell and spatial genomic technologies in our understanding of tumor heterogeneity, its contribution to lethal cancer outcomes, and the physiological challenges to consider in designing cancer therapies. We highlight how tumor cells dynamically evolve because of the interactions within the tumor immune microenvironment and how to leverage this to unleash immune recognition through immunotherapy. A multidisciplinary approach grounded in novel bioinformatic and computational tools will allow reaching the integrated, multilayered knowledge of tumor heterogeneity required to implement personalized, more efficient therapies urgently required for cancer patients.
Collapse
Affiliation(s)
- Marco Proietto
- Next Generation Sequencing Core, The Salk Institute for Biological Studies, La Jolla, CA, United States
- Gene Expression Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, United States
- NOMIS Center for Immunobiology and Microbial Pathogenesis, The Salk Institute for Biological Studies, La Jolla, CA, United States
| | - Martina Crippa
- Vita-Salute San Raffaele University, Milan, Italy
- Experimental Imaging Center, Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale San Raffaele, Milan, Italy
| | - Chiara Damiani
- Infrastructure Systems Biology Europe /Centre of Systems Biology (ISBE/SYSBIO) Centre of Systems Biology, Milan, Italy
- Department of Biotechnology and Biosciences, School of Sciences, University of Milano-Bicocca, Milan, Italy
| | - Valentina Pasquale
- Infrastructure Systems Biology Europe /Centre of Systems Biology (ISBE/SYSBIO) Centre of Systems Biology, Milan, Italy
- Department of Biotechnology and Biosciences, School of Sciences, University of Milano-Bicocca, Milan, Italy
| | - Elena Sacco
- Infrastructure Systems Biology Europe /Centre of Systems Biology (ISBE/SYSBIO) Centre of Systems Biology, Milan, Italy
- Department of Biotechnology and Biosciences, School of Sciences, University of Milano-Bicocca, Milan, Italy
| | - Marco Vanoni
- Infrastructure Systems Biology Europe /Centre of Systems Biology (ISBE/SYSBIO) Centre of Systems Biology, Milan, Italy
- Department of Biotechnology and Biosciences, School of Sciences, University of Milano-Bicocca, Milan, Italy
- *Correspondence: Marco Vanoni, ; Mara Gilardi,
| | - Mara Gilardi
- NOMIS Center for Immunobiology and Microbial Pathogenesis, The Salk Institute for Biological Studies, La Jolla, CA, United States
- Salk Cancer Center, The Salk Institute for Biological Studies, La Jolla, CA, United States
- *Correspondence: Marco Vanoni, ; Mara Gilardi,
| |
Collapse
|
6
|
Chen J. Timed hazard networks: Incorporating temporal difference for oncogenetic analysis. PLoS One 2023; 18:e0283004. [PMID: 36928529 PMCID: PMC10019724 DOI: 10.1371/journal.pone.0283004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open
Abstract
Oncogenetic graphical models are crucial for understanding cancer progression by analyzing the accumulation of genetic events. These models are used to identify statistical dependencies and temporal order of genetic events, which helps design targeted therapies. However, existing algorithms do not account for temporal differences between samples in oncogenetic analysis. This paper introduces Timed Hazard Networks (TimedHN), a new statistical model that uses temporal differences to improve accuracy and reliability. TimedHN models the accumulation process as a continuous-time Markov chain and includes an efficient gradient computation algorithm for optimization. Our simulation experiments demonstrate that TimedHN outperforms current state-of-the-art graph reconstruction methods. We also compare TimedHN with existing methods on a luminal breast cancer dataset, highlighting its potential utility. The Matlab implementation and data are available at https://github.com/puar-playground/TimedHN.
Collapse
Affiliation(s)
- Jian Chen
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, United States of America
- * E-mail:
| |
Collapse
|
7
|
Moen MT, Johnston IG. HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics 2022; 39:6895098. [PMID: 36511587 PMCID: PMC9848056 DOI: 10.1093/bioinformatics/btac803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The evolution of bacterial drug resistance and other features in biology, the progression of cancer and other diseases and a wide range of broader questions can often be viewed as the sequential stochastic acquisition of binary traits (e.g. genetic changes, symptoms or characters). Using potentially noisy or incomplete data to learn the sequences by which such traits are acquired is a problem of general interest. The problem is complicated for large numbers of traits, which may, individually or synergistically, influence the probability of further acquisitions both positively and negatively. Hypercubic inference approaches, based on hidden Markov models on a hypercubic transition network, address these complications, but previous Bayesian instances can consume substantial time for converged results, limiting their practical use. RESULTS Here, we introduce HyperHMM, an adapted Baum-Welch (expectation-maximization) algorithm for hypercubic inference with resampling to quantify uncertainty, and show that it allows orders-of-magnitude faster inference while making few practical sacrifices compared to previous hypercubic inference approaches. We show that HyperHMM allows any combination of traits to exert arbitrary positive or negative influence on the acquisition of other traits, relaxing a common limitation of only independent trait influences. We apply this approach to synthetic and biological datasets and discuss its more general application in learning evolutionary and progressive pathways. AVAILABILITY AND IMPLEMENTATION Code for inference and visualization, and data for example cases, is freely available at https://github.com/StochasticBiology/hypercube-hmm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcus T Moen
- Department of Mathematics, University of Bergen, Bergen, Vestland, Norway
| | | |
Collapse
|
8
|
ToMExO: A probabilistic tree-structured model for cancer progression. PLoS Comput Biol 2022; 18:e1010732. [DOI: 10.1371/journal.pcbi.1010732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/15/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Identifying the interrelations among cancer driver genes and the patterns in which the driver genes get mutated is critical for understanding cancer. In this paper, we study cross-sectional data from cohorts of tumors to identify the cancer-type (or subtype) specific process in which the cancer driver genes accumulate critical mutations. We model this mutation accumulation process using a tree, where each node includes a driver gene or a set of driver genes. A mutation in each node enables its children to have a chance of mutating. This model simultaneously explains the mutual exclusivity patterns observed in mutations in specific cancer genes (by its nodes) and the temporal order of events (by its edges). We introduce a computationally efficient dynamic programming procedure for calculating the likelihood of our noisy datasets and use it to build our Markov Chain Monte Carlo (MCMC) inference algorithm, ToMExO. Together with a set of engineered MCMC moves, our fast likelihood calculations enable us to work with datasets with hundreds of genes and thousands of tumors, which cannot be dealt with using available cancer progression analysis methods. We demonstrate our method’s performance on several synthetic datasets covering various scenarios for cancer progression dynamics. Then, a comparison against two state-of-the-art methods on a moderate-size biological dataset shows the merits of our algorithm in identifying significant and valid patterns. Finally, we present our analyses of several large biological datasets, including colorectal cancer, glioblastoma, and pancreatic cancer. In all the analyses, we validate the results using a set of method-independent metrics testing the causality and significance of the relations identified by ToMExO or competing methods.
Collapse
|
9
|
Stepwise evolutionary genomics of early-stage lung adenocarcinoma manifesting as pure, heterogeneous and part-solid ground-glass nodules. Br J Cancer 2022; 127:747-756. [PMID: 35618790 DOI: 10.1038/s41416-022-01821-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/25/2022] [Accepted: 04/04/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND This study was designed to unravel the genomic landscape and evolution of early-stage subsolid lung adenocarcinomas (SSN-LUADs) manifesting as pure ground-glass nodules (pGGNs), heterogeneous ground-glass nodules (HGGNs) and part-solid nodules (PSNs). METHODS Samples subjected to either broad-panel next-generation sequencing (NGS) or whole-exome sequencing (WES) were included. Clinicopathologic and genomic features were compared among pGGN, HGGN and PSN, while tumour evolutionary trajectories and mutational signatures were evaluated in the entire cohort. RESULTS In total, 247 SSN-LUAD samples subjected to broad-panel NGS and 125 to WES were identified. Compared with PSNs, HGGNs had significantly lower tumour mutation count (P < 0.001), genomic alteration count (P < 0.001), and intra-tumour heterogeneity (P = 0.005). Statistically significant upward trends were observed in alterations involving driver mutations and oncogenic pathways from pGGNs to HGGNs to PSNs. EGFR mutation was proved to be a key early event in the progression of SSN-LUADs, with subsequently two evolutionary trajectories involving either RBM10 or TP53 mutation in the cancer-evolution models. CONCLUSIONS This study provided evidence for unravelling the previously unknown genomic underpinnings associated with SSN-LUAD evolution from pGGN to HGGN to PSN, proving that HGGN was an intermediate SSN form between pGGN and PSN genetically.
Collapse
|
10
|
Angelopoulos N, Chatzipli A, Nangalia J, Maura F, Campbell PJ. Bayesian networks elucidate complex genomic landscapes in cancer. Commun Biol 2022; 5:306. [PMID: 35379892 PMCID: PMC8980036 DOI: 10.1038/s42003-022-03243-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/09/2022] [Indexed: 11/27/2022] Open
Abstract
Bayesian networks (BNs) are disciplined, explainable Artificial Intelligence models that can describe structured joint probability spaces. In the context of understanding complex relations between a number of variables in biological settings, they can be constructed from observed data and can provide a guiding, graphical tool in exploring such relations. Here we propose BNs for elucidating the relations between driver events in large cancer genomic datasets. We present a methodology that is specifically tailored to biologists and clinicians as they are the main producers of such datasets. We achieve this by using an optimal BN learning algorithm based on well established likelihood functions and by utilising just two tuning parameters, both of which are easy to set and have intuitive readings. To enhance value to clinicians, we introduce (a) the use of heatmaps for families in each network, and (b) visualising pairwise co-occurrence statistics on the network. For binary data, an optional step of fitting logic gates can be employed. We show how our methodology enhances pairwise testing and how biologists and clinicians can use BNs for discussing the main relations among driver events in large genomic cohorts. We demonstrate the utility of our methodology by applying it to 5 cancer datasets revealing complex genomic landscapes. Our networks identify central patterns in all datasets including a central 4-way mutual exclusivity between HDR, t(4,14), t(11,14) and t(14,16) in myeloma, and a 3-way mutual exclusivity of three major players: CALR, JAK2 and MPL, in myeloproliferative neoplasms. These analyses demonstrate that our methodology can play a central role in the study of large genomic cancer datasets. Bayesian network inference on several blood and solid cancer genomic datasets provides more accessible ways to explore driver events in cancer.
Collapse
Affiliation(s)
- Nicos Angelopoulos
- The Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK. .,Systems Immunity Research Institute, Medical School, Cardiff University, Cardiff, CF14 4XN, UK.
| | - Aikaterini Chatzipli
- The Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Jyoti Nangalia
- The Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Francesco Maura
- Myeloma Program, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, USA
| | - Peter J Campbell
- The Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| |
Collapse
|
11
|
Li H, Sun Z, Li Y, Qi Q, Huang H, Wang X, Zhou J, Liu K, Yin P, Wang Z, Li X, Yang F. Disparate Genomic Characteristics of Patients with Early-Stage Lung Adenocarcinoma Manifesting as Radiological Subsolid or Solid Lesions. Lung Cancer 2022; 166:178-188. [DOI: 10.1016/j.lungcan.2022.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 01/21/2022] [Accepted: 02/22/2022] [Indexed: 11/27/2022]
|
12
|
Angaroni F, Chen K, Damiani C, Caravagna G, Graudenzi A, Ramazzotti D. PMCE: efficient inference of expressive models of cancer evolution with high prognostic power. Bioinformatics 2022; 38:754-762. [PMID: 34647978 DOI: 10.1093/bioinformatics/btab717] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 10/04/2021] [Accepted: 10/12/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods can not infer logical formulas connecting events to represent alternative evolutionary routes or convergent evolution. RESULTS We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations. The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups. AVAILABILITY AND IMPLEMENTATION PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabrizio Angaroni
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan 20125, Italy
| | - Kevin Chen
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Chiara Damiani
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy.,Sysbio Centre for Systems Biology, Milan 20100, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, University of Trieste, Trieste 34128, Italy
| | - Alex Graudenzi
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan 20054, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan 20100, Italy
| | - Daniele Ramazzotti
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA.,Department of Pathology, Stanford University, Stanford, CA 94305, USA.,Department of Medicine and Surgery, University of Milan-Bicocca, Monza 20900, Italy
| |
Collapse
|
13
|
Diaz-Colunga J, Diaz-Uriarte R. Conditional prediction of consecutive tumor evolution using cancer progression models: What genotype comes next? PLoS Comput Biol 2021; 17:e1009055. [PMID: 34932572 PMCID: PMC8730404 DOI: 10.1371/journal.pcbi.1009055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 01/05/2022] [Accepted: 11/25/2021] [Indexed: 12/13/2022] Open
Abstract
Accurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. However, their performance when predicting complete evolutionary trajectories is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, here we focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. We examine whether five distinct CPMs can be used to answer the question "Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression?" or, shortly, "What genotype comes next?". Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics can be much more relevant than global features. Application of these methods to 25 cancer data sets shows that their use is hampered by a lack of information needed to make principled decisions about method choice. Fruitful use of these methods for short-term predictions requires adapting method's use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method's results when key assumptions do not hold.
Collapse
Affiliation(s)
- Juan Diaz-Colunga
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid, Spain
- Department of Ecology & Evolutionary Biology and Microbial Sciences Institute, Yale University, New Haven, Connecticut, United States of America
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (UAM-CSIC), Madrid, Spain
- * E-mail:
| |
Collapse
|
14
|
Panja S, Rahem S, Chu CJ, Mitrofanova A. Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer. Curr Genomics 2021; 22:244-266. [PMID: 35273457 PMCID: PMC8822229 DOI: 10.2174/1389202921999201224110101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 11/22/2022] Open
Abstract
Background In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches in light of their application to therapeutic response modeling in cancer. Conclusion We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.
Collapse
Affiliation(s)
| | | | | | - Antonina Mitrofanova
- Address correspondence to this author at the Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107, USA; E-mail:
| |
Collapse
|
15
|
Reconstruction of evolving gene variants and fitness from short sequencing reads. Nat Chem Biol 2021; 17:1188-1198. [PMID: 34635842 PMCID: PMC8551035 DOI: 10.1038/s41589-021-00876-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.
Collapse
|
16
|
Comparing mutational pathways to lopinavir resistance in HIV-1 subtypes B versus C. PLoS Comput Biol 2021; 17:e1008363. [PMID: 34491984 PMCID: PMC8448360 DOI: 10.1371/journal.pcbi.1008363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 09/17/2021] [Accepted: 08/09/2021] [Indexed: 11/19/2022] Open
Abstract
Although combination antiretroviral therapies seem to be effective at controlling HIV-1 infections regardless of the viral subtype, there is increasing evidence for subtype-specific drug resistance mutations. The order and rates at which resistance mutations accumulate in different subtypes also remain poorly understood. Most of this knowledge is derived from studies of subtype B genotypes, despite not being the most abundant subtype worldwide. Here, we present a methodology for the comparison of mutational networks in different HIV-1 subtypes, based on Hidden Conjunctive Bayesian Networks (H-CBN), a probabilistic model for inferring mutational networks from cross-sectional genotype data. We introduce a Monte Carlo sampling scheme for learning H-CBN models for a larger number of resistance mutations and develop a statistical test to assess differences in the inferred mutational networks between two groups. We apply this method to infer the temporal progression of mutations conferring resistance to the protease inhibitor lopinavir in a large cross-sectional cohort of HIV-1 subtype C genotypes from South Africa, as well as to a data set of subtype B genotypes obtained from the Stanford HIV Drug Resistance Database and the Swiss HIV Cohort Study. We find strong support for different initial mutational events in the protease, namely at residue 46 in subtype B and at residue 82 in subtype C. The inferred mutational networks for subtype B versus C are significantly different sharing only five constraints on the order of accumulating mutations with mutation at residue 54 as the parental event. The results also suggest that mutations can accumulate along various alternative paths within subtypes, as opposed to a unique total temporal ordering. Beyond HIV drug resistance, the statistical methodology is applicable more generally for the comparison of inferred mutational networks between any two groups. There is a disparity in the distribution of infections by HIV-1 subtype in the world. Subtype B is predominant in America, Australia and western and central Europe, and most therapeutic strategies are based on research and clinical studies on this subtype. However, non-B subtypes represent the majority of global HIV-1 infections; e.g., subtype C alone accounts for nearly half of all HIV-1 infections. We present a statistical framework enabling the comparison of patterns of accumulating mutations in different HIV-1 subtypes. Specifically, we compare the temporal ordering of lopinavir resistance mutations in HIV-1 subtypes B versus C. To this end, we combine the Hidden Conjunctive Bayesian Network (H-CBN) model with an approximate inference scheme enabling comparisons of larger networks. We show that the development of resistance to lopinavir differs significantly between subtypes B and C, such that findings based on subtype B sequences can not always be applied to sybtype C. The described methodology is suitable for comparing different subgroups in the context of other evolutionary processes.
Collapse
|
17
|
|
18
|
Li L, Shao M, He X, Ren S, Tian T. Risk of lung cancer due to external environmental factor and epidemiological data analysis. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:6079-6094. [PMID: 34517524 DOI: 10.3934/mbe.2021304] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Lung cancer is a cancer with the fastest growth in the incidence and mortality all over the world, which is an extremely serious threat to human's life and health. Evidences reveal that external environmental factors are the key drivers of lung cancer, such as smoking, radiation exposure and so on. Therefore, it is urgent to explain the mechanism of lung cancer risk due to external environmental factors experimentally and theoretically. However, it is still an open issue regarding how external environment factors affect lung cancer risk. In this paper, we summarize the main mathematical models involved the gene mutations for cancers, and review the application of the models to analyze the mechanism of lung cancer and the risk of lung cancer due to external environmental exposure. In addition, we apply the model described and the epidemiological data to analyze the influence of external environmental factors on lung cancer risk. The result indicates that radiation can cause significantly an increase in the mutation rate of cells, in particular the mutation in stability gene that leads to genomic instability. These studies not only can offer insights into the relationship between external environmental factors and human lung cancer risk, but also can provide theoretical guidance for the prevention and control of lung cancer.
Collapse
Affiliation(s)
- Lingling Li
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Mengyao Shao
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Xingshi He
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Shanjing Ren
- School of Mathematics and Big Data, GuiZhou Education University, Guiyang 550018, China
| | - Tianhai Tian
- School of Mathematical Science, Monash University, Melbourne Vic 3800, Australia
| |
Collapse
|
19
|
Nicol PB, Coombes KR, Deaver C, Chkrebtii O, Paul S, Toland AE, Asiaee A. Oncogenetic network estimation with disjunctive Bayesian networks. COMPUTATIONAL AND SYSTEMS ONCOLOGY 2021. [DOI: 10.1002/cso2.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
| | - Kevin R. Coombes
- Department of Biomedical Informatics Ohio State University Columbus Ohio
| | - Courtney Deaver
- Natural Sciences Division Pepperdine University Malibu California
| | | | - Subhadeep Paul
- Department of Statistics Ohio State University Columbus Ohio
| | - Amanda E. Toland
- Department of Cancer Biology and Genetics and Department of Internal Medicine Division of Human Genetics, Comprehensive Cancer Center Ohio State University Columbus Ohio
| | - Amir Asiaee
- Mathematical Biosciences Institute Ohio State University Columbus Ohio
| |
Collapse
|
20
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
21
|
Hayashi Y, Fujita K, Banno E, Eich ML, Netto GJ, Nonomura N. Telomerase reverse transcriptase promoter mutation in tumorigenesis of bladder cancer: Evolutionary trajectory by algorithmic inference from cross-sectional data. Int J Urol 2021; 28:774-776. [PMID: 33858033 DOI: 10.1111/iju.14574] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Yujiro Hayashi
- Department of Urology, Osaka University Graduate School of Medicine, Suita, Osaka, Japan
| | - Kazutoshi Fujita
- Department of Urology, Osaka University Graduate School of Medicine, Suita, Osaka, Japan.,Department of Urology, Kindai University Faculty of Medicine, Osakasayama, Osaka, Japan
| | - Eri Banno
- Department of Urology, Kindai University Faculty of Medicine, Osakasayama, Osaka, Japan
| | - Marie-Lisa Eich
- Department of Pathology, University Hospital Cologne, Cologne, Germany
| | - George J Netto
- Department of Pathology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Norio Nonomura
- Department of Urology, Osaka University Graduate School of Medicine, Suita, Osaka, Japan
| |
Collapse
|
22
|
Ramazzotti D, Angaroni F, Maspero D, Gambacorti-Passerini C, Antoniotti M, Graudenzi A, Piazza R. VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples. PATTERNS (NEW YORK, N.Y.) 2021; 2:100212. [PMID: 33728416 PMCID: PMC7953447 DOI: 10.1016/j.patter.2021.100212] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 11/30/2020] [Accepted: 01/22/2021] [Indexed: 12/22/2022]
Abstract
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
Collapse
Affiliation(s)
- Daniele Ramazzotti
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Davide Maspero
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| |
Collapse
|
23
|
Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.
Collapse
|
24
|
Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics 2020; 36:241-249. [PMID: 31250881 PMCID: PMC6956791 DOI: 10.1093/bioinformatics/btz513] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 03/29/2019] [Accepted: 06/25/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. RESULTS Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. AVAILABILITY AND IMPLEMENTATION Implementation and data are available at https://github.com/RudiSchill/MHN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rudolf Schill
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg 93040, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, Regensburg 93040, Germany
| |
Collapse
|
25
|
Angaroni F, Graudenzi A, Rossignolo M, Maspero D, Calarco T, Piazza R, Montangero S, Antoniotti M. An Optimal Control Framework for the Automated Design of Personalized Cancer Treatments. Front Bioeng Biotechnol 2020; 8:523. [PMID: 32548108 PMCID: PMC7270334 DOI: 10.3389/fbioe.2020.00523] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 05/01/2020] [Indexed: 12/17/2022] Open
Abstract
One of the key challenges in current cancer research is the development of computational strategies to support clinicians in the identification of successful personalized treatments. Control theory might be an effective approach to this end, as proven by the long-established application to therapy design and testing. In this respect, we here introduce the Control Theory for Therapy Design (CT4TD) framework, which employs optimal control theory on patient-specific pharmacokinetics (PK) and pharmacodynamics (PD) models, to deliver optimized therapeutic strategies. The definition of personalized PK/PD models allows to explicitly consider the physiological heterogeneity of individuals and to adapt the therapy accordingly, as opposed to standard clinical practices. CT4TD can be used in two distinct scenarios. At the time of the diagnosis, CT4TD allows to set optimized personalized administration strategies, aimed at reaching selected target drug concentrations, while minimizing the costs in terms of toxicity and adverse effects. Moreover, if longitudinal data on patients under treatment are available, our approach allows to adjust the ongoing therapy, by relying on simplified models of cancer population dynamics, with the goal of minimizing or controlling the tumor burden. CT4TD is highly scalable, as it employs the efficient dCRAB/RedCRAB optimization algorithm, and the results are robust, as proven by extensive tests on synthetic data. Furthermore, the theoretical framework is general, and it might be applied to any therapy for which a PK/PD model can be estimated, and for any kind of administration and cost. As a proof of principle, we present the application of CT4TD to Imatinib administration in Chronic Myeloid leukemia, in which we adopt a simplified model of cancer population dynamics. In particular, we show that the optimized therapeutic strategies are diversified among patients, and display improvements with respect to the current standard regime.
Collapse
Affiliation(s)
- Fabrizio Angaroni
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
| | - Marco Rossignolo
- Center for Integrated Quantum Science and Technologies, Institute for Quantum Optics, Universitat Ulm, Ulm, Germany
- Istituto Nazionale di Fisica Nucleare (INFN), Padova, Italy
| | - Davide Maspero
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Tommaso Calarco
- Forschungszentrum Jülich, Institute of Quantum Control (PGI-8), Jülich, Germany
| | - Rocco Piazza
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- Hematology and Clinical Research Unit, San Gerardo Hospital, Monza, Italy
| | - Simone Montangero
- Istituto Nazionale di Fisica Nucleare (INFN), Padova, Italy
- Department of Physics and Astronomy “G. Galilei”, University of Padova, Padova, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre - B4, Milan, Italy
| |
Collapse
|
26
|
Qian J, Zhao S, Zou Y, Rahman SMJ, Senosain MF, Stricker T, Chen H, Powell CA, Borczuk AC, Massion PP. Genomic Underpinnings of Tumor Behavior in In Situ and Early Lung Adenocarcinoma. Am J Respir Crit Care Med 2020; 201:697-706. [PMID: 31747302 PMCID: PMC7068818 DOI: 10.1164/rccm.201902-0294oc] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 11/19/2019] [Indexed: 01/15/2023] Open
Abstract
Rationale: We have a limited understanding of the molecular underpinnings of early adenocarcinoma (ADC) progression. We hypothesized that the behavior of early ADC can be predicted based on genomic determinants.Objectives: To identify genomic alterations associated with resected indolent and aggressive early lung ADCs.Methods: DNA was extracted from 21 ADCs in situ (AISs), 27 minimally invasive ADCs (MIAs), and 54 fully invasive ADCs. This DNA was subjected to deep next-generation sequencing and tested against a custom panel of 347 cancer genes.Measurements and Main Results: Sequencing data was analyzed for associations among tumor mutation burden, frequency of mutations or copy number alterations, mutation signatures, intratumor heterogeneity, pathway alterations, histology, and overall survival. We found that deleterious mutation burden was significantly greater in invasive ADC, whereas more copy number loss was observed in AIS and MIA. Intratumor heterogeneity establishes early, as in AIS. Twenty-one significantly mutated genes were shared among the groups. Mutation signature profiling did not vary significantly, although the APOBEC signature was associated with ADC and poor survival. Subclonal KRAS mutations and a gene signature consisting of PIK3CG, ATM, EPPK1, EP300, or KMT2C mutations were also associated with poor survival. Mutations of KRAS, TP53, and NF1 were found to increase in frequency from AIS and MIA to ADC. A cancer progression model revealed selective early and late drivers.Conclusions: Our results reveal several genetic driver events, clonality, and mutational signatures associated with poor outcome in early lung ADC, with potential future implications for the detection and management of ADC.
Collapse
Affiliation(s)
- Jun Qian
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Early Cancer Detection and Prevention Initiative, Vanderbilt Ingram Cancer Center
- Center for Pecision Medicine, Department of Biomedical Informatics
| | | | - Yong Zou
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Early Cancer Detection and Prevention Initiative, Vanderbilt Ingram Cancer Center
| | - S. M. Jamshedur Rahman
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Early Cancer Detection and Prevention Initiative, Vanderbilt Ingram Cancer Center
| | - Maria-Fernanda Senosain
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Early Cancer Detection and Prevention Initiative, Vanderbilt Ingram Cancer Center
| | - Thomas Stricker
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Heidi Chen
- Center for Pecision Medicine, Department of Biomedical Informatics
| | | | - Alain C. Borczuk
- Department of Pathology, Weill Cornell Medicine, New York, New York
| | - Pierre P. Massion
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Early Cancer Detection and Prevention Initiative, Vanderbilt Ingram Cancer Center
| |
Collapse
|
27
|
HyperTraPS: Inferring Probabilistic Patterns of Trait Acquisition in Evolutionary and Disease Progression Pathways. Cell Syst 2020; 10:39-51.e10. [PMID: 31786211 DOI: 10.1016/j.cels.2019.10.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 08/23/2019] [Accepted: 10/26/2019] [Indexed: 01/15/2023]
Abstract
The explosion of data throughout the biomedical sciences provides unprecedented opportunities to learn about the dynamics of evolution and disease progression, but harnessing these large and diverse datasets remains challenging. Here, we describe a highly generalizable statistical platform to infer the dynamic pathways by which many, potentially interacting, traits are acquired or lost over time. We use HyperTraPS (hypercubic transition path sampling) to efficiently learn progression pathways from cross-sectional, longitudinal, or phylogenetically linked data, readily distinguishing multiple competing pathways, and identifying the most parsimonious mechanisms underlying given observations. This Bayesian approach allows inclusion of prior knowledge, quantifies uncertainty in pathway structure, and allows predictions, such as which symptom a patient will acquire next. We provide visualization tools for intuitive assessment of multiple, variable pathways. We apply the method to ovarian cancer progression and the evolution of multidrug resistance in tuberculosis, demonstrating its power to reveal previously undetected dynamic pathways.
Collapse
|
28
|
Diaz-Uriarte R, Vasallo C. Every which way? On predicting tumor evolution using cancer progression models. PLoS Comput Biol 2019; 15:e1007246. [PMID: 31374072 PMCID: PMC6693785 DOI: 10.1371/journal.pcbi.1007246] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 08/14/2019] [Accepted: 07/05/2019] [Indexed: 11/18/2022] Open
Abstract
Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| | - Claudia Vasallo
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| |
Collapse
|
29
|
Abstract
MOTIVATION How predictable is the evolution of cancer? This fundamental question is of immense relevance for the diagnosis, prognosis and treatment of cancer. Evolutionary biologists have approached the question of predictability based on the underlying fitness landscape. However, empirical fitness landscapes of tumor cells are impossible to determine in vivo. Thus, in order to quantify the predictability of cancer evolution, alternative approaches are required that circumvent the need for fitness landscapes. RESULTS We developed a computational method based on conjunctive Bayesian networks (CBNs) to quantify the predictability of cancer evolution directly from mutational data, without the need for measuring or estimating fitness. Using simulated data derived from >200 different fitness landscapes, we show that our CBN-based notion of evolutionary predictability strongly correlates with the classical notion of predictability based on fitness landscapes under the strong selection weak mutation assumption. The statistical framework enables robust and scalable quantification of evolutionary predictability. We applied our approach to driver mutation data from the TCGA and the MSK-IMPACT clinical cohorts to systematically compare the predictability of 15 different cancer types. We found that cancer evolution is remarkably predictable as only a small fraction of evolutionary trajectories are feasible during cancer progression. AVAILABILITY AND IMPLEMENTATION https://github.com/cbg-ethz/predictability\_of\_cancer\_evolution. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayed-Rzgar Hosseini
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas “Alberto Sols (UAM-CSIC)”, Madrid, Spain
| | - Florian Markowetz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
30
|
Khakabimamaghani S, Malikic S, Tang J, Ding D, Morin R, Chindelevitch L, Ester M. Collaborative intra-tumor heterogeneity detection. Bioinformatics 2019; 35:i379-i388. [PMID: 31510674 PMCID: PMC6612880 DOI: 10.1093/bioinformatics/btz355] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data. RESULTS We introduce a method called Hintra for intra-tumor heterogeneity detection. Hintra integrates sequencing data for a cohort of tumors and infers tumor phylogeny for each individual based on the evolutionary information shared between different tumors. Through an iterative process, Hintra learns the repeating evolutionary patterns and uses this information for resolving the phylogenetic ambiguities of individual tumors. The results of synthetic experiments show an improved performance compared to two state-of-the-art methods. The experimental results with a recent Breast Cancer dataset are consistent with the existing knowledge and provide potentially interesting findings. AVAILABILITY AND IMPLEMENTATION The source code for Hintra is available at https://github.com/sahandk/HINTRA.
Collapse
Affiliation(s)
| | - Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Jeffrey Tang
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Ryan Morin
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC
| | | | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC
- Vancouver Prostate Centre, Vancouver, BC, Canada
| |
Collapse
|
31
|
Ramazzotti D, Graudenzi A, De Sano L, Antoniotti M, Caravagna G. Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data. BMC Bioinformatics 2019; 20:210. [PMID: 31023236 PMCID: PMC6485126 DOI: 10.1186/s12859-019-2795-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 04/08/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. RESULTS We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. CONCLUSIONS We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses.
Collapse
Affiliation(s)
| | - Alex Graudenzi
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
- Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Viale F.lli Cervi 93, Segrate, Milan, 20090 Italy
| | - Luca De Sano
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
| | - Marco Antoniotti
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Viale Sarca 336, Milan, 20126 Italy
- Milan Center for Neuroscience, Università degli Studi di Milano-Bicocca, San Gerardo Hospital, Via Pergolesi 33, Monza, 20052 Italy
| | - Giulio Caravagna
- Centre for Evolution and Cancer, The Institute of Cancer Research, 15 Cotswold Road, London, SM2 5NG UK
| |
Collapse
|
32
|
Zhang W, Wang SL. An Integrated Framework for Identifying Mutated Driver Pathway and Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:455-464. [PMID: 29990286 DOI: 10.1109/tcbb.2017.2788016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Next-generation sequencing (NGS) technologies provide amount of somatic mutation data in a large number of patients. The identification of mutated driver pathway and cancer progression from these data is a challenging task because of the heterogeneity of interpatient. In addition, cancer progression at the pathway level has been proved to be more reasonable than at the gene level. In this paper, we introduce an integrated framework to identify mutated driver pathways and cancer progression (iMDPCP) at the pathway level from somatic mutation data. First, we use uncertainty coefficient to quantify mutual exclusivity on gene driver pathways and develop a computational framework to identify mutated driver pathways based on the adaptive discrete differential evolution algorithm. Then, we construct cancer progression model for driver pathways based on the Bayesian Network. Finally, we evaluate the performance of iMDPCP on real cancer somatic mutation datasets. The experimental results indicate that iMDPCP is more accurate than state-of-the-art methods according to the enrichment of KEGG pathways, and it also provides new insights on identifying cancer progression at the pathway level.
Collapse
|
33
|
Gerhauser C, Favero F, Risch T, Simon R, Feuerbach L, Assenov Y, Heckmann D, Sidiropoulos N, Waszak SM, Hübschmann D, Urbanucci A, Girma EG, Kuryshev V, Klimczak LJ, Saini N, Stütz AM, Weichenhan D, Böttcher LM, Toth R, Hendriksen JD, Koop C, Lutsik P, Matzk S, Warnatz HJ, Amstislavskiy V, Feuerstein C, Raeder B, Bogatyrova O, Schmitz EM, Hube-Magg C, Kluth M, Huland H, Graefen M, Lawerenz C, Henry GH, Yamaguchi TN, Malewska A, Meiners J, Schilling D, Reisinger E, Eils R, Schlesner M, Strand DW, Bristow RG, Boutros PC, von Kalle C, Gordenin D, Sültmann H, Brors B, Sauter G, Plass C, Yaspo ML, Korbel JO, Schlomm T, Weischenfeldt J. Molecular Evolution of Early-Onset Prostate Cancer Identifies Molecular Risk Markers and Clinical Trajectories. Cancer Cell 2018; 34:996-1011.e8. [PMID: 30537516 PMCID: PMC7444093 DOI: 10.1016/j.ccell.2018.10.016] [Citation(s) in RCA: 162] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 08/17/2018] [Accepted: 10/29/2018] [Indexed: 12/28/2022]
Abstract
Identifying the earliest somatic changes in prostate cancer can give important insights into tumor evolution and aids in stratifying high- from low-risk disease. We integrated whole genome, transcriptome and methylome analysis of early-onset prostate cancers (diagnosis ≤55 years). Characterization across 292 prostate cancer genomes revealed age-related genomic alterations and a clock-like enzymatic-driven mutational process contributing to the earliest mutations in prostate cancer patients. Our integrative analysis identified four molecular subgroups, including a particularly aggressive subgroup with recurrent duplications associated with increased expression of ESRP1, which we validate in 12,000 tissue microarray tumors. Finally, we combined the patterns of molecular co-occurrence and risk-based subgroup information to deconvolve the molecular and clinical trajectories of prostate cancer from single patient samples.
Collapse
Affiliation(s)
- Clarissa Gerhauser
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Francesco Favero
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark; Biotech Research & Innovation Centre (BRIC), University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Thomas Risch
- Max Planck Institute for Molecular Genetics, Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Ronald Simon
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Lars Feuerbach
- Division Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Yassen Assenov
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Doreen Heckmann
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| | - Nikos Sidiropoulos
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark; Biotech Research & Innovation Centre (BRIC), University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Sebastian M Waszak
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69120 Heidelberg, Germany
| | - Daniel Hübschmann
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; Department for Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology and Bioquant, University of Heidelberg, Heidelberg 69120, Germany; Department of Pediatric Immunology, Hematology and Oncology, University Hospital, Heidelberg 69120, Germany
| | - Alfonso Urbanucci
- Centre for Molecular Medicine Norway, Nordic European Molecular Biology Laboratory Partnership, Forskningsparken, University of Oslo, 0316 Oslo, Norway; Institute for Cancer Genetics and Informatics, Oslo University Hospital, 0316 Oslo, Norway; Department of Core Facilities, Institute for Cancer Research, Oslo University Hospital, 0316 Oslo, Norway
| | - Etsehiwot G Girma
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark; Biotech Research & Innovation Centre (BRIC), University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Vladimir Kuryshev
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| | - Leszek J Klimczak
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, Durham, 27709 NC, USA
| | - Natalie Saini
- Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Durham, 27709 NC, USA
| | - Adrian M Stütz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69120 Heidelberg, Germany
| | - Dieter Weichenhan
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Lisa-Marie Böttcher
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Reka Toth
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Josephine D Hendriksen
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark; Biotech Research & Innovation Centre (BRIC), University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Christina Koop
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Pavlo Lutsik
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Sören Matzk
- Max Planck Institute for Molecular Genetics, Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Hans-Jörg Warnatz
- Max Planck Institute for Molecular Genetics, Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Vyacheslav Amstislavskiy
- Max Planck Institute for Molecular Genetics, Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Clarissa Feuerstein
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; Faculty of Biosciences, Heidelberg University, 69120 Heidelberg, Germany
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69120 Heidelberg, Germany
| | - Olga Bogatyrova
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | | | - Claudia Hube-Magg
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Martina Kluth
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Hartwig Huland
- Martini-Clinic Prostate Cancer Center at the University Medical Center Hamburg-Eppendorf, Martinistrasse 52, D-20246 Hamburg, Germany
| | - Markus Graefen
- Martini-Clinic Prostate Cancer Center at the University Medical Center Hamburg-Eppendorf, Martinistrasse 52, D-20246 Hamburg, Germany
| | - Chris Lawerenz
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Gervaise H Henry
- Department of Urology, UT Southwestern Medical Center, Dallas, TX 75390-9110, USA
| | - Takafumi N Yamaguchi
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Canada
| | - Alicia Malewska
- Department of Urology, UT Southwestern Medical Center, Dallas, TX 75390-9110, USA
| | - Jan Meiners
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Daniela Schilling
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; NCT Trial Center, National Center for Tumor Diseases and German Cancer Research Center, 69120 Heidelberg, Germany
| | - Eva Reisinger
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; Department for Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology and Bioquant, University of Heidelberg, Heidelberg 69120, Germany
| | - Matthias Schlesner
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; Bioinformatics and Omics Data Analytics (B240), German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Douglas W Strand
- Department of Urology, UT Southwestern Medical Center, Dallas, TX 75390-9110, USA
| | - Robert G Bristow
- Manchester Cancer Research Centre, University of Manchester, 555 Wilmslow Road, Manchester, UK
| | - Paul C Boutros
- Ontario Institute for Cancer Research, Toronto, Canada; Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Christof von Kalle
- German Cancer Consortium (DKTK), 69120 Heidelberg, Germany; Division of Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Dmitry Gordenin
- Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Durham, 27709 NC, USA
| | - Holger Sültmann
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| | - Benedikt Brors
- Division Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany; National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany; German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| | - Guido Sauter
- Department of Pathology, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Christoph Plass
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| | - Marie-Laure Yaspo
- Max Planck Institute for Molecular Genetics, Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69120 Heidelberg, Germany.
| | - Thorsten Schlomm
- Martini-Clinic Prostate Cancer Center at the University Medical Center Hamburg-Eppendorf, Martinistrasse 52, D-20246 Hamburg, Germany; Charité Universitätsmedizin Berlin, Charitéplatz 1, D-10117 Berlin, Germany.
| | - Joachim Weischenfeldt
- Finsen Laboratory, Rigshospitalet, DK-2200, Copenhagen, Denmark; Biotech Research & Innovation Centre (BRIC), University of Copenhagen, DK-2200, Copenhagen, Denmark; European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69120 Heidelberg, Germany; Charité Universitätsmedizin Berlin, Charitéplatz 1, D-10117 Berlin, Germany.
| |
Collapse
|
34
|
Clinical impact of MYC abnormalities in plasma cell myeloma. Cancer Genet 2018; 228-229:115-126. [DOI: 10.1016/j.cancergen.2018.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 10/16/2018] [Accepted: 10/22/2018] [Indexed: 11/23/2022]
|
35
|
Abstract
Large-scale genomic data highlight the complexity and diversity of the molecular changes that drive cancer progression. Statistical analysis of cancer data from different tissues can guide drug repositioning as well as the design of targeted treatments. Here, we develop an improved Bayesian network model for tumour mutational profiles and apply it to 8198 patient samples across 22 cancer types from TCGA. For each cancer type, we identify the interactions between mutated genes, capturing signatures beyond mere mutational frequencies. When comparing mutation networks, we find genes which interact both within and across cancer types. To detach cancer classification from the tissue type we perform de novo clustering of the pancancer mutational profiles based on the Bayesian network models. We find 22 novel clusters which significantly improve survival prediction beyond clinical information. The models highlight key gene interactions for each cluster potentially allowing genomic stratification for clinical trials and identifying drug targets. Tumour heterogeneity hinders translation of large-scale genomic data into the clinic. Here the authors develop a method for the stratification of cancer patients based on the molecular gene status, including genetic interactions, rather than clinico-histological data, and apply it to TCGA data for over 8000 cases across 22 cancer types.
Collapse
|
36
|
Diaz-Uriarte R. Cancer progression models and fitness landscapes: a many-to-many relationship. Bioinformatics 2018; 34:836-844. [PMID: 29048486 PMCID: PMC6031050 DOI: 10.1093/bioinformatics/btx663] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation The identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to identify these constraints, and return Directed Acyclic Graphs (DAGs) of restrictions where arrows indicate dependencies or constraints. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes-e.g. those with reciprocal sign epistasis-cannot be represented by CPMs. Results Using simulated data under 500 fitness landscapes, I show that CPMs' performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime and fitness landscape features, in ways that depend on CPM method. Using three cancer datasets, I show that these problems strongly affect the analysis of empirical data: fitness landscapes that are widely different from each other produce data similar to the empirically observed ones and lead to DAGs that infer very different restrictions. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs. Availability and implementation Code available from Supplementary Material. Contact ramon.diaz@iib.uam.es. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid 28029, Spain
| |
Collapse
|
37
|
Graudenzi A, Maspero D, Di Filippo M, Gnugnoli M, Isella C, Mauri G, Medico E, Antoniotti M, Damiani C. Integration of transcriptomic data and metabolic networks in cancer samples reveals highly significant prognostic power. J Biomed Inform 2018; 87:37-49. [PMID: 30244122 DOI: 10.1016/j.jbi.2018.09.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 09/07/2018] [Accepted: 09/14/2018] [Indexed: 12/20/2022]
Abstract
Effective stratification of cancer patients on the basis of their molecular make-up is a key open challenge. Given the altered and heterogenous nature of cancer metabolism, we here propose to use the overall expression of central carbon metabolism as biomarker to characterize groups of patients with important characteristics, such as response to ad-hoc therapeutic strategies and survival expectancy. To this end, we here introduce the data integration framework named Metabolic Reaction Enrichment Analysis (MaREA), which strives to characterize the metabolic deregulations that distinguish cancer phenotypes, by projecting RNA-seq data onto metabolic networks, without requiring metabolic measurements. MaREA computes a score for each network reaction, based on the expression of the set of genes encoding for the associated enzyme(s). The scores are first used as features for cluster analysis and then to rank and visualize in an organized fashion the metabolic deregulations that distinguish cancer sub-types. We applied our method to recent lung and breast cancer RNA-seq datasets from The Cancer Genome Atlas and we were able to identify subgroups of patients with significant differences in survival expectancy. We show how the prognostic power of MaREA improves when an extracted and further curated core model focusing on central carbon metabolism is used rather than the genome-wide reference network. The visualization of the metabolic differences between the groups with best and worst prognosis allowed to identify and analyze key metabolic properties related to cancer aggressiveness. Some of these properties are shared across different cancer (sub) types, e.g., the up-regulation of nucleic acid and amino acid synthesis, whereas some other appear to be tumor-specific, such as the up- or down-regulation of the phosphoenolpyruvate carboxykinase reaction, which display different patterns in distinct tumor (sub)types. These results might be soon employed to deliver highly automated diagnostic and prognostic strategies for cancer patients.
Collapse
Affiliation(s)
- Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Davide Maspero
- Department of Biotechnology and Biosciences, University Milano-Bicocca, Milan, Italy
| | - Marzia Di Filippo
- Department of Biotechnology and Biosciences, University Milano-Bicocca, Milan, Italy; SYSBIO Centre of Systems Biology, University Milano-Bicocca, Milan, Italy
| | - Marco Gnugnoli
- Department of Biotechnology and Biosciences, University Milano-Bicocca, Milan, Italy; SYSBIO Centre of Systems Biology, University Milano-Bicocca, Milan, Italy
| | - Claudio Isella
- University of Torino, Department of Oncology, Candiolo, Torino, Italy; Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Torino, Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; SYSBIO Centre of Systems Biology, University Milano-Bicocca, Milan, Italy
| | - Enzo Medico
- University of Torino, Department of Oncology, Candiolo, Torino, Italy; Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Torino, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; Milan Center for Neuroscience, University of Milan-Bicocca, Monza, Italy
| | - Chiara Damiani
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy; SYSBIO Centre of Systems Biology, University Milano-Bicocca, Milan, Italy.
| |
Collapse
|
38
|
Caravagna G, Giarratano Y, Ramazzotti D, Tomlinson I, Graham TA, Sanguinetti G, Sottoriva A. Detecting repeated cancer evolution from multi-region tumor sequencing data. Nat Methods 2018; 15:707-714. [PMID: 30171232 PMCID: PMC6380470 DOI: 10.1038/s41592-018-0108-x] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 07/23/2018] [Indexed: 12/13/2022]
Abstract
Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression. Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (n = 2,935). Our method provides a means of classifying patients on the basis of how their tumor evolved, with implications for the anticipation of disease progression.
Collapse
Affiliation(s)
- Giulio Caravagna
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
- School of Informatics, University of Edinburgh, Edinburgh, UK.
| | - Ylenia Giarratano
- School of Informatics, University of Edinburgh, Edinburgh, UK
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
| | | | - Ian Tomlinson
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
| | - Trevor A Graham
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | | | - Andrea Sottoriva
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
| |
Collapse
|
39
|
Abstract
Cancer arises through the accumulation of somatic mutations over time. An understanding of the sequence of events during this process should allow both earlier diagnosis and better prediction of cancer progression. However, the pathways of tumor evolution have not yet been comprehensively characterized. With the advent of whole genome sequencing, it is now possible to infer the evolutionary history of single tumors from the snapshot of their genome taken at diagnosis, giving new insights into the biology of tumorigenesis.
Collapse
MESH Headings
- BRCA1 Protein/genetics
- BRCA1 Protein/metabolism
- Breast Neoplasms/genetics
- Breast Neoplasms/metabolism
- Breast Neoplasms/pathology
- Carcinogenesis/genetics
- Carcinogenesis/metabolism
- Carcinogenesis/pathology
- Clonal Evolution
- Female
- Gene Expression Regulation, Neoplastic
- Genome, Human
- Humans
- Janus Kinase 2/genetics
- Janus Kinase 2/metabolism
- Leukemia, Lymphocytic, Chronic, B-Cell/genetics
- Leukemia, Lymphocytic, Chronic, B-Cell/metabolism
- Leukemia, Lymphocytic, Chronic, B-Cell/pathology
- Male
- Mutation
- Neoplasm Proteins/genetics
- Neoplasm Proteins/metabolism
- STAT3 Transcription Factor/genetics
- STAT3 Transcription Factor/metabolism
- Time Factors
- Whole Genome Sequencing
Collapse
Affiliation(s)
- Clemency Jolly
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Peter Van Loo
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
- Department of Human Genetics, University of Leuven, B-3000, Leuven, Belgium.
| |
Collapse
|
40
|
Ramazzotti D, Graudenzi A, Caravagna G, Antoniotti M. Modeling Cumulative Biological Phenomena with Suppes-Bayes Causal Networks. Evol Bioinform Online 2018; 14:1176934318785167. [PMID: 30013303 PMCID: PMC6043942 DOI: 10.1177/1176934318785167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 05/27/2018] [Indexed: 12/18/2022] Open
Abstract
Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wild-type conditions. Cancer and HIV are 2 common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, co-operation, and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes’ theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). The SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model selection strategies with regularization. In this article, we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model selection task of (1) the poset based on Suppes’ theory and (2) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred SBCN
Collapse
Affiliation(s)
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
41
|
Hainke K, Szugat S, Fried R, Rahnenführer J. Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV. BMC Bioinformatics 2017; 18:358. [PMID: 28764644 PMCID: PMC5539896 DOI: 10.1186/s12859-017-1762-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/14/2017] [Indexed: 12/12/2022] Open
Abstract
Background Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail. Results We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected. Conclusions The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1762-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katrin Hainke
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Sebastian Szugat
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Roland Fried
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany.
| |
Collapse
|
42
|
Abstract
Rapid advances in high-throughput sequencing and a growing realization of the importance of evolutionary theory to cancer genomics have led to a proliferation of phylogenetic studies of tumour progression. These studies have yielded not only new insights but also a plethora of experimental approaches, sometimes reaching conflicting or poorly supported conclusions. Here, we consider this body of work in light of the key computational principles underpinning phylogenetic inference, with the goal of providing practical guidance on the design and analysis of scientifically rigorous tumour phylogeny studies. We survey the range of methods and tools available to the researcher, their key applications, and the various unsolved problems, closing with a perspective on the prospects and broader implications of this field.
Collapse
Affiliation(s)
- Russell Schwartz
- Department of Biological Sciences and Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Alejandro A Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
43
|
Peterson LE, Kovyrshina T. Progression inference for somatic mutations in cancer. Heliyon 2017; 3:e00277. [PMID: 28492066 PMCID: PMC5415494 DOI: 10.1016/j.heliyon.2017.e00277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Revised: 03/08/2017] [Accepted: 03/23/2017] [Indexed: 01/05/2023] Open
Abstract
Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer.
Collapse
Affiliation(s)
- Leif E. Peterson
- Center for Biostatistics, Houston Methodist Research Institute, Houston, TX 77030, USA
- Dept. of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Dept. of Biostatistics, School of Public Health, University of Texas – Health Science Center, Houston, TX 77030, USA
- Dept. of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Dept. of Neuroscience and Experimental Therapeutics, Texas A&M University Health Science Center, College Station, TX 77843, USA
| | - Tatiana Kovyrshina
- Center for Biostatistics, Houston Methodist Research Institute, Houston, TX 77030, USA
- Dept. of Mathematics and Statistics, University of Houston – Downtown, Houston, TX 77002, USA
| |
Collapse
|
44
|
Exposing the probabilistic causal structure of discrimination. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2017. [DOI: 10.1007/s41060-016-0040-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
45
|
Gertz EM, Chowdhury SA, Lee WJ, Wangsa D, Heselmeyer-Haddad K, Ried T, Schwartz R, Schäffer AA. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe. PLoS One 2016; 11:e0158569. [PMID: 27362268 PMCID: PMC4928784 DOI: 10.1371/journal.pone.0158569] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 06/19/2016] [Indexed: 01/03/2023] Open
Abstract
Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees.
Collapse
MESH Headings
- Biomarkers, Tumor/genetics
- Breast Neoplasms/genetics
- Breast Neoplasms/pathology
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Intraductal, Noninfiltrating/genetics
- Carcinoma, Intraductal, Noninfiltrating/pathology
- Databases, Genetic
- Female
- Humans
- In Situ Hybridization, Fluorescence/methods
- Ploidies
- Uterine Cervical Neoplasms/genetics
- Uterine Cervical Neoplasms/pathology
Collapse
Affiliation(s)
- E. Michael Gertz
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Salim Akhter Chowdhury
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Carnegie Mellon/University of Pittsburgh Joint Ph.D. Program in Computational Biology, Pittsburgh, PA, United States of America
| | - Woei-Jyh Lee
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Darawalee Wangsa
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Kerstin Heselmeyer-Haddad
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Thomas Ried
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, National Cancer Institute, U.S. National Institutes of Health, Bethesda, MD, United States of America
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Alejandro A. Schäffer
- Computational Biology Branch, National Center for Biotechnology Information, U.S. National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
46
|
Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Natl Acad Sci U S A 2016; 113:E4025-34. [PMID: 27357673 DOI: 10.1073/pnas.1520213113] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
Collapse
|
47
|
De Sano L, Caravagna G, Ramazzotti D, Graudenzi A, Mauri G, Mishra B, Antoniotti M. TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data. Bioinformatics 2016; 32:1911-3. [PMID: 26861821 PMCID: PMC6280783 DOI: 10.1093/bioinformatics/btw035] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 01/18/2016] [Accepted: 01/18/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION We introduce TRanslational ONCOlogy (TRONCO), an open-source R package that implements the state-of-the-art algorithms for the inference of cancer progression models from (epi)genomic mutational profiles. TRONCO can be used to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples, e.g. retrieved from publicly available databases, and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples, e.g. multiple biopsies or single-cell sequencing data, are available. The resulting models can provide key hints for uncovering the evolutionary trajectories of cancer, especially for precision medicine or personalized therapy. AVAILABILITY AND IMPLEMENTATION TRONCO is released under the GPL license, is hosted at http://bimib.disco.unimib.it/ (Software section) and archived also at bioconductor.org. CONTACT tronco@disco.unimib.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca De Sano
- Department of Informatics, Systems and Communication, University of Milano-Bicocca
| | - Giulio Caravagna
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, School of Informatics, University of Edinburgh, Edinburgh, UK
| | - Daniele Ramazzotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Milan, Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication, University of Milano-Bicocca
| | - Bud Mishra
- Courant Institute of Mathematical Sciences, New York University, New York, NY, USA and
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan Center for Neuroscience, University of Milan-Bicocca, Milan, Italy
| |
Collapse
|
48
|
Rubinacci S, Graudenzi A, Caravagna G, Mauri G, Osborne J, Pitt-Francis J, Antoniotti M. CoGNaC: A Chaste Plugin for the Multiscale Simulation of Gene Regulatory Networks Driving the Spatial Dynamics of Tissues and Cancer. Cancer Inform 2015; 14:53-65. [PMID: 26380549 PMCID: PMC4559197 DOI: 10.4137/cin.s19965] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 06/18/2015] [Accepted: 06/21/2015] [Indexed: 01/01/2023] Open
Abstract
We introduce a Chaste plugin for the generation and the simulation of Gene Regulatory Networks (GRNs) in multiscale models of multicellular systems. Chaste is a widely used and versatile computational framework for the multiscale modeling and simulation of multicellular biological systems. The plugin, named CoGNaC (Chaste and Gene Networks for Cancer), allows the linking of the regulatory dynamics to key properties of the cell cycle and of the differentiation process in populations of cells, which can subsequently be modeled using different spatial modeling scenarios. The approach of CoGNaC focuses on the emergent dynamical behavior of gene networks, in terms of gene activation patterns characterizing the different cellular phenotypes of real cells and, especially, on the overall robustness to perturbations and biological noise. The integration of this approach within Chaste’s modular simulation framework provides a powerful tool to model multicellular systems, possibly allowing for the formulation of novel hypotheses on gene regulation, cell differentiation, and, in particular, cancer emergence and development. In order to demonstrate the usefulness of CoGNaC over a range of modeling paradigms, two example applications are presented. The first of these concerns the characterization of the gene activation patterns of human T-helper cells. The second example is a multiscale simulation of a simplified intestinal crypt, in which, given certain conditions, tumor cells can emerge and colonize the tissue.
Collapse
Affiliation(s)
- Simone Rubinacci
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Giulio Caravagna
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - James Osborne
- School of Mathematics and Statistics, University of Melbourne, Australia
| | - Joe Pitt-Francis
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| |
Collapse
|