1
|
Melo D, Pallares LF, Ayroles JF. Reassessing the modularity of gene co-expression networks using the Stochastic Block Model. PLoS Comput Biol 2024; 20:e1012300. [PMID: 39074140 PMCID: PMC11309492 DOI: 10.1371/journal.pcbi.1012300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 08/08/2024] [Accepted: 07/07/2024] [Indexed: 07/31/2024] Open
Abstract
Finding communities in gene co-expression networks is a common first step toward extracting biological insight from these complex datasets. Most community detection algorithms expect genes to be organized into assortative modules, that is, groups of genes that are more associated with each other than with genes in other groups. While it is reasonable to expect that these modules exist, using methods that assume they exist a priori is risky, as it guarantees that alternative organizations of gene interactions will be ignored. Here, we ask: can we find meaningful communities without imposing a modular organization on gene co-expression networks, and how modular are these communities? For this, we use a recently developed community detection method, the weighted degree corrected stochastic block model (SBM), that does not assume that assortative modules exist. Instead, the SBM attempts to efficiently use all information contained in the co-expression network to separate the genes into hierarchically organized blocks of genes. Using RNAseq gene expression data measured in two tissues derived from an outbred population of Drosophila melanogaster, we show that (a) the SBM is able to find ten times as many groups as competing methods, that (b) several of those gene groups are not modular, and that (c) the functional enrichment for non-modular groups is as strong as for modular communities. These results show that the transcriptome is structured in more complex ways than traditionally thought and that we should revisit the long-standing assumption that modularity is the main driver of the structuring of gene co-expression networks.
Collapse
Affiliation(s)
- Diogo Melo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Luisa F. Pallares
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Julien F. Ayroles
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
2
|
Giansanti V, Giannese F, Botrugno OA, Gandolfi G, Balestrieri C, Antoniotti M, Tonon G, Cittaro D. Scalable integration of multiomic single-cell data using generative adversarial networks. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae300. [PMID: 38696763 DOI: 10.1093/bioinformatics/btae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024]
Abstract
MOTIVATION Single-cell profiling has become a common practice to investigate the complexity of tissues, organs, and organisms. Recent technological advances are expanding our capabilities to profile various molecular layers beyond the transcriptome such as, but not limited to, the genome, the epigenome, and the proteome. Depending on the experimental procedure, these data can be obtained from separate assays or the very same cells. Yet, integration of more than two assays is currently not supported by the majority of the computational frameworks avaiable. RESULTS We here propose a Multi-Omic data integration framework based on Wasserstein Generative Adversarial Networks suitable for the analysis of paired or unpaired data with a high number of modalities (>2). At the core of our strategy is a single network trained on all modalities together, limiting the computational burden when many molecular layers are evaluated. AVAILABILITY AND IMPLEMENTATION Source code of our framework is available at https://github.com/vgiansanti/MOWGAN.
Collapse
Affiliation(s)
- Valentina Giansanti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Francesca Giannese
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Oronza A Botrugno
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Giorgia Gandolfi
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Chiara Balestrieri
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Experimental Hematology Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Università degli Studi di Milano-Bicocca, Milan, 20125, Italy
- Istituto di Bioimmagini e Fisiologia Molecolare, Consiglio Nazionale delle Ricerche (CNR), Milan, 20090, Italy
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Functional Genomics of Cancer Unit, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
- Università Vita-Salute San Raffaele, Milan, 20132, Italy
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, 20132, Italy
| |
Collapse
|
3
|
Melo D, Pallares LF, Ayroles JF. Reassessing the modularity of gene co-expression networks using the Stochastic Block Model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.31.542906. [PMID: 37398186 PMCID: PMC10312592 DOI: 10.1101/2023.05.31.542906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Finding communities in gene co-expression networks is a common first step toward extracting biological insight from these complex datasets. Most community detection algorithms expect genes to be organized into assortative modules, that is, groups of genes that are more associated with each other than with genes in other groups. While it is reasonable to expect that these modules exist, using methods that assume they exist a priori is risky, as it guarantees that alternative organizations of gene interactions will be ignored. Here, we ask: can we find meaningful communities without imposing a modular organization on gene co-expression networks, and how modular are these communities? For this, we use a recently developed community detection method, the weighted degree corrected stochastic block model (SBM), that does not assume that assortative modules exist. Instead, the SBM attempts to efficiently use all information contained in the co-expression network to separate the genes into hierarchically organized blocks of genes. Using RNA-seq gene expression data measured in two tissues derived from an outbred population of Drosophila melanogaster, we show that (a) the SBM is able to find ten times as many groups as competing methods, that (b) several of those gene groups are not modular, and that (c) the functional enrichment for non-modular groups is as strong as for modular communities. These results show that the transcriptome is structured in more complex ways than traditionally thought and that we should revisit the long-standing assumption that modularity is the main driver of the structuring of gene co-expression networks.
Collapse
Affiliation(s)
- Diogo Melo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, USA
| | - Luisa F Pallares
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, USA
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Julien F Ayroles
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, USA
| |
Collapse
|
4
|
Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers (Basel) 2024; 16:1350. [PMID: 38611028 PMCID: PMC11011054 DOI: 10.3390/cancers16071350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/25/2024] [Accepted: 03/28/2024] [Indexed: 04/14/2024] Open
Abstract
Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.
Collapse
Affiliation(s)
- Gabriele Malagoli
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Filippo Valle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Emmanuel Barillot
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| | - Michele Caselle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Loredana Martignetti
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| |
Collapse
|
5
|
Perelli L, Zhang L, Mangiameli S, Russell AJC, Giannese F, Peng F, Carbone F, Le C, Khan H, Citron F, Soeung M, Lam TNA, Lundgren S, Zhu C, Catania D, Feng N, Gurreri E, Sgambato A, Tortora G, Draetta GF, Tonon G, Futreal A, Giuliani V, Carugo A, Viale A, Heffernan TP, Wang L, Cittaro D, Chen F, Genovese G. Evolutionary fingerprints of EMT in pancreatic cancers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.18.558231. [PMID: 37786705 PMCID: PMC10541589 DOI: 10.1101/2023.09.18.558231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Mesenchymal plasticity has been extensively described in advanced and metastatic epithelial cancers; however, its functional role in malignant progression, metastatic dissemination and therapy response is controversial. More importantly, the role of epithelial mesenchymal transition (EMT) and cell plasticity in tumor heterogeneity, clonal selection and clonal evolution is poorly understood. Functionally, our work clarifies the contribution of EMT to malignant progression and metastasis in pancreatic cancer. We leveraged ad hoc somatic mosaic genome engineering, lineage tracing and ablation technologies and dynamic genetic reporters to trace and ablate tumor-specific lineages along the phenotypic spectrum of epithelial to mesenchymal plasticity. The experimental evidences clarify the essential contribution of mesenchymal lineages to pancreatic cancer evolution and metastatic dissemination. Spatial genomic analysis combined with single cell transcriptomic and epigenomic profiling of epithelial and mesenchymal lineages reveals that EMT promotes with the emergence of chromosomal instability (CIN). Specifically tumor lineages with mesenchymal features display highly conserved patterns of genomic evolution including complex structural genomic rearrangements and chromotriptic events. Genetic ablation of mesenchymal lineages robustly abolished these mutational processes and evolutionary patterns, as confirmed by cross species analysis of pancreatic and other human epithelial cancers. Mechanistically, we discovered that malignant cells with mesenchymal features display increased chromatin accessibility, particularly in the pericentromeric and centromeric regions, which in turn results in delayed mitosis and catastrophic cell division. Therefore, EMT favors the emergence of high-fitness tumor cells, strongly supporting the concept of a cell-state, lineage-restricted patterns of evolution, where cancer cell sub-clonal speciation is propagated to progenies only through restricted functional compartments. Restraining those evolutionary routes through genetic ablation of clones capable of mesenchymal plasticity and extinction of the derived lineages completely abrogates the malignant potential of one of the most aggressive form of human cancer.
Collapse
|
6
|
Cittaro D, Lazarević D, Tonon G, Giannese F. Analyzing genomic and epigenetic profiles in single cells by hybrid transposase (scGET-seq). STAR Protoc 2023; 4:102176. [PMID: 37000619 PMCID: PMC10090441 DOI: 10.1016/j.xpro.2023.102176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 12/29/2022] [Accepted: 02/23/2023] [Indexed: 03/30/2023] Open
Abstract
scGET-seq simultaneously profiles euchromatin and heterochromatin. scGET-seq exploits the concurrent action of transposase Tn5 and its hybrid form TnH, which targets H3K9me3 domains. Here we present a step-by-step protocol to profile single cells by scGET-seq using a 10× Chromium Controller. We describe steps for transposomes preparation and validation. We detail nuclei preparation and transposition, followed by encapsulation, library preparation, sequencing, and data analysis. For complete details on the use and execution of this protocol, please refer to Tedesco et al. (2022)1 and de Pretis and Cittaro (2022).2.
Collapse
Affiliation(s)
- Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Hospital, Milano, Italy.
| | - Dejan Lazarević
- Center for Omics Sciences, IRCCS San Raffaele Hospital, Milano, Italy; Università Vita-Salute San Raffaele, Milano, Italy
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS San Raffaele Hospital, Milano, Italy; Università Vita-Salute San Raffaele, Milano, Italy
| | | |
Collapse
|
7
|
Multiomics Topic Modeling for Breast Cancer Classification. Cancers (Basel) 2022; 14:cancers14051150. [PMID: 35267458 PMCID: PMC8909787 DOI: 10.3390/cancers14051150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 02/18/2022] [Indexed: 12/04/2022] Open
Abstract
The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
Collapse
|