1
|
Ruiz-Arenas C, Marín-Goñi I, Wang L, Ochoa I, Pérez-Jurado L, Hernaez M. NetActivity enhances transcriptional signals by combining gene expression into robust gene set activity scores through interpretable autoencoders. Nucleic Acids Res 2024; 52:e44. [PMID: 38597610 PMCID: PMC11109970 DOI: 10.1093/nar/gkae197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/23/2024] [Accepted: 03/12/2024] [Indexed: 04/11/2024] Open
Abstract
Grouping gene expression into gene set activity scores (GSAS) provides better biological insights than studying individual genes. However, existing gene set projection methods cannot return representative, robust, and interpretable GSAS. We developed NetActivity, a machine learning framework that generates GSAS based on a sparsely-connected autoencoder, where each neuron in the inner layer represents a gene set. We proposed a three-tier training that yielded representative, robust, and interpretable GSAS. NetActivity model was trained with 1518 GO biological processes terms and KEGG pathways and all GTEx samples. NetActivity generates GSAS robust to the initialization parameters and representative of the original transcriptome, and assigned higher importance to more biologically relevant genes. Moreover, NetActivity returns GSAS with a more consistent definition and higher interpretability than GSVA and hipathia, state-of-the-art gene set projection methods. Finally, NetActivity enables combining bulk RNA-seq and microarray datasets in a meta-analysis of prostate cancer progression, highlighting gene sets related to cell division, key for disease progression. When applied to metastatic prostate cancer, gene sets associated with cancer progression were also altered due to drug resistance, while a classical enrichment analysis identified gene sets irrelevant to the phenotype. NetActivity is publicly available in Bioconductor and GitHub.
Collapse
Affiliation(s)
- Carlos Ruiz-Arenas
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Department MELIS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Irene Marín-Goñi
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Idoia Ochoa
- Department of Electrical and Electronics Engineering, Tecnun, University of Navarra, Donostia, Spain
- Institute for Data Science and Artificial Inteligence (DATAI), University of Navarra, Pamplona 31008, Spain
| | - Luis A Pérez-Jurado
- Department MELIS, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
- Genetics Service, Hospital del Mar & Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Mikel Hernaez
- Computational Biology Program, CIMA University of Navarra, idiSNA, Pamplona 31008, Spain
- Institute for Data Science and Artificial Inteligence (DATAI), University of Navarra, Pamplona 31008, Spain
| |
Collapse
|
2
|
Li Y, Wu M, Ma S, Wu M. ZINBMM: a general mixture model for simultaneous clustering and gene selection using single-cell transcriptomic data. Genome Biol 2023; 24:208. [PMID: 37697330 PMCID: PMC10496184 DOI: 10.1186/s13059-023-03046-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.
Collapse
Affiliation(s)
- Yang Li
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- RSS and China-Re Life Joint Lab on Public Health and Risk Management, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Mingcong Wu
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
| |
Collapse
|
3
|
Li J, Li L, You P, Wei Y, Xu B. Towards artificial intelligence to multi-omics characterization of tumor heterogeneity in esophageal cancer. Semin Cancer Biol 2023; 91:35-49. [PMID: 36868394 DOI: 10.1016/j.semcancer.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/21/2023] [Accepted: 02/28/2023] [Indexed: 03/05/2023]
Abstract
Esophageal cancer is a unique and complex heterogeneous malignancy, with substantial tumor heterogeneity: at the cellular levels, tumors are composed of tumor and stromal cellular components; at the genetic levels, they comprise genetically distinct tumor clones; at the phenotypic levels, cells in distinct microenvironmental niches acquire diverse phenotypic features. This heterogeneity affects almost every process of esophageal cancer progression from onset to metastases and recurrence, etc. Intertumoral and intratumoral heterogeneity are major obstacles in the treatment of esophageal cancer, but also offer the potential to manipulate the heterogeneity themselves as a new therapeutic strategy. The high-dimensional, multi-faceted characterization of genomics, epigenomics, transcriptomics, proteomics, metabonomics, etc. of esophageal cancer has opened novel horizons for dissecting tumor heterogeneity. Artificial intelligence especially machine learning and deep learning algorithms, are able to make decisive interpretations of data from multi-omics layers. To date, artificial intelligence has emerged as a promising computational tool for analyzing and dissecting esophageal patient-specific multi-omics data. This review provides a comprehensive review of tumor heterogeneity from a multi-omics perspective. Especially, we discuss the novel techniques single-cell sequencing and spatial transcriptomics, which have revolutionized our understanding of the cell compositions of esophageal cancer and allowed us to determine novel cell types. We focus on the latest advances in artificial intelligence in integrating multi-omics data of esophageal cancer. Artificial intelligence-based multi-omics data integration computational tools exert a key role in tumor heterogeneity assessment, which will potentially boost the development of precision oncology in esophageal cancer.
Collapse
Affiliation(s)
- Junyu Li
- Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China; Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Lin Li
- Department of Thoracic Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Peimeng You
- Nanchang University, Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Yiping Wei
- Department of Thoracic Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, Jiangxi, China.
| | - Bin Xu
- Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China.
| |
Collapse
|
4
|
Single-Cell RNAseq Complexity Reduction. Methods Mol Biol 2022; 2584:217-230. [PMID: 36495452 DOI: 10.1007/978-1-0716-2756-3_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
An important step in single-cell RNAseq data analysis is the preparation of the single cell transcription data for cell sub-population partitioning. In this chapter, we describe how to perform complexity reduction for 3' end single-cell RNAseq transcriptomics data.
Collapse
|
5
|
Abstract
rCASC is a modular workflow providing an integrated environment for single-cell RNA-seq (scRNA-Seq) data analysis exploiting Docker containers to achieve functional and computational reproducibility. It was initially developed as an R package usable also through a Java GUI. However, the Java frontend cannot be employed when running rCASC on a remote server, a typical setup due to the significant computational resources commonly needed to analyze scRNA-Seq data.To allow the use of rCASC through a graphical user interface on the client side and to harness the many advantages provided by the Galaxy platform, we have made rCASC available as a Galaxy set of tools, also providing a dedicated public instance of Galaxy named "Galaxy-rCASC." To integrate rCASC into Galaxy, all its functions, originally implemented as a set of Docker containers to maximize reproducibility, have been extensively reworked to become independent from the R package functions that launch them in the original implementation. Furthermore, suitable Galaxy wrappers have been developed for most functions of rCASC. We provide a detailed reference document to the use of Galaxy-rCASC with insights and explanations on the platform functionalities, parameters, and output while guiding the reader through the typical rCASC analysis workflow of a scRNA-Seq dataset.
Collapse
|
6
|
Danielski K. Guidance on Processing the 10x Genomics Single Cell Gene Expression Assay. Methods Mol Biol 2022; 2584:1-28. [PMID: 36495443 DOI: 10.1007/978-1-0716-2756-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The demand for technologies that allow the study of gene expression at single cell resolution continues to increase. One such assay was launched in 2016 by the US-based company 10x Genomics Inc. Utilizing the power of the single cell on a large scale (Zheng et al. Nat Commun 8:14049, 2017)-capturing thousands of cells at once-has shaped life sciences ever since and allowed researchers to discover new insights within their respective fields of study such as oncology, neurobiology, and immunology (among others). Obtaining high-data quality is the key to being able to make these meaningful discoveries, which in turn is directly linked to the quality of the initial cell (or nuclei) suspension that is used to load the 10x Genomics Chromium Single Cell Gene Expression assay. A successful workflow relies on a cell suspension which is fully dissociated, extremely clean, and of high viability. While the workflow itself has been detailed elsewhere (De Simone et al. Methods Mol Biol 1979:87-110, 2019), in this chapter we will focus on the importance of the quality of the initial cell suspension, as well as common mistakes that can occur while running a Single Cell Gene Expression assay. The descriptions of these tips and tricks refer to the current version of the 10x Genomics User Guide (Chromium Single Cell 3' Reagent Kits User Guide (v3.1 Chemistry Dual Index). https://support10xgenomicscom/single-cell-gene-expression/index/doc/user-guide-chromium-single-cell-3-reagent-kits-user-guide-v31-chemistry-dual-index) which can be downloaded from the Support section on the 10x Genomics website (10x Genomics website. https://www10xgenomicscom). These documents and user guides are continuously improved and updated; hence, it is important to regularly check the company's website for the most recent version.
Collapse
|
7
|
Alessandri L, Calogero RA. Functional-Feature-Based Data Reduction Using Sparsely Connected Autoencoders. Methods Mol Biol 2022; 2584:231-240. [PMID: 36495453 DOI: 10.1007/978-1-0716-2756-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) allows for the creation of large collections of individual cells transcriptome. Unsupervised clustering is an essential element for the analysis of these data, and it represents the initial step for the identification of different cell types to investigate the cell subpopulation structure of a biological sample. However, it is possible that the clustering aggregation features do not perfectly match the underlying biology since scRNA-seq data are characterized by high noise. In this chapter, we describe a functional feature-driven data reduction approach, which could provide a better link among cell clusters and their underlying cell biology.
Collapse
Affiliation(s)
- Luca Alessandri
- Molecular Biotechnology Center, University of Torino, Turin, Italy.
| | | |
Collapse
|
8
|
Olivero M, Calogero RA. Single-Cell RNAseq Data QC and Preprocessing. Methods Mol Biol 2022; 2584:205-215. [PMID: 36495451 DOI: 10.1007/978-1-0716-2756-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The first step in single-cell RNAseq data analysis is the evaluation of the overall quality of the cell transcriptome and the preparation of the single-cell transcription data for clustering. In this chapter, we describe one of the possible approaches to perform single-cell data preprocessing for 3' end single-cell RNAseq transcriptomics data.
Collapse
Affiliation(s)
- Martina Olivero
- Department of Oncology, University of Torino, Torino, Italy. .,Candiolo Cancer Institute-FPO, IRCCS, Candiolo, TO, Italy.
| | | |
Collapse
|
9
|
Beccuti M, Calogero RA. Single-Cell RNAseq Clustering. Methods Mol Biol 2022; 2584:241-250. [PMID: 36495454 DOI: 10.1007/978-1-0716-2756-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) allows the creation of large collections of individual cells transcriptome. Unsupervised clustering is an essential element for the analysis of these data, and it represents the initial step for the identification of different cell types to investigate the cell subpopulation organization of a sample. In this chapter, we describe how to approach the clustering of single-cell RNAseq transcriptomics data using various clustering tools, and we provide some information on the limitations affecting the clustering procedure.
Collapse
Affiliation(s)
- Marco Beccuti
- Department of Computer Science, University of Torino, Turin, Italy.
| | | |
Collapse
|
10
|
Antico F, Gai M, Arigoni M. Tissue RNA Integrity in Visium Spatial Protocol (Fresh Frozen Samples). Methods Mol Biol 2022; 2584:191-203. [PMID: 36495450 DOI: 10.1007/978-1-0716-2756-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The transcriptome of a tissue can be acquired both by single-cell RNAseq (scRNA-seq) and by spatial transcriptomics (ST). The dissociation step, which is mandatory in scRNA-seq methods, might lead to the loss of fragile cells and of spatial information, thus limiting the acquisition of the tissue cellular organization. Spatial transcriptomics methods moderate the above-mentioned issues and provide single-cell transcripts detection over an intact fresh frozen tissue section. Visium platform, commercialized from 10× Genomics, provides a whole transcriptome spatial transcriptomics platform, which does not require dedicated instruments, other than those available in any pathology laboratory. In spatial transcriptomics, proper tissue handling is mandatory to preserve the morphological quality of the tissue sections and the integrity of mRNA transcripts. Proper tissue handling is critical for downstream library preparation and sequencing performance. In this chapter, we describe the most critical steps of Visium protocol on fresh frozen tissues and we provide indications on how to interpret the data obtained from the quality control analysis recommended during the workflow.
Collapse
Affiliation(s)
- Federica Antico
- Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Marta Gai
- Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Maddalena Arigoni
- Molecular Biotechnology Center, University of Torino, Torino, Italy.
| |
Collapse
|
11
|
Abstract
The idea behind novel single-cell RNA sequencing (scRNA-seq) pipelines is to isolate single cells through microfluidic approaches and generate sequencing libraries in which the transcripts are tagged to track their cell of origin. Modern scRNA-seq platforms are capable of analyzing up to many thousands of cells in each run. Then, combined with massive high-throughput sequencing producing billions of reads, scRNA-seq allows the assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution.In this chapter, we describe how cell subpopulation discovery algorithms, integrated into rCASC, could be efficiently executed on cloud-HPC infrastructure. To achieve this task, we focus on the StreamFlow framework which provides container-native runtime support for scientific workflows in cloud/HPC environments.
Collapse
|
12
|
Identifying Gene Markers Associated with Cell Subpopulations. Methods Mol Biol 2022; 2584:251-268. [PMID: 36495455 DOI: 10.1007/978-1-0716-2756-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
An important point of the analysis of a single-cell RNA experiment is the identification of the key elements, i.e., genes, characterizing each cell subpopulation cluster. In this chapter, we describe the use of sparsely connected autoencoder, as a tool to convert single-cell clusters in pseudo-RNAseq experiments to be used as input for differential expression analysis, and the use of COMET, as a tool to depict cluster-specific gene markers.
Collapse
|
13
|
Kang M, Oh JH. Editorial of Special Issue "Deep Learning and Machine Learning in Bioinformatics". Int J Mol Sci 2022; 23:ijms23126610. [PMID: 35743052 PMCID: PMC9224509 DOI: 10.3390/ijms23126610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 06/10/2022] [Indexed: 02/04/2023] Open
Abstract
In recent years, deep learning has emerged as a highly active research field, achieving great success in various machine learning areas, including image processing, speech recognition, and natural language processing, and now rapidly becoming a dominant tool in biomedicine [...].
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA;
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Correspondence:
| |
Collapse
|
14
|
Abondio P, De Intinis C, da Silva Gonçalves Vianez Júnior JL, Pace L. SINGLE CELL MULTIOMIC APPROACHES TO DISENTANGLE T CELL HETEROGENEITY. Immunol Lett 2022; 246:37-51. [DOI: 10.1016/j.imlet.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/16/2022] [Accepted: 04/26/2022] [Indexed: 11/29/2022]
|