1
|
Paas-Oliveros E, Hernández-Lemus E, de Anda-Jáuregui G. Computational single cell oncology: state of the art. Front Genet 2023; 14:1256991. [PMID: 38028624 PMCID: PMC10663273 DOI: 10.3389/fgene.2023.1256991] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Single cell computational analysis has emerged as a powerful tool in the field of oncology, enabling researchers to decipher the complex cellular heterogeneity that characterizes cancer. By leveraging computational algorithms and bioinformatics approaches, this methodology provides insights into the underlying genetic, epigenetic and transcriptomic variations among individual cancer cells. In this paper, we present a comprehensive overview of single cell computational analysis in oncology, discussing the key computational techniques employed for data processing, analysis, and interpretation. We explore the challenges associated with single cell data, including data quality control, normalization, dimensionality reduction, clustering, and trajectory inference. Furthermore, we highlight the applications of single cell computational analysis, including the identification of novel cell states, the characterization of tumor subtypes, the discovery of biomarkers, and the prediction of therapy response. Finally, we address the future directions and potential advancements in the field, including the development of machine learning and deep learning approaches for single cell analysis. Overall, this paper aims to provide a roadmap for researchers interested in leveraging computational methods to unlock the full potential of single cell analysis in understanding cancer biology with the goal of advancing precision oncology. For this purpose, we also include a notebook that instructs on how to apply the recommended tools in the Preprocessing and Quality Control section.
Collapse
Affiliation(s)
- Ernesto Paas-Oliveros
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Investigadores por Mexico, Conahcyt, Mexico City, Mexico
| |
Collapse
|
2
|
Ng GYL, Tan SC, Ong CS. On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data. PLoS One 2023; 18:e0292961. [PMID: 37856458 PMCID: PMC10586655 DOI: 10.1371/journal.pone.0292961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/03/2023] [Indexed: 10/21/2023] Open
Abstract
Cell type identification is one of the fundamental tasks in single-cell RNA sequencing (scRNA-seq) studies. It is a key step to facilitate downstream interpretations such as differential expression, trajectory inference, etc. scRNA-seq data contains technical variations that could affect the interpretation of the cell types. Therefore, gene selection, also known as feature selection in data science, plays an important role in selecting informative genes for scRNA-seq cell type identification. Generally speaking, feature selection methods are categorized into filter-, wrapper-, and embedded-based approaches. From the existing literature, methods from filter- and embedded-based approaches are widely applied in scRNA-seq gene selection tasks. The wrapper-based method that gives promising results in other fields has yet been extensively utilized for selecting gene features from scRNA-seq data; in addition, most of the existing wrapper methods used in this field are clustering instead of classification-based. With a large number of annotated data available today, this study applied a classification-based approach as an alternative to the clustering-based wrapper method. In our work, a quantum-inspired differential evolution (QDE) wrapped with a classification method was introduced to select a subset of genes from twelve well-known scRNA-seq transcriptomic datasets to identify cell types. In particular, the QDE was combined with different machine-learning (ML) classifiers namely logistic regression, decision tree, support vector machine (SVM) with linear and radial basis function kernels, as well as extreme learning machine. The linear SVM wrapped with QDE, namely QDE-SVM, was chosen by referring to the feature selection results from the experiment. QDE-SVM showed a superior cell type classification performance among QDE wrapping with other ML classifiers as well as the recent wrapper methods (i.e., FSCAM, SSD-LAHC, MA-HS, and BSF). QDE-SVM achieved an average accuracy of 0.9559, while the other wrapper methods achieved average accuracies in the range of 0.8292 to 0.8872.
Collapse
Affiliation(s)
- Grace Yee Lin Ng
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| | - Shing Chiang Tan
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| | - Chia Sui Ong
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| |
Collapse
|
3
|
Lubatti G, Stock M, Iturbide A, Ruiz Tejada Segura ML, Riepl M, Tyser RCV, Danese A, Colomé-Tatché M, Theis FJ, Srinivas S, Torres-Padilla ME, Scialdone A. CIARA: a cluster-independent algorithm for identifying markers of rare cell types from single-cell sequencing data. Development 2023; 150:dev201264. [PMID: 37294170 DOI: 10.1242/dev.201264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/25/2023] [Indexed: 05/18/2023]
Abstract
A powerful feature of single-cell genomics is the possibility of identifying cell types from their molecular profiles. In particular, identifying novel rare cell types and their marker genes is a key potential of single-cell RNA sequencing. Standard clustering approaches perform well in identifying relatively abundant cell types, but tend to miss rarer cell types. Here, we have developed CIARA (Cluster Independent Algorithm for the identification of markers of RAre cell types), a cluster-independent computational tool designed to select genes that are likely to be markers of rare cell types. Genes selected by CIARA are subsequently integrated with common clustering algorithms to single out groups of rare cell types. CIARA outperforms existing methods for rare cell type detection, and we use it to find previously uncharacterized rare populations of cells in a human gastrula and among mouse embryonic stem cells treated with retinoic acid. Moreover, CIARA can be applied more generally to any type of single-cell omic data, thus allowing the identification of rare cells across multiple data modalities. We provide implementations of CIARA in user-friendly packages available in R and Python.
Collapse
Affiliation(s)
- Gabriele Lubatti
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, D-85354 Freising, Germany
| | - Ane Iturbide
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
| | - Mayra L Ruiz Tejada Segura
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Melina Riepl
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Richard C V Tyser
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
| | - Anna Danese
- Biomedical Center Munich (BMC), Physiological Genomics, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Maria Colomé-Tatché
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, D-85748 Munich, Germany
| | - Shankar Srinivas
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, UK
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Faculty of Biology, Ludwig-Maximilians University, D-82152 Munich, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| |
Collapse
|
4
|
Davalos OA, Heydari AA, Fertig EJ, Sindi SS, Hoyer KK. Boosting Single-Cell RNA Sequencing Analysis with Simple Neural Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542760. [PMID: 37398136 PMCID: PMC10312486 DOI: 10.1101/2023.05.29.542760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
A limitation of current deep learning (DL) approaches for single-cell RNA sequencing (scRNAseq) analysis is the lack of interpretability. Moreover, existing pipelines are designed and trained for specific tasks used disjointly for different stages of analysis. We present scANNA, a novel interpretable DL model for scRNAseq studies that leverages neural attention to learn gene associations. After training, the learned gene importance (interpretability) is used to perform downstream analyses (e.g., global marker selection and cell-type classification) without retraining. ScANNA's performance is comparable to or better than state-of-the-art methods designed and trained for specific standard scRNAseq analyses even though scANNA was not trained for these tasks explicitly. ScANNA enables researchers to discover meaningful results without extensive prior knowledge or training separate task-specific models, saving time and enhancing scRNAseq analyses.
Collapse
Affiliation(s)
- Oscar A. Davalos
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
| | - A. Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Katrina K. Hoyer
- Health Sciences Research Institute, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| |
Collapse
|
5
|
Deng T, Chen S, Zhang Y, Xu Y, Feng D, Wu H, Sun X. A cofunctional grouping-based approach for non-redundant feature gene selection in unannotated single-cell RNA-seq analysis. Brief Bioinform 2023; 24:bbad042. [PMID: 36754847 PMCID: PMC10025445 DOI: 10.1093/bib/bbad042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/05/2022] [Accepted: 01/18/2023] [Indexed: 02/10/2023] Open
Abstract
Feature gene selection has significant impact on the performance of cell clustering in single-cell RNA sequencing (scRNA-seq) analysis. A well-rounded feature selection (FS) method should consider relevance, redundancy and complementarity of the features. Yet most existing FS methods focus on gene relevance to the cell types but neglect redundancy and complementarity, which undermines the cell clustering performance. We develop a novel computational method GeneClust to select feature genes for scRNA-seq cell clustering. GeneClust groups genes based on their expression profiles, then selects genes with the aim of maximizing relevance, minimizing redundancy and preserving complementarity. It can work as a plug-in tool for FS with any existing cell clustering method. Extensive benchmark results demonstrate that GeneClust significantly improve the clustering performance. Moreover, GeneClust can group cofunctional genes in biological process and pathway into clusters, thus providing a means of investigating gene interactions and identifying potential genes relevant to biological characteristics of the dataset. GeneClust is freely available at https://github.com/ToryDeng/scGeneClust.
Collapse
Affiliation(s)
- Tao Deng
- School of Data Science, The Chinese University of Hong Kong—Shenzhen, Guangdong, China
| | - Siyu Chen
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China
| | - Ying Zhang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China
| | - Yuanbin Xu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China
| | - Da Feng
- School of Pharmacy, Tongji Medical College, Huazhong University of Sciences and Technology, Hubei, China
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, GA, USA
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Xiaobo Sun
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China
| |
Collapse
|
6
|
Ascensión AM, Araúzo-Bravo MJ, Izeta A. The need to reassess single-cell RNA sequencing datasets: the importance of biological sample processing. F1000Res 2021; 10:767. [PMID: 35399227 PMCID: PMC8984215 DOI: 10.12688/f1000research.54864.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2022] [Indexed: 08/27/2023] Open
Abstract
Background: The advent of single-cell RNA sequencing (scRNAseq) and additional single-cell omics technologies have provided scientists with unprecedented tools to explore biology at cellular resolution. However, reaching an appropriate number of good quality reads per cell and reasonable numbers of cells within each of the populations of interest are key to infer relevant conclusions about the underlying biology of the dataset. For these reasons, scRNAseq studies are constantly increasing the number of cells analysed and the granularity of the resultant transcriptomics analyses. Methods: We aimed to identify previously described fibroblast subpopulations in healthy adult human skin by using the largest dataset published to date (528,253 sequenced cells) and an unsupervised population-matching algorithm. Results: Our reanalysis of this landmark resource demonstrates that a substantial proportion of cell transcriptomic signatures may be biased by cellular stress and response to hypoxic conditions. Conclusions: We postulate that careful design of experimental conditions is needed to avoid long processing times of biological samples. Additionally, computation of large datasets might undermine the extent of the analysis, possibly due to long processing times.
Collapse
Affiliation(s)
- Alex M. Ascensión
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain
- Tissue Engineering Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain
| | - Marcos J. Araúzo-Bravo
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain
- Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
- CIBER of Frailty and Healthy Aging (CIBERfes), Madrid, Spain
- Computational Biology and Bioinformatics Group, Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Ander Izeta
- Tissue Engineering Group, Biodonostia Health Research Institute, San Sebastian, Gipuzkoa, 20014, Spain
- Department of Biomedical Engineering and Science, Tecnun-University of Navarra, School of Engineering, San Sebastian, Gipuzkoa, 20009, Spain
| |
Collapse
|