1
|
Sun Y, Pan Z, Wang Z, Wang H, Wei L, Cui F, Zou Q, Zhang Z. Single-cell transcriptome analysis reveals immune microenvironment changes and insights into the transition from DCIS to IDC with associated prognostic genes. J Transl Med 2024; 22:894. [PMID: 39363164 PMCID: PMC11448450 DOI: 10.1186/s12967-024-05706-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 09/25/2024] [Indexed: 10/05/2024] Open
Abstract
BACKGROUND Ductal carcinoma in situ (DCIS) of the breast is an early stage of breast cancer, and preventing its progression to invasive ductal carcinoma (IDC) is crucial for the early detection and treatment of breast cancer. Although single-cell transcriptome analysis technology has been widely used in breast cancer research, the biological mechanisms underlying the transition from DCIS to IDC remain poorly understood. RESULTS We identified eight cell types through cell annotation, finding significant differences in T cell proportions between DCIS and IDC. Using this as a basis, we performed pseudotime analysis on T cell subpopulations, revealing that differentially expressed genes primarily regulate immune cell migration and modulation. By intersecting WGCNA results of T cells highly correlated with the subtypes and the differentially expressed genes, we identified six key genes: FGFBP2, GNLY, KLRD1, TYROBP, PRF1, and NKG7. Excluding PRF1, the other five genes were significantly associated with overall survival in breast cancer, highlighting their potential as prognostic biomarkers. CONCLUSIONS We identified immune cells that may play a role in the progression from DCIS to IDC and uncovered five key genes that can serve as prognostic markers for breast cancer. These findings provide insights into the mechanisms underlying the transition from DCIS to IDC, offering valuable perspectives for future research. Additionally, our results contribute to a better understanding of the biological processes involved in breast cancer progression.
Collapse
MESH Headings
- Humans
- Single-Cell Analysis
- Female
- Tumor Microenvironment/genetics
- Tumor Microenvironment/immunology
- Gene Expression Profiling
- Prognosis
- Carcinoma, Intraductal, Noninfiltrating/genetics
- Carcinoma, Intraductal, Noninfiltrating/immunology
- Carcinoma, Intraductal, Noninfiltrating/pathology
- Breast Neoplasms/genetics
- Breast Neoplasms/immunology
- Breast Neoplasms/pathology
- Gene Expression Regulation, Neoplastic
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Ductal, Breast/immunology
- Disease Progression
- Transcriptome/genetics
- Single-Cell Gene Expression Analysis
Collapse
Affiliation(s)
- Yidi Sun
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Zhuoyu Pan
- International Business School, Hainan University, Haikou, 570228, China
| | - Ziyi Wang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Haofei Wang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Leyi Wei
- Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China
- School of Informatics, Xiamen University, Xiamen, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China.
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| |
Collapse
|
2
|
Chen M, Zou Q, Qi R, Ding Y. PseU-KeMRF: A Novel Method for Identifying RNA Pseudouridine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1423-1435. [PMID: 38625768 DOI: 10.1109/tcbb.2024.3389094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Pseudouridine is a type of abundant RNA modification that is seen in many different animals and is crucial for a variety of biological functions. Accurately identifying pseudouridine sites within the RNA sequence is vital for the subsequent study of various biological mechanisms of pseudouridine. However, the use of traditional experimental methods faces certain challenges. The development of fast and convenient computational methods is necessary to accurately identify pseudouridine sites from RNA sequence information. To address this, we introduce a novel pseudouridine site prediction model called PseU-KeMRF, which can identify pseudouridine sites in three species, H. sapiens, S. cerevisiae, and M. musculus. Through comprehensive analysis, we selected four RNA coding schemes, including binary feature, position-specific trinucleotide propensity based on single strand (PSTNPss), nucleotide chemical property (NCP) and pseudo k-tuple composition (PseKNC). Then the support vector machine-recursive feature elimination (SVM-RFE) method was used for feature selection and the feature subset was optimized. Finally, the best feature subsets are input into the kernel based on multinomial random forests (KeMRF) classifier for cross-validation and independent testing. As a new classification method, compared with the traditional random forest, KeMRF not only improves the node splitting process of decision tree construction based on multinomial distribution, but also combines the easy to interpret kernel method for prediction, which makes the classification performance better. Our results indicate superior predictive performance of PseU-KeMRF over other existing models, which can prove that PseU-KeMRF is a highly competitive predictive model that can successfully identify pseudouridine sites in RNA sequences.
Collapse
|
3
|
Yan C, Zhu Y, Chen M, Yang K, Cui F, Zou Q, Zhang Z. Integration tools for scRNA-seq data and spatial transcriptomics sequencing data. Brief Funct Genomics 2024; 23:295-302. [PMID: 38267084 DOI: 10.1093/bfgp/elae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/26/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024] Open
Abstract
Numerous methods have been developed to integrate spatial transcriptomics sequencing data with single-cell RNA sequencing (scRNA-seq) data. Continuous development and improvement of these methods offer multiple options for integrating and analyzing scRNA-seq and spatial transcriptomics data based on diverse research inquiries. However, each method has its own advantages, limitations and scope of application. Researchers need to select the most suitable method for their research purposes based on the actual situation. This review article presents a compilation of 19 integration methods sourced from a wide range of available approaches, serving as a comprehensive reference for researchers to select the suitable integration method for their specific research inquiries. By understanding the principles of these methods, we can identify their similarities and differences, comprehend their applicability and potential complementarity, and lay the foundation for future method development and understanding. This review article presents 19 methods that aim to integrate scRNA-seq data and spatial transcriptomics data. The methods are classified into two main groups and described accordingly. The article also emphasizes the incorporation of High Variance Genes in annotating various technologies, aiming to obtain biologically relevant information aligned with the intended purpose.
Collapse
Affiliation(s)
- Chaorui Yan
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Yanxu Zhu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Miao Chen
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Kainan Yang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| |
Collapse
|
4
|
Derisoud E, Jiang H, Zhao A, Chavatte-Palmer P, Deng Q. Revealing the molecular landscape of human placenta: a systematic review and meta-analysis of single-cell RNA sequencing studies. Hum Reprod Update 2024; 30:410-441. [PMID: 38478759 PMCID: PMC11215163 DOI: 10.1093/humupd/dmae006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 02/12/2024] [Indexed: 07/02/2024] Open
Abstract
BACKGROUND With increasing significance of developmental programming effects associated with placental dysfunction, more investigations are devoted to improving the characterization and understanding of placental signatures in health and disease. The placenta is a transitory but dynamic organ adapting to the shifting demands of fetal development and available resources of the maternal supply throughout pregnancy. Trophoblasts (cytotrophoblasts, syncytiotrophoblasts, and extravillous trophoblasts) are placental-specific cell types responsible for the main placental exchanges and adaptations. Transcriptomic studies with single-cell resolution have led to advances in understanding the placenta's role in health and disease. These studies, however, often show discrepancies in characterization of the different placental cell types. OBJECTIVE AND RATIONALE We aim to review the knowledge regarding placental structure and function gained from the use of single-cell RNA sequencing (scRNAseq), followed by comparing cell-type-specific genes, highlighting their similarities and differences. Moreover, we intend to identify consensus marker genes for the various trophoblast cell types across studies. Finally, we will discuss the contributions and potential applications of scRNAseq in studying pregnancy-related diseases. SEARCH METHODS We conducted a comprehensive systematic literature review to identify different cell types and their functions at the human maternal-fetal interface, focusing on all original scRNAseq studies on placentas published before March 2023 and published reviews (total of 28 studies identified) using PubMed search. Our approach involved curating cell types and subtypes that had previously been defined using scRNAseq and comparing the genes used as markers or identified as potential new markers. Next, we reanalyzed expression matrices from the six available scRNAseq raw datasets with cell annotations (four from first trimester and two at term), using Wilcoxon rank-sum tests to compare gene expression among studies and annotate trophoblast cell markers in both first trimester and term placentas. Furthermore, we integrated scRNAseq raw data available from 18 healthy first trimester and nine term placentas, and performed clustering and differential gene expression analysis. We further compared markers obtained with the analysis of annotated and raw datasets with the literature to obtain a common signature gene list for major placental cell types. OUTCOMES Variations in the sampling site, gestational age, fetal sex, and subsequent sequencing and analysis methods were observed between the studies. Although their proportions varied, the three trophoblast types were consistently identified across all scRNAseq studies, unlike other non-trophoblast cell types. Notably, no marker genes were shared by all studies for any of the investigated cell types. Moreover, most of the newly defined markers in one study were not observed in other studies. These discrepancies were confirmed by our analysis on trophoblast cell types, where hundreds of potential marker genes were identified in each study but with little overlap across studies. From 35 461 and 23 378 cells of high quality in the first trimester and term placentas, respectively, we obtained major placental cell types, including perivascular cells that previously had not been identified in the first trimester. Importantly, our meta-analysis provides marker genes for major placental cell types based on our extensive curation. WIDER IMPLICATIONS This review and meta-analysis emphasizes the need for establishing a consensus for annotating placental cell types from scRNAseq data. The marker genes identified here can be deployed for defining human placental cell types, thereby facilitating and improving the reproducibility of trophoblast cell annotation.
Collapse
Affiliation(s)
- Emilie Derisoud
- Department of Physiology and Pharmacology, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Hong Jiang
- Department of Physiology and Pharmacology, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Allan Zhao
- Department of Physiology and Pharmacology, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Pascale Chavatte-Palmer
- INRAE, BREED, Université Paris-Saclay, UVSQ, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d’Alfort, BREED, Maisons-Alfort, France
| | - Qiaolin Deng
- Department of Physiology and Pharmacology, Karolinska Institutet, Solna, Stockholm, Sweden
- Center for Molecular Medicine, Karolinska University Hospital, Solna, Stockholm, Sweden
| |
Collapse
|
5
|
Sun Y, Kong L, Huang J, Deng H, Bian X, Li X, Cui F, Dou L, Cao C, Zou Q, Zhang Z. A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. Brief Funct Genomics 2024:elae023. [PMID: 38860675 DOI: 10.1093/bfgp/elae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.
Collapse
Affiliation(s)
- Yidi Sun
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lingling Kong
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi Huang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Hongyan Deng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xinling Bian
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, United States
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 210029, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
6
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
7
|
Duan H, Zhang Y, Qiu H, Fu X, Liu C, Zang X, Xu A, Wu Z, Li X, Zhang Q, Zhang Z, Cui F. Machine learning-based prediction model for distant metastasis of breast cancer. Comput Biol Med 2024; 169:107943. [PMID: 38211382 DOI: 10.1016/j.compbiomed.2024.107943] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 12/10/2023] [Accepted: 01/01/2024] [Indexed: 01/13/2024]
Abstract
BACKGROUND Breast cancer is the most prevalent malignancy in women. Advanced breast cancer can develop distant metastases, posing a severe threat to the life of patients. Because the clinical warning signs of distant metastasis are manifested in the late stage of the disease, there is a need for better methods of predicting metastasis. METHODS First, we screened breast cancer distant metastasis target genes by performing difference analysis and weighted gene co-expression network analysis (WGCNA) on the selected datasets, and performed analyses such as GO enrichment analysis on these target genes. Secondly, we screened breast cancer distant metastasis target genes by LASSO regression analysis and performed correlation analysis and other analyses on these biomarkers. Finally, we constructed several breast cancer distant metastasis prediction models based on Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gradient Boosting Decision Tree (GBDT) model and eXtreme Gradient Boosting (XGBoost) model, and selected the optimal model from them. RESULTS Several 21-gene breast cancer distant metastasis prediction models were constructed, with the best performance of the model constructed based on the random forest model. This model accurately predicted the emergence of distant metastases from breast cancer, with an accuracy of 93.6 %, an F1-score of 88.9 % and an AUC value of 91.3 % on the validation set. CONCLUSION Our findings have the potential to be translated into a point-of-care prognostic analysis to reduce breast cancer mortality.
Collapse
Affiliation(s)
- Hao Duan
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Yu Zhang
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haoye Qiu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Xiuhao Fu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Chunling Liu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Xiaofeng Zang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Anqi Xu
- The First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Ziyue Wu
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Qingchen Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.
| |
Collapse
|
8
|
Jiang J, Pei H, Li J, Li M, Zou Q, Lv Z. FEOpti-ACVP: identification of novel anti-coronavirus peptide sequences based on feature engineering and optimization. Brief Bioinform 2024; 25:bbae037. [PMID: 38366802 PMCID: PMC10939380 DOI: 10.1093/bib/bbae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024] Open
Abstract
Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
9
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
10
|
Shi Q, Chen X, Zhang Z. Decoding Human Biology and Disease Using Single-cell Omics Technologies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:926-949. [PMID: 37739168 PMCID: PMC10928380 DOI: 10.1016/j.gpb.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 05/22/2023] [Accepted: 06/08/2023] [Indexed: 09/24/2023]
Abstract
Over the past decade, advances in single-cell omics (SCO) technologies have enabled the investigation of cellular heterogeneity at an unprecedented resolution and scale, opening a new avenue for understanding human biology and disease. In this review, we summarize the developments of sequencing-based SCO technologies and computational methods, and focus on considerable insights acquired from SCO sequencing studies to understand normal and diseased properties, with a particular emphasis on cancer research. We also discuss the technological improvements of SCO and its possible contribution to fundamental research of the human, as well as its great potential in clinical diagnoses and personalized therapies of human disease.
Collapse
Affiliation(s)
- Qiang Shi
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Xueyan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; Changping Laboratory, Beijing 102206, China.
| |
Collapse
|
11
|
Yu W, Wang C, Shang Z, Tian J. Unveiling novel insights in prostate cancer through single-cell RNA sequencing. Front Oncol 2023; 13:1224913. [PMID: 37746302 PMCID: PMC10514910 DOI: 10.3389/fonc.2023.1224913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 08/15/2023] [Indexed: 09/26/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a cutting-edge technology that provides insights at the individual cell level. In contrast to traditional bulk RNA-seq, which captures gene expression at an average level and may overlook important details, scRNA-seq examines each individual cell as a fundamental unit and is particularly well-suited for identifying rare cell populations. Analogous to a microscope that distinguishes various cell types within a tissue sample, scRNA-seq unravels the heterogeneity and diversity within a single cell species, offering great potential as a leading sequencing method in the future. In the context of prostate cancer (PCa), a disease characterized by significant heterogeneity and multiple stages of progression, scRNA-seq emerges as a powerful tool for uncovering its intricate secrets.
Collapse
Affiliation(s)
| | | | - Zhiqun Shang
- Tianjin Institute of Urology, Second Hospital of Tianjin Medical University, Tianjin, China
| | - Jing Tian
- Tianjin Institute of Urology, Second Hospital of Tianjin Medical University, Tianjin, China
| |
Collapse
|
12
|
Fan R, Ding Y, Zou Q, Yuan L. Multi-view local hyperplane nearest neighbor model based on independence criterion for identifying vesicular transport proteins. Int J Biol Macromol 2023; 247:125774. [PMID: 37437677 DOI: 10.1016/j.ijbiomac.2023.125774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/30/2023] [Accepted: 07/07/2023] [Indexed: 07/14/2023]
Abstract
Vesicular transport proteins participate in various biological processes and play a significant role in the movement of substances within cells. These proteins are associated with numerous human diseases, making their identification particularly important. In this study, we developed a novel strategy for accurately identifying vesicular transport proteins. We developed a novel multi-view classifier called graph-regularized k-local hyperplane distance nearest neighbor model (HSIC-GHKNN), which combines the Hilbert-Schmidt independence criterion (HSIC)-based multi-view learning method with a local hyperplane distance nearest-neighbor classifier. We first extracted protein evolution information using two feature extraction methods, pseudo-position-specific scoring matrix (PsePSSM) and AATP, and addressed dataset imbalance using the Edited Nearest Neighbors (ENN) algorithm. Subsequently, we employed a local hyperplane distance nearest-neighbor classifier for each view identification and added an HSIC term to maintain independence between views. We then assessed the performance of our identification strategy and analyzed the PsePSSM and AATP feature sets to determine the influencing factors of the classification results. The experimental results demonstrate that the accurate and Matthew correlation coefficients of our strategy on the independent test set are 85.8 % and 0.548, respectively. Our approach outperformed existing methods in most evaluation metrics. In addition, the proposed multi-view classification model can easily be applied to similar identification tasks.
Collapse
Affiliation(s)
- Rui Fan
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, Quzhou, Zhejiang 324000, China.
| |
Collapse
|
13
|
Qian Y, Shang T, Guo F, Wang C, Cui Z, Ding Y, Wu H. Identification of DNA-binding protein based multiple kernel model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:13149-13170. [PMID: 37501482 DOI: 10.3934/mbe.2023586] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
DNA-binding proteins (DBPs) play a critical role in the development of drugs for treating genetic diseases and in DNA biology research. It is essential for predicting DNA-binding proteins more accurately and efficiently. In this paper, a Laplacian Local Kernel Alignment-based Restricted Kernel Machine (LapLKA-RKM) is proposed to predict DBPs. In detail, we first extract features from the protein sequence using six methods. Second, the Radial Basis Function (RBF) kernel function is utilized to construct pre-defined kernel metrics. Then, these metrics are combined linearly by weights calculated by LapLKA. Finally, the fused kernel is input to RKM for training and prediction. Independent tests and leave-one-out cross-validation were used to validate the performance of our method on a small dataset and two large datasets. Importantly, we built an online platform to represent our model, which is now freely accessible via http://8.130.69.121:8082/.
Collapse
Affiliation(s)
- Yuqing Qian
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Tingting Shang
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chunliang Wang
- The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhiming Cui
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hongjie Wu
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| |
Collapse
|
14
|
Jiao L, Ren Y, Wang L, Gao C, Wang S, Song T. MulCNN: An efficient and accurate deep learning method based on gene embedding for cell type identification in single-cell RNA-seq data. Front Genet 2023; 14:1179859. [PMID: 37082202 PMCID: PMC10110861 DOI: 10.3389/fgene.2023.1179859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023] Open
Abstract
Advancements in single-cell sequencing research have revolutionized our understanding of cellular heterogeneity and functional diversity through the analysis of single-cell transcriptomes and genomes. A crucial step in single-cell RNA sequencing (scRNA-seq) analysis is identifying cell types. However, scRNA-seq data are often high dimensional and sparse, and manual cell type identification can be time-consuming, subjective, and lack reproducibility. Consequently, analyzing scRNA-seq data remains a computational challenge. With the increasing availability of well-annotated scRNA-seq datasets, advanced methods are emerging to aid in cell type identification by leveraging this information. Deep learning neural networks have great potential for analyzing single-cell data. This paper proposes MulCNN, a multi-level convolutional neural network that uses a unique cell type-specific gene expression feature extraction method. This method extracts critical features through multi-scale convolution while filtering noise. Extensive testing using datasets from various species and comparisons with popular classification methods show that MulCNN has outstanding performance and offers a new and scalable direction for scRNA-seq analysis.
Collapse
Affiliation(s)
- Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Yongqi Ren
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Lulu Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Changnan Gao
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Madrid, Spain
- *Correspondence: Tao Song,
| |
Collapse
|
15
|
Zhang J, Liu X, Huang Z, Wu C, Zhang F, Han A, Stalin A, Lu S, Guo S, Huang J, Liu P, Shi R, Zhai Y, Chen M, Zhou W, Bai M, Wu J. T cell-related prognostic risk model and tumor immune environment modulation in lung adenocarcinoma based on single-cell and bulk RNA sequencing. Comput Biol Med 2023; 152:106460. [PMID: 36565482 DOI: 10.1016/j.compbiomed.2022.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/06/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND T cells are present in all stages of tumor formation and play an important role in the tumor microenvironment. We aimed to explore the expression profile of T cell marker genes, constructed a prognostic risk model based on these genes in Lung adenocarcinoma (LUAD), and investigated the link between this risk model and the immunotherapy response. METHODS We obtained the single-cell sequencing data of LUAD from the literature, and screened out 6 tissue biopsy samples, including 32,108 cells from patients with non-small cell lung cancer, to identify T cell marker genes in LUAD. Combined with TCGA database, a prognostic risk model based on T-cell marker gene was constructed, and the data from GEO database was used for verification. We also investigated the association between this risk model and immunotherapy response. RESULTS Based on scRNA-seq data 1839 T-cell marker genes were identified, after which a risk model consisting of 9 gene signatures for prognosis was constructed in combination with the TCGA dataset. This risk model divided patients into high-risk and low-risk groups based on overall survival. The multivariate analysis demonstrated that the risk model was an independent prognostic factor. Analysis of immune profiles showed that high-risk groups presented discriminative immune-cell infiltrations and immune-suppressive states. Risk scores of the model were closely correlated with Linoleic acid metabolism, intestinal immune network for IgA production and drug metabolism cytochrome P450. CONCLUSION Our study proposed a novel prognostic risk model based on T cell marker genes for LUAD patients. The survival of LUAD patients as well as treatment outcomes may be accurately predicted by the prognostic risk model, and make the high-risk population present different immune cell infiltration and immunosuppression state.
Collapse
Affiliation(s)
- Jingyuan Zhang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Xinkui Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Zhihong Huang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Chao Wu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Fanqin Zhang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Aiqing Han
- School of Management, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Antony Stalin
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Shan Lu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Siyu Guo
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Jiaqi Huang
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Pengyun Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Rui Shi
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Yiyan Zhai
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Meilin Chen
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Wei Zhou
- Pharmacy Department, China-Japan Friendship Hospital, Beijing, 100029, China.
| | - Meirong Bai
- Key Laboratory of Mongolian Medicine Research and Development Engineering, Ministry of Education, Tongliao, 028000, China.
| | - Jiarui Wu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, 100029, China.
| |
Collapse
|
16
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
Zeng L, Yang K, Zhang T, Zhu X, Hao W, Chen H, Ge J. Research progress of single-cell transcriptome sequencing in autoimmune diseases and autoinflammatory disease: A review. J Autoimmun 2022; 133:102919. [PMID: 36242821 DOI: 10.1016/j.jaut.2022.102919] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/16/2022] [Accepted: 09/19/2022] [Indexed: 12/07/2022]
Abstract
Autoimmunity refers to the phenomenon that the body's immune system produces antibodies or sensitized lymphocytes to its own tissues to cause an immune response. Immune disorders caused by autoimmunity can mediate autoimmune diseases. Autoimmune diseases have complicated pathogenesis due to the many types of cells involved, and the mechanism is still unclear. The emergence of single-cell research technology can solve the problem that ordinary transcriptome technology cannot be accurate to cell type. It provides unbiased results through independent analysis of cells in tissues and provides more mRNA information for identifying cell subpopulations, which provides a novel approach to study disruption of immune tolerance and disturbance of pro-inflammatory pathways on a cellular basis. It may fundamentally change the understanding of molecular pathways in the pathogenesis of autoimmune diseases and develop targeted drugs. Single-cell transcriptome sequencing (scRNA-seq) has been widely applied in autoimmune diseases, which provides a powerful tool for demonstrating the cellular heterogeneity of tissues involved in various immune inflammations, identifying pathogenic cell populations, and revealing the mechanism of disease occurrence and development. This review describes the principles of scRNA-seq, introduces common sequencing platforms and practical procedures, and focuses on the progress of scRNA-seq in 41 autoimmune diseases, which include 9 systemic autoimmune diseases and autoinflammatory diseases (rheumatoid arthritis, systemic lupus erythematosus, etc.) and 32 organ-specific autoimmune diseases (5 Skin diseases, 3 Nervous system diseases, 4 Eye diseases, 2 Respiratory system diseases, 2 Circulatory system diseases, 6 Liver, Gallbladder and Pancreas diseases, 2 Gastrointestinal system diseases, 3 Muscle, Bones and joint diseases, 3 Urinary system diseases, 2 Reproductive system diseases). This review also prospects the molecular mechanism targets of autoimmune diseases from the multi-molecular level and multi-dimensional analysis combined with single-cell multi-omics sequencing technology (such as scRNA-seq, Single cell ATAC-seq and single cell immune group library sequencing), which provides a reference for further exploring the pathogenesis and marker screening of autoimmune diseases and autoimmune inflammatory diseases in the future.
Collapse
Affiliation(s)
- Liuting Zeng
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Kailin Yang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Tianqing Zhang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China
| | - Xiaofei Zhu
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Wensa Hao
- Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Hua Chen
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Jinwen Ge
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China; Hunan Academy of Chinese Medicine, Changsha, China.
| |
Collapse
|
18
|
Zhao S, Zhang L, Liu X. AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction. FRONTIERS OF COMPUTER SCIENCE 2022; 17:173902. [PMID: 36320820 PMCID: PMC9607720 DOI: 10.1007/s11704-022-2011-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/15/2022] [Indexed: 06/16/2023]
Abstract
UNLABELLED Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous populations. Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material, the technical noise and biological variation are inevitably introduced into experimental process, resulting in high dropout events, which greatly hinder the downstream analysis. Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data, we propose a customized autoencoder based on a two-part-generalized-gamma distribution (AE-TPGG) for scRNA-seq data analysis, which takes mixed discrete-continuous random variables of scRNA-seq data into account using a two-part model and utilizes the generalized gamma (GG) distribution, for fitting the positive and right-skewed continuous data. The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes. In addition to the ability of achieving low-dimensional representation, the AE-TPGG model also provides a denoised imputation according to statistical characteristic of gene expression. Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses. ELECTRONIC SUPPLEMENTARY MATERIAL Supplementary material is available in the online version of this article at 10.1007/s11704-022-2011-y.
Collapse
Affiliation(s)
- Shuchang Zhao
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106 China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023 China
| | - Li Zhang
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106 China
- College of Computer Science and Technology, Nanjing Forestry University, Nanjing, 210037 China
| | - Xuejun Liu
- MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106 China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023 China
| |
Collapse
|
19
|
Wang R, Peng G, Tam PPL, Jing N. Integration of computational analysis and spatial transcriptomics in single-cell study. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00084-5. [PMID: 35901961 PMCID: PMC10372908 DOI: 10.1016/j.gpb.2022.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 06/08/2022] [Accepted: 06/19/2022] [Indexed: 04/08/2023]
Abstract
Recent advances of single-cell transcriptomics technologies and allied computational methodologies have revolutionized molecular cell biology. Meanwhile, pioneering explorations in spatial transcriptomics have opened avenues to address fundamental biological questions in health and diseases. Here, we review the technical attributes of single-cell RNA sequencing and spatial transcriptomics, and the core concepts of computational data analysis. We further highlight the challenges in the application of data integration methodologies and the interpretation of the biological context of the findings.
Collapse
Affiliation(s)
- Ran Wang
- State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Patrick P L Tam
- Embryology Research Unit, Children's Medical Research Institute, University of Sydney, Sydney, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2145, Australia
| | - Naihe Jing
- State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China; Guangzhou Laboratory, Guangzhou 510005, China; CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
20
|
Liu G, Li M, Wang H, Lin S, Xu J, Li R, Tang M, Li C. D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data. Front Genet 2022; 13:912711. [PMID: 35846121 PMCID: PMC9284269 DOI: 10.3389/fgene.2022.912711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 04/25/2022] [Indexed: 12/02/2022] Open
Abstract
A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm.
Collapse
Affiliation(s)
- Guoyun Liu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Manzhi Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
- *Correspondence: Manzhi Li,
| | - Hongtao Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Shijun Lin
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Ruixi Li
- Geneis Beijing Co., Ltd., Beijing, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | - Chun Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
21
|
Zhang Z, Cui F, Su W, Dou L, Xu A, Cao C, Zou Q. webSCST: an interactive web application for single-cell RNA-sequencing data and spatial transcriptomic data integration. Bioinformatics 2022; 38:3488-3489. [PMID: 35604082 DOI: 10.1093/bioinformatics/btac350] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/26/2022] [Accepted: 05/18/2022] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Integrative analysis of single-cell RNA-sequencing (scRNA-seq) data with spatial data for the same species and organ would provide each cell sample with a predictive spatial location, which would facilitate biological study. However, publicly available spatial sequencing datasets for specific species and organs are rare and are often displayed in different formats.In this study we introduce a new web-based scRNA-seq analysis tool, webSCST, that integrates well-organized spatial transcriptome sequencing datasets categorized by species and organs, provides a user-friendly interface for raw single-cell processing with popular integration methods, and allows users to submit their raw scRNA-seq data once to obtain predicted spatial locations for each cell type. AVAILABILITY AND IMPLEMENTATION webSCST implemented in shiny with all major browsers supported is available at http://www.webscst.com. webSCST is also freely available as an R package at https://github.com/swsoyee/webSCST.
Collapse
Affiliation(s)
- Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou, 570228, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Wei Su
- Yahoo Japan Corporation, Tokyo, 102-8282, Japan
| | - Lijun Dou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Anqi Xu
- The First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250014, China
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| |
Collapse
|
22
|
Jia Q, Chu H, Jin Z, Long H, Zhu B. High-throughput single-сell sequencing in cancer research. Signal Transduct Target Ther 2022; 7:145. [PMID: 35504878 PMCID: PMC9065032 DOI: 10.1038/s41392-022-00990-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/23/2022] [Accepted: 04/08/2022] [Indexed: 12/22/2022] Open
Abstract
With advances in sequencing and instrument technology, bioinformatics analysis is being applied to batches of massive cells at single-cell resolution. High-throughput single-cell sequencing can be utilized for multi-omics characterization of tumor cells, stromal cells or infiltrated immune cells to evaluate tumor progression, responses to environmental perturbations, heterogeneous composition of the tumor microenvironment, and complex intercellular interactions between these factors. Particularly, single-cell sequencing of T cell receptors, alone or in combination with single-cell RNA sequencing, is useful in the fields of tumor immunology and immunotherapy. Clinical insights obtained from single-cell analysis are critically important for exploring the biomarkers of disease progression or antitumor treatment, as well as for guiding precise clinical decision-making for patients with malignant tumors. In this review, we summarize the clinical applications of single-cell sequencing in the fields of tumor cell evolution, tumor immunology, and tumor immunotherapy. Additionally, we analyze the tumor cell response to antitumor treatment, heterogeneity of the tumor microenvironment, and response or resistance to immune checkpoint immunotherapy. The limitations of single-cell analysis in cancer research are also discussed.
Collapse
Affiliation(s)
- Qingzhu Jia
- Institute of Cancer, Xinqiao Hospital, Army Medical University, Chongqing, 400037, China.,Chongqing Key Laboratory of Immunotherapy, Chongqing, 400037, China
| | - Han Chu
- Institute of Cancer, Xinqiao Hospital, Army Medical University, Chongqing, 400037, China.,Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resources and Eco-Environment, College of Life Sciences, Sichuan University, Chengdu, 610064, China
| | - Zheng Jin
- Research Institute, GloriousMed Clinical Laboratory Co., Ltd, Shanghai, 201318, China
| | - Haixia Long
- Institute of Cancer, Xinqiao Hospital, Army Medical University, Chongqing, 400037, China. .,Chongqing Key Laboratory of Immunotherapy, Chongqing, 400037, China.
| | - Bo Zhu
- Institute of Cancer, Xinqiao Hospital, Army Medical University, Chongqing, 400037, China. .,Chongqing Key Laboratory of Immunotherapy, Chongqing, 400037, China.
| |
Collapse
|
23
|
Gan S, Deng H, Qiu Y, Alshahrani M, Liu S. DSAE-Impute: Learning Discriminative Stacked Autoencoders for Imputing Single-cell RNA-seq Data. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220330151024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
In this research, we aim to propose an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process, so as to accurately predict missing values.
Background:
Due to the limited amount of mRNA in single-cell, there are always many missing values in scRNA-seq data, which makes it impossible to accurately quantify the expression of single-cell RNA. The dropout phenomenon makes it impossible to detect the truly expressed genes in some cells, which greatly affects the downstream analysis on scRNA-seq data, such as cell cluster analysis and cell development trajectories.
Objective:
In this research, we aim to propose an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process, so as to accurately predict missing values.
Method:
We propose a novel deep learning model based on the discriminative stacked autoencoders to impute the missing values in scRNA-seq data, named DSAE-Impute. DSAE-Impute embeds the discriminative cell similarity to perfect the feature representation of stacked autoencoders, and comprehensively learns the scRNA-seq data expression pattern through layer-by-layer training to achieve accurate imputation.
Result:
We have systematically evaluated the performance of DSAE-Impute in the simulation and real datasets. The experimental results demonstrate that DSAE-Impute significantly improves downstream analysis, and its imputation results are more accurate compared with other state-of-the-art imputation methods.
Conclusion:
Extensive experiments show that compared with other state-of-the-art methods, the imputation results of DSAE-Impute on simulated and real datasets are more accurate and helpful for downstream analysis.
Collapse
Affiliation(s)
- Shengfeng Gan
- College of Computer, Hubei University of Education, Wuhan, China
| | - Huan Deng
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | | | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
24
|
Lall S, Ray S, Bandyopadhyay S. A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data. PLoS Comput Biol 2022; 18:e1009600. [PMID: 35271564 PMCID: PMC8979455 DOI: 10.1371/journal.pcbi.1009600] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 04/04/2022] [Accepted: 01/27/2022] [Indexed: 11/18/2022] Open
Abstract
Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv (copula based graph convolution network for single clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. One of the important aspects of single cell downstream analysis is to classify cells into subpopulations. This immediately leads to clustering of cells into homogeneous groups, which faces lots of issues due to (i) small amount of starting RNA, (ii) cell-to-cell variability, (iii) technical noise incorporated within the single cell sequencing technology, and (iv) unavailability of discriminating selected/extracted genes (features) in the preprocessing step of downstream analysis. We proposed sc-CGconv, stepwise feature extraction and clustering framework, which leverage landmark advantage of copula and graph convolution network in single-cell analysis domain. sc-CGconv outperforms the state-of-the-art feature selection/extraction methods in the preprocessing steps, performs well with small sample size data, can preserve the cell-to-cell variability within the extracted features, provides a topology-preserving embedding of cells in low dimensional space. sc-CGconv therefore successfully addresses the above-mentioned key challenges.
Collapse
Affiliation(s)
- Snehalika Lall
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | - Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata, India
- Health Analytics Network, Pittsburgh, Pennsylvania, United States of America
- * E-mail: , (SR); (SB)
| | | |
Collapse
|
25
|
Leote AC, Wu X, Beyer A. Regulatory network-based imputation of dropouts in single-cell RNA sequencing data. PLoS Comput Biol 2022; 18:e1009849. [PMID: 35176023 PMCID: PMC8890719 DOI: 10.1371/journal.pcbi.1009849] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 03/02/2022] [Accepted: 01/18/2022] [Indexed: 01/07/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Further, it is unknown if all genes equally benefit from imputation or which imputation method works best for a given gene. Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Further, the cell-to-cell variation of 11.3% to 48.8% of the genes could not be adequately imputed by any of the methods that we tested. In those cases gene expression levels were best predicted by the mean expression across all cells, i.e. assuming no measurable expression variation between cells. These findings suggest that different imputation methods are optimal for different genes. We thus implemented an R-package called ADImpute (available via Bioconductor https://bioconductor.org/packages/release/bioc/html/ADImpute.html) that automatically determines the best imputation method for each gene in a dataset. Our work represents a paradigm shift by demonstrating that there is no single best imputation method. Instead, we propose that imputation should maximally exploit external information and be adapted to gene-specific features, such as expression level and expression variation across cells.
Collapse
Affiliation(s)
- Ana Carolina Leote
- Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Cologne, Germany
- University of Cologne, Faculty of Medicine and Cologne University Hospital, Cologne, Germany
| | - Xiaohui Wu
- Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Cologne, Germany
- Department of Automation, Xiamen University, Xiamen, China
- Pasteurien College, Soochow University, Suzhou, China
| | - Andreas Beyer
- Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Cologne, Germany
- University of Cologne, Faculty of Medicine and Cologne University Hospital, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), Cologne, Germany
- Cologne School for Computational Biology & Center for Data Science and Simulation, University of Cologne, Cologne, Germany
- * E-mail:
| |
Collapse
|
26
|
Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X. Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics 2022; 22:e2100197. [PMID: 35112474 DOI: 10.1002/pmic.202100197] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 11/09/2022]
Abstract
With the development of artificial intelligence technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein-DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein-DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
27
|
Cell Heterogeneity Analysis in Single-Cell RNA-seq Data Using Mixture Exponential Graph and Markov Random Field Model. BIOMED RESEARCH INTERNATIONAL 2021; 2021:9919080. [PMID: 34095314 PMCID: PMC8164540 DOI: 10.1155/2021/9919080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 04/30/2021] [Indexed: 11/18/2022]
Abstract
Advanced single-cell profiling technologies promote exploration of cell heterogeneity, and clustering of single-cell RNA (scRNA-seq) data enables discovery of coexpression genes and network relationships between genes. In particular, single-cell profiling of circulating tumor cells (CTCs) can provide unique insights into tumor heterogeneity (including in triple-negative breast cancer (TNBC)), while scRNA-seq leads to better understanding of subclonal architecture and biological function. Despite numerous reports suggesting a direct correlation between circulating tumor cells (CTCs) and poor clinical outcomes, few studies have provided a thorough heterogeneity characterization of CTCs. In addition, TNBC is a disease with not only intertumor but also intratumor heterogeneity and represents various biological distinct subgroups that may have relationships with immune functions that are not clearly established yet. In this article, we introduce a new scheme for detecting genotypic characterization of single-cell heterogeneities and apply it to CTC and TNBC single-cell RNA-seq data. First, we use an existing mixture exponential family graph model to partition the cell-cell network; then, with the Markov random field model, we obtain more flexible network rewiring. Finally, we find the cell heterogeneity and network relationships according to different high coexpression gene modules in different cell subsets. Our results demonstrate that this scheme provides a reasonable and effective way to model different cell clusters and different biological enrichment gene clusters. Thus, using different internal coexpression genes of different cell clusters, we can infer the differences in tumor composition and diversity.
Collapse
|
28
|
Zhang Z, Cui F, Lin C, Zhao L, Wang C, Zou Q. Critical downstream analysis steps for single-cell RNA sequencing data. Brief Bioinform 2021; 22:6210064. [PMID: 33822873 DOI: 10.1093/bib/bbab105] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 02/20/2021] [Accepted: 03/09/2021] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.
Collapse
Affiliation(s)
- Zilong Zhang
- University of Electronic Science and Technology of China
| | | | | | | | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|