1
|
Xia Y, Liu Y, Li T, He S, Chang H, Wang Y, Zhang Y, Ge W. Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data. Methods 2024; 228:12-21. [PMID: 38759908 DOI: 10.1016/j.ymeth.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 04/28/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024] Open
Abstract
Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.
Collapse
Affiliation(s)
- Yucheng Xia
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, 610209, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Tianhao Li
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Sihan He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Hong Chang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yaqing Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Wenyi Ge
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China.
| |
Collapse
|
2
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
3
|
Zhang W, Yu R, Xu Z, Li J, Gao W, Jiang M, Dai Q. scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene-gene interactions. BMC Genomics 2024; 25:423. [PMID: 38684946 PMCID: PMC11059774 DOI: 10.1186/s12864-024-10286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Single-cell clustering has played an important role in exploring the molecular mechanisms about cell differentiation and human diseases. Due to highly-stochastic transcriptomics data, accurate detection of cell types is still challenged, especially for RNA-sequencing data from human beings. In this case, deep neural networks have been increasingly employed to mine cell type specific patterns and have outperformed statistic approaches in cell clustering. RESULTS Using cross-correlation to capture gene-gene interactions, this study proposes the scCompressSA method to integrate topological patterns from scRNA-seq data, with support of self-attention (SA) based coefficient compression (CC) block. This SA-based CC block is able to extract and employ static gene-gene interactions from scRNA-seq data. This proposed scCompressSA method has enhanced clustering accuracy in multiple benchmark scRNA-seq datasets by integrating topological and temporal features. CONCLUSION Static gene-gene interactions have been extracted as temporal features to boost clustering performance in single-cell clustering For the scCompressSA method, dual-channel SA based CC block is able to integrate topological features and has exhibited extraordinary detection accuracy compared with previous clustering approaches that only employ temporal patterns.
Collapse
Affiliation(s)
- Wei Zhang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Ruochen Yu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Zeqi Xu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Junnan Li
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Wenhao Gao
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Mingfeng Jiang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| | - Qi Dai
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| |
Collapse
|
4
|
Wang H, Lin K, Zhang Q, Shi J, Song X, Wu J, Zhao C, He K. HyperTMO: a trusted multi-omics integration framework based on hypergraph convolutional network for patient classification. Bioinformatics 2024; 40:btae159. [PMID: 38530977 PMCID: PMC11212491 DOI: 10.1093/bioinformatics/btae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 02/02/2024] [Accepted: 03/24/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION The rapid development of high-throughput biomedical technologies can provide researchers with detailed multi-omics data. The multi-omics integrated analysis approach based on machine learning contributes a more comprehensive perspective to human disease research. However, there are still significant challenges in representing single-omics data and integrating multi-omics information. RESULTS This article presents HyperTMO, a Trusted Multi-Omics integration framework based on Hypergraph convolutional network for patient classification. HyperTMO constructs hypergraph structures to represent the association between samples in single-omics data, then evidence extraction is performed by hypergraph convolutional network, and multi-omics information is integrated at an evidence level. Last, we experimentally demonstrate that HyperTMO outperforms other state-of-the-art methods in breast cancer subtype classification and Alzheimer's disease classification tasks using multi-omics data from TCGA (BRCA) and ROSMAP datasets. Importantly, HyperTMO is the first attempt to integrate hypergraph structure, evidence theory, and multi-omics integration for patient classification. Its accurate and robust properties bring great potential for applications in clinical diagnosis. AVAILABILITY AND IMPLEMENTATION HyperTMO and datasets are publicly available at https://github.com/ippousyuga/HyperTMO.
Collapse
Affiliation(s)
- Haohua Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Kai Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jinlong Shi
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Xinyu Song
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Jue Wu
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Chenghui Zhao
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Kunlun He
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| |
Collapse
|
5
|
Huynh T, Cang Z. Topological and geometric analysis of cell states in single-cell transcriptomic data. Brief Bioinform 2024; 25:bbae176. [PMID: 38632952 PMCID: PMC11024518 DOI: 10.1093/bib/bbae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 01/29/2024] [Accepted: 03/24/2024] [Indexed: 04/19/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.
Collapse
Affiliation(s)
- Tram Huynh
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| | - Zixuan Cang
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| |
Collapse
|
6
|
Ding J, Liu R, Wen H, Tang W, Li Z, Venegas J, Su R, Molho D, Jin W, Wang Y, Lu Q, Li L, Zuo W, Chang Y, Xie Y, Tang J. DANCE: a deep learning library and benchmark platform for single-cell analysis. Genome Biol 2024; 25:72. [PMID: 38504331 PMCID: PMC10949782 DOI: 10.1186/s13059-024-03211-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024] Open
Abstract
DANCE is the first standard, generic, and extensible benchmark platform for accessing and evaluating computational methods across the spectrum of benchmark datasets for numerous single-cell analysis tasks. Currently, DANCE supports 3 modules and 8 popular tasks with 32 state-of-art methods on 21 benchmark datasets. People can easily reproduce the results of supported algorithms across major benchmark datasets via minimal efforts, such as using only one command line. In addition, DANCE provides an ecosystem of deep learning architectures and tools for researchers to facilitate their own model development. DANCE is an open-source Python package that welcomes all kinds of contributions.
Collapse
Affiliation(s)
- Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Hongzhi Wen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Wenzhuo Tang
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Zhaoheng Li
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Julian Venegas
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Runze Su
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Dylan Molho
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Wei Jin
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Yixin Wang
- Department of Bioengineering, Stanford University, Palo Alto, USA
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Lingxiao Li
- Department of Computer Science, Boston University, Boston, USA
| | - Wangyang Zuo
- Department of Computer Science, Zhejiang University of Technology, Zhejiang, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuying Xie
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, USA.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| |
Collapse
|
7
|
Shen X, Li X. Deep-learning methods for unveiling large-scale single-cell transcriptomes. Cancer Biol Med 2024; 20:j.issn.2095-3941.2023.0436. [PMID: 38318925 PMCID: PMC10845931 DOI: 10.20892/j.issn.2095-3941.2023.0436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 12/20/2023] [Indexed: 02/07/2024] Open
Affiliation(s)
- Xilin Shen
- Tianjin Cancer Institute, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin Medical University, Tianjin 300060, China
| | - Xiangchun Li
- Tianjin Cancer Institute, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin Medical University, Tianjin 300060, China
| |
Collapse
|
8
|
Yang J, Wang W, Zhang X. scSemiGCN: boosting cell-type annotation from noise-resistant graph neural networks with extremely limited supervision. Bioinformatics 2024; 40:btae091. [PMID: 38366925 PMCID: PMC10904148 DOI: 10.1093/bioinformatics/btae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/14/2024] [Accepted: 02/14/2024] [Indexed: 02/19/2024] Open
Abstract
MOTIVATION Cell-type annotation is fundamental in revealing cell heterogeneity for single-cell data analysis. Although a host of works have been developed, the low signal-to-noise-ratio single-cell RNA-sequencing data that suffers from batch effects and dropout still poses obstacles in discovering grouped patterns for cell types by unsupervised learning and its alternative-semi-supervised learning that utilizes a few labeled cells as guidance for cell-type annotation. RESULTS We propose a robust cell-type annotation method scSemiGCN based on graph convolutional networks. Built upon a denoised network structure that characterizes reliable cell-to-cell connections, scSemiGCN generates pseudo labels for unannotated cells. Then supervised contrastive learning follows to refine the noisy single-cell data. Finally, message passing with the refined features over the denoised network structure is conducted for semi-supervised cell-type annotation. Comparison over several datasets with six methods under extremely limited supervision validates the effectiveness and efficiency of scSemiGCN for cell-type annotation. AVAILABILITY AND IMPLEMENTATION Implementation of scSemiGCN is available at https://github.com/Jane9898/scSemiGCN.
Collapse
Affiliation(s)
- Jue Yang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510000, China
| | - Weiwen Wang
- Department of Mathematics, School of Information Science and Technology, Jinan University, Guangzhou 510000, China
| | - Xiwen Zhang
- Department of Bioinformatics, College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510000, China
| |
Collapse
|
9
|
Wei Z, Chenjun W, Feiyang X, Mingfeng J, Yixuan Z, Qi L, Zhuoxing S, Qi D. scHybridBERT: integrating gene regulation and cell graph for spatiotemporal dynamics in single-cell clustering. Brief Bioinform 2024; 25:bbae018. [PMID: 38517692 PMCID: PMC10959234 DOI: 10.1093/bib/bbae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/19/2023] [Accepted: 01/09/2024] [Indexed: 03/24/2024] Open
Abstract
Graph learning models have received increasing attention in the computational analysis of single-cell RNA sequencing (scRNA-seq) data. Compared with conventional deep neural networks, graph neural networks and language models have exhibited superior performance by extracting graph-structured data from raw gene count matrices. Established deep neural network-based clustering approaches generally focus on temporal expression patterns while ignoring inherent interactions at gene-level as well as cell-level, which could be regarded as spatial dynamics in single-cell data. Both gene-gene and cell-cell interactions are able to boost the performance of cell type detection, under the framework of multi-view modeling. In this study, spatiotemporal embedding and cell graphs are extracted to capture spatial dynamics at the molecular level. In order to enhance the accuracy of cell type detection, this study proposes the scHybridBERT architecture to conduct multi-view modeling of scRNA-seq data using extracted spatiotemporal patterns. In this scHybridBERT method, graph learning models are employed to deal with cell graphs and the Performer model employs spatiotemporal embeddings. Experimental outcomes about benchmark scRNA-seq datasets indicate that the proposed scHybridBERT method is able to enhance the accuracy of single-cell clustering tasks by integrating spatiotemporal embeddings and cell graphs.
Collapse
Affiliation(s)
- Zhang Wei
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Wu Chenjun
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Xing Feiyang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | | | - Zhang Yixuan
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Liu Qi
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | - Shi Zhuoxing
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, 510060, Guangzhou, China
| | - Dai Qi
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| |
Collapse
|
10
|
Xiong YX, Zhang XF. scDOT: enhancing single-cell RNA-Seq data annotation and uncovering novel cell types through multi-reference integration. Brief Bioinform 2024; 25:bbae072. [PMID: 38436563 PMCID: PMC10939303 DOI: 10.1093/bib/bbae072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/12/2024] [Accepted: 02/07/2024] [Indexed: 03/05/2024] Open
Abstract
The proliferation of single-cell RNA-seq data has greatly enhanced our ability to comprehend the intricate nature of diverse tissues. However, accurately annotating cell types in such data, especially when handling multiple reference datasets and identifying novel cell types, remains a significant challenge. To address these issues, we introduce Single Cell annotation based on Distance metric learning and Optimal Transport (scDOT), an innovative cell-type annotation method adept at integrating multiple reference datasets and uncovering previously unseen cell types. scDOT introduces two key innovations. First, by incorporating distance metric learning and optimal transport, it presents a novel optimization framework. This framework effectively learns the predictive power of each reference dataset for new query data and simultaneously establishes a probabilistic mapping between cells in the query data and reference-defined cell types. Secondly, scDOT develops an interpretable scoring system based on the acquired probabilistic mapping, enabling the precise identification of previously unseen cell types within the data. To rigorously assess scDOT's capabilities, we systematically evaluate its performance using two diverse collections of benchmark datasets encompassing various tissues, sequencing technologies and diverse cell types. Our experimental results consistently affirm the superior performance of scDOT in cell-type annotation and the identification of previously unseen cell types. These advancements provide researchers with a potent tool for precise cell-type annotation, ultimately enriching our understanding of complex biological tissues.
Collapse
Affiliation(s)
- Yi-Xuan Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China
| |
Collapse
|
11
|
Zhang Y, Sun H, Zhang W, Fu T, Huang S, Mou M, Zhang J, Gao J, Ge Y, Yang Q, Zhu F. CellSTAR: a comprehensive resource for single-cell transcriptomic annotation. Nucleic Acids Res 2024; 52:D859-D870. [PMID: 37855686 PMCID: PMC10767908 DOI: 10.1093/nar/gkad874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/12/2023] [Accepted: 09/27/2023] [Indexed: 10/20/2023] Open
Abstract
Large-scale studies of single-cell sequencing and biological experiments have successfully revealed expression patterns that distinguish different cell types in tissues, emphasizing the importance of studying cellular heterogeneity and accurately annotating cell types. Analysis of gene expression profiles in these experiments provides two essential types of data for cell type annotation: annotated references and canonical markers. In this study, the first comprehensive database of single-cell transcriptomic annotation resource (CellSTAR) was thus developed. It is unique in (a) offering the comprehensive expertly annotated reference data for annotating hundreds of cell types for the first time and (b) enabling the collective consideration of reference data and marker genes by incorporating tens of thousands of markers. Given its unique features, CellSTAR is expected to attract broad research interests from the technological innovations in single-cell transcriptomics, the studies of cellular heterogeneity & dynamics, and so on. It is now publicly accessible without any login requirement at: https://idrblab.org/cellstar.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shijie Huang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yichao Ge
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
12
|
Li C, Ye G, Jiang Y, Wang Z, Yu H, Yang M. Artificial Intelligence in battling infectious diseases: A transformative role. J Med Virol 2024; 96:e29355. [PMID: 38179882 DOI: 10.1002/jmv.29355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/01/2023] [Accepted: 12/17/2023] [Indexed: 01/06/2024]
Abstract
It is widely acknowledged that infectious diseases have wrought immense havoc on human society, being regarded as adversaries from which humanity cannot elude. In recent years, the advancement of Artificial Intelligence (AI) technology has ushered in a revolutionary era in the realm of infectious disease prevention and control. This evolution encompasses early warning of outbreaks, contact tracing, infection diagnosis, drug discovery, and the facilitation of drug design, alongside other facets of epidemic management. This article presents an overview of the utilization of AI systems in the field of infectious diseases, with a specific focus on their role during the COVID-19 pandemic. The article also highlights the contemporary challenges that AI confronts within this domain and posits strategies for their mitigation. There exists an imperative to further harness the potential applications of AI across multiple domains to augment its capacity in effectively addressing future disease outbreaks.
Collapse
Affiliation(s)
- Chunhui Li
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Guoguo Ye
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, The Third People's Hospital of Shenzhen, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Yinghan Jiang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Zhiming Wang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Haiyang Yu
- Hangzhou Yalla Information Technology Service Co., Ltd., Hangzhou, People's Republic of China
| | - Minghui Yang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| |
Collapse
|
13
|
Yin Q, Chen L. CellTICS: an explainable neural network for cell-type identification and interpretation based on single-cell RNA-seq data. Brief Bioinform 2023; 25:bbad449. [PMID: 38061196 PMCID: PMC10703497 DOI: 10.1093/bib/bbad449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
Identifying cell types is crucial for understanding the functional units of an organism. Machine learning has shown promising performance in identifying cell types, but many existing methods lack biological significance due to poor interpretability. However, it is of the utmost importance to understand what makes cells share the same function and form a specific cell type, motivating us to propose a biologically interpretable method. CellTICS prioritizes marker genes with cell-type-specific expression, using a hierarchy of biological pathways for neural network construction, and applying a multi-predictive-layer strategy to predict cell and sub-cell types. CellTICS usually outperforms existing methods in prediction accuracy. Moreover, CellTICS can reveal pathways that define a cell type or a cell type under specific physiological conditions, such as disease or aging. The nonlinear nature of neural networks enables us to identify many novel pathways. Interestingly, some of the pathways identified by CellTICS exhibit differential expression "variability" rather than differential expression across cell types, indicating that expression stochasticity within a pathway could be an important feature characteristic of a cell type. Overall, CellTICS provides a biologically interpretable method for identifying and characterizing cell types, shedding light on the underlying pathways that define cellular heterogeneity and its role in organismal function. CellTICS is available at https://github.com/qyyin0516/CellTICS.
Collapse
Affiliation(s)
- Qingyang Yin
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
14
|
HELLER GERWIN, FUEREDER THORSTEN, GRANDITS ALEXANDERMICHAEL, WIESER ROTRAUD. New perspectives on biology, disease progression, and therapy response of head and neck cancer gained from single cell RNA sequencing and spatial transcriptomics. Oncol Res 2023; 32:1-17. [PMID: 38188682 PMCID: PMC10767240 DOI: 10.32604/or.2023.044774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/12/2023] [Indexed: 01/09/2024] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is one of the most frequent cancers worldwide. The main risk factors are consumption of tobacco products and alcohol, as well as infection with human papilloma virus. Approved therapeutic options comprise surgery, radiation, chemotherapy, targeted therapy through epidermal growth factor receptor inhibition, and immunotherapy, but outcome has remained unsatisfactory due to recurrence rates of ~50% and the frequent occurrence of second primaries. The availability of the human genome sequence at the beginning of the millennium heralded the omics era, in which rapid technological progress has advanced our knowledge of the molecular biology of malignant diseases, including HNSCC, at an unprecedented pace. Initially, microarray-based methods, followed by approaches based on next-generation sequencing, were applied to study the genetics, epigenetics, and gene expression patterns of bulk tumors. More recently, the advent of single-cell RNA sequencing (scRNAseq) and spatial transcriptomics methods has facilitated the investigation of the heterogeneity between and within different cell populations in the tumor microenvironment (e.g., cancer cells, fibroblasts, immune cells, endothelial cells), led to the discovery of novel cell types, and advanced the discovery of cell-cell communication within tumors. This review provides an overview of scRNAseq, spatial transcriptomics, and the associated bioinformatics methods, and summarizes how their application has promoted our understanding of the emergence, composition, progression, and therapy responsiveness of, and intercellular signaling within, HNSCC.
Collapse
Affiliation(s)
- GERWIN HELLER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | - THORSTEN FUEREDER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | | | - ROTRAUD WIESER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
- Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
15
|
Li S, Guo H, Zhang S, Li Y, Li M. Attention-based deep clustering method for scRNA-seq cell type identification. PLoS Comput Biol 2023; 19:e1011641. [PMID: 37948464 PMCID: PMC10703402 DOI: 10.1371/journal.pcbi.1011641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/07/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023] Open
Abstract
Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes.
Collapse
Affiliation(s)
- Shenghao Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Hui Guo
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Simai Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Sichuan, China
| | - Yizhou Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Sichuan, China
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
16
|
Herrera I, Fernandes JAL, Shir-Mohammadi K, Levesque J, Mattar P. Lamin A upregulation reorganizes the genome during rod photoreceptor degeneration. Cell Death Dis 2023; 14:701. [PMID: 37880237 PMCID: PMC10600220 DOI: 10.1038/s41419-023-06224-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 10/10/2023] [Accepted: 10/16/2023] [Indexed: 10/27/2023]
Abstract
Neurodegenerative diseases are accompanied by dynamic changes in gene expression, including the upregulation of hallmark stress-responsive genes. While the transcriptional pathways that impart adaptive and maladaptive gene expression signatures have been the focus of intense study, the role of higher order nuclear organization in this process is less clear. Here, we examine the role of the nuclear lamina in genome organization during the degeneration of rod photoreceptors. Two proteins had previously been shown to be necessary and sufficient to tether heterochromatin at the nuclear envelope. The lamin B receptor (Lbr) is expressed during development, but downregulates upon rod differentiation. A second tether is the intermediate filament lamin A (LA), which is not normally expressed in murine rods. Here, we show that in the rd1 model of retinitis pigmentosa, LA ectopically upregulates in rod photoreceptors at the onset of degeneration. LA upregulation correlated with increased heterochromatin tethering at the nuclear periphery in rd1 rods, suggesting that LA reorganizes the nucleus. To determine how heterochromatin tethering affects the genome, we used in vivo electroporation to misexpress LA or Lbr in mature rods in the absence of degeneration, resulting in the restoration of conventional nuclear architecture. Using scRNA-seq, we show that reorganizing the nucleus via LA/Lbr misexpression has relatively minor effects on rod gene expression. Next, using ATAC-seq, we show that LA and Lbr both lead to marked increases in genome accessibility. Novel ATAC-seq peaks tended to be associated with stress-responsive genes. Together, our data reveal that heterochromatin tethers have a global effect on genome accessibility, and suggest that heterochromatin tethering primes the photoreceptor genome to respond to stress.
Collapse
Affiliation(s)
- Ivana Herrera
- Ottawa Hospital Research Institute (OHRI), Ottawa, ON, K1H 8L6, Canada
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| | - José Alex Lourenço Fernandes
- Ottawa Hospital Research Institute (OHRI), Ottawa, ON, K1H 8L6, Canada
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| | - Khatereh Shir-Mohammadi
- Ottawa Hospital Research Institute (OHRI), Ottawa, ON, K1H 8L6, Canada
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| | - Jasmine Levesque
- Ottawa Hospital Research Institute (OHRI), Ottawa, ON, K1H 8L6, Canada
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| | - Pierre Mattar
- Ottawa Hospital Research Institute (OHRI), Ottawa, ON, K1H 8L6, Canada.
- Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, K1H 8M5, Canada.
| |
Collapse
|
17
|
Lazaros K, Vlamos P, Vrahatis AG. Methods for cell-type annotation on scRNA-seq data: A recent overview. J Bioinform Comput Biol 2023; 21:2340002. [PMID: 37743364 DOI: 10.1142/s0219720023400024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The evolution of single-cell technology is ongoing, continually generating massive amounts of data that reveal many mysteries surrounding intricate diseases. However, their drawbacks continue to constrain us. Among these, annotating cell types in single-cell gene expressions pose a substantial challenge, despite the myriad of tools at our disposal. The rapid growth in data, resources, and tools has consequently brought about significant alterations in this area over the years. In our study, we spotlight all note-worthy cell type annotation techniques developed over the past four years. We provide an overview of the latest trends in this field, showcasing the most advanced methods in taxonomy. Our research underscores the demand for additional tools that incorporate a biological context and also predicts that the rising trend of graph neural network approaches will likely lead this research field in the coming years.
Collapse
Affiliation(s)
- Konstantinos Lazaros
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Aristidis G Vrahatis
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| |
Collapse
|
18
|
Ren P, Shi X, Yu Z, Dong X, Ding X, Wang J, Sun L, Yan Y, Hu J, Zhang P, Chen Q, Zhang J, Li T, Wang C. Single-cell assignment using multiple-adversarial domain adaptation network with large-scale references. CELL REPORTS METHODS 2023; 3:100577. [PMID: 37751689 PMCID: PMC10545911 DOI: 10.1016/j.crmeth.2023.100577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/11/2023] [Accepted: 08/09/2023] [Indexed: 09/28/2023]
Abstract
The rapid accumulation of single-cell RNA-seq data has provided rich resources to characterize various human cell populations. However, achieving accurate cell-type annotation using public references presents challenges due to inconsistent annotations, batch effects, and rare cell types. Here, we introduce SELINA (single-cell identity navigator), an integrative and automatic cell-type annotation framework based on a pre-curated reference atlas spanning various tissues. SELINA employs a multiple-adversarial domain adaptation network to remove batch effects within the reference dataset. Additionally, it enhances the annotation of less frequent cell types by synthetic minority oversampling and fits query data with the reference data using an autoencoder. SELINA culminates in the creation of a comprehensive and uniform reference atlas, encompassing 1.7 million cells covering 230 distinct human cell types. We substantiate its robustness and superiority across a multitude of human tissues. Notably, SELINA could accurately annotate cells within diverse disease contexts. SELINA provides a complete solution for human single-cell RNA-seq data annotation with both python and R packages.
Collapse
Affiliation(s)
- Pengfei Ren
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China
| | - Xiaoying Shi
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zhiguang Yu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Guangxi 530004, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuanxin Ding
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jin Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Liangdong Sun
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Yilv Yan
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Junjie Hu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Peng Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Qianming Chen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China
| | - Jing Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Science and Technology, Tongji University, Shanghai, China.
| | - Taiwen Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
19
|
Fiannaca A, La Rosa M, La Paglia L, Gaglio S, Urso A. GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data. Brief Bioinform 2023; 24:bbad332. [PMID: 37756593 PMCID: PMC10530315 DOI: 10.1093/bib/bbad332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
Collapse
Affiliation(s)
- Antonino Fiannaca
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Massimo La Rosa
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Laura La Paglia
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Salvatore Gaglio
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
- Dipartimento di Ingegneria, Università degli studi di Palermo, Viale Delle Scienze, ed. 6, 90128, Palermo, Italy
| | - Alfonso Urso
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| |
Collapse
|
20
|
Cava C, D'Antona S, Maselli F, Castiglioni I, Porro D. From genetic correlations of Alzheimer's disease to classification with artificial neural network models. Funct Integr Genomics 2023; 23:293. [PMID: 37682415 PMCID: PMC10491691 DOI: 10.1007/s10142-023-01228-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/30/2023] [Accepted: 09/03/2023] [Indexed: 09/09/2023]
Abstract
Sporadic Alzheimer's disease (AD) is a complex neurological disorder characterized by many risk loci with potential associations with different traits and diseases. AD, characterized by a progressive loss of neuronal functions, manifests with different symptoms such as decline in memory, movement, coordination, and speech. The mechanisms underlying the onset of AD are not always fully understood, but involve a multiplicity of factors. Early diagnosis of AD plays a central role as it can offer the possibility of early treatment, which can slow disease progression. Currently, the methods of diagnosis are cognitive testing, neuroimaging, or cerebrospinal fluid analysis that can be time-consuming, expensive, invasive, and not always accurate. In the present study, we performed a genetic correlation analysis using genome-wide association statistics from a large study of AD and UK Biobank, to examine the association of AD with other human traits and disorders. In addition, since hippocampus, a part of cerebral cortex could play a central role in several traits that are associated with AD; we analyzed the gene expression profiles of hippocampus of AD patients applying 4 different artificial neural network models. We found 65 traits correlated with AD grouped into 9 clusters: medical conditions, fluid intelligence, education, anthropometric measures, employment status, activity, diet, lifestyle, and sexuality. The comparison of different 4 neural network models along with feature selection methods on 5 Alzheimer's gene expression datasets showed that the simple basic neural network model obtains a better performance (66% of accuracy) than other more complex methods with dropout and weight regularization of the network.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F. Cervi 93, Segrate-Milan, 20090, Milan, Italy.
- Department of Science, Technology and Society, University School for Advanced Studies IUSS Pavia, Palazzo del Broletto, Piazza Della Vittoria 15, 27100, Pavia, Italy.
| | - Salvatore D'Antona
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F. Cervi 93, Segrate-Milan, 20090, Milan, Italy
| | - Francesca Maselli
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F. Cervi 93, Segrate-Milan, 20090, Milan, Italy
| | - Isabella Castiglioni
- Department of Physics "Giuseppe Occhialini", University of Milan-Bicocca Piazza Dell'Ateneo Nuovo, 20126, Milan, Italy
| | - Danilo Porro
- Institute of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Via F. Cervi 93, Segrate-Milan, 20090, Milan, Italy
- NBFC, National Biodiversity Future Center, 90133, Palermo, Italy
| |
Collapse
|
21
|
Lyu P, Zhai Y, Li T, Qian J. CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics 2023; 39:btad521. [PMID: 37610325 PMCID: PMC10477937 DOI: 10.1093/bioinformatics/btad521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/17/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Single-cell sequencing technology has become a routine in studying many biological problems. A core step of analyzing single-cell data is the assignment of cell clusters to specific cell types. Reference-based methods are proposed for predicting cell types for single-cell clusters. However, the scalability and lack of preprocessed reference datasets prevent them from being practical and easy to use. RESULTS Here, we introduce a reference-based cell annotation web server, CellAnn, which is super-fast and easy to use. CellAnn contains a comprehensive reference database with 204 human and 191 mouse single-cell datasets. These reference datasets cover 32 organs. Furthermore, we developed a cluster-to-cluster alignment method to transfer cell labels from the reference to the query datasets, which is superior to the existing methods with higher accuracy and higher scalability. Finally, CellAnn is an online tool that integrates all the procedures in cell annotation, including reference searching, transferring cell labels, visualizing results, and harmonizing cell annotation labels. Through the user-friendly interface, users can identify the best annotation by cross-validating with multiple reference datasets. We believe that CellAnn can greatly facilitate single-cell sequencing data analysis. AVAILABILITY AND IMPLEMENTATION The web server is available at www.cellann.io, and the source code is available at https://github.com/Pinlyu3/CellAnn_shinyapp.
Collapse
Affiliation(s)
- Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Yijie Zhai
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Taibo Li
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21218, United States
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| |
Collapse
|
22
|
Madadi Y, Monavarfeshani A, Chen H, Stamer WD, Williams RW, Yousefi S. Artificial Intelligence Models for Cell Type and Subtype Identification Based on Single-Cell RNA Sequencing Data in Vision Science. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2837-2852. [PMID: 37294649 PMCID: PMC10631573 DOI: 10.1109/tcbb.2023.3284795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) provides a high throughput, quantitative and unbiased framework for scientists in many research fields to identify and characterize cell types within heterogeneous cell populations from various tissues. However, scRNA-seq based identification of discrete cell-types is still labor intensive and depends on prior molecular knowledge. Artificial intelligence has provided faster, more accurate, and user-friendly approaches for cell-type identification. In this review, we discuss recent advances in cell-type identification methods using artificial intelligence techniques based on single-cell and single-nucleus RNA sequencing data in vision science. The main purpose of this review paper is to assist vision scientists not only to select suitable datasets for their problems, but also to be aware of the appropriate computational tools to perform their analysis. Developing novel methods for scRNA-seq data analysis remains to be addressed in future studies.
Collapse
|
23
|
Peng L, Tan J, Xiong W, Zhang L, Wang Z, Yuan R, Li Z, Chen X. Deciphering ligand-receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput Biol Med 2023; 163:107137. [PMID: 37364528 DOI: 10.1016/j.compbiomed.2023.107137] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/18/2023] [Accepted: 06/04/2023] [Indexed: 06/28/2023]
Abstract
BACKGROUND Cell-cell communication in a tumor microenvironment is vital to tumorigenesis, tumor progression and therapy. Intercellular communication inference helps understand molecular mechanisms of tumor growth, progression and metastasis. METHODS Focusing on ligand-receptor co-expressions, in this study, we developed an ensemble deep learning framework, CellComNet, to decipher ligand-receptor-mediated cell-cell communication from single-cell transcriptomic data. First, credible LRIs are captured by integrating data arrangement, feature extraction, dimension reduction, and LRI classification based on an ensemble of heterogeneous Newton boosting machine and deep neural network. Next, known and identified LRIs are screened based on single-cell RNA sequencing (scRNA-seq) data in certain tissues. Finally, cell-cell communication is inferred by incorporating scRNA-seq data, the screened LRIs, a joint scoring strategy that combines expression thresholding and expression product of ligands and receptors. RESULTS The proposed CellComNet framework was compared with four competing protein-protein interaction prediction models (PIPR, XGBoost, DNNXGB, and OR-RCNN) and obtained the best AUCs and AUPRs on four LRI datasets, elucidating the optimal LRI classification ability. CellComNet was further applied to analyze intercellular communication in human melanoma and head and neck squamous cell carcinoma (HNSCC) tissues. The results demonstrate that cancer-associated fibroblasts highly communicate with melanoma cells and endothelial cells strong communicate with HNSCC cells. CONCLUSIONS The proposed CellComNet framework efficiently identified credible LRIs and significantly improved cell-cell communication inference performance. We anticipate that CellComNet can contribute to anticancer drug design and tumor-targeted therapy.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Wei Xiong
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
24
|
Halawani R, Buchert M, Chen YPP. Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity. Comput Biol Med 2023; 164:107274. [PMID: 37506451 DOI: 10.1016/j.compbiomed.2023.107274] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 07/03/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
Tumour heterogeneity is one of the critical confounding aspects in decoding tumour growth. Malignant cells display variations in their gene transcription profiles and mutation spectra even when originating from a single progenitor cell. Single-cell and spatial transcriptomics sequencing have recently emerged as key technologies for unravelling tumour heterogeneity. Single-cell sequencing promotes individual cell-type identification through transcriptome-wide gene expression measurements of each cell. Spatial transcriptomics facilitates identification of cell-cell interactions and the structural organization of heterogeneous cells within a tumour tissue through associating spatial RNA abundance of cells at distinct spots in the tissue section. However, extracting features and analyzing single-cell and spatial transcriptomics data poses challenges. Single-cell transcriptome data is extremely noisy and its sparse nature and dropouts can lead to misinterpretation of gene expression and the misclassification of cell types. Deep learning predictive power can overcome data challenges, provide high-resolution analysis and enhance precision oncology applications that involve early cancer prognosis, diagnosis, patient survival estimation and anti-cancer therapy planning. In this paper, we provide a background to and review of the recent progress of deep learning frameworks to investigate tumour heterogeneity using both single-cell and spatial transcriptomics data types.
Collapse
Affiliation(s)
- Raid Halawani
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Michael Buchert
- School of Cancer Medicine, La Trobe University, Melbourne, Victoria, Australia; Olivia Newton-John Cancer Research Institute, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| |
Collapse
|
25
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
26
|
Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, Mantineo H, Brydon EM, Zeng Z, Liu XS, Ellinor PT. Transfer learning enables predictions in network biology. Nature 2023; 618:616-624. [PMID: 37258680 PMCID: PMC10949956 DOI: 10.1038/s41586-023-06139-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 65.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 04/27/2023] [Indexed: 06/02/2023]
Abstract
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
Collapse
Affiliation(s)
- Christina V Theodoris
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
- Harvard Medical School Genetics Training Program, Boston, USA.
| | - Ling Xiao
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Anant Chopra
- Precision Cardiology Laboratory, Bayer US LLC, Cambridge, MA, USA
| | - Mark D Chaffin
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zeina R Al Sayed
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew C Hill
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Helene Mantineo
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | | | - Zexian Zeng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
27
|
Liu Y, Wei G, Li C, Shen LC, Gasser RB, Song J, Chen D, Yu DJ. TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level. Brief Bioinform 2023; 24:bbad132. [PMID: 37080771 PMCID: PMC10199768 DOI: 10.1093/bib/bbad132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/02/2023] [Accepted: 03/14/2023] [Indexed: 04/22/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has significantly accelerated the experimental characterization of distinct cell lineages and types in complex tissues and organisms. Cell-type annotation is of great importance in most of the scRNA-seq analysis pipelines. However, manual cell-type annotation heavily relies on the quality of scRNA-seq data and marker genes, and therefore can be laborious and time-consuming. Furthermore, the heterogeneity of scRNA-seq datasets poses another challenge for accurate cell-type annotation, such as the batch effect induced by different scRNA-seq protocols and samples. To overcome these limitations, here we propose a novel pipeline, termed TripletCell, for cross-species, cross-protocol and cross-sample cell-type annotation. We developed a cell embedding and dimension-reduction module for the feature extraction (FE) in TripletCell, namely TripletCell-FE, to leverage the deep metric learning-based algorithm for the relationships between the reference gene expression matrix and the query cells. Our experimental studies on 21 datasets (covering nine scRNA-seq protocols, two species and three tissues) demonstrate that TripletCell outperformed state-of-the-art approaches for cell-type annotation. More importantly, regardless of protocols or species, TripletCell can deliver outstanding and robust performance in annotating different types of cells. TripletCell is freely available at https://github.com/liuyan3056/TripletCell. We believe that TripletCell is a reliable computational tool for accurately annotating various cell types using scRNA-seq data and will be instrumental in assisting the generation of novel biological hypotheses in cell biology.
Collapse
Affiliation(s)
- Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Guo Wei
- School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dijun Chen
- School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
28
|
Xu J, Zhang A, Liu F, Chen L, Zhang X. CIForm as a Transformer-based model for cell-type annotation of large-scale single-cell RNA-seq data. Brief Bioinform 2023:7169137. [PMID: 37200157 DOI: 10.1093/bib/bbad195] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/03/2023] [Accepted: 04/30/2023] [Indexed: 05/20/2023] Open
Abstract
Single-cell omics technologies have made it possible to analyze the individual cells within a biological sample, providing a more detailed understanding of biological systems. Accurately determining the cell type of each cell is a crucial goal in single-cell RNA-seq (scRNA-seq) analysis. Apart from overcoming the batch effects arising from various factors, single-cell annotation methods also face the challenge of effectively processing large-scale datasets. With the availability of an increase in the scRNA-seq datasets, integrating multiple datasets and addressing batch effects originating from diverse sources are also challenges in cell-type annotation. In this work, to overcome the challenges, we developed a supervised method called CIForm based on the Transformer for cell-type annotation of large-scale scRNA-seq data. To assess the effectiveness and robustness of CIForm, we have compared it with some leading tools on benchmark datasets. Through the systematic comparisons under various cell-type annotation scenarios, we exhibit that the effectiveness of CIForm is particularly pronounced in cell-type annotation. The source code and data are available at https://github.com/zhanglab-wbgcas/CIForm.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Liang Chen
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
29
|
Jiao L, Wang G, Dai H, Li X, Wang S, Song T. scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings. Biomolecules 2023; 13:biom13040611. [PMID: 37189359 DOI: 10.3390/biom13040611] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/31/2023] Open
Abstract
Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
Collapse
|
30
|
Heydari AA, Sindi SS. Deep learning in spatial transcriptomics: Learning from the next next-generation sequencing. BIOPHYSICS REVIEWS 2023; 4:011306. [PMID: 38505815 PMCID: PMC10903438 DOI: 10.1063/5.0091135] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 12/19/2022] [Indexed: 03/21/2024]
Abstract
Spatial transcriptomics (ST) technologies are rapidly becoming the extension of single-cell RNA sequencing (scRNAseq), holding the potential of profiling gene expression at a single-cell resolution while maintaining cellular compositions within a tissue. Having both expression profiles and tissue organization enables researchers to better understand cellular interactions and heterogeneity, providing insight into complex biological processes that would not be possible with traditional sequencing technologies. Data generated by ST technologies are inherently noisy, high-dimensional, sparse, and multi-modal (including histological images, count matrices, etc.), thus requiring specialized computational tools for accurate and robust analysis. However, many ST studies currently utilize traditional scRNAseq tools, which are inadequate for analyzing complex ST datasets. On the other hand, many of the existing ST-specific methods are built upon traditional statistical or machine learning frameworks, which have shown to be sub-optimal in many applications due to the scale, multi-modality, and limitations of spatially resolved data (such as spatial resolution, sensitivity, and gene coverage). Given these intricacies, researchers have developed deep learning (DL)-based models to alleviate ST-specific challenges. These methods include new state-of-the-art models in alignment, spatial reconstruction, and spatial clustering, among others. However, DL models for ST analysis are nascent and remain largely underexplored. In this review, we provide an overview of existing state-of-the-art tools for analyzing spatially resolved transcriptomics while delving deeper into the DL-based approaches. We discuss the new frontiers and the open questions in this field and highlight domains in which we anticipate transformational DL applications.
Collapse
|
31
|
Dong X, Chowdhury S, Victor U, Li X, Qian L. Semi-Supervised Deep Learning for Cell Type Identification From Single-Cell Transcriptomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1492-1505. [PMID: 35536811 DOI: 10.1109/tcbb.2022.3173587] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. Deep neural networks have been employed to identify cell types from scRNAseq data with high performance. However, it requires a large mount of individual cells with accurate and unbiased annotated types to train the identification models. Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as it involves manual inspection of marker genes. To overcome this challenge, we propose a semi-supervised learning model "SemiRNet" to use unlabeled scRNAseq cells and a limited amount of labeled scRNAseq cells to implement cell identification. The proposed model is based on recurrent convolutional neural networks (RCNN), which includes a shared network, a supervised network and an unsupervised network. The proposed model is evaluated on two large scale single-cell transcriptomic datasets. It is observed that the proposed model is able to achieve encouraging performance by learning on the very limited amount of labeled scRNAseq cells together with a large number of unlabeled scRNAseq cells.
Collapse
|
32
|
Singh A, Saint-Antoine M. Probing transient memory of cellular states using single-cell lineages. Front Microbiol 2023; 13:1050516. [PMID: 36824587 PMCID: PMC9942930 DOI: 10.3389/fmicb.2022.1050516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 12/22/2022] [Indexed: 02/10/2023] Open
Abstract
The inherent stochasticity in the gene product levels can drive single cells within an isoclonal population to different phenotypic states. The dynamic nature of this intercellular variation, where individual cells can transition between different states over time, makes it a particularly hard phenomenon to characterize. We reviewed recent progress in leveraging the classical Luria-Delbrück experiment to infer the transient heritability of the cellular states. Similar to the original experiment, individual cells were first grown into cell colonies, and then, the fraction of cells residing in different states was assayed for each colony. We discuss modeling approaches for capturing dynamic state transitions in a growing cell population and highlight formulas that identify the kinetics of state switching from the extent of colony-to-colony fluctuations. The utility of this method in identifying multi-generational memory of the both expression and phenotypic states is illustrated across diverse biological systems from cancer drug resistance, reactivation of human viruses, and cellular immune responses. In summary, this fluctuation-based methodology provides a powerful approach for elucidating cell-state transitions from a single time point measurement, which is particularly relevant in situations where measurements lead to cell death (as in single-cell RNA-seq or drug treatment) or cause an irreversible change in cell physiology.
Collapse
Affiliation(s)
- Abhyudai Singh
- Departments of Electrical and Computer Engineering, Biomedical Engineering, Mathematical Sciences University of Delaware, Newark, DE, United States
| | | |
Collapse
|
33
|
Suzuki T. Overview of single-cell RNA sequencing analysis and its application to spermatogenesis research. Reprod Med Biol 2023; 22:e12502. [PMID: 36726594 PMCID: PMC9884325 DOI: 10.1002/rmb2.12502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 12/18/2022] [Accepted: 01/10/2023] [Indexed: 01/30/2023] Open
Abstract
Background Single-cell transcriptomics allows parallel analysis of multiple cell types in tissues. Because testes comprise somatic cells and germ cells at various stages of spermatogenesis, single-cell RNA sequencing is a powerful tool for investigating the complex process of spermatogenesis. However, single-cell RNA sequencing analysis needs extensive knowledge of experimental technologies and bioinformatics, making it difficult for many, particularly experimental biologists and clinicians, to use it. Methods Aiming to make single-cell RNA sequencing analysis familiar, this review article presents an overview of experimental and computational methods for single-cell RNA sequencing analysis with a history of transcriptomics. In addition, combining the PubMed search and manual curation, this review also provides a summary of recent novel insights into human and mouse spermatogenesis obtained using single-cell RNA sequencing analyses. Main Findings Single-cell RNA sequencing identified mesenchymal cells and type II innate lymphoid cells as novel testicular cell types in the adult mouse testes, as well as detailed subtypes of germ cells. This review outlines recent discoveries into germ cell development and subtypes, somatic cell development, and cell-cell interactions. Conclusion The findings on spermatogenesis obtained using single-cell RNA sequencing may contribute to a deeper understanding of spermatogenesis and provide new directions for male fertility therapy.
Collapse
Affiliation(s)
- Takahiro Suzuki
- RIKEN Center for Integrated Medical Science (IMS)Yokohama CityKanagawaJapan
- Graduate School of Medical Life ScienceYokohama City UniversityYokohama CityKanagawaJapan
| |
Collapse
|
34
|
Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun 2022; 13:7640. [PMID: 36496406 PMCID: PMC9741613 DOI: 10.1038/s41467-022-35288-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
Spatially resolved transcriptomics provides the opportunity to investigate the gene expression profiles and the spatial context of cells in naive state, but at low transcript detection sensitivity or with limited gene throughput. Comprehensive annotating of cell types in spatially resolved transcriptomics to understand biological processes at the single cell level remains challenging. Here we propose Spatial-ID, a supervision-based cell typing method, that combines the existing knowledge of reference single-cell RNA-seq data and the spatial information of spatially resolved transcriptomics data. We present a series of benchmarking analyses on publicly available spatially resolved transcriptomics datasets, that demonstrate the superiority of Spatial-ID compared with state-of-the-art methods. Besides, we apply Spatial-ID on a self-collected mouse brain hemisphere dataset measured by Stereo-seq, that shows the scalability of Spatial-ID to three-dimensional large field tissues with subcellular spatial resolution.
Collapse
|
35
|
Liu Q, Liang Y, Wang D, Li J. LFSC: A linear fast semi-supervised clustering algorithm that integrates reference-bulk and single-cell transcriptomes. Front Genet 2022; 13:1068075. [PMID: 36531230 PMCID: PMC9754124 DOI: 10.3389/fgene.2022.1068075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/09/2022] [Indexed: 08/29/2023] Open
Abstract
The identification of cell types in complex tissues is an important step in research into cellular heterogeneity in disease. We present a linear fast semi-supervised clustering (LFSC) algorithm that utilizes reference samples generated from bulk RNA sequencing data to identify cell types from single-cell transcriptomes. An anchor graph is constructed to depict the relationship between reference samples and cells. By applying a connectivity constraint to the learned graph, LFSC enables the preservation of the underlying cluster structure. Moreover, the overall complexity of LFSC is linear to the size of the data, which greatly improves effectiveness and efficiency. By applying LFSC to real single-cell RNA sequencing datasets, we discovered that it has superior performance over existing baseline methods in clustering accuracy and robustness. An application using infiltrating T cells in liver cancer demonstrates that LFSC can successfully find new cell types, discover differently expressed genes, and explore new cancer-associated biomarkers.
Collapse
Affiliation(s)
- Qiaoming Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yingjian Liang
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Dong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
36
|
Yang L, Yang Y, Huang L, Cui X, Liu Y. From single- to multi-omics: future research trends in medicinal plants. Brief Bioinform 2022; 24:6840072. [PMID: 36416120 PMCID: PMC9851310 DOI: 10.1093/bib/bbac485] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/25/2022] Open
Abstract
Medicinal plants are the main source of natural metabolites with specialised pharmacological activities and have been widely examined by plant researchers. Numerous omics studies of medicinal plants have been performed to identify molecular markers of species and functional genes controlling key biological traits, as well as to understand biosynthetic pathways of bioactive metabolites and the regulatory mechanisms of environmental responses. Omics technologies have been widely applied to medicinal plants, including as taxonomics, transcriptomics, metabolomics, proteomics, genomics, pangenomics, epigenomics and mutagenomics. However, because of the complex biological regulation network, single omics usually fail to explain the specific biological phenomena. In recent years, reports of integrated multi-omics studies of medicinal plants have increased. Until now, there have few assessments of recent developments and upcoming trends in omics studies of medicinal plants. We highlight recent developments in omics research of medicinal plants, summarise the typical bioinformatics resources available for analysing omics datasets, and discuss related future directions and challenges. This information facilitates further studies of medicinal plants, refinement of current approaches and leads to new ideas.
Collapse
Affiliation(s)
- Lifang Yang
- Kunming University of Science and Technology, China
| | - Ye Yang
- Kunming University of Science and Technology, China
| | - Luqi Huang
- the academician of the Chinese Academy of Engineering, studies the development of traditional Chinese medicine, Chinese Academy of Chinese Medical Sciences, China
| | - Xiuming Cui
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| | - Yuan Liu
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| |
Collapse
|
37
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
38
|
La Rosa M, Fiannaca A, La Paglia L, Urso A. A Graph Neural Network Approach for the Analysis of siRNA-Target Biological Networks. Int J Mol Sci 2022; 23:ijms232214211. [PMID: 36430688 PMCID: PMC9696923 DOI: 10.3390/ijms232214211] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Many biological systems are characterised by biological entities, as well as their relationships. These interaction networks can be modelled as graphs, with nodes representing bio-entities, such as molecules, and edges representing relations among them, such as interactions. Due to the current availability of a huge amount of biological data, it is very important to consider in silico analysis methods based on, for example, machine learning, that could take advantage of the inner graph structure of the data in order to improve the quality of the results. In this scenario, graph neural networks (GNNs) are recent computational approaches that directly deal with graph-structured data. In this paper, we present a GNN network for the analysis of siRNA-mRNA interaction networks. siRNAs, in fact, are small RNA molecules that are able to bind to target genes and silence them. These events make siRNAs key molecules as RNA interference agents in many biological interaction networks related to severe diseases such as cancer. In particular, our GNN approach allows for the prediction of the siRNA efficacy, which measures the siRNA's ability to bind and silence a gene target. Tested on benchmark datasets, our proposed method overcomes other machine learning algorithms, including the state-of-the-art predictor based on the convolutional neural network, reaching a Pearson correlation coefficient of approximately 73.6%. Finally, we proposed a case study where the efficacy of a set of siRNAs is predicted for a gene of interest. To the best of our knowledge, GNNs were used for the first time in this scenario.
Collapse
|
39
|
De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution. Nat Commun 2022; 13:6498. [PMID: 36310179 PMCID: PMC9618574 DOI: 10.1038/s41467-022-34271-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 10/19/2022] [Indexed: 12/25/2022] Open
Abstract
Uncovering the tissue molecular architecture at single-cell resolution could help better understand organisms' biological and pathological processes. However, bulk RNA-seq can only measure gene expression in cell mixtures, without revealing the transcriptional heterogeneity and spatial patterns of single cells. Herein, we introduce Bulk2Space ( https://github.com/ZJUFanLab/bulk2space ), a deep learning framework-based spatial deconvolution algorithm that can simultaneously disclose the spatial and cellular heterogeneity of bulk RNA-seq data using existing single-cell and spatial transcriptomics references. The use of bulk transcriptomics to validate Bulk2Space unveils, in particular, the spatial variance of immune cells in different tumor regions, the molecular and spatial heterogeneity of tissues during inflammation-induced tumorigenesis, and spatial patterns of novel genes in different cell types. Moreover, Bulk2Space is utilized to perform spatial deconvolution analysis on bulk transcriptome data from two different mouse brain regions derived from our in-house developed sequencing approach termed Spatial-seq. We have not only reconstructed the hierarchical structure of the mouse isocortex but also further annotated cell types that were not identified by original methods in the mouse hypothalamus.
Collapse
|
40
|
Chen Y, Zhang S. Automatic Cell Type Annotation Using Marker Genes for Single-Cell RNA Sequencing Data. Biomolecules 2022; 12:biom12101539. [PMID: 36291748 PMCID: PMC9599378 DOI: 10.3390/biom12101539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 10/01/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
Recent advancement in single-cell RNA sequencing (scRNA-seq) technology is gaining more and more attention. Cell type annotation plays an essential role in scRNA-seq data analysis. Several computational methods have been proposed for automatic annotation. Traditional cell type annotation is to first cluster the cells using unsupervised learning methods based on the gene expression profiles, then to label the clusters using the aggregated cluster-level expression profiles and the marker genes’ information. Such procedure relies heavily on the clustering results. As the purity of clusters cannot be guaranteed, false detection of cluster features may lead to wrong annotations. In this paper, we improve this procedure and propose an Automatic Cell type Annotation Method (ACAM). ACAM delineates a clear framework to conduct automatic cell annotation through representative cluster identification, representative cluster annotation using marker genes, and the remaining cells’ classification. Experiments on seven real datasets show the better performance of ACAM compared to six well-known cell type annotation methods.
Collapse
Affiliation(s)
- Yu Chen
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai 200433, China
- Key Laboratory of Mathematics for Nonlinear Science (Ministry of Education), Fudan University, Shanghai 200433, China
- Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai 200433, China
- Correspondence:
| |
Collapse
|
41
|
Song T, Dai H, Wang S, Wang G, Zhang X, Zhang Y, Jiao L. TransCluster: A Cell-Type Identification Method for single-cell RNA-Seq data using deep learning based on transformer. Front Genet 2022; 13:1038919. [PMID: 36303549 PMCID: PMC9592860 DOI: 10.3389/fgene.2022.1038919] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 09/23/2022] [Indexed: 11/25/2022] Open
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) have accelerated the development of techniques to classify thousands of cells through transcriptome profiling. As more and more scRNA-seq data become available, supervised cell type classification methods using externally well-annotated source data become more popular than unsupervised clustering algorithms. However, accurate cellular annotation of single cell transcription data remains a significant challenge. Here, we propose a hybrid network structure called TransCluster, which uses linear discriminant analysis and a modified Transformer to enhance feature learning. It is a cell-type identification tool for single-cell transcriptomic maps. It shows high accuracy and robustness in many cell data sets of different human tissues. It is superior to other known methods in external test data set. To our knowledge, TransCluster is the first attempt to use Transformer for annotating cell types of scRNA-seq, which greatly improves the accuracy of cell-type identification.
Collapse
Affiliation(s)
- Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- Department of Artificial Intelligence, Faculty of Computer Science, Campus de Montegancedo, Polytechnical University of Madrid, Boadilla Del Monte, Madrid, Spain
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Huanhuan Dai
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Gan Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Xudong Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Ying Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
42
|
CaSee: A lightning transfer-learning model directly used to discriminate cancer/normal cells from scRNA-seq. Oncogene 2022; 41:4866-4876. [PMID: 36192479 DOI: 10.1038/s41388-022-02478-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/13/2022] [Accepted: 09/20/2022] [Indexed: 11/08/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) is one of the most efficient technologies for human tumor research. However, data analysis is still faced with technical challenges, especially the difficulty in efficiently and accurately discriminating cancer/normal cells in the scRNA-seq expression matrix. If we can address these challenges, we can have a deeper understanding of the intratumoral and intertumoral heterogeneity. In this study, we developed a cancer/normal cell discrimination pipeline called pan-Cancer Seeker (CaSee) devoted to scRNA-seq expression matrix, which is based on the traditional high-quality pan-cancer bulk sequencing data using transfer learning. CaSee is the first tool directly used to discriminate cancer/normal cells in the scRNA-seq expression matrix, with much wider application fields and higher efficiency than copy number variation (CNV) method which requires corresponding reference cells. CaSee is user-friendly and can adapt to a variety of data sources, including but not limited to scRNA tissue sequencing data, scRNA cell line sequencing data, scRNA xenograft cell sequencing data and scRNA circulating tumor cell sequencing data. It is compatible with mainstream sequencing technology platforms, 10× Genomics Chromium, Smart-seq2, and Microwell-seq. Here, CaSee pipeline exhibited excellent performance in the multicenter data evaluation of 11 retrospective cohorts and one independent dataset, with an average discrimination accuracy of 96.69%. In general, the development of a deep-learning based, pan-cancer cell discrimination model, CaSee, to distinguish cancer cells from normal cells will be compelling to researchers working in the genomics, cancer, and single-cell fields.
Collapse
|
43
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
44
|
Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat Commun 2022; 13:4429. [PMID: 35908020 PMCID: PMC9338929 DOI: 10.1038/s41467-022-32111-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 07/18/2022] [Indexed: 12/19/2022] Open
Abstract
Spatially resolved transcriptomics provides genetic information in space toward elucidation of the spatial architecture in intact organs and the spatially resolved cell-cell communications mediating tissue homeostasis, development, and disease. To facilitate inference of spatially resolved cell-cell communications, we here present SpaTalk, which relies on a graph network and knowledge graph to model and score the ligand-receptor-target signaling network between spatially proximal cells by dissecting cell-type composition through a non-negative linear model and spatial mapping between single-cell transcriptomic and spatially resolved transcriptomic data. The benchmarked performance of SpaTalk on public single-cell spatial transcriptomic datasets is superior to that of existing inference methods. Then we apply SpaTalk to STARmap, Slide-seq, and 10X Visium data, revealing the in-depth communicative mechanisms underlying normal and disease tissues with spatial structure. SpaTalk can uncover spatially resolved cell-cell communications for single-cell and spot-based spatially resolved transcriptomic data universally, providing valuable insights into spatial inter-cellular tissue dynamics. Cell-cell communication is a vital feature involving numerous biological processes. Here, the authors develop SpaTalk, a cell-cell communication inference method using knowledge graph for spatially resolved transcriptomic data, providing valuable insights into spatial intercellular tissue dynamics.
Collapse
|
45
|
Wu X, Zhou Y. GE-Impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23:6651303. [PMID: 35901457 DOI: 10.1093/bib/bbac313] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/27/2022] [Accepted: 07/11/2022] [Indexed: 11/12/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has been widely used to depict gene expression profiles at the single-cell resolution. However, its relatively high dropout rate often results in artificial zero expressions of genes and therefore compromised reliability of results. To overcome such unwanted sparsity of scRNA-seq data, several imputation algorithms have been developed to recover the single-cell expression profiles. Here, we propose a novel approach, GE-Impute, to impute the dropout zeros in scRNA-seq data with graph embedding-based neural network model. GE-Impute learns the neural graph representation for each cell and reconstructs the cell-cell similarity network accordingly, which enables better imputation of dropout zeros based on the more accurately allocated neighbors in the similarity network. Gene expression correlation analysis between true expression data and simulated dropout data suggests significantly better performance of GE-Impute on recovering dropout zeros for both droplet- and plated-based scRNA-seq data. GE-Impute also outperforms other imputation methods in identifying differentially expressed genes and improving the unsupervised clustering on datasets from various scRNA-seq techniques. Moreover, GE-Impute enhances the identification of marker genes, facilitating the cell type assignment of clusters. In trajectory analysis, GE-Impute improves time-course scRNA-seq data analysis and reconstructing differentiation trajectory. The above results together demonstrate that GE-Impute could be a useful method to recover the single-cell expression profiles, thus enabling better biological interpretation of scRNA-seq data. GE-Impute is implemented in Python and is freely available at https://github.com/wxbCaterpillar/GE-Impute.
Collapse
Affiliation(s)
- Xiaobin Wu
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.,MOE Key Laboratory of Molecular Cardiovascular Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.,MOE Key Laboratory of Molecular Cardiovascular Sciences, Peking University, Beijing, China
| |
Collapse
|
46
|
Li Z, Feng H. A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data. Sci Rep 2022; 12:910. [PMID: 35042860 PMCID: PMC8766435 DOI: 10.1038/s41598-021-04473-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/21/2021] [Indexed: 02/01/2023] Open
Abstract
The fast-advancing single cell RNA sequencing (scRNA-seq) technology enables researchers to study the transcriptome of heterogeneous tissues at a single cell level. The initial important step of analyzing scRNA-seq data is usually to accurately annotate cells. The traditional approach of annotating cell types based on unsupervised clustering and marker genes is time-consuming and laborious. Taking advantage of the numerous existing scRNA-seq databases, many supervised label assignment methods have been developed. One feature that many label assignment methods shares is to label cells with low confidence as "unassigned." These unassigned cells can be the result of assignment difficulties due to highly similar cell types or caused by the presence of unknown cell types. However, when unknown cell types are not expected, existing methods still label a considerable number of cells as unassigned, which is not desirable. In this work, we develop a neural network-based cell annotation method called NeuCA (Neural network-based Cell Annotation) for scRNA-seq data obtained from well-studied tissues. NeuCA can utilize the hierarchical structure information of the cell types to improve the annotation accuracy, which is especially helpful when data contain closely correlated cell types. We show that NeuCA can achieve more accurate cell annotation results compared with existing methods. Additionally, the applications on eight real datasets show that NeuCA has stable performance for intra- and inter-study annotation, as well as cross-condition annotation. NeuCA is freely available as an R/Bioconductor package at https://bioconductor.org/packages/NeuCA .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
| |
Collapse
|