1
|
Xu H, Ye Y, Duan R, Gao Y, Hu Y, Gao L. Beaconet: A Reference-Free Method for Integrating Multiple Batches of Single-Cell Transcriptomic Data in Original Molecular Space. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2306770. [PMID: 38711214 DOI: 10.1002/advs.202306770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/02/2024] [Indexed: 05/08/2024]
Abstract
Integrating multiple single-cell datasets is essential for the comprehensive understanding of cell heterogeneity. Batch effect is the undesired systematic variations among technologies or experimental laboratories that distort biological signals and hinder the integration of single-cell datasets. However, existing methods typically rely on a selected dataset as a reference, leading to inconsistent integration performance using different references, or embed cells into uninterpretable low-dimensional feature space. To overcome these limitations, a reference-free method, Beaconet, for integrating multiple single-cell transcriptomic datasets in original molecular space by aligning the global distribution of each batch using an adversarial correction network is presented. Through extensive comparisons with 13 state-of-the-art methods, it is demonstrated that Beaconet can effectively remove batch effect while preserving biological variations and is superior to existing unsupervised methods using all possible references in overall performance. Furthermore, Beaconet performs integration in the original molecular feature space, enabling the characterization of cell types and downstream differential expression analysis directly using integrated data with gene-expression features. Additionally, when applying to large-scale atlas data integration, Beaconet shows notable advantages in both time- and space-efficiencies. In summary, Beaconet serves as an effective and efficient batch effect removal tool that can facilitate the integration of single-cell datasets in a reference-free and molecular feature-preserved mode.
Collapse
Affiliation(s)
- Han Xu
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, China
| | - Ran Duan
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 102616, China
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, V1V 1V7, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, 710126, China
| |
Collapse
|
2
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
3
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
4
|
Zeng Y, Luo M, Shangguan N, Shi P, Feng J, Xu J, Chen K, Lu Y, Yu W, Yang Y. Deciphering cell types by integrating scATAC-seq data with genome sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:285-298. [PMID: 38600256 DOI: 10.1038/s43588-024-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/18/2024] [Indexed: 04/12/2024]
Abstract
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Big Data and Software Engineering, Chongqing University, Chongqing, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Ningyuan Shangguan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Junxi Feng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
5
|
Li S, Li Y, Sun Y, Li Y, Chen X, Tang S, Chen S. EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data. Bioinformatics 2024; 40:btae191. [PMID: 38588573 PMCID: PMC11037479 DOI: 10.1093/bioinformatics/btae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/02/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
SUMMARY Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.
Collapse
Affiliation(s)
- Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yuxi Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yu Sun
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yaru Li
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
6
|
Zhang W, Cui Y, Liu B, Loza M, Park SJ, Nakai K. HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data. Brief Bioinform 2024; 25:bbae152. [PMID: 38581422 PMCID: PMC10998639 DOI: 10.1093/bib/bbae152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/19/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.
Collapse
Affiliation(s)
- Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
7
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
8
|
Wang X, Chai Z, Li S, Liu Y, Li C, Jiang Y, Liu Q. CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data. Bioinformatics 2024; 40:btae063. [PMID: 38317054 PMCID: PMC10873586 DOI: 10.1093/bioinformatics/btae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Effective identification of cell types is of critical importance in single-cell RNA-sequencing (scRNA-seq) data analysis. To date, many supervised machine learning-based predictors have been implemented to identify cell types from scRNA-seq datasets. Despite the technical advances of these state-of-the-art tools, most existing predictors were single classifiers, of which the performances can still be significantly improved. It is therefore highly desirable to employ the ensemble learning strategy to develop more accurate computational models for robust and comprehensive identification of cell types on scRNA-seq datasets. RESULTS We propose a two-layer stacking model, termed CTISL (Cell Type Identification by Stacking ensemble Learning), which integrates multiple classifiers to identify cell types. In the first layer, given a reference scRNA-seq dataset with known cell types, CTISL dynamically combines multiple cell-type-specific classifiers (i.e. support-vector machine and logistic regression) as the base learners to deliver the outcomes for the input of a meta-classifier in the second layer. We conducted a total of 24 benchmarking experiments on 17 human and mouse scRNA-seq datasets to evaluate and compare the prediction performance of CTISL and other state-of-the-art predictors. The experiment results demonstrate that CTISL achieves superior or competitive performance compared to these state-of-the-art approaches. We anticipate that CTISL can serve as a useful and reliable tool for cost-effective identification of cell types from scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/CTISLweb/home and https://zenodo.org/records/10568906, respectively.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ziyi Chai
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shaohua Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yu Jiang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
- Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis, Northwest A&F University, Yangling 712100, China
| |
Collapse
|
9
|
Lu C, Wei Y, Abbas M, Agula H, Wang E, Meng Z, Zhang R. Application of Single-Cell Assay for Transposase-Accessible Chromatin with High Throughput Sequencing in Plant Science: Advances, Technical Challenges, and Prospects. Int J Mol Sci 2024; 25:1479. [PMID: 38338756 PMCID: PMC10855595 DOI: 10.3390/ijms25031479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
The Single-cell Assay for Transposase-Accessible Chromatin with high throughput sequencing (scATAC-seq) has gained increasing popularity in recent years, allowing for chromatin accessibility to be deciphered and gene regulatory networks (GRNs) to be inferred at single-cell resolution. This cutting-edge technology now enables the genome-wide profiling of chromatin accessibility at the cellular level and the capturing of cell-type-specific cis-regulatory elements (CREs) that are masked by cellular heterogeneity in bulk assays. Additionally, it can also facilitate the identification of rare and new cell types based on differences in chromatin accessibility and the charting of cellular developmental trajectories within lineage-related cell clusters. Due to technical challenges and limitations, the data generated from scATAC-seq exhibit unique features, often characterized by high sparsity and noise, even within the same cell type. To address these challenges, various bioinformatic tools have been developed. Furthermore, the application of scATAC-seq in plant science is still in its infancy, with most research focusing on root tissues and model plant species. In this review, we provide an overview of recent progress in scATAC-seq and its application across various fields. We first conduct scATAC-seq in plant science. Next, we highlight the current challenges of scATAC-seq in plant science and major strategies for cell type annotation. Finally, we outline several future directions to exploit scATAC-seq technologies to address critical challenges in plant science, ranging from plant ENCODE(The Encyclopedia of DNA Elements) project construction to GRN inference, to deepen our understanding of the roles of CREs in plant biology.
Collapse
Affiliation(s)
- Chao Lu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yunxiao Wei
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Mubashir Abbas
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Hasi Agula
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Zhigang Meng
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Rui Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| |
Collapse
|
10
|
Yuan M, Wan H, Wang Z, Guo Q, Deng M. SPANN: annotating single-cell resolution spatial transcriptome data with scRNA-seq data. Brief Bioinform 2024; 25:bbad533. [PMID: 38279647 PMCID: PMC10818138 DOI: 10.1093/bib/bbad533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/13/2023] [Accepted: 12/19/2023] [Indexed: 01/28/2024] Open
Abstract
MOTIVATION The rapid development of spatial transcriptome technologies has enabled researchers to acquire single-cell-level spatial data at an affordable price. However, computational analysis tools, such as annotation tools, tailored for these data are still lacking. Recently, many computational frameworks have emerged to integrate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics datasets. While some frameworks can utilize well-annotated scRNA-seq data to annotate spatial expression patterns, they overlook critical aspects. First, existing tools do not explicitly consider cell type mapping when aligning the two modalities. Second, current frameworks lack the capability to detect novel cells, which remains a key interest for biologists. RESULTS To address these problems, we propose an annotation method for spatial transcriptome data called SPANN. The main tasks of SPANN are to transfer cell-type labels from well-annotated scRNA-seq data to newly generated single-cell resolution spatial transcriptome data and discover novel cells from spatial data. The major innovations of SPANN come from two aspects: SPANN automatically detects novel cells from unseen cell types while maintaining high annotation accuracy over known cell types. SPANN finds a mapping between spatial transcriptome samples and RNA data prototypes and thus conducts cell-type-level alignment. Comprehensive experiments using datasets from various spatial platforms demonstrate SPANN's capabilities in annotating known cell types and discovering novel cell states within complex tissue contexts. AVAILABILITY The source code of SPANN can be accessed at https://github.com/ddb-qiwang/SPANN-torch. CONTACT dengmh@math.pku.edu.cn.
Collapse
Affiliation(s)
- Musu Yuan
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Hui Wan
- School of Mathematical Sciences, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Zihao Wang
- Biomedical Interdisciplinary Research Center, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Qirui Guo
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Yiheyuan Road, 100871, Beijing, China
- School of Mathematical Sciences, Peking University, Yiheyuan Road, 100871, Beijing, China
- Center for Statistical Science, Peking University, Yiheyuan Road, 100871, Beijing, China
- Biomedical Interdisciplinary Research Center, Peking University, Yiheyuan Road, 100871, Beijing, China
| |
Collapse
|
11
|
Qian FC, Zhou LW, Zhu YB, Li YY, Yu ZM, Feng CC, Fang QL, Zhao Y, Cai FH, Wang QY, Tang HF, Li CQ. scATAC-Ref: a reference of scATAC-seq with known cell labels in multiple species. Nucleic Acids Res 2024; 52:D285-D292. [PMID: 37897340 PMCID: PMC10767920 DOI: 10.1093/nar/gkad924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/14/2023] [Accepted: 10/16/2023] [Indexed: 10/30/2023] Open
Abstract
Chromatin accessibility profiles at single cell resolution can reveal cell type-specific regulatory programs, help dissect highly specialized cell functions and trace cell origin and evolution. Accurate cell type assignment is critical for effectively gaining biological and pathological insights, but is difficult in scATAC-seq. Hence, by extensively reviewing the literature, we designed scATAC-Ref (https://bio.liclab.net/scATAC-Ref/), a manually curated scATAC-seq database aimed at providing a comprehensive, high-quality source of chromatin accessibility profiles with known cell labels across broad cell types. Currently, scATAC-Ref comprises 1 694 372 cells with known cell labels, across various biological conditions, >400 cell/tissue types and five species. We used uniform system environment and software parameters to perform comprehensive downstream analysis on these chromatin accessibility profiles with known labels, including gene activity score, TF enrichment score, differential chromatin accessibility regions, pathway/GO term enrichment analysis and co-accessibility interactions. The scATAC-Ref also provided a user-friendly interface to query, browse and visualize cell types of interest, thereby providing a valuable resource for exploring epigenetic regulation in different tissues and cell types.
Collapse
Affiliation(s)
- Feng-Cui Qian
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Li-Wei Zhou
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yan-Bing Zhu
- Beijing Clinical Research Institute, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Yan-Yu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
| | - Zheng-Min Yu
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Chen-Chen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
| | - Qiao-Li Fang
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Yu Zhao
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Fu-Hong Cai
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Qiu-Yu Wang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Hui-Fang Tang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Chun-Quan Li
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| |
Collapse
|
12
|
Shi X, Yang Y, Ma X, Zhou Y, Guo Z, Wang C, Liu J. Probabilistic cell/domain-type assignment of spatial transcriptomics data with SpatialAnno. Nucleic Acids Res 2023; 51:e115. [PMID: 37941153 PMCID: PMC10711557 DOI: 10.1093/nar/gkad1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 10/03/2023] [Accepted: 10/20/2023] [Indexed: 11/10/2023] Open
Abstract
In the analysis of both single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data, classifying cells/spots into cell/domain types is an essential analytic step for many secondary analyses. Most of the existing annotation methods have been developed for scRNA-seq datasets without any consideration of spatial information. Here, we present SpatialAnno, an efficient and accurate annotation method for spatial transcriptomics datasets, with the capability to effectively leverage a large number of non-marker genes as well as 'qualitative' information about marker genes without using a reference dataset. Uniquely, SpatialAnno estimates low-dimensional embeddings for a large number of non-marker genes via a factor model while promoting spatial smoothness among neighboring spots via a Potts model. Using both simulated and four real spatial transcriptomics datasets from the 10x Visium, ST, Slide-seqV1/2, and seqFISH platforms, we showcase the method's improved spatial annotation accuracy, including its robustness to the inclusion of marker genes for irrelevant cell/domain types and to various degrees of marker gene misspecification. SpatialAnno is computationally scalable and applicable to SRT datasets from different platforms. Furthermore, the estimated embeddings for cellular biological effects facilitate many downstream analyses.
Collapse
Affiliation(s)
- Xingjie Shi
- KLATASDS-MOE, Academy of Statistics and Interdisciplinary Sciences, School of Statistics, East China Normal University, Shanghai 200062, China
| | - Yi Yang
- The Key Laboratory of Developmental Genes and Human Disease, School of Life Science and Technology, Southeast University, Nanjing 210018, China
| | - Xiaohui Ma
- College of Life Sciences, Nanjing University, Nanjing 210033, China
| | - Yong Zhou
- KLATASDS-MOE, Academy of Statistics and Interdisciplinary Sciences, School of Statistics, East China Normal University, Shanghai 200062, China
| | - Zhenxing Guo
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen 518172, China
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430070, China
| | - Jin Liu
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen 518172, China
| |
Collapse
|
13
|
Li K, Chen X, Song S, Hou L, Chen S, Jiang R. Cofea: correlation-based feature selection for single-cell chromatin accessibility data. Brief Bioinform 2023; 25:bbad458. [PMID: 38113078 PMCID: PMC10782922 DOI: 10.1093/bib/bbad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/19/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.
Collapse
Affiliation(s)
- Keyi Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
14
|
Zhang W, Jiang R, Chen S, Wang Y. scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data. Genome Biol 2023; 24:225. [PMID: 37814314 PMCID: PMC10561408 DOI: 10.1186/s13059-023-03072-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 09/22/2023] [Indexed: 10/11/2023] Open
Abstract
Application of the widely used droplet-based microfluidic technologies in single-cell sequencing often yields doublets, introducing bias to downstream analyses. Especially, doublet-detection methods for single-cell chromatin accessibility sequencing (scCAS) data have multiple assay-specific challenges. Therefore, we propose scIBD, a self-supervised iterative-optimizing model for boosting heterotypic doublet detection in scCAS data. scIBD introduces an adaptive strategy to simulate high-confident heterotypic doublets and self-supervise for doublet-detection in an iteratively optimizing manner. Comprehensive benchmarking on various simulated and real datasets demonstrates the outperformance and robustness of scIBD. Moreover, the downstream biological analyses suggest the efficacy of doublet-removal by scIBD.
Collapse
Affiliation(s)
- Wenhao Zhang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, 361000, Fujian, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361000, Fujian, China.
- Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen, 361005, Fujian, China.
| |
Collapse
|
15
|
Li Z, Chen X, Zhang X, Jiang R, Chen S. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res 2023; 33:1757-1773. [PMID: 37903634 PMCID: PMC10691543 DOI: 10.1101/gr.277891.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/19/2023] [Indexed: 11/01/2023]
Abstract
Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.
Collapse
Affiliation(s)
- Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
16
|
Tian L, Xie Y, Xie Z, Tian J, Tian W. AtacAnnoR: a reference-based annotation tool for single cell ATAC-seq data. Brief Bioinform 2023; 24:bbad268. [PMID: 37497729 DOI: 10.1093/bib/bbad268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 06/14/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Here, we present AtacAnnoR, a two-round annotation method for scATAC-seq data using well-annotated scRNA-seq data as reference. We evaluate AtacAnnoR's performance against six competing methods on 11 benchmark datasets. Our results show that AtacAnnoR achieves the highest mean accuracy and the highest mean balanced accuracy and performs particularly well when unpaired scRNA-seq data are used as the reference. Furthermore, AtacAnnoR implements a 'Combine and Discard' strategy to further improve annotation accuracy when annotations of multiple references are available. AtacAnnoR has been implemented in an R package and can be directly integrated into currently popular scATAC-seq analysis pipelines.
Collapse
Affiliation(s)
- Lejin Tian
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Yunxiao Xie
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Zhaobin Xie
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | | | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
- Children's Hospital of Shandong University, Jinan, China
| |
Collapse
|
17
|
Logotheti S, Papadaki E, Zolota V, Logothetis C, Vrahatis AG, Soundararajan R, Tzelepi V. Lineage Plasticity and Stemness Phenotypes in Prostate Cancer: Harnessing the Power of Integrated "Omics" Approaches to Explore Measurable Metrics. Cancers (Basel) 2023; 15:4357. [PMID: 37686633 PMCID: PMC10486655 DOI: 10.3390/cancers15174357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
Prostate cancer (PCa), the most frequent and second most lethal cancer type in men in developed countries, is a highly heterogeneous disease. PCa heterogeneity, therapy resistance, stemness, and lethal progression have been attributed to lineage plasticity, which refers to the ability of neoplastic cells to undergo phenotypic changes under microenvironmental pressures by switching between developmental cell states. What remains to be elucidated is how to identify measurements of lineage plasticity, how to implement them to inform preclinical and clinical research, and, further, how to classify patients and inform therapeutic strategies in the clinic. Recent research has highlighted the crucial role of next-generation sequencing technologies in identifying potential biomarkers associated with lineage plasticity. Here, we review the genomic, transcriptomic, and epigenetic events that have been described in PCa and highlight those with significance for lineage plasticity. We further focus on their relevance in PCa research and their benefits in PCa patient classification. Finally, we explore ways in which bioinformatic analyses can be used to determine lineage plasticity based on large omics analyses and algorithms that can shed light on upstream and downstream events. Most importantly, an integrated multiomics approach may soon allow for the identification of a lineage plasticity signature, which would revolutionize the molecular classification of PCa patients.
Collapse
Affiliation(s)
- Souzana Logotheti
- Department of Pathology, University of Patras, 26504 Patras, Greece; (S.L.); (E.P.); (V.Z.)
| | - Eugenia Papadaki
- Department of Pathology, University of Patras, 26504 Patras, Greece; (S.L.); (E.P.); (V.Z.)
- Department of Informatics, Ionian University, 49100 Corfu, Greece;
| | - Vasiliki Zolota
- Department of Pathology, University of Patras, 26504 Patras, Greece; (S.L.); (E.P.); (V.Z.)
| | - Christopher Logothetis
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA;
| | | | - Rama Soundararajan
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Vasiliki Tzelepi
- Department of Pathology, University of Patras, 26504 Patras, Greece; (S.L.); (E.P.); (V.Z.)
| |
Collapse
|
18
|
Li C, Chen X, Chen S, Jiang R, Zhang X. simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data. Bioinformatics 2023; 39:btad453. [PMID: 37494428 PMCID: PMC10394124 DOI: 10.1093/bioinformatics/btad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/25/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION simCAS is freely available at https://github.com/Chen-Li-17/simCAS.
Collapse
Affiliation(s)
- Chen Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, School of Life Sciences and School of Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
19
|
Zhang F, Jiao H, Wang Y, Yang C, Li L, Wang Z, Tong R, Zhou J, Shen J, Li L. InferLoop: leveraging single-cell chromatin accessibility for the signal of chromatin loop. Brief Bioinform 2023; 24:7150740. [PMID: 37139553 DOI: 10.1093/bib/bbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 05/05/2023] Open
Abstract
Deciphering cell-type-specific 3D structures of chromatin is challenging. Here, we present InferLoop, a novel method for inferring the strength of chromatin interaction using single-cell chromatin accessibility data. The workflow of InferLoop is, first, to conduct signal enhancement by grouping nearby cells into bins, and then, for each bin, leverage accessibility signals for loop signals using a newly constructed metric that is similar to the perturbation of the Pearson correlation coefficient. In this study, we have described three application scenarios of InferLoop, including the inference of cell-type-specific loop signals, the prediction of gene expression levels and the interpretation of intergenic loci. The effectiveness and superiority of InferLoop over other methods in those three scenarios are rigorously validated by using the single-cell 3D genome structure data of human brain cortex and human blood, the single-cell multi-omics data of human blood and mouse brain cortex, and the intergenic loci in the GWAS Catalog database as well as the GTEx database, respectively. In addition, InferLoop can be applied to predict loop signals of individual spots using the spatial chromatin accessibility data of mouse embryo. InferLoop is available at https://github.com/jumphone/inferloop.
Collapse
Affiliation(s)
- Feng Zhang
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Huiyuan Jiao
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Yihao Wang
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200025, China
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai 201109, China
| | - Chen Yang
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Linying Li
- Department of Central Laboratory, Shanghai Children's Hospital, School of medicine, Shanghai Jiao Tong University, Shanghai 200062, China
| | - Zhiming Wang
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Ran Tong
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Junmei Zhou
- Department of Central Laboratory, Shanghai Children's Hospital, School of medicine, Shanghai Jiao Tong University, Shanghai 200062, China
| | - Jianfeng Shen
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200025, China
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai 201109, China
| | - Lingjie Li
- Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| |
Collapse
|
20
|
Ma W, Lu J, Wu H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat Commun 2023; 14:1864. [PMID: 37012226 PMCID: PMC10070275 DOI: 10.1038/s41467-023-37439-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 03/15/2023] [Indexed: 04/05/2023] Open
Abstract
Computational cell type identification is a fundamental step in single-cell omics data analysis. Supervised celltyping methods have gained increasing popularity in single-cell RNA-seq data because of the superior performance and the availability of high-quality reference datasets. Recent technological advances in profiling chromatin accessibility at single-cell resolution (scATAC-seq) have brought new insights to the understanding of epigenetic heterogeneity. With continuous accumulation of scATAC-seq datasets, supervised celltyping method specifically designed for scATAC-seq is in urgent need. Here we develop Cellcano, a computational method based on a two-round supervised learning algorithm to identify cell types from scATAC-seq data. The method alleviates the distributional shift between reference and target data and improves the prediction performance. After systematically benchmarking Cellcano on 50 well-designed celltyping tasks from various datasets, we show that Cellcano is accurate, robust, and computationally efficient. Cellcano is well-documented and freely available at https://marvinquiet.github.io/Cellcano/ .
Collapse
Affiliation(s)
- Wenjing Ma
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Jiaying Lu
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, P. R. China.
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA, 30322, USA.
| |
Collapse
|
21
|
Ding K, Sun S, Luo Y, Long C, Zhai J, Zhai Y, Wang G. PlantCADB: A Comprehensive Plant Chromatin Accessibility Database. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:311-323. [PMID: 36328151 PMCID: PMC10626055 DOI: 10.1016/j.gpb.2022.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/25/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022]
Abstract
Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution.PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.
Collapse
Affiliation(s)
- Ke Ding
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Shanwen Sun
- College of Life Science, Northeast Forestry University, Harbin 150040, China
| | - Yang Luo
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Chaoyue Long
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Jingwen Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| |
Collapse
|
22
|
Dong X, Chowdhury S, Victor U, Li X, Qian L. Semi-Supervised Deep Learning for Cell Type Identification From Single-Cell Transcriptomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1492-1505. [PMID: 35536811 DOI: 10.1109/tcbb.2022.3173587] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. Deep neural networks have been employed to identify cell types from scRNAseq data with high performance. However, it requires a large mount of individual cells with accurate and unbiased annotated types to train the identification models. Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as it involves manual inspection of marker genes. To overcome this challenge, we propose a semi-supervised learning model "SemiRNet" to use unlabeled scRNAseq cells and a limited amount of labeled scRNAseq cells to implement cell identification. The proposed model is based on recurrent convolutional neural networks (RCNN), which includes a shared network, a supervised network and an unsupervised network. The proposed model is evaluated on two large scale single-cell transcriptomic datasets. It is observed that the proposed model is able to achieve encouraging performance by learning on the very limited amount of labeled scRNAseq cells together with a large number of unlabeled scRNAseq cells.
Collapse
|
23
|
Gan M, Ma Y. Mapping user interest into hyper-spherical space: A novel POI recommendation method. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
24
|
Zhang Z, Chen S, Lin Z. RefTM: reference-guided topic modeling of single-cell chromatin accessibility data. Brief Bioinform 2023; 24:6895319. [PMID: 36513377 DOI: 10.1093/bib/bbac540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/27/2022] [Accepted: 11/09/2022] [Indexed: 12/15/2022] Open
Abstract
Single-cell analysis is a valuable approach for dissecting the cellular heterogeneity, and single-cell chromatin accessibility sequencing (scCAS) can profile the epigenetic landscapes for thousands of individual cells. It is challenging to analyze scCAS data, because of its high dimensionality and a higher degree of sparsity compared with scRNA-seq data. Topic modeling in single-cell data analysis can lead to robust identification of the cell types and it can provide insight into the regulatory mechanisms. Reference-guided approach may facilitate the analysis of scCAS data by utilizing the information in existing datasets. We present RefTM (Reference-guided Topic Modeling of single-cell chromatin accessibility data), which not only utilizes the information in existing bulk chromatin accessibility and annotated scCAS data, but also takes advantage of topic models for single-cell data analysis. RefTM simultaneously models: (1) the shared biological variation among reference data and the target scCAS data; (2) the unique biological variation in scCAS data; (3) other variations from known covariates in scCAS data.
Collapse
Affiliation(s)
- Zheng Zhang
- Department of Statistics in the Chinese University of Hong Kong
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC in Nankai university
| | - Zhixiang Lin
- Department of Statistics in the Chinese University of Hong Kong
| |
Collapse
|
25
|
Cao K, Gong Q, Hong Y, Wan L. A unified computational framework for single-cell data integration with optimal transport. Nat Commun 2022; 13:7419. [PMID: 36456571 PMCID: PMC9715710 DOI: 10.1038/s41467-022-35094-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 11/18/2022] [Indexed: 12/05/2022] Open
Abstract
Single-cell data integration can provide a comprehensive molecular view of cells. However, how to integrate heterogeneous single-cell multi-omics as well as spatially resolved transcriptomic data remains a major challenge. Here we introduce uniPort, a unified single-cell data integration framework that combines a coupled variational autoencoder (coupled-VAE) and minibatch unbalanced optimal transport (Minibatch-UOT). It leverages both highly variable common and dataset-specific genes for integration to handle the heterogeneity across datasets, and it is scalable to large-scale datasets. uniPort jointly embeds heterogeneous single-cell multi-omics datasets into a shared latent space. It can further construct a reference atlas for gene imputation across datasets. Meanwhile, uniPort provides a flexible label transfer framework to deconvolute heterogeneous spatial transcriptomic data using an optimal transport plan, instead of embedding latent space. We demonstrate the capability of uniPort by applying it to integrate a variety of datasets, including single-cell transcriptomics, chromatin accessibility, and spatially resolved transcriptomic data.
Collapse
Affiliation(s)
- Kai Cao
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qiyu Gong
- grid.16821.3c0000 0004 0368 8293Shanghai Institute of Immunology, Faculty of Basic Medicine, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yiguang Hong
- grid.24516.340000000123704535Department of Control Science and Engineering, Tongji University, Shanghai, China
| | - Lin Wan
- grid.484479.2LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ,grid.410726.60000 0004 1797 8419School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
26
|
Jafari E, Johnson T, Wang Y, Liu Y, Huang K, Wang Y. AIscEA: unsupervised integration of single-cell gene expression and chromatin accessibility via their biological consistency. Bioinformatics 2022; 38:5236-5244. [PMID: 36250795 PMCID: PMC9710555 DOI: 10.1093/bioinformatics/btac683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 10/07/2022] [Accepted: 10/14/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The integrative analysis of single-cell gene expression and chromatin accessibility measurements is essential for revealing gene regulation, but it is one of the key challenges in computational biology. Gene expression and chromatin accessibility are measurements from different modalities, and no common features can be directly used to guide integration. Current state-of-the-art methods lack practical solutions for finding heterogeneous clusters. However, previous methods might not generate reliable results when cluster heterogeneity exists. More importantly, current methods lack an effective way to select hyper-parameters under an unsupervised setting. Therefore, applying computational methods to integrate single-cell gene expression and chromatin accessibility measurements remains difficult. RESULTS We introduce AIscEA-Alignment-based Integration of single-cell gene Expression and chromatin Accessibility-a computational method that integrates single-cell gene expression and chromatin accessibility measurements using their biological consistency. AIscEA first defines a ranked similarity score to quantify the biological consistency between cell clusters across measurements. AIscEA then uses the ranked similarity score and a novel permutation test to identify cluster alignment across measurements. AIscEA further utilizes graph alignment for the aligned cell clusters to align the cells across measurements. We compared AIscEA with the competing methods on several benchmark datasets and demonstrated that AIscEA is highly robust to the choice of hyper-parameters and can better handle the cluster heterogeneity problem. Furthermore, AIscEA significantly outperforms the state-of-the-art methods when integrating real-world SNARE-seq and scMultiome-seq datasets in terms of integration accuracy. AVAILABILITY AND IMPLEMENTATION AIscEA is available at https://figshare.com/articles/software/AIscEA_zip/21291135 on FigShare as well as {https://github.com/elhaam/AIscEA} onGitHub. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elham Jafari
- Computer Science Department, Indiana University, Bloomington, IN 47408, USA
| | - Travis Johnson
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yue Wang
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yunlong Liu
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Kun Huang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN 47408, USA
| |
Collapse
|
27
|
Yang L, Zhang P, Wang Y, Hu G, Guo W, Gu X, Pu L. Plant synthetic epigenomic engineering for crop improvement. SCIENCE CHINA. LIFE SCIENCES 2022; 65:2191-2204. [PMID: 35851940 DOI: 10.1007/s11427-021-2131-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 05/17/2022] [Indexed: 06/15/2023]
Abstract
Efforts have been directed to redesign crops with increased yield, stress adaptability, and nutritional value through synthetic biology-the application of engineering principles to biology. A recent expansion in our understanding of how epigenetic mechanisms regulate plant development and stress responses has unveiled a new set of resources that can be harnessed to develop improved crops, thus heralding the promise of "synthetic epigenetics." In this review, we summarize the latest advances in epigenetic regulation and highlight how innovative sequencing techniques, epigenetic editing, and deep learning-driven predictive tools can rapidly extend these insights. We also proposed the future directions of synthetic epigenetics for the development of engineered smart crops that can actively monitor and respond to internal and external cues throughout their life cycles.
Collapse
Affiliation(s)
- Liwen Yang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Pingxian Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yifan Wang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Guihua Hu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Weijun Guo
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Xiaofeng Gu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Li Pu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
28
|
Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med 2022; 150:106127. [PMID: 36182762 DOI: 10.1016/j.compbiomed.2022.106127] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/27/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Computational drug repositioning is an effective way to find new indications for existing drugs, thus can accelerate drug development and reduce experimental costs. Recently, various deep learning-based repurposing methods have been established to identify the potential drug-disease associations (DDA). However, effective utilization of the relations of biological entities to capture the biological interactions to enhance the drug-disease association prediction is still challenging. To resolve the above problem, we proposed a heterogeneous graph neural network called REDDA (Relations-Enhanced Drug-Disease Association prediction). Assembled with three attention mechanisms, REDDA can sequentially learn drug/disease representations by a general heterogeneous graph convolutional network-based node embedding block, a topological subnet embedding block, a graph attention block, and a layer attention block. Performance comparisons on our proposed benchmark dataset show that REDDA outperforms 8 advanced drug-disease association prediction methods, achieving relative improvements of 0.76% on the area under the receiver operating characteristic curve (AUC) score and 13.92% on the precision-recall curve (AUPR) score compared to the suboptimal method. On the other benchmark dataset, REDDA also obtains relative improvements of 2.48% on the AUC score and 4.93% on the AUPR score. Specifically, case studies also indicate that REDDA can give valid predictions for the discovery of -new indications for drugs and new therapies for diseases. The overall results provide an inspiring potential for REDDA in the in silico drug development. The proposed benchmark dataset and source code are available in https://github.com/gu-yaowen/REDDA.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China.
| |
Collapse
|
29
|
Wan H, Chen L, Deng M. scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:939-958. [PMID: 36608843 PMCID: PMC10025768 DOI: 10.1016/j.gpb.2022.12.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 11/30/2022] [Accepted: 12/11/2022] [Indexed: 01/05/2023]
Abstract
Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achieving feature alignment between source and target data explicitly is impossible. Additionally, these methods are barely able to discover the presence of novel cell types. A subjective threshold is often selected by users to detect novel cells. We propose a universal annotation framework for scRNA-seq data called scEMAIL, which automatically detects novel cell types without accessing source data during adaptation. For new cell-type identification, a novel cell-type perception module is designed with three steps. First, an expert ensemble system measures uncertainty of each cell from three complementary aspects. Second, based on this measurement, bimodality tests are applied to detect the presence of new cell types. Third, once assured of their presence, an adaptive threshold via manifold mixup partitions target cells into "known" and "unknown" groups. Model adaptation is then conducted to alleviate the batch effect. We gather multi-order neighborhood messages globally and impose local affinity regularizations on "known" cells. These constraints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors. scEMAIL is accurate and robust under various scenarios in both simulation and real data. It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of superiority. The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/v1.0.
Collapse
Affiliation(s)
- Hui Wan
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing 100080, China.
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing 100871, China; Center for Statistical Science, Peking University, Beijing 100871, China; Center for Quantitative Biology, Peking University, Beijing 100871, China.
| |
Collapse
|
30
|
Qiao Y, Zhao L, Luo C, Luo Y, Wu Y, Li S, Bu D, Zhao Y. Multi-modality artificial intelligence in digital pathology. Brief Bioinform 2022; 23:6702380. [PMID: 36124675 PMCID: PMC9677480 DOI: 10.1093/bib/bbac367] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/27/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022] Open
Abstract
In common medical procedures, the time-consuming and expensive nature of obtaining test results plagues doctors and patients. Digital pathology research allows using computational technologies to manage data, presenting an opportunity to improve the efficiency of diagnosis and treatment. Artificial intelligence (AI) has a great advantage in the data analytics phase. Extensive research has shown that AI algorithms can produce more up-to-date and standardized conclusions for whole slide images. In conjunction with the development of high-throughput sequencing technologies, algorithms can integrate and analyze data from multiple modalities to explore the correspondence between morphological features and gene expression. This review investigates using the most popular image data, hematoxylin-eosin stained tissue slide images, to find a strategic solution for the imbalance of healthcare resources. The article focuses on the role that the development of deep learning technology has in assisting doctors' work and discusses the opportunities and challenges of AI.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lianhe Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| | - Chunlong Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yufan Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Shengtong Li
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| |
Collapse
|
31
|
Liu Q, Hua K, Zhang X, Wong WH, Jiang R. DeepCAGE: Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:496-507. [PMID: 35293310 PMCID: PMC9801045 DOI: 10.1016/j.gpb.2021.08.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/31/2021] [Accepted: 09/27/2021] [Indexed: 01/26/2023]
Abstract
Although computational approaches have been complementing high-throughput biological experiments for the identification of functional regions in the human genome, it remains a great challenge to systematically decipher interactions between transcription factors (TFs) and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular contexts. To solve this problem, we propose DeepCAGE, a deep learning framework that integrates sequence information and binding statuses of TFs, for the accurate prediction of chromatin accessible regions at a genome-wide scale in a variety of cell types. DeepCAGE takes advantage of a densely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expression levels and binding activities of human core TFs to predict novel chromatin accessible regions. In a series of systematic comparisons with existing methods, DeepCAGE exhibits superior performance in not only the classification but also the regression of chromatin accessibility signals. In a detailed analysis of TF activities, DeepCAGE successfully extracts novel binding motifs and measures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type. When applied to whole-genome sequencing data analysis, our method successfully prioritizes putative deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants. DeepCAGE can be downloaded from https://github.com/kimmo1019/DeepCAGE.
Collapse
Affiliation(s)
- Qiao Liu
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Kui Hua
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA,Corresponding authors.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Corresponding authors.
| |
Collapse
|
32
|
Yang M, Lin C, Wang Y, Chen K, Han Y, Zhang H, Li W. Cytokine storm promoting T cell exhaustion in severe COVID-19 revealed by single cell sequencing data analysis. PRECISION CLINICAL MEDICINE 2022; 5:pbac014. [PMID: 35694714 PMCID: PMC9172646 DOI: 10.1093/pcmedi/pbac014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 04/28/2022] [Accepted: 05/11/2022] [Indexed: 12/15/2022] Open
Abstract
Background Evidence has suggested that cytokine storms may be associated with T cell exhaustion (TEX) in COVID-19. However, the interaction mechanism between cytokine storms and TEX remains unclear. Methods With the aim of dissecting the molecular relationship of cytokine storms and TEX through single-cell RNA sequencing data analysis, we identified 14 cell types from bronchoalveolar lavage fluid of COVID-19 patients and healthy people. We observed a novel subset of severely exhausted CD8 T cells (Exh T_CD8) that co-expressed multiple inhibitory receptors, and two macrophage subclasses that were the main source of cytokine storms in bronchoalveolar. Results Correlation analysis between cytokine storm level and TEX level suggested that cytokine storms likely promoted TEX in severe COVID-19. Cell–cell communication analysis indicated that cytokines (e.g. CXCL10, CXCL11, CXCL2, CCL2, and CCL3) released by macrophages acted as ligands and significantly interacted with inhibitory receptors (e.g. CXCR3, DPP4, CCR1, CCR2, and CCR5) expressed by Exh T_CD8. These interactions formed the cytokine–receptor axes, which were also verified to be significantly correlated with cytokine storms and TEX in lung squamous cell carcinoma. Conclusions Cytokine storms may promote TEX through cytokine-receptor axes and be associated with poor prognosis in COVID-19. Blocking cytokine-receptor axes may reverse TEX. Our finding provides novel insights into TEX in COVID-19 and new clues for cytokine-targeted immunotherapy development.
Collapse
Affiliation(s)
- Minglei Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Chenghao Lin
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Yanni Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Kang Chen
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Yutong Han
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou 510080, China
- Key Laboratory of Tropical Disease Control of Ministry of Education, Sun Yat-Sen University, Guangzhou 510080, China
| |
Collapse
|
33
|
Xu S, Skarica M, Hwang A, Dai Y, Lee C, Girgenti MJ, Zhang J. Translator: A Transfer Learning Approach to Facilitate Single-Cell ATAC-Seq Data Analysis fr om Reference Dataset. J Comput Biol 2022; 29:619-633. [PMID: 35584295 PMCID: PMC9464368 DOI: 10.1089/cmb.2021.0596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Recent advances in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) have allowed simultaneous epigenetic profiling over thousands of individual cells to dissect the cellular heterogeneity and elucidate regulatory mechanisms at the finest possible resolution. However, scATAC-seq is challenging to model computationally due to the ultra-high dimensionality, low signal-to-noise ratio, complex feature interactions, and high vulnerability to various confounding factors. In this study, we present Translator, an efficient transfer learning approach to capture generalizable chromatin interactions from high-quality (HQ) reference scATAC-seq data to obtain robust cell representations in low-to-moderate quality target scATAC-seq data. We applied Translator on various simulated and real scATAC-seq datasets and demonstrated that Translator could learn more biologically meaningful cell representations than other methods by incorporating information learned from the reference data, thus facilitating various downstream analyses such as clustering and motif enrichment measurements. Moreover, Translator's block-wise deep learning framework can handle nonlinear relationships with restricted connections using fewer parameters to boost computational efficiency through Graphics Processing Unit (GPU) parallelism. Finally, we have implemented Translator as a free software package available for the community to leverage large-scale, HQ reference data to study target scATAC-seq data.
Collapse
Affiliation(s)
- Siwei Xu
- Department of Computer Science, University of California, Irvine, California, USA
| | - Mario Skarica
- Department of Neuroscience, School of Medicine, Yale University, New Haven, Connecticut, USA
| | - Ahyeon Hwang
- Mathematical, Computational, and Systems Biology, University of California, Irvine, California, USA
| | - Yi Dai
- Department of Computer Science, University of California, Irvine, California, USA
| | - Cheyu Lee
- Department of Computer Science, University of California, Irvine, California, USA
| | - Matthew J Girgenti
- Department of Psychiatry, School of Medicine, Yale University, New Haven, Connecticut, USA.,Clinical Neurosciences Division, National Center for PTSD U.S. Department of Veterans Affairs, Washington, DC, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, California, USA
| |
Collapse
|