1
|
Wang C, Acosta D, McNutt M, Bian J, Ma A, Fu H, Ma Q. A single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). Nat Commun 2024; 15:4710. [PMID: 38844475 PMCID: PMC11156951 DOI: 10.1038/s41467-024-49133-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
Alzheimer's Disease (AD) pathology has been increasingly explored through single-cell and single-nucleus RNA-sequencing (scRNA-seq & snRNA-seq) and spatial transcriptomics (ST). However, the surge in data demands a comprehensive, user-friendly repository. Addressing this, we introduce a single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). It offers a broader spectrum of AD-related datasets, an optimized analytical pipeline, and improved usability. The database encompasses 1,053 samples (277 integrated datasets) from 67 AD-related scRNA-seq & snRNA-seq studies, totaling 7,332,202 cells. Additionally, it archives 381 ST datasets from 18 human and mouse brain studies. Each dataset is annotated with details such as species, gender, brain region, disease/control status, age, and AD Braak stages. ssREAD also provides an analysis suite for cell clustering, identification of differentially expressed and spatially variable genes, cell-type-specific marker genes and regulons, and spot deconvolution for integrative analysis. ssREAD is freely available at https://bmblx.bmi.osumc.edu/ssread/ .
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Diana Acosta
- Department of Neuroscience, The Ohio State University, Columbus, OH, 43210, USA
| | - Megan McNutt
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, 32606, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Hongjun Fu
- Department of Neuroscience, The Ohio State University, Columbus, OH, 43210, USA.
- Chronic Brain Injury Program, The Ohio State University, Columbus, OH, 43210, USA.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
2
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
3
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
4
|
Armingol E, Baghdassarian HM, Lewis NE. The diversification of methods for studying cell-cell interactions and communication. Nat Rev Genet 2024; 25:381-400. [PMID: 38238518 PMCID: PMC11139546 DOI: 10.1038/s41576-023-00685-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/01/2023] [Indexed: 05/20/2024]
Abstract
No cell lives in a vacuum, and the molecular interactions between cells define most phenotypes. Transcriptomics provides rich information to infer cell-cell interactions and communication, thus accelerating the discovery of the roles of cells within their communities. Such research relies heavily on algorithms that infer which cells are interacting and the ligands and receptors involved. Specific pressures on different research niches are driving the evolution of next-generation computational tools, enabling new conceptual opportunities and technological advances. More sophisticated algorithms now account for the heterogeneity and spatial organization of cells, multiple ligand types and intracellular signalling events, and enable the use of larger and more complex datasets, including single-cell and spatial transcriptomics. Similarly, new high-throughput experimental methods are increasing the number and resolution of interactions that can be analysed simultaneously. Here, we explore recent progress in cell-cell interaction research and highlight the diversification of the next generation of tools, which have yielded a rich ecosystem of tools for different applications and are enabling invaluable discoveries.
Collapse
Affiliation(s)
- Erick Armingol
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA.
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
| | - Hratch M Baghdassarian
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Nathan E Lewis
- Department of Paediatrics, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
5
|
Valous NA, Popp F, Zörnig I, Jäger D, Charoentong P. Graph machine learning for integrated multi-omics analysis. Br J Cancer 2024:10.1038/s41416-024-02706-7. [PMID: 38729996 DOI: 10.1038/s41416-024-02706-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open
Abstract
Multi-omics experiments at bulk or single-cell resolution facilitate the discovery of hypothesis-generating biomarkers for predicting response to therapy, as well as aid in uncovering mechanistic insights into cellular and microenvironmental processes. Many methods for data integration have been developed for the identification of key elements that explain or predict disease risk or other biological outcomes. The heterogeneous graph representation of multi-omics data provides an advantage for discerning patterns suitable for predictive/exploratory analysis, thus permitting the modeling of complex relationships. Graph-based approaches-including graph neural networks-potentially offer a reliable methodological toolset that can provide a tangible alternative to scientists and clinicians that seek ideas and implementation strategies in the integrated analysis of their omics sets for biomedical research. Graph-based workflows continue to push the limits of the technological envelope, and this perspective provides a focused literature review of research articles in which graph machine learning is utilized for integrated multi-omics data analyses, with several examples that demonstrate the effectiveness of graph-based approaches.
Collapse
Affiliation(s)
- Nektarios A Valous
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany.
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.
| | - Ferdinand Popp
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Inka Zörnig
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Dirk Jäger
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Pornpimol Charoentong
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| |
Collapse
|
6
|
Trimbour R, Deutschmann IM, Cantini L. Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS. Bioinformatics 2024; 40:btae143. [PMID: 38460192 PMCID: PMC11065476 DOI: 10.1093/bioinformatics/btae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/20/2023] [Accepted: 03/07/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The molecular identity of a cell results from a complex interplay between heterogeneous molecular layers. Recent advances in single-cell sequencing technologies have opened the possibility to measure such molecular layers of regulation. RESULTS Here, we present HuMMuS, a new method for inferring regulatory mechanisms from single-cell multi-omics data. Differently from the state-of-the-art, HuMMuS captures cooperation between biological macromolecules and can easily include additional layers of molecular regulation. We benchmarked HuMMuS with respect to the state-of-the-art on both paired and unpaired multi-omics datasets. Our results proved the improvements provided by HuMMuS in terms of transcription factor (TF) targets, TF binding motifs and regulatory regions prediction. Finally, once applied to snmC-seq, scATAC-seq and scRNA-seq data from mouse brain cortex, HuMMuS enabled to accurately cluster scRNA profiles and to identify potential driver TFs. AVAILABILITY AND IMPLEMENTATION HuMMuS is available at https://github.com/cantinilab/HuMMuS.
Collapse
Affiliation(s)
- Remi Trimbour
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Ina Maria Deutschmann
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| |
Collapse
|
7
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
8
|
Zhang W, Yu R, Xu Z, Li J, Gao W, Jiang M, Dai Q. scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene-gene interactions. BMC Genomics 2024; 25:423. [PMID: 38684946 PMCID: PMC11059774 DOI: 10.1186/s12864-024-10286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Single-cell clustering has played an important role in exploring the molecular mechanisms about cell differentiation and human diseases. Due to highly-stochastic transcriptomics data, accurate detection of cell types is still challenged, especially for RNA-sequencing data from human beings. In this case, deep neural networks have been increasingly employed to mine cell type specific patterns and have outperformed statistic approaches in cell clustering. RESULTS Using cross-correlation to capture gene-gene interactions, this study proposes the scCompressSA method to integrate topological patterns from scRNA-seq data, with support of self-attention (SA) based coefficient compression (CC) block. This SA-based CC block is able to extract and employ static gene-gene interactions from scRNA-seq data. This proposed scCompressSA method has enhanced clustering accuracy in multiple benchmark scRNA-seq datasets by integrating topological and temporal features. CONCLUSION Static gene-gene interactions have been extracted as temporal features to boost clustering performance in single-cell clustering For the scCompressSA method, dual-channel SA based CC block is able to integrate topological features and has exhibited extraordinary detection accuracy compared with previous clustering approaches that only employ temporal patterns.
Collapse
Affiliation(s)
- Wei Zhang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Ruochen Yu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Zeqi Xu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Junnan Li
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Wenhao Gao
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Mingfeng Jiang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| | - Qi Dai
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| |
Collapse
|
9
|
Wang C, Acosta D, McNutt M, Bian J, Ma A, Fu H, Ma Q. A Single-cell and Spatial RNA-seq Database for Alzheimer's Disease (ssREAD). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.556944. [PMID: 37745592 PMCID: PMC10515769 DOI: 10.1101/2023.09.08.556944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Alzheimer's Disease (AD) pathology has been increasingly explored through single-cell and single-nucleus RNA-sequencing (scRNA-seq & snRNA-seq) and spatial transcriptomics (ST). However, the surge in data demands a comprehensive, user-friendly repository. Addressing this, we introduce a single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). It offers a broader spectrum of AD-related datasets, an optimized analytical pipeline, and improved usability. The database encompasses 1,053 samples (277 integrated datasets) from 67 AD-related scRNA-seq & snRNA-seq studies, totaling 7,332,202 cells. Additionally, it archives 381 ST datasets from 18 human and mouse brain studies. Each dataset is annotated with details such as species, gender, brain region, disease/control status, age, and AD Braak stages. ssREAD also provides an analysis suite for cell clustering, identification of differentially expressed and spatially variable genes, cell-type-specific marker genes and regulons, and spot deconvolution for integrative analysis. ssREAD is freely available at https://bmblx.bmi.osumc.edu/ssread/.
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Diana Acosta
- Department of Neuroscience, The Ohio State University, OH 43210, USA
| | - Megan McNutt
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, FL 32606, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Hongjun Fu
- Department of Neuroscience, The Ohio State University, OH 43210, USA
- Chronic Brain Injury Program, The Ohio State University, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| |
Collapse
|
10
|
Zeng Y, Luo M, Shangguan N, Shi P, Feng J, Xu J, Chen K, Lu Y, Yu W, Yang Y. Deciphering cell types by integrating scATAC-seq data with genome sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:285-298. [PMID: 38600256 DOI: 10.1038/s43588-024-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/18/2024] [Indexed: 04/12/2024]
Abstract
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Big Data and Software Engineering, Chongqing University, Chongqing, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Ningyuan Shangguan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Junxi Feng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
11
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
12
|
Li Y, Wang Y, Wang C, Ma A, Ma Q, Liu B. A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data. PATTERNS (NEW YORK, N.Y.) 2024; 5:100927. [PMID: 38487805 PMCID: PMC10935504 DOI: 10.1016/j.patter.2024.100927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 01/10/2024] [Indexed: 03/17/2024]
Abstract
In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yizhong Wang
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
13
|
Wang C, Ma A, Li Y, McNutt ME, Zhang S, Zhu J, Hoyd R, Wheeler CE, Robinson LA, Chan CH, Zakharia Y, Dodd RD, Ulrich CM, Hardikar S, Churchman ML, Tarhini AA, Singer EA, Ikeguchi AP, McCarter MD, Denko N, Tinoco G, Husain M, Jin N, Osman AE, Eljilany I, Tan AC, Coleman SS, Denko L, Riedlinger G, Schneider BP, Spakowicz D, Ma Q. A Bioinformatics Tool for Identifying Intratumoral Microbes from the ORIEN Dataset. CANCER RESEARCH COMMUNICATIONS 2024; 4:293-302. [PMID: 38259095 PMCID: PMC10840455 DOI: 10.1158/2767-9764.crc-23-0213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 09/26/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024]
Abstract
Evidence supports significant interactions among microbes, immune cells, and tumor cells in at least 10%-20% of human cancers, emphasizing the importance of further investigating these complex relationships. However, the implications and significance of tumor-related microbes remain largely unknown. Studies have demonstrated the critical roles of host microbes in cancer prevention and treatment responses. Understanding interactions between host microbes and cancer can drive cancer diagnosis and microbial therapeutics (bugs as drugs). Computational identification of cancer-specific microbes and their associations is still challenging due to the high dimensionality and high sparsity of intratumoral microbiome data, which requires large datasets containing sufficient event observations to identify relationships, and the interactions within microbial communities, the heterogeneity in microbial composition, and other confounding effects that can lead to spurious associations. To solve these issues, we present a bioinformatics tool, microbial graph attention (MEGA), to identify the microbes most strongly associated with 12 cancer types. We demonstrate its utility on a dataset from a consortium of nine cancer centers in the Oncology Research Information Exchange Network. This package has three unique features: species-sample relations are represented in a heterogeneous graph and learned by a graph attention network; it incorporates metabolic and phylogenetic information to reflect intricate relationships within microbial communities; and it provides multiple functionalities for association interpretations and visualizations. We analyzed 2,704 tumor RNA sequencing samples and MEGA interpreted the tissue-resident microbial signatures of each of 12 cancer types. MEGA can effectively identify cancer-associated microbial signatures and refine their interactions with tumors. SIGNIFICANCE Studying the tumor microbiome in high-throughput sequencing data is challenging because of the extremely sparse data matrices, heterogeneity, and high likelihood of contamination. We present a new deep learning tool, MEGA, to refine the organisms that interact with tumors.
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Yingjie Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio
| | - Megan E. McNutt
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio
| | - Shiqi Zhang
- Department of Human Sciences, College of Education and Human Ecology, The Ohio State University, Columbus, Ohio
| | - Jiangjiang Zhu
- Department of Human Sciences, College of Education and Human Ecology, The Ohio State University, Columbus, Ohio
| | - Rebecca Hoyd
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Caroline E. Wheeler
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Lary A. Robinson
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Carlos H.F. Chan
- University of Iowa, Holden Comprehensive Cancer Center, Iowa City, Iowa
| | - Yousef Zakharia
- Division of Oncology, Hematology and Blood & Marrow Transplantation, University of Iowa, Holden Comprehensive Cancer Center, Iowa City, Iowa
| | - Rebecca D. Dodd
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa
| | - Cornelia M. Ulrich
- Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - Sheetal Hardikar
- Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | | | - Ahmad A. Tarhini
- Departments of Cutaneous Oncology and Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Eric A. Singer
- Department of Urologic Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Alexandra P. Ikeguchi
- Department of Hematology/Oncology, Stephenson Cancer Center of University of Oklahoma, Oklahoma City, Oklahoma
| | - Martin D. McCarter
- Department of Surgery, University of Colorado School of Medicine, Aurora, Colorado
| | - Nicholas Denko
- Department of Radiation Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Gabriel Tinoco
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Marium Husain
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Ning Jin
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Afaf E.G. Osman
- Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| | - Islam Eljilany
- Clinical Science Lab – Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida
| | - Aik Choon Tan
- Departments of Oncological Science and Biomedical Informatics, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - Samuel S. Coleman
- Departments of Oncological Science and Biomedical Informatics, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah
| | - Louis Denko
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Gregory Riedlinger
- Department of Precision Medicine, Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey
| | - Bryan P. Schneider
- Indiana University Simon Comprehensive Cancer Center, Indianapolis, Indiana
| | - Daniel Spakowicz
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio
| | | |
Collapse
|
14
|
Ye F, Wang J, Li J, Mei Y, Guo G. Mapping Cell Atlases at the Single-Cell Level. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305449. [PMID: 38145338 PMCID: PMC10885669 DOI: 10.1002/advs.202305449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/01/2023] [Indexed: 12/26/2023]
Abstract
Recent advancements in single-cell technologies have led to rapid developments in the construction of cell atlases. These atlases have the potential to provide detailed information about every cell type in different organisms, enabling the characterization of cellular diversity at the single-cell level. Global efforts in developing comprehensive cell atlases have profound implications for both basic research and clinical applications. This review provides a broad overview of the cellular diversity and dynamics across various biological systems. In addition, the incorporation of machine learning techniques into cell atlas analyses opens up exciting prospects for the field of integrative biology.
Collapse
Affiliation(s)
- Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Yuqing Mei
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative MedicineDr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative MedicineHangzhouZhejiang310058China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310000China
| |
Collapse
|
15
|
Liu J, Yang M, Yu Y, Xu H, Li K, Zhou X. Large language models in bioinformatics: applications and perspectives. ARXIV 2024:arXiv:2401.04155v1. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will present a summary of the prominent large language models used in natural language processing, such as BERT and GPT, and focus on exploring the applications of large language models at different omics levels in bioinformatics, mainly including applications of large language models in genomics, transcriptomics, proteomics, drug discovery and single cell analysis. Finally, this review summarizes the potential and prospects of large language models in solving bioinformatic problems.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Mengyuan Yang
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Yankai Yu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
| | - Haixia Xu
- The Center of Gerontology and Geriatrics, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
16
|
Wang X, Duan M, Li J, Ma A, Xin G, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. Nat Commun 2024; 15:338. [PMID: 38184630 PMCID: PMC10771517 DOI: 10.1038/s41467-023-44570-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/14/2023] [Indexed: 01/08/2024] Open
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
17
|
Huang X, Song C, Zhang G, Li Y, Zhao Y, Zhang Q, Zhang Y, Fan S, Zhao J, Xie L, Li C. scGRN: a comprehensive single-cell gene regulatory network platform of human and mouse. Nucleic Acids Res 2024; 52:D293-D303. [PMID: 37889053 PMCID: PMC10767939 DOI: 10.1093/nar/gkad885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/19/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
Gene regulatory networks (GRNs) are interpretable graph models encompassing the regulatory interactions between transcription factors (TFs) and their downstream target genes. Making sense of the topology and dynamics of GRNs is fundamental to interpreting the mechanisms of disease etiology and translating corresponding findings into novel therapies. Recent advances in single-cell multi-omics techniques have prompted the computational inference of GRNs from single-cell transcriptomic and epigenomic data at an unprecedented resolution. Here, we present scGRN (https://bio.liclab.net/scGRN/), a comprehensive single-cell multi-omics gene regulatory network platform of human and mouse. The current version of scGRN catalogs 237 051 cell type-specific GRNs (62 999 692 TF-target gene pairs), covering 160 tissues/cell lines and 1324 single-cell samples. scGRN is the first resource documenting large-scale cell type-specific GRN information of diverse human and mouse conditions inferred from single-cell multi-omics data. We have implemented multiple online tools for effective GRN analysis, including differential TF-target network analysis, TF enrichment analysis, and pathway downstream analysis. We also provided details about TF binding to promoters, super-enhancers and typical enhancers of target genes in GRNs. Taken together, scGRN is an integrative and useful platform for searching, browsing, analyzing, visualizing and downloading GRNs of interest, enabling insight into the differences in regulatory mechanisms across diverse conditions.
Collapse
Affiliation(s)
- Xuemei Huang
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Chao Song
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Guorui Zhang
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Ye Li
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Yu Zhao
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Qinyi Zhang
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Yuexin Zhang
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Shifan Fan
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Jun Zhao
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Liyuan Xie
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Chunquan Li
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases & College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| |
Collapse
|
18
|
Jordan DM, Vy HMT, Do R. A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300393. [PMID: 38196638 PMCID: PMC10775679 DOI: 10.1101/2023.12.21.23300393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.
Collapse
Affiliation(s)
- Daniel M. Jordan
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ha My T. Vy
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
19
|
Liu Z, Bao Y, Wang W, Pan L, Wang H, Lin GN. Emden: A novel method integrating graph and transformer representations for predicting the effect of mutations on clinical drug response. Comput Biol Med 2023; 167:107678. [PMID: 37976823 DOI: 10.1016/j.compbiomed.2023.107678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/22/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Precision medicine based on personalized genomics provides promising strategies to enhance the efficacy of molecular-targeted therapies. However, the clinical effectiveness of drugs has been severely limited due to genetic variations that lead to drug resistance. Predicting the impact of missense mutations on clinical drug response is an essential way to reduce the cost of clinical trials and understand genetic diseases. Here, we present Emden, a novel method integrating graph and transformer representations that predicts the effect of missense mutations on drug response through binary classification with interpretability. Emden utilized protein sequences-based features and drug structures as inputs for rapid prediction, employing competitive representation learning and demonstrating strong generalization capabilities and robustness. Our study showed promising potential for clinical drug guidance and deep insight into computer-assisted precision medicine. Emden is freely available as a web server at https://www.psymukb.net/Emden.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yihang Bao
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Liangwei Pan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China.
| |
Collapse
|
20
|
Wang Y, Li Y, Wang C, Lio CWJ, Ma Q, Liu B. CEMIG: prediction of the cis-regulatory motif using the de Bruijn graph from ATAC-seq. Brief Bioinform 2023; 25:bbad505. [PMID: 38189539 PMCID: PMC10772951 DOI: 10.1093/bib/bbad505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 11/21/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
Sequence motif discovery algorithms enhance the identification of novel deoxyribonucleic acid sequences with pivotal biological significance, especially transcription factor (TF)-binding motifs. The advent of assay for transposase-accessible chromatin using sequencing (ATAC-seq) has broadened the toolkit for motif characterization. Nonetheless, prevailing computational approaches have focused on delineating TF-binding footprints, with motif discovery receiving less attention. Herein, we present Cis rEgulatory Motif Influence using de Bruijn Graph (CEMIG), an algorithm leveraging de Bruijn and Hamming distance graph paradigms to predict and map motif sites. Assessment on 129 ATAC-seq datasets from the Cistrome Data Browser demonstrates CEMIG's exceptional performance, surpassing three established methodologies on four evaluative metrics. CEMIG accurately identifies both cell-type-specific and common TF motifs within GM12878 and K562 cell lines, demonstrating its comparative genomic capabilities in the identification of evolutionary conservation and cell-type specificity. In-depth transcriptional and functional genomic studies have validated the functional relevance of CEMIG-identified motifs across various cell types. CEMIG is available at https://github.com/OSU-BMBL/CEMIG, developed in C++ to ensure cross-platform compatibility with Linux, macOS and Windows operating systems.
Collapse
Affiliation(s)
- Yizhong Wang
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
| | - Chan-Wang Jerry Lio
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, 250100, China
| |
Collapse
|
21
|
Zhang P, Zhang D, Zhou W, Wang L, Wang B, Zhang T, Li S. Network pharmacology: towards the artificial intelligence-based precision traditional Chinese medicine. Brief Bioinform 2023; 25:bbad518. [PMID: 38197310 PMCID: PMC10777171 DOI: 10.1093/bib/bbad518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/03/2023] [Accepted: 11/30/2023] [Indexed: 01/11/2024] Open
Abstract
Network pharmacology (NP) provides a new methodological perspective for understanding traditional medicine from a holistic perspective, giving rise to frontiers such as traditional Chinese medicine network pharmacology (TCM-NP). With the development of artificial intelligence (AI) technology, it is key for NP to develop network-based AI methods to reveal the treatment mechanism of complex diseases from massive omics data. In this review, focusing on the TCM-NP, we summarize involved AI methods into three categories: network relationship mining, network target positioning and network target navigating, and present the typical application of TCM-NP in uncovering biological basis and clinical value of Cold/Hot syndromes. Collectively, our review provides researchers with an innovative overview of the methodological progress of NP and its application in TCM from the AI perspective.
Collapse
Affiliation(s)
- Peng Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Dingfan Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wuai Zhou
- China Mobile Information System Integration Co., Ltd, Beijing 100032, China
| | - Lan Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Boyang Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingyu Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics/Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
23
|
Kim D, Tran A, Kim HJ, Lin Y, Yang JYH, Yang P. Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data. NPJ Syst Biol Appl 2023; 9:51. [PMID: 37857632 PMCID: PMC10587078 DOI: 10.1038/s41540-023-00312-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Collapse
Affiliation(s)
- Daniel Kim
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Andy Tran
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Hani Jieun Kim
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| | - Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Camperdown, NSW, Australia.
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, Australia.
| |
Collapse
|
24
|
Baysoy A, Bai Z, Satija R, Fan R. The technological landscape and applications of single-cell multi-omics. Nat Rev Mol Cell Biol 2023; 24:695-713. [PMID: 37280296 PMCID: PMC10242609 DOI: 10.1038/s41580-023-00615-w] [Citation(s) in RCA: 92] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2023] [Indexed: 06/08/2023]
Abstract
Single-cell multi-omics technologies and methods characterize cell states and activities by simultaneously integrating various single-modality omics methods that profile the transcriptome, genome, epigenome, epitranscriptome, proteome, metabolome and other (emerging) omics. Collectively, these methods are revolutionizing molecular cell biology research. In this comprehensive Review, we discuss established multi-omics technologies as well as cutting-edge and state-of-the-art methods in the field. We discuss how multi-omics technologies have been adapted and improved over the past decade using a framework characterized by optimization of throughput and resolution, modality integration, uniqueness and accuracy, and we also discuss multi-omics limitations. We highlight the impact that single-cell multi-omics technologies have had in cell lineage tracing, tissue-specific and cell-specific atlas production, tumour immunology and cancer genetics, and in mapping of cellular spatial information in fundamental and translational research. Finally, we discuss bioinformatics tools that have been developed to link different omics modalities and elucidate functionality through the use of better mathematical modelling and computational methods.
Collapse
Affiliation(s)
- Alev Baysoy
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Zhiliang Bai
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Rong Fan
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
- Yale Stem Cell Center and Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA.
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
25
|
Wang J, Chen Y, Zou Q. Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet 2023; 19:e1010942. [PMID: 37703293 PMCID: PMC10519590 DOI: 10.1371/journal.pgen.1010942] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/25/2023] [Accepted: 08/29/2023] [Indexed: 09/15/2023] Open
Abstract
The gene regulatory structure of cells involves not only the regulatory relationship between two genes, but also the cooperative associations of multiple genes. However, most gene regulatory network inference methods for single cell only focus on and infer the regulatory relationships of pairs of genes, ignoring the global regulatory structure which is crucial to identify the regulations in the complex biological systems. Here, we proposed a graph-based Deep learning model for Regulatory networks Inference among Genes (DeepRIG) from single-cell RNA-seq data. To learn the global regulatory structure, DeepRIG builds a prior regulatory graph by transforming the gene expression of data into the co-expression mode. Then it utilizes a graph autoencoder model to embed the global regulatory information contained in the graph into gene latent embeddings and to reconstruct the gene regulatory network. Extensive benchmarking results demonstrate that DeepRIG can accurately reconstruct the gene regulatory networks and outperform existing methods on multiple simulated networks and real-cell regulatory networks. Additionally, we applied DeepRIG to the samples of human peripheral blood mononuclear cells and triple-negative breast cancer, and presented that DeepRIG can provide accurate cell-type-specific gene regulatory networks inference and identify novel regulators of progression and inhibition.
Collapse
Affiliation(s)
- Jiacheng Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
26
|
Wang X, Duan M, Li J, Ma A, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553454. [PMID: 37645917 PMCID: PMC10462017 DOI: 10.1101/2023.08.15.553454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
27
|
Wang C, Ma A, McNutt ME, Hoyd R, Wheeler CE, Robinson LA, Chan CH, Zakharia Y, Dodd RD, Ulrich CM, Hardikar S, Churchman ML, Tarhini AA, Singer EA, Ikeguchi AP, McCarter MD, Denko N, Tinoco G, Husain M, Jin N, Osman AE, Eljilany I, Tan AC, Coleman SS, Denko L, Riedlinger G, Schneider BP, Spakowicz D, Ma Q. A bioinformatics tool for identifying intratumoral microbes from the ORIEN dataset. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.541982. [PMID: 37292990 PMCID: PMC10245834 DOI: 10.1101/2023.05.24.541982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Evidence supports significant interactions among microbes, immune cells, and tumor cells in at least 10-20% of human cancers, emphasizing the importance of further investigating these complex relationships. However, the implications and significance of tumor-related microbes remain largely unknown. Studies have demonstrated the critical roles of host microbes in cancer prevention and treatment responses. Understanding interactions between host microbes and cancer can drive cancer diagnosis and microbial therapeutics (bugs as drugs). Computational identification of cancer-specific microbes and their associations is still challenging due to the high dimensionality and high sparsity of intratumoral microbiome data, which requires large datasets containing sufficient event observations to identify relationships, and the interactions within microbial communities, the heterogeneity in microbial composition, and other confounding effects that can lead to spurious associations. To solve these issues, we present a bioinformatics tool, MEGA, to identify the microbes most strongly associated with 12 cancer types. We demonstrate its utility on a dataset from a consortium of 9 cancer centers in the Oncology Research Information Exchange Network (ORIEN). This package has 3 unique features: species-sample relations are represented in a heterogeneous graph and learned by a graph attention network; it incorporates metabolic and phylogenetic information to reflect intricate relationships within microbial communities; and it provides multiple functionalities for association interpretations and visualizations. We analyzed 2704 tumor RNA-seq samples and MEGA interpreted the tissue-resident microbial signatures of each of 12 cancer types. MEGA can effectively identify cancer-associated microbial signatures and refine their interactions with tumors.
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus; OH, USA
| | - Megan E. McNutt
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Rebecca Hoyd
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Caroline E. Wheeler
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Lary A. Robinson
- Department of Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Carlos H.F. Chan
- University of Iowa, Holden Comprehensive Cancer Center, Iowa City, IA, USA
| | - Yousef Zakharia
- Division of Oncology, Hematology and Blood & Marrow Transplantation, University of Iowa, Holden Comprehensive Cancer Center, Iowa City, IA, USA
| | - Rebecca D. Dodd
- Department of Internal Medicine, University of Iowa, Iowa City, IA, USA
| | - Cornelia M. Ulrich
- Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Sheetal Hardikar
- Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | | | - Ahmad A. Tarhini
- Departments of Cutaneous Oncology and Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Eric A. Singer
- Department of Urologic Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Alexandra P. Ikeguchi
- Department of Hematology/Oncology, Stephenson Cancer Center of University of Oklahoma, Oklahoma City, OK, USA
| | - Martin D. McCarter
- Department of Surgery, University of Colorado School of Medicine, Aurora, CO, USA
| | - Nicholas Denko
- Department of Radiation Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Gabriel Tinoco
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Marium Husain
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Ning Jin
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Afaf E.G. Osman
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| | - Islam Eljilany
- Clinical Science Lab -- Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Aik Choon Tan
- Departments of Oncological Science and Biomedical Informatics, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Samuel S. Coleman
- Departments of Oncological Science and Biomedical Informatics, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
| | - Louis Denko
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus; OH, USA
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Gregory Riedlinger
- Department of Precision Medicine, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA
| | - Bryan P. Schneider
- Indiana University Simon Comprehensive Cancer Center, Indianapolis, IN, USA
| | - Daniel Spakowicz
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus; OH, USA
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus; OH, USA
| |
Collapse
|