1
|
Unger Avila P, Padvitski T, Leote AC, Chen H, Saez-Rodriguez J, Kann M, Beyer A. Gene regulatory networks in disease and ageing. Nat Rev Nephrol 2024:10.1038/s41581-024-00849-7. [PMID: 38867109 DOI: 10.1038/s41581-024-00849-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/14/2024]
Abstract
The precise control of gene expression is required for the maintenance of cellular homeostasis and proper cellular function, and the declining control of gene expression with age is considered a major contributor to age-associated changes in cellular physiology and disease. The coordination of gene expression can be represented through models of the molecular interactions that govern gene expression levels, so-called gene regulatory networks. Gene regulatory networks can represent interactions that occur through signal transduction, those that involve regulatory transcription factors, or statistical models of gene-gene relationships based on the premise that certain sets of genes tend to be coexpressed across a range of conditions and cell types. Advances in experimental and computational technologies have enabled the inference of these networks on an unprecedented scale and at unprecedented precision. Here, we delineate different types of gene regulatory networks and their cell-biological interpretation. We describe methods for inferring such networks from large-scale, multi-omics datasets and present applications that have aided our understanding of cellular ageing and disease mechanisms.
Collapse
Affiliation(s)
- Paula Unger Avila
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Tsimafei Padvitski
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Ana Carolina Leote
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - He Chen
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julio Saez-Rodriguez
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
| | - Martin Kann
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Beyer
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
| |
Collapse
|
2
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
3
|
Grones C, Eekhout T, Shi D, Neumann M, Berg LS, Ke Y, Shahan R, Cox KL, Gomez-Cano F, Nelissen H, Lohmann JU, Giacomello S, Martin OC, Cole B, Wang JW, Kaufmann K, Raissig MT, Palfalvi G, Greb T, Libault M, De Rybel B. Best practices for the execution, analysis, and data storage of plant single-cell/nucleus transcriptomics. THE PLANT CELL 2024; 36:812-828. [PMID: 38231860 PMCID: PMC10980355 DOI: 10.1093/plcell/koae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 01/19/2024]
Abstract
Single-cell and single-nucleus RNA-sequencing technologies capture the expression of plant genes at an unprecedented resolution. Therefore, these technologies are gaining traction in plant molecular and developmental biology for elucidating the transcriptional changes across cell types in a specific tissue or organ, upon treatments, in response to biotic and abiotic stresses, or between genotypes. Despite the rapidly accelerating use of these technologies, collective and standardized experimental and analytical procedures to support the acquisition of high-quality data sets are still missing. In this commentary, we discuss common challenges associated with the use of single-cell transcriptomics in plants and propose general guidelines to improve reproducibility, quality, comparability, and interpretation and to make the data readily available to the community in this fast-developing field of research.
Collapse
Affiliation(s)
- Carolin Grones
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Thomas Eekhout
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
- VIB Single Cell Core Facility, Ghent 9052, Belgium
| | - Dongbo Shi
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
- Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany
| | - Manuel Neumann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Lea S Berg
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Yuji Ke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Rachel Shahan
- Department of Biology, Duke University, Durham, NC 27708, USA
- Howard Hughes Medical Institute, Duke University, Durham, NC 27708, USA
| | - Kevin L Cox
- Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Fabio Gomez-Cano
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| | - Jan U Lohmann
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Stefania Giacomello
- SciLifeLab, Department of Gene Technology, KTH Royal Institute of Technology, 17165 Solna, Sweden
| | - Olivier C Martin
- Universities of Paris-Saclay, Paris-Cité and Evry, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay, Gif-sur-Yvette 91192, France
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jia-Wei Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences (CEMPS), Institute of Plant Physiology and Ecology (SIPPE), Chinese Academy of Sciences (CAS), Shanghai 200032, China
| | - Kerstin Kaufmann
- Institute of Biology, Humboldt-Universität zu Berlin, 10115 Berlin, Germany
| | - Michael T Raissig
- Institute of Plant Sciences, University of Bern, 3012 Bern, Switzerland
| | - Gergo Palfalvi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Thomas Greb
- Centre for Organismal Studies, Heidelberg University, 69120 Heidelberg, Germany
| | - Marc Libault
- Division of Plant Science and Technology, Interdisciplinary Plant Group, College of Agriculture, Food, and Natural Resources, University of Missouri-Columbia, Columbia, MO 65201, USA
| | - Bert De Rybel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB Centre for Plant Systems Biology, Ghent 9052, Belgium
| |
Collapse
|
4
|
Song T, Broadbent C, Kuang R. GNTD: reconstructing spatial transcriptomes with graph-guided neural tensor decomposition informed by spatial and functional relations. Nat Commun 2023; 14:8276. [PMID: 38092776 PMCID: PMC10719260 DOI: 10.1038/s41467-023-44017-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023] Open
Abstract
Spatially-resolved RNA profiling has now been widely used to understand cells' structural organizations and functional roles in tissues, yet it is challenging to reconstruct the whole spatial transcriptomes due to various inherent technical limitations in tissue section preparation and RNA capture and fixation in the application of the spatial RNA profiling technologies. Here, we introduce a graph-guided neural tensor decomposition (GNTD) model for reconstructing whole spatial transcriptomes in tissues. GNTD employs a hierarchical tensor structure and formulation to explicitly model the high-order spatial gene expression data with a hierarchical nonlinear decomposition in a three-layer neural network, enhanced by spatial relations among the capture spots and gene functional relations for accurate reconstruction from highly sparse spatial profiling data. Extensive experiments on 22 Visium spatial transcriptomics datasets and 3 high-resolution Stereo-seq datasets as well as simulation data demonstrate that GNTD consistently improves the imputation accuracy in cross-validations driven by nonlinear tensor decomposition and incorporation of spatial and functional information, and confirm that the imputed spatial transcriptomes provide a more complete gene expression landscape for downstream analyses of cell/spot clustering for tissue segmentation, and spatial gene expression clustering and visualizations.
Collapse
Affiliation(s)
- Tianci Song
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA
| | - Charles Broadbent
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, 55414, MN, USA.
| |
Collapse
|
5
|
Abstract
Missing values are a notable challenge when analyzing mass spectrometry-based proteomics data. While the field is still actively debating the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently, imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modeling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values, and for proper encoding of missing values.
Collapse
Affiliation(s)
- Christophe Vanderaa
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, 1200 Brussels, Belgium
| |
Collapse
|
6
|
姜 超, 胡 龙, 徐 春, 葛 芹, 赵 祥. [Imputation method for dropout in single-cell transcriptome data]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:778-783. [PMID: 37666769 PMCID: PMC10477391 DOI: 10.7507/1001-5515.202301009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/27/2023] [Indexed: 09/06/2023]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) can resolve the expression characteristics of cells in tissues with single-cell precision, enabling researchers to quantify cellular heterogeneity within populations with higher resolution, revealing potentially heterogeneous cell populations and the dynamics of complex tissues. However, the presence of a large number of technical zeros in scRNA-seq data will have an impact on downstream analysis of cell clustering, differential genes, cell annotation, and pseudotime, hindering the discovery of meaningful biological signals. The main idea to solve this problem is to make use of the potential correlation between cells and genes, and to impute the technical zeros through the observed data. Based on this, this paper reviewed the basic methods of imputing technical zeros in the scRNA-seq data and discussed the advantages and disadvantages of the existing methods. Finally, recommendations and perspectives on the use and development of the method were provided.
Collapse
Affiliation(s)
- 超 姜
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
- 新格元生物科技有限公司(南京 210018)Singleron BiotechCo., Ltd, Nanjing 210018, P. R. China
| | - 龙飞 胡
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 春祥 徐
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 芹玉 葛
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 祥伟 赵
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| |
Collapse
|
7
|
Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023; 13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open
Abstract
The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Perumal P Onkara
- Department of Biotechnology, National Institute of Technology, Warangal, India.
| |
Collapse
|