1
|
O'Connor LM, O'Connor BA, Lim SB, Zeng J, Lo CH. Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective. J Pharm Anal 2023; 13:836-850. [PMID: 37719197 PMCID: PMC10499660 DOI: 10.1016/j.jpha.2023.06.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 09/19/2023] Open
Abstract
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information, with its application to neuroscience termed neuroinformatics. Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms, which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases. Importantly, integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile. In this review, we first summarize data mining studies utilizing datasets from the individual type of omics analysis, including epigenetics/epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, and spatial omics, pertaining to Alzheimer's disease, Parkinson's disease, and multiple sclerosis. We then discuss multi-omics integration approaches, including independent biological integration and unsupervised integration methods, for more intuitive and informative interpretation of the biological data obtained across different omics layers. We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks. Finally, we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery, therapeutic development, and elucidation of disease mechanisms. We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
Collapse
Affiliation(s)
- Lance M. O'Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Blake A. O'Connor
- School of Pharmacy, University of Wisconsin, Madison, WI, 53705, USA
| | - Su Bin Lim
- Department of Biochemistry and Molecular Biology, Ajou University School of Medicine, Suwon, 16499, South Korea
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| |
Collapse
|
2
|
Dai C, Jiang Y, Yin C, Su R, Zeng X, Zou Q, Nakai K, Wei L. scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods. Nucleic Acids Res 2022; 50:4877-4899. [PMID: 35524568 PMCID: PMC9122610 DOI: 10.1093/nar/gkac317] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 04/08/2022] [Accepted: 04/20/2022] [Indexed: 12/13/2022] Open
Abstract
With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called 'dropout' events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, we use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. We demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, we built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest. It is now freely accessible via https://server.wei-group.net/scIMC/.
Collapse
Affiliation(s)
- Chichi Dai
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yi Jiang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chenglin Yin
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| | - Kenta Nakai
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
3
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:159-176. [DOI: 10.1093/bfgp/elac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/20/2022] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open
|
4
|
Li H, Xiao X, Wu X, Ye L, Ji G. scLINE: A multi-network integration framework based on network embedding for representation of single-cell RNA-seq data. J Biomed Inform 2021; 122:103899. [PMID: 34481921 DOI: 10.1016/j.jbi.2021.103899] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 01/18/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is fast becoming a powerful technology that revolutionizes biomedical studies related to development, immunology and cancer by providing genome-scale transcriptional profiles at unprecedented throughput and resolution. However, due to the low capture rate and frequent drop-out events in the sequencing process, scRNA-seq data suffer from extremely high sparsity and variability, challenging the data analysis. Here we proposed a novel method called scLINE for learning low dimensional representations of scRNA-seq data. scLINE is based on the network embedding model that jointly considers multiple gene-gene interaction networks, facilitating the incorporation of prior biological knowledge for signal extraction. We comprehensively evaluated scLINE on eight single-cell datasets. Results show that scLINE achieved comparable or higher performance than competing methods, including PCA, t-SNE and Isomap, in terms of internal validation metrics and clustering accuracy. The low dimensional representations learned by scLINE are effective for downstream single-cell analysis, such as visualization, clustering and cell typing. We have implemented scLINE as an easy-to-use R package, which can be incorporated in other existing scRNA-seq analysis pipelines or tools for data preprocessing.
Collapse
Affiliation(s)
- Huoyou Li
- School of Mathematics and Information Engineering, Longyan University, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, China.
| | - Lishan Ye
- Xiamen Health and Medical Big Data Center, XiaMen, Fujian, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, China.
| |
Collapse
|
5
|
Patruno L, Maspero D, Craighero F, Angaroni F, Antoniotti M, Graudenzi A. A review of computational strategies for denoising and imputation of single-cell transcriptomic data. Brief Bioinform 2021; 22:bbaa222. [PMID: 33003202 DOI: 10.1093/bib/bbaa222] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 08/07/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION The advancements of single-cell sequencing methods have paved the way for the characterization of cellular states at unprecedented resolution, revolutionizing the investigation on complex biological systems. Yet, single-cell sequencing experiments are hindered by several technical issues, which cause output data to be noisy, impacting the reliability of downstream analyses. Therefore, a growing number of data science methods has been proposed to recover lost or corrupted information from single-cell sequencing data. To date, however, no quantitative benchmarks have been proposed to evaluate such methods. RESULTS We present a comprehensive analysis of the state-of-the-art computational approaches for denoising and imputation of single-cell transcriptomic data, comparing their performance in different experimental scenarios. In detail, we compared 19 denoising and imputation methods, on both simulated and real-world datasets, with respect to several performance metrics related to imputation of dropout events, recovery of true expression profiles, characterization of cell similarity, identification of differentially expressed genes and computation time. The effectiveness and scalability of all methods were assessed with regard to distinct sequencing protocols, sample size and different levels of biological variability and technical noise. As a result, we identify a subset of versatile approaches exhibiting solid performances on most tests and show that certain algorithmic families prove effective on specific tasks but inefficient on others. Finally, most methods appear to benefit from the introduction of appropriate assumptions on noise distribution of biological processes.
Collapse
Affiliation(s)
- Lucrezia Patruno
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Davide Maspero
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Francesco Craighero
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| |
Collapse
|
6
|
Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet 2020; 11:106. [PMID: 32180795 PMCID: PMC7057719 DOI: 10.3389/fgene.2020.00106] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/29/2020] [Indexed: 02/01/2023] Open
Abstract
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion—also referred to as network propagation—has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network diffusion provides a quantitative estimation of network proximity between genes associated with one or more different data types, from simple binary vectors to real vectors. Therefore, this powerful data transformation method has also been increasingly used in integrative analyses of multiple collections of biological scores and/or one or more interaction networks. We present an overview of the state of the art of bioinformatics pipelines that use network diffusion processes for the integrative analysis of omics data. We discuss the fundamental ways in which network diffusion is exploited, open issues and potential developments in the field. Current trends suggest that network diffusion is a tool of broad utility in omics data analysis. It is reasonable to think that it will continue to be used and further refined as new data types arise (e.g. single cell datasets) and the identification of system-level patterns will be considered more and more important in omics data analysis.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.,Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy
| | - Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.,National Institute of Nuclear Physics (INFN), Bologna, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| |
Collapse
|