1
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
2
|
Paas-Oliveros E, Hernández-Lemus E, de Anda-Jáuregui G. Computational single cell oncology: state of the art. Front Genet 2023; 14:1256991. [PMID: 38028624 PMCID: PMC10663273 DOI: 10.3389/fgene.2023.1256991] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Single cell computational analysis has emerged as a powerful tool in the field of oncology, enabling researchers to decipher the complex cellular heterogeneity that characterizes cancer. By leveraging computational algorithms and bioinformatics approaches, this methodology provides insights into the underlying genetic, epigenetic and transcriptomic variations among individual cancer cells. In this paper, we present a comprehensive overview of single cell computational analysis in oncology, discussing the key computational techniques employed for data processing, analysis, and interpretation. We explore the challenges associated with single cell data, including data quality control, normalization, dimensionality reduction, clustering, and trajectory inference. Furthermore, we highlight the applications of single cell computational analysis, including the identification of novel cell states, the characterization of tumor subtypes, the discovery of biomarkers, and the prediction of therapy response. Finally, we address the future directions and potential advancements in the field, including the development of machine learning and deep learning approaches for single cell analysis. Overall, this paper aims to provide a roadmap for researchers interested in leveraging computational methods to unlock the full potential of single cell analysis in understanding cancer biology with the goal of advancing precision oncology. For this purpose, we also include a notebook that instructs on how to apply the recommended tools in the Preprocessing and Quality Control section.
Collapse
Affiliation(s)
- Ernesto Paas-Oliveros
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Investigadores por Mexico, Conahcyt, Mexico City, Mexico
| |
Collapse
|
3
|
Dindhoria K, Monga I, Thind AS. Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq. Funct Integr Genomics 2022; 22:1105-1112. [DOI: 10.1007/s10142-022-00915-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/22/2022]
|
4
|
Zhao X, Lan Y, Chen D. Exploring long non-coding RNA networks from single cell omics data. Comput Struct Biotechnol J 2022; 20:4381-4389. [PMID: 36051880 PMCID: PMC9403499 DOI: 10.1016/j.csbj.2022.08.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
|
5
|
Zhang Y, He Y, Chen Q, Yang Y, Gong M. Fusion prior gene network for high reliable single-cell gene regulatory network inference. Comput Biol Med 2022; 143:105279. [PMID: 35134605 DOI: 10.1016/j.compbiomed.2022.105279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/25/2022] [Accepted: 01/29/2022] [Indexed: 11/03/2022]
Abstract
Single-Cell RNA sequencing technology provides an opportunity to discover gene regulatory networks(GRN) that control cell differentiation and drive cell type transformation. However, it is faced with the challenge of high loss and high noise of sequencing data and contains many pseudo-connections. To solve these problems, we propose a framework called Fusion prior gene network for Gene Regulatory Network inference Accuracy Enhancement(FGRNAE) to infer a high reliable gene regulatory network. Specifically, based on the Single-Cell RNA-sequencing Network Propagation and network Fusion(scNPF) preprocessing framework, we employ the Random Walk with Restart on the prior gene network to interpolate the missing data. Furthermore, we infer the network using the Random Forest algorithm with the results achieved above. In addition, we apply data from the Co-Function Network to build a meta-gene network and select the regulatory connection with the Markov Random Field. Extensive experiments based on datasets from BEELINE validate the effectiveness of our framework for improving the accuracy of inference.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Qingyuan Chen
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yihan Yang
- International College, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
6
|
Li K, Yan C, Li C, Chen L, Zhao J, Zhang Z, Bao S, Sun J, Zhou M. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 27:404-411. [PMID: 35036053 PMCID: PMC8728308 DOI: 10.1016/j.omtn.2021.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent advances in spatially resolved transcriptomics (SRT) have revolutionized biological and medical research and enabled unprecedented insight into the functional organization and cell communication of tissues and organs in situ. Identifying and elucidating gene spatial expression variation (SE analysis) is fundamental to elucidate the SRT landscape. There is an urgent need for public repositories and computational techniques of SRT data in SE analysis alongside technological breakthroughs and large-scale data generation. Increasing efforts to use in silico techniques in SE analysis have been made. However, these attempts are widely scattered among a large number of studies that are not easily accessible or comprehensible by both medical and life scientists. This study provides a survey and a summary of public resources on SE analysis in SRT studies. An updated systematic overview of state-of-the-art computational approaches and tools currently available in SE analysis are presented herein, emphasizing recent advances. Finally, the present study explores the future perspectives and challenges of in silico techniques in SE analysis. This study guides medical and life scientists to look for dedicated resources and more competent tools for characterizing spatial patterns of gene expression.
Collapse
Affiliation(s)
- Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Chenghao Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Lu Chen
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jingting Zhao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Siqi Bao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jie Sun
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Jie Sun, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Meng Zhou, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| |
Collapse
|
7
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
8
|
Yang B. Gene Regulatory Network Identification based on Forest Graph-embedded Deep Feedforward Network. 2021 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTERNET OF THINGS 2021. [DOI: 10.1145/3493287.3493297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
9
|
Choudhary K, Meng EC, Diaz-Mejia JJ, Bader GD, Pico AR, Morris JH. scNetViz: from single cells to networks using Cytoscape. F1000Res 2021; 10:ISCB Comm J-448. [PMID: 34912541 PMCID: PMC8593621 DOI: 10.12688/f1000research.52460.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/21/2021] [Indexed: 01/14/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has revolutionized molecular biology and medicine by enabling high-throughput studies of cellular heterogeneity in diverse tissues. Applying network biology approaches to scRNA-seq data can provide useful insights into genes driving heterogeneous cell-type compositions of tissues. Here, we present scNetViz- a Cytoscape app to aid biological interpretation of cell clusters in scRNA-seq data using network analysis. scNetViz calculates the differential expression of each gene across clusters and then creates a cluster-specific gene functional interaction network between the significantly differentially expressed genes for further analysis, such as pathway enrichment analysis. To automate a complete data analysis workflow, scNetViz integrates parts of the Scanpy software, which is a popular Python package for scRNA-seq data analysis, with Cytoscape apps such as stringApp, cyPlot, and enhancedGraphics. We describe our implementation of methods for accessing data from public single cell atlas projects, differential expression analysis, visualization, and automation. scNetViz enables users to analyze data from public atlases or their own experiments, which we illustrate with two use cases. Analysis can be performed via the Cytoscape GUI or CyREST programming interface using R (RCy3) or Python (py4cytoscape).
Collapse
Affiliation(s)
- Krishna Choudhary
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, 94158, USA
| | - Elaine C. Meng
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94143, USA
| | - J. Javier Diaz-Mejia
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94143, USA
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, M5G 2M9, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
- Phenomic AI, Toronto, Ontario, M5G 0B7, Canada
| | - Gary D. Bader
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, M5G 2M9, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, M5G 1A8, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, M5T 3A1, Canada
| | - Alexander R. Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, 94158, USA
| | - John H. Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94143, USA
| |
Collapse
|
10
|
Turki T, Taguchi YH. Discriminating the single-cell gene regulatory networks of human pancreatic islets: A novel deep learning application. Comput Biol Med 2021; 132:104257. [PMID: 33740535 DOI: 10.1016/j.compbiomed.2021.104257] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/01/2021] [Accepted: 02/03/2021] [Indexed: 12/24/2022]
Abstract
Analysis of single-cell pancreatic data can play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, inference of single-cell gene regulatory networks remains a challenge. Since recent studies have reported the reliable inference of single-cell gene regulatory networks (SCGRNs), the current study focused on discriminating the SCGRNs of T2D patients from those of healthy controls. By accurately distinguishing SCGRNs of healthy pancreas from those of T2D pancreas, it would be possible to annotate, organize, visualize, and identify common patterns of SCGRNs in metabolic diseases. Such annotated SCGRNs could play an important role in accelerating the process of building large data repositories. This study aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked their prediction based on a test set. Of note, we evaluated the DL architectures on a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.
Collapse
Affiliation(s)
- Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
| | - Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, 112-8551, Japan.
| |
Collapse
|
11
|
Nayak R, Hasija Y. A hitchhiker's guide to single-cell transcriptomics and data analysis pipelines. Genomics 2021; 113:606-619. [PMID: 33485955 DOI: 10.1016/j.ygeno.2021.01.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 12/30/2020] [Accepted: 01/18/2021] [Indexed: 12/20/2022]
Abstract
Single-cell transcriptomics (SCT) is a tour de force in the era of big omics data that has led to the accumulation of massive cellular transcription data at an astounding resolution of single cells. It provides valuable insights into cells previously unachieved by bulk cell analysis and is proving crucial in uncovering cellular heterogeneity, identifying rare cell populations, distinct cell-lineage trajectories, and mechanisms involved in complex cellular processes. SCT data is highly complex and necessitates advanced statistical and computational methods for analysis. This review provides a comprehensive overview of the steps in a typical SCT workflow, starting from experimental protocol to data analysis, deliberating various pipelines used. We discuss recent trends, challenges, machine learning methods for data analysis, and future prospects. We conclude by listing the multitude of scRNA-seq data applications and how it shall revolutionize our understanding of cellular biology and diseases.
Collapse
Affiliation(s)
- Richa Nayak
- Department of Biotechnology, Delhi Technological University, Delhi 110042, India
| | - Yasha Hasija
- Department of Biotechnology, Delhi Technological University, Delhi 110042, India.
| |
Collapse
|