1
|
Huang Y, Huang S, Zhang XF, Ou-Yang L, Liu C. NJGCG: A node-based joint Gaussian copula graphical model for gene networks inference across multiple states. Comput Struct Biotechnol J 2024; 23:3199-3210. [PMID: 39263209 PMCID: PMC11388165 DOI: 10.1016/j.csbj.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 08/05/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024] Open
Abstract
Inferring the interactions between genes is essential for understanding the mechanisms underlying biological processes. Gene networks will change along with the change of environment and state. The accumulation of gene expression data from multiple states makes it possible to estimate the gene networks in various states based on computational methods. However, most existing gene network inference methods focus on estimating a gene network from a single state, ignoring the similarities between networks in different but related states. Moreover, in addition to individual edges, similarities and differences between different networks may also be driven by hub genes. But existing network inference methods rarely consider hub genes, which affects the accuracy of network estimation. In this paper, we propose a novel node-based joint Gaussian copula graphical (NJGCG) model to infer multiple gene networks from gene expression data containing heterogeneous samples jointly. Our model can handle various gene expression data with missing values. Furthermore, a tree-structured group lasso penalty is designed to identify the common and specific hub genes in different gene networks. Simulation studies show that our proposed method outperforms other compared methods in all cases. We also apply NJGCG to infer the gene networks for different stages of differentiation in mouse embryonic stem cells and different subtypes of breast cancer, and explore changes in gene networks across different stages of differentiation or different subtypes of breast cancer. The common and specific hub genes in the estimated gene networks are closely related to stem cell differentiation processes and heterogeneity within breast cancers.
Collapse
Affiliation(s)
- Yun Huang
- Department of Geriatrics, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Clinical Research Center for Geriatric Hypertension Disease of Fujian province, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| | - Sen Huang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Chen Liu
- Department of Oncology, Molecular Oncology Research Institute, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Department of Oncology, National Regional Medical Center, Binhai Campus of The First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
- Fujian Key Laboratory of Precision Medicine for Cancer, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| |
Collapse
|
2
|
Niloofar P, Aghdam R, Eslahchi C. GAEM: Genetic Algorithm based Expectation-Maximization for inferring Gene Regulatory Networks from incomplete data. Comput Biol Med 2024; 183:109238. [PMID: 39426072 DOI: 10.1016/j.compbiomed.2024.109238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/02/2024] [Accepted: 09/30/2024] [Indexed: 10/21/2024]
Abstract
In Bioinformatics, inferring the structure of a Gene Regulatory Network (GRN) from incomplete gene expression data is a difficult task. One popular method for inferring the structure GRNs is to apply the Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). Although PCA-CMI excels at extracting GRN skeletons, it struggles with missing values in datasets. As a result, applying PCA-CMI to infer GRNs, necessitates a preprocessing method for data imputation. In this paper, we present the GAEM algorithm, which uses an iterative approach based on a combination of Genetic Algorithm and Expectation-Maximization to infer the structure of GRN from incomplete gene expression datasets. GAEM learns the GRN structure from the incomplete dataset via an algorithm that iteratively updates the imputed values based on the learnt GRN until the convergence criteria are met. We evaluate the performance of this algorithm under various missingness mechanisms (ignorable and nonignorable) and percentages (5%, 15%, and 40%). The traditional approach to handling missing values in gene expression datasets involves estimating them first and then constructing the GRN. However, our methodology differs in that both missing values and the GRN are updated iteratively until convergence. Results from the DREAM3 dataset demonstrate that the GAEM algorithm appears to be a more reliable method overall, especially for smaller network sizes, GAEM outperforms methods where the incomplete dataset is imputed first, followed by learning the GRN structure from the imputed data. We have implemented the GAEM algorithm within the GAEM R package, which is accessible at the following GitHub repository: https://github.com/parniSDU/GAEM.
Collapse
Affiliation(s)
- Parisa Niloofar
- Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense, 5230, Denmark.
| | - Rosa Aghdam
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, WI, Madison, USA; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| |
Collapse
|
3
|
Yang B, Li J, Li X, Liu S. Gene regulatory network inference based on novel ensemble method. Brief Funct Genomics 2024:elae036. [PMID: 39324652 DOI: 10.1093/bfgp/elae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024] Open
Abstract
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Jing Li
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Xiang Li
- Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China
| | - Sanrong Liu
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| |
Collapse
|
4
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
5
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
6
|
Yu Y, Hou L, Liu X, Wu S, Li H, Xue F. A novel constraint-based structure learning algorithm using marginal causal prior knowledge. Sci Rep 2024; 14:19279. [PMID: 39164273 PMCID: PMC11335901 DOI: 10.1038/s41598-024-68379-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 07/23/2024] [Indexed: 08/22/2024] Open
Abstract
Causal discovery with prior knowledge is important for improving performance. We consider the incorporation of marginal causal relations, which correspond to the presence or absence of directed paths in a causal model. We propose the Marginal Prior Causal Knowledge PC (MPPC) algorithm to incorporate marginal causal relations into a constraint-based structure learning algorithm. We provide the theorems of conditional independence properties by combining observational data and marginal causal relations. We compare the MPPC algorithm with other structure learning methods in both simulation studies and real-world networks. The results indicate that, compare with other constraint-based structure learning methods, MPPC algorithm can incorporate marginal causal relations and is more effective and more efficient.
Collapse
Affiliation(s)
- Yifan Yu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Lei Hou
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Xinhui Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Sijia Wu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Hongkai Li
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000.
| | - Fuzhong Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000.
| |
Collapse
|
7
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
8
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. Biophys J 2024; 123:221-234. [PMID: 38102827 PMCID: PMC10808046 DOI: 10.1016/j.bpj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/18/2023] [Accepted: 12/12/2023] [Indexed: 12/17/2023] Open
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present a modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding rational strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally determined from experiments, augmented with dynamical network computations involving endpoint objective functions, mutual information, change-point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method, based on the strategies described above. The cybernetic-inspired method is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this innovative framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights.
Collapse
Affiliation(s)
- Rubesh Raja
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Sana Khanum
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Lina Aboulmouna
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Mano R Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shakti Gupta
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California; Departments of Computer Science and Engineering, Cellular and Molecular Medicine, San Diego Supercomputer Center, and the Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California.
| | - Doraiswami Ramkrishna
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
9
|
Kim D, Heo Y, Kim M, Suminda GGD, Manzoor U, Min Y, Kim M, Yang J, Park Y, Zhao Y, Ghosh M, Son YO. Inhibitory effects of Acanthopanax sessiliflorus Harms extract on the etiology of rheumatoid arthritis in a collagen-induced arthritis mouse model. Arthritis Res Ther 2024; 26:11. [PMID: 38167214 PMCID: PMC10763440 DOI: 10.1186/s13075-023-03241-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The biological function of Acanthopanax sessiliflorus Harm (ASH) has been investigated on various diseases; however, the effects of ASH on arthritis have not been investigated so far. This study investigates the effects of ASH on rheumatoid arthritis (RA). METHODS Supercritical carbon dioxide (CO2) was used for ASH extract preparation, and its primary components, pimaric and kaurenoic acids, were identified using gas chromatography-mass spectrometer (GC-MS). Collagenase-induced arthritis (CIA) was used as the RA model, and primary cultures of articular chondrocytes were used to examine the inhibitory effects of ASH extract on arthritis in three synovial joints: ankle, sole, and knee. RESULTS Pimaric and kaurenoic acids attenuated pro-inflammatory cytokine-mediated increase in the catabolic factors and retrieved pro-inflammatory cytokine-mediated decrease in related anabolic factors in vitro; however, they did not affect pro-inflammatory cytokine (IL-1β, TNF-α, and IL-6)-mediated cytotoxicity. ASH effectively inhibited cartilage degradation in the knee, ankle, and toe in the CIA model and decreased pannus development in the knee. Immunohistochemistry demonstrated that ASH mostly inhibited the IL-6-mediated matrix metalloproteinase. Gene Ontology and pathway studies bridge major gaps in the literature and provide insights into the pathophysiology and in-depth mechanisms of RA-like joint degeneration. CONCLUSIONS To the best of our knowledge, this is the first study to conduct extensive research on the efficacy of ASH extract in inhibiting the pathogenesis of RA. However, additional animal models and clinical studies are required to validate this hypothesis.
Collapse
Affiliation(s)
- Dahye Kim
- Division of Animal Genetics and Bioinformatics, National Institute of Animal Science, RDA, Wanju, Republic of Korea
| | - Yunji Heo
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Mangeun Kim
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Godagama Gamaarachchige Dinesh Suminda
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Umar Manzoor
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
- Laboratory of Immune and Inflammatory Disease, College of Pharmacy, Jeju Research Institute of Pharmaceutical Sciences, Jeju National University, Jeju, 63243, Republic of Korea
| | - Yunhui Min
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Minhye Kim
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Jiwon Yang
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Youngjun Park
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
- Laboratory of Immune and Inflammatory Disease, College of Pharmacy, Jeju Research Institute of Pharmaceutical Sciences, Jeju National University, Jeju, 63243, Republic of Korea
| | - Yaping Zhao
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
| | - Mrinmoy Ghosh
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Department of Biotechnology, School of Bio, Chemical and Processing Engineering (SBCE), Kalasalingam Academy of Research and Education, Krishnankoil, Srivilliputhur, 626126, India.
| | - Young-Ok Son
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Practical Translational Research Center, Jeju National University, Jeju, 63243, Republic of Korea.
| |
Collapse
|
10
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
11
|
Shi W, Zhong B, Dong J, Hu X, Li L. Super enhancer-driven core transcriptional regulatory circuitry crosstalk with cancer plasticity and patient mortality in triple-negative breast cancer. Front Genet 2023; 14:1258862. [PMID: 37900187 PMCID: PMC10602724 DOI: 10.3389/fgene.2023.1258862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/02/2023] [Indexed: 10/31/2023] Open
Abstract
Triple-negative breast cancer (TNBC) is a clinically aggressive subtype of breast cancer. Core transcriptional regulatory circuitry (CRC) consists of autoregulated transcription factors (TFs) and their enhancers, which dominate gene expression programs and control cell fate. However, there is limited knowledge of CRC in TNBC. Herein, we systemically characterized the activated super-enhancers (SEs) and interrogated 14 CRCs in breast cancer. We found that CRCs could be broadly involved in DNA conformation change, metabolism process, and signaling response affecting the gene expression reprogramming. Furthermore, these CRC TFs are capable of coordinating with partner TFs bridging the enhancer-promoter loops. Notably, the CRC TF and partner pairs show remarkable specificity for molecular subtypes of breast cancer, especially in TNBC. USF1, SOX4, and MYBL2 were identified as the TNBC-specific CRC TFs. We further demonstrated that USF1 was a TNBC immunophenotype-related TF. Our findings that the rewiring of enhancer-driven CRCs was related to cancer immune and mortality, will facilitate the development of epigenetic anti-cancer treatment strategies.
Collapse
Affiliation(s)
- Wensheng Shi
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Bowen Zhong
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Jiaming Dong
- Department of Radiation, Cangzhou Central Hospital, Changsha, China
| | - Xiheng Hu
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Lingfang Li
- Department of Cardiovascular Medicine, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
12
|
Barbagallo C, Stella M, Ferrara C, Caponnetto A, Battaglia R, Barbagallo D, Di Pietro C, Ragusa M. RNA-RNA competitive interactions: a molecular civil war ruling cell physiology and diseases. EXPLORATION OF MEDICINE 2023:504-540. [DOI: 10.37349/emed.2023.00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/02/2023] [Indexed: 09/02/2023] Open
Abstract
The idea that proteins are the main determining factors in the functioning of cells and organisms, and their dysfunctions are the first cause of pathologies, has been predominant in biology and biomedicine until recently. This protein-centered view was too simplistic and failed to explain the physiological and pathological complexity of the cell. About 80% of the human genome is dynamically and pervasively transcribed, mostly as non-protein-coding RNAs (ncRNAs), which competitively interact with each other and with coding RNAs generating a complex RNA network regulating RNA processing, stability, and translation and, accordingly, fine-tuning the gene expression of the cells. Qualitative and quantitative dysregulations of RNA-RNA interaction networks are strongly involved in the onset and progression of many pathologies, including cancers and degenerative diseases. This review will summarize the RNA species involved in the competitive endogenous RNA network, their mechanisms of action, and involvement in pathological phenotypes. Moreover, it will give an overview of the most advanced experimental and computational methods to dissect and rebuild RNA networks.
Collapse
Affiliation(s)
- Cristina Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Michele Stella
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | | | - Angela Caponnetto
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Rosalia Battaglia
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Davide Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Cinzia Di Pietro
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Marco Ragusa
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| |
Collapse
|
13
|
Li L, Sun L, Chen G, Wong CW, Ching WK, Liu ZP. LogBTF: gene regulatory network inference using Boolean threshold network model from single-cell gene expression data. Bioinformatics 2023; 39:btad256. [PMID: 37079737 PMCID: PMC10172039 DOI: 10.1093/bioinformatics/btad256] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/25/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION The source data and code are available at https://github.com/zpliulab/LogBTF.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Liangjie Sun
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Chi-Wing Wong
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Wai-Ki Ching
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
14
|
Xu J, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39:btad165. [PMID: 37004161 PMCID: PMC10085635 DOI: 10.1093/bioinformatics/btad165] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/28/2023] [Accepted: 03/25/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) technologies provide an opportunity to infer cell-specific gene regulatory networks (GRNs), which is an important challenge in systems biology. Although numerous methods have been developed for inferring GRNs from scRNA-seq data, it is still a challenge to deal with cellular heterogeneity. RESULTS To address this challenge, we developed an interpretable transformer-based method namely STGRNS for inferring GRNs from scRNA-seq data. In this algorithm, gene expression motif technique was proposed to convert gene pairs into contiguous sub-vectors, which can be used as input for the transformer encoder. By avoiding missing phase-specific regulations in a network, gene expression motif can improve the accuracy of GRN inference for different types of scRNA-seq data. To assess the performance of STGRNS, we implemented the comparative experiments with some popular methods on extensive benchmark datasets including 21 static and 27 time-series scRNA-seq dataset. All the results show that STGRNS is superior to other comparative methods. In addition, STGRNS was also proved to be more interpretable than "black box" deep learning methods, which are well-known for the difficulty to explain the predictions clearly. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/zhanglab-wbgcas/STGRNS.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074 China
| |
Collapse
|
15
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.21.533676. [PMID: 36993235 PMCID: PMC10055344 DOI: 10.1101/2023.03.21.533676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present an elegant modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding entirely novel strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally-determined from experiments, augmented with dynamical network computations involving end point objective functions, mutual information, change point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method (CIM), utilizing the strategies described above. The CIM is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this state-of-the-art framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights. STATEMENT OF SIGNIFICANCE Cellular processes like cell cycle are overly complex, involving multiple players interacting at multiple levels, and explicit modeling of such systems is challenging. The availability of longitudinal RNA measurements provides an opportunity to "reverse-engineer" for novel regulatory models. We develop a novel framework, inspired using goal-oriented cybernetic model, to implicitly model transcriptional regulation by constraining the system using inferred temporal goals. A preliminary causal network based on information-theory is used as a starting point, and our framework is used to distill the network to temporally-based networks containing essential molecular players. The strength of this approach is its ability to dynamically model the RNA temporal measurements. The approach developed paves the way for inferring regulatory processes in many complex cellular processes.
Collapse
|
16
|
Wang Y, Liu C, Qiao X, Han X, Liu ZP. PKI: A bioinformatics method of quantifying the importance of nodes in gene regulatory network via a pseudo knockout index. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2023; 1866:194911. [PMID: 36804477 DOI: 10.1016/j.bbagrm.2023.194911] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/09/2023] [Accepted: 01/30/2023] [Indexed: 02/18/2023]
Abstract
BACKGROUND Gene regulatory network (GRN) is a model that characterizes the complex relationships between genes and thereby provides an informatics environment to measure the importance of nodes. The evaluation of important nodes in a GRN can effectively refer to their functional implications severing as key players in particular biological processes, such as master regulator and driver gene. Currently, it is mainly based on network topological parameters and focuses only on evaluating a single node individually. However, genes and products play their functions by interacting with each other. It is worth noting that the effects of gene combinations in GRN are not simply additive. Key combinations discovery is of significance in revealing gene sets with important functions. Recently, with the development of single-cell RNA-sequencing (scRNA-seq) technology, we can quantify gene expression profiles of individual cells that provide the potential to identify crucial nodes in gene regulations regarding specific condition, e.g., stem cell differentiation. RESULTS In this paper, we propose a bioinformatics method, called Pseudo Knockout Importance (PKI), to quantify the importance of node and node sets in a specific GRN structure using time-course scRNA-seq data. First, we construct ordinary differential equations to approach the gene regulations during cell differentiation. Then we design gene pseudo knockout experiments and define PKI score evaluation criteria based on the coefficient of determination. The importance of nodes can be described as the influence on the ODE system of removing variables. For key gene combinations, PKI is derived as a combinatorial optimization problem of quantifying the in silico gene knockout effects. CONCLUSIONS Here, we focus our analyses on the specific GRN of embryonic stem cells with time series gene expression profile. To verify the effectiveness and advantage of PKI method, we compare its node importance rankings with other twelve kinds of centrality-based methods, such as degree and Latora closeness. For key node combinations, we compare the results with the method based on minimum dominant set. Moreover, the famous combinations of transcription factors in induced pluripotent stem cell are also employed to verify the vital gene combinations identified by PKI. These results demonstrate the reliability and superiority of the proposed method.
Collapse
Affiliation(s)
- Yijuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Chao Liu
- Department of Orthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi 753-8511, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
17
|
Jiang X, Liu K, Peng H, Fang J, Zhang A, Han Y, Zhang X. Comparative network analysis reveals the dynamics of organic acid diversity during fruit ripening in peach (Prunus persica L. Batsch). BMC PLANT BIOLOGY 2023; 23:16. [PMID: 36617558 PMCID: PMC9827700 DOI: 10.1186/s12870-023-04037-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 01/02/2023] [Indexed: 06/17/2023]
Abstract
BACKGROUND Organic acids are important components that determine the fruit flavor of peach (Prunus persica L. Batsch). However, the dynamics of organic acid diversity during fruit ripening and the key genes that modulate the organic acids metabolism remain largely unknown in this kind of fruit tree which yield ranks sixth in the world. RESULTS In this study, we used 3D transcriptome data containing three dimensions of information, namely time, phenotype and gene expression, from 5 different varieties of peach to construct gene co-expression networks throughout fruit ripening of peach. With the network inferred, the time-ordered network comparative analysis was performed to select high-acid specific gene co-expression network and then clarify the regulatory factors controlling organic acid accumulation. As a result, network modules related to organic acid synthesis and metabolism under high-acid and low-acid comparison conditions were identified for our following research. In addition, we obtained 20 candidate genes as regulatory factors related to organic acid metabolism in peach. CONCLUSIONS The study provides new insights into the dynamics of organic acid accumulation during fruit ripening, complements the results of classical co-expression network analysis and establishes a foundation for key genes discovery from time-series multiple species transcriptome data.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jing Fang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Yuepeng Han
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| |
Collapse
|
18
|
Amy Lyu MJ, Tang Q, Wang Y, Essemine J, Chen F, Ni X, Chen G, Zhu XG. Evolution of gene regulatory network of C 4 photosynthesis in the genus Flaveria reveals the evolutionary status of C 3-C 4 intermediate species. PLANT COMMUNICATIONS 2023; 4:100426. [PMID: 35986514 PMCID: PMC9860191 DOI: 10.1016/j.xplc.2022.100426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/16/2022] [Accepted: 08/11/2022] [Indexed: 06/15/2023]
Abstract
C4 photosynthesis evolved from ancestral C3 photosynthesis by recruiting pre-existing genes to fulfill new functions. The enzymes and transporters required for the C4 metabolic pathway have been intensively studied and well documented; however, the transcription factors (TFs) that regulate these C4 metabolic genes are not yet well understood. In particular, how the TF regulatory network of C4 metabolic genes was rewired during the evolutionary process is unclear. Here, we constructed gene regulatory networks (GRNs) for four closely evolutionarily related species from the genus Flaveria, which represent four different evolutionary stages of C4 photosynthesis: C3 (F. robusta), type I C3-C4 (F. sonorensis), type II C3-C4 (F. ramosissima), and C4 (F. trinervia). Our results show that more than half of the co-regulatory relationships between TFs and core C4 metabolic genes are species specific. The counterparts of the C4 genes in C3 species were already co-regulated with photosynthesis-related genes, whereas the required TFs for C4 photosynthesis were recruited later. The TFs involved in C4 photosynthesis were widely recruited in the type I C3-C4 species; nevertheless, type II C3-C4 species showed a divergent GRN from C4 species. In line with these findings, a 13CO2 pulse-labeling experiment showed that the CO2 initially fixed into C4 acid was not directly released to the Calvin-Benson-Bassham cycle in the type II C3-C4 species. Therefore, our study uncovered dynamic changes in C4 genes and TF co-regulation during the evolutionary process; furthermore, we showed that the metabolic pathway of the type II C3-C4 species F. ramosissima represents an alternative evolutionary solution to the ammonia imbalance in C3-C4 intermediate species.
Collapse
Affiliation(s)
- Ming-Ju Amy Lyu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Qiming Tang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Yanjie Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Jemaa Essemine
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Faming Chen
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Xiaoxiang Ni
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Genyun Chen
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Xin-Guang Zhu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
19
|
Fan Z, Kernan KF, Sriram A, Benos PV, Canna SW, Carcillo JA, Kim S, Park HJ. Deep neural networks with knockoff features identify nonlinear causal relations and estimate effect sizes in complex biological systems. Gigascience 2022; 12:giad044. [PMID: 37395630 PMCID: PMC10316696 DOI: 10.1093/gigascience/giad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/31/2023] [Accepted: 05/29/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Learning the causal structure helps identify risk factors, disease mechanisms, and candidate therapeutics for complex diseases. However, although complex biological systems are characterized by nonlinear associations, existing bioinformatic methods of causal inference cannot identify the nonlinear relationships and estimate their effect size. RESULTS To overcome these limitations, we developed the first computational method that explicitly learns nonlinear causal relations and estimates the effect size using a deep neural network approach coupled with the knockoff framework, named causal directed acyclic graphs using deep learning variable selection (DAG-deepVASE). Using simulation data of diverse scenarios and identifying known and novel causal relations in molecular and clinical data of various diseases, we demonstrated that DAG-deepVASE consistently outperforms existing methods in identifying true and known causal relations. In the analyses, we also illustrate how identifying nonlinear causal relations and estimating their effect size help understand the complex disease pathobiology, which is not possible using other methods. CONCLUSIONS With these advantages, the application of DAG-deepVASE can help identify driver genes and therapeutic agents in biomedical studies and clinical trials.
Collapse
Affiliation(s)
- Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kate F Kernan
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Aditya Sriram
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, USA
| | - Scott W Canna
- Pediatric Rheumatology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Joseph A Carcillo
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Soyeon Kim
- Division of Pediatric Pulmonary Medicine, Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Hyun Jung Park
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
20
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
21
|
Jia Z, Zhang X. Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes. Front Genet 2022; 13:923339. [PMID: 36568360 PMCID: PMC9768335 DOI: 10.3389/fgene.2022.923339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/08/2022] [Indexed: 12/12/2022] Open
Abstract
Accurate determination of causalities between genes is a challenge in the inference of gene regulatory networks (GRNs) from the gene expression profile. Although many methods have been developed for the reconstruction of GRNs, most of them are insufficient in determining causalities or regulatory directions. In this work, we present a novel method, namely, DDTG, to improve the accuracy of causality determination in GRN inference by dissecting downstream target genes. In the proposed method, the topology and hierarchy of GRNs are determined by mutual information and conditional mutual information, and the regulatory directions of GRNs are determined by Taylor formula-based regression. In addition, indirect interactions are removed with the sparseness of the network topology to improve the accuracy of network inference. The method is validated on the benchmark GRNs from DREAM3 and DREAM4 challenges. The results demonstrate the superior performance of the DDTG method on causality determination of GRNs compared to some popular GRN inference methods. This work provides a useful tool to infer the causal gene regulatory network.
Collapse
Affiliation(s)
- Zhigang Jia
- School of Mathematics and Statistics, Xinyang Normal University, Xinyang, China,Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China,*Correspondence: Xiujun Zhang,
| |
Collapse
|
22
|
The Analysis of Relevant Gene Networks Based on Driver Genes in Breast Cancer. Diagnostics (Basel) 2022; 12:diagnostics12112882. [PMID: 36428940 PMCID: PMC9689550 DOI: 10.3390/diagnostics12112882] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The occurrence and development of breast cancer has a strong correlation with a person's genetics. Therefore, it is important to analyze the genetic factors of breast cancer for future development of potential targeted therapies from the genetic level. METHODS In this study, we complete an analysis of the relevant protein-protein interaction network relating to breast cancer. This includes three steps, which are breast cancer-relevant genes selection using mutual information method, protein-protein interaction network reconstruction based on the STRING database, and vital genes calculating by nodes centrality analysis. RESULTS The 230 breast cancer-relevant genes were chosen in gene selection to reconstruct the protein-protein interaction network and some vital genes were calculated by node centrality analyses. Node centrality analyses conducted with the top 10 and top 20 values of each metric found 19 and 39 statistically vital genes, respectively. In order to prove the biological significance of these vital genes, we carried out the survival analysis and DNA methylation analysis, inquired about the prognosis in other cancer tissues and the RNA expression level in breast cancer. The results all proved the validity of the selected genes. CONCLUSIONS These genes could provide a valuable reference in clinical treatment among breast cancer patients.
Collapse
|
23
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
24
|
Kelly J, Berzuini C, Keavney B, Tomaszewski M, Guo H. A review of causal discovery methods for molecular network analysis. Mol Genet Genomic Med 2022; 10:e2055. [PMID: 36087049 PMCID: PMC9544222 DOI: 10.1002/mgg3.2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/12/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology andgenetics. There are an increasing number of methodlogies that have been developed and applied to moleular networks to investigate these causal interactions. METHODS We have introduced and reviewed the available methods for building large-scale causal molecular networks that have been developed and applied in the past decade. RESULTS In this review we have identified and summarized the existing methods for infering causality in large-scale causal molecular networks, and discussed important factors that will need to be considered in future research in this area. CONCLUSION Existing methods to infering causal molecular networks have their own strengths and limitations so there is no one best approach, and it is instead down to the discretion of the researcher. This review also to discusses some of the current limitations to biological interpretation of these networks, and important factors to consider for future studies on molecular networks.
Collapse
Affiliation(s)
- Jack Kelly
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Carlo Berzuini
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Bernard Keavney
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Division of Cardiology and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Maciej Tomaszewski
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Manchester Heart Centre and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Hui Guo
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| |
Collapse
|
25
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:bbac219. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
26
|
Passemiers A, Moreau Y, Raimondi D. Fast and accurate inference of gene regulatory networks through robust precision matrix estimation. Bioinformatics 2022; 38:2802-2809. [PMID: 35561176 PMCID: PMC9113237 DOI: 10.1093/bioinformatics/btac178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/14/2022] [Accepted: 03/22/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Transcriptional regulation mechanisms allow cells to adapt and respond to external stimuli by altering gene expression. The possible cell transcriptional states are determined by the underlying gene regulatory network (GRN), and reliably inferring such network would be invaluable to understand biological processes and disease progression. RESULTS In this article, we present a novel method for the inference of GRNs, called PORTIA, which is based on robust precision matrix estimation, and we show that it positively compares with state-of-the-art methods while being orders of magnitude faster. We extensively validated PORTIA using the DREAM and MERLIN+P datasets as benchmarks. In addition, we propose a novel scoring metric that builds on graph-theoretical concepts. AVAILABILITY AND IMPLEMENTATION The code and instructions for data acquisition and full reproduction of our results are available at https://github.com/AntoinePassemiers/PORTIA-Manuscript. PORTIA is available on PyPI as a Python package (portia-grn). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
27
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
28
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
29
|
Zhang H, Chen J, Tian T. Bayesian Inference of Stochastic Dynamic Models Using Early-Rejection Methods Based on Sequential Stochastic Simulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1484-1494. [PMID: 33216717 DOI: 10.1109/tcbb.2020.3039490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Stochastic modelling is an important method to investigate the functions of noise in a wide range of biological systems. However, the parameter inference for stochastic models is still a challenging problem partially due to the large computing time required for stochastic simulations. To address this issue, we propose a novel early-rejection method by using sequential stochastic simulations. We first show that a large number of stochastic simulations are required to obtain reliable inference results. Instead of generating a large number of simulations for each parameter sample, we propose to generate these simulations in a number of stages. The simulation process will go to the next stage only if the accuracy of simulations at the current stage satisfies a given error criterion. We propose a formula to determine the error criterion and use a stochastic differential equation model to examine the effects of different criteria. Three biochemical network models are used to evaluate the efficiency and accuracy of the proposed method. Numerical results suggest the proposed early-rejection method achieves substantial improvement in the efficiency for the inference of stochastic models.
Collapse
|
30
|
Hernández-Gómez C, Hernández-Lemus E, Espinal-Enríquez J. The Role of Copy Number Variants in Gene Co-Expression Patterns for Luminal B Breast Tumors. Front Genet 2022; 13:806607. [PMID: 35432489 PMCID: PMC9010943 DOI: 10.3389/fgene.2022.806607] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/03/2022] [Indexed: 12/20/2022] Open
Abstract
Gene co-expression networks have become a usual approach to integrate the vast amounts of information coming from gene expression studies in cancer cohorts. The reprogramming of the gene regulatory control and the molecular pathways depending on such control are central to the characterization of the disease, aiming to unveil the consequences for cancer prognosis and therapeutics. There is, however, a multitude of factors which have been associated with anomalous control of gene expression in cancer. In the particular case of co-expression patterns, we have previously documented a phenomenon of loss of long distance co-expression in several cancer types, including breast cancer. Of the many potential factors that may contribute to this phenomenology, copy number variants (CNVs) have been often discussed. However, no systematic assessment of the role that CNVs may play in shaping gene co-expression patterns in breast cancer has been performed to date. For this reason we have decided to develop such analysis. In this study, we focus on using probabilistic modeling techniques to evaluate to what extent CNVs affect the phenomenon of long/short range co-expression in Luminal B breast tumors. We analyzed the co-expression patterns in chromosome 8, since it is known to be affected by amplifications/deletions during cancer development. We found that the CNVs pattern in chromosome 8 of Luminal B network does not alter the co-expression patterns significantly, which means that the co-expression program in this cancer phenotype is not determined by CNV structure. Additionally, we found that region 8q24.3 is highly dense in interactions, as well as region p21.3. The most connected genes in this network belong to those cytobands and are associated with several manifestations of cancer in different tissues. Interestingly, among the most connected genes, we found MAF1 and POLR3D, which may constitute an axis of regulation of gene transcription, in particular for non-coding RNA species. We believe that by advancing on our knowledge of the molecular mechanisms behind gene regulation in cancer, we will be better equipped, not only to understand tumor biology, but also to broaden the scope of diagnostic, prognostic and therapeutic interventions to ultimately benefit oncologic patients.
Collapse
Affiliation(s)
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
- *Correspondence: Jesús Espinal-Enríquez, ; Enrique Hernández-Lemus,
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
- *Correspondence: Jesús Espinal-Enríquez, ; Enrique Hernández-Lemus,
| |
Collapse
|
31
|
Degree of Freedom of Gene Expression in Saccharomyces cerevisiae. Microbiol Spectr 2022; 10:e0083821. [PMID: 35230153 PMCID: PMC9045123 DOI: 10.1128/spectrum.00838-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The complexity of genome-wide gene expression has not yet been adequately addressed due to a lack of comprehensive statistical analyses. In the present study, we introduce degree of freedom (DOF) as a summary statistic for evaluating gene expression complexity. Because DOF can be interpreted by a state-space representation, application of the DOF is highly useful for understanding gene activities. We used over 11,000 gene expression data sets to reveal that the DOF of gene expression in Saccharomyces cerevisiae is not greater than 450. We further demonstrated that various degrees of freedom of gene expression can be interpreted by different sequence motifs within promoter regions and Gene Ontology (GO) terms. The well-known TATA box is the most significant one among the identified motifs, while the GO term "ribosome genesis" is an associated biological process. On the basis of transcriptional freedom, our findings suggest that the regulation of gene expression can be modeled using only a few state variables. IMPORTANCE Yeast works like a well-organized factory. Each of its components works in its own way, while affecting the activities of others. The order of all activities is largely governed by the regulation of gene expression. In recent decades, biologists have recognized many regulations for yeast genes. However, it is not known how closely the regulation links each gene together to make all components of the cell work as a whole. In other words, biologists are very interested in how many independent control factors are needed to operate an artificial "cell" that works the same as a real one. In this work, we suggested that only 450 control factors were sufficient to represent the regulation of all 5800 yeast genes.
Collapse
|
32
|
Feng H, Zheng R, Wang J, Wu FX, Li M. NIMCE: A Gene Regulatory Network Inference Approach Based on Multi Time Delays Causal Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1042-1049. [PMID: 33035155 DOI: 10.1109/tcbb.2020.3029846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.
Collapse
|
33
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
34
|
Wang Y, Liu ZP. Identifying biomarkers for breast cancer by gene regulatory network rewiring. BMC Bioinformatics 2022; 22:308. [PMID: 35045805 PMCID: PMC8772043 DOI: 10.1186/s12859-021-04225-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 12/09/2022] Open
Abstract
Background Mining gene regulatory network (GRN) is an important avenue for addressing cancer mechanism. Mutations in cancer genome perturb GRN and cause a rewiring in an orchestrated network. Hence, the exploration of gene regulatory network rewiring is significant to discover potential biomarkers and indicators for discriminating cancer phenotypes. Results Here, we propose a new bioinformatics method of identifying biomarkers based on network rewiring in different states. It firstly reconstructs GRN in different phenotypic conditions from gene expression data with a priori background network. We employ the algorithm based on path consistency algorithm and conditional mutual information to delete false-positive regulatory interactions between independent nodes/genes or not closely related gene pairs. And then a differential gene regulatory network (D-GRN) is constructed from the rewiring parts in the two phenotype-specific GRNs. Community detection technique is then applied for D-GRN to detect functional modules. Finally, we apply logistic regression classifier with recursive feature elimination to select biomarker genes in each module individually. The extracted feature genes result in a gene set of biomarkers with impressing ability to distinguish normal samples from controls. We verify the identified biomarkers in external independent validation datasets. For a proof-of-concept study, we apply the framework to identify diagnostic biomarkers of breast cancer. The identified biomarkers obtain a maximum AUC of 0.985 in the internal sample classification experiments. And these biomarkers achieve a maximum AUC of 0.989 in the external validations. Conclusion In conclusion, network rewiring reveals significant differences between different phenotypes, which indicating cancer dysfunctional mechanisms. With the development of sequencing technology, the amount and quality of gene expression data become available. Condition-specific gene regulatory networks that are close to the real regulations in different states will be established. Revealing the network rewiring will greatly benefit the discovery of biomarkers or signatures for phenotypes. D-GRN is a general method to meet this demand of deciphering the high-throughput data for biomarker discovery. It is also easy to be extended for identifying biomarkers of other complex diseases beyond breast cancer.
Collapse
Affiliation(s)
- Yijuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.
| |
Collapse
|
35
|
Disentangling direct from indirect relationships in association networks. Proc Natl Acad Sci U S A 2022; 119:2109995119. [PMID: 34992138 PMCID: PMC8764688 DOI: 10.1073/pnas.2109995119] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2021] [Indexed: 11/18/2022] Open
Abstract
Networks are vital tools for understanding and modeling interactions in complex systems in science and engineering, and direct and indirect interactions are pervasive in all types of networks. However, quantitatively disentangling direct and indirect relationships in networks remains a formidable task. Here, we present a framework, called iDIRECT (Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity), for quantitatively inferring direct dependencies in association networks. Using copula-based transitivity, iDIRECT eliminates/ameliorates several challenging mathematical problems, including ill-conditioning, self-looping, and interaction strength overflow. With simulation data as benchmark examples, iDIRECT showed high prediction accuracies. Application of iDIRECT to reconstruct gene regulatory networks in Escherichia coli also revealed considerably higher prediction power than the best-performing approaches in the DREAM5 (Dialogue on Reverse Engineering Assessment and Methods project, #5) Network Inference Challenge. In addition, applying iDIRECT to highly diverse grassland soil microbial communities in response to climate warming showed that the iDIRECT-processed networks were significantly different from the original networks, with considerably fewer nodes, links, and connectivity, but higher relative modularity. Further analysis revealed that the iDIRECT-processed network was more complex under warming than the control and more robust to both random and target species removal (P < 0.001). As a general approach, iDIRECT has great advantages for network inference, and it should be widely applicable to infer direct relationships in association networks across diverse disciplines in science and engineering.
Collapse
|
36
|
Abdulkadhar S, Natarajan J. A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature. Methods Mol Biol 2022; 2496:141-157. [PMID: 35713863 DOI: 10.1007/978-1-0716-2305-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A biological pathway or regulatory network is a collection of molecular regulators which can activate the changes in cellular processes leading to an assembly of new molecules by series of actions among the molecules. There are three important pathways in system biology studies namely signaling pathways, metabolic pathways, and genetic pathways (or) gene regulatory networks. Recently, biological pathway construction from scientific literature is given much attention as the scientific literature contains a rich set of linguistic features to extract biological associations between genes and proteins. These associations can be united to construct biological networks. Here, we present a brief overview about various biological pathways, biomedical text resources/corpora for network construction and state-of-the-art existing methods for network construction followed by our hybrid text mining protocol for extracting pathways and regulatory networks from biomedical literature.
Collapse
Affiliation(s)
- Sabenabanu Abdulkadhar
- Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, India
| | - Jeyakumar Natarajan
- Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, India.
| |
Collapse
|
37
|
Han J, Perera S, Wunderlich Z, Periwal V. Mechanistic gene networks inferred from single-cell data with an outlier-insensitive method. Math Biosci 2021; 342:108722. [PMID: 34688607 PMCID: PMC8722367 DOI: 10.1016/j.mbs.2021.108722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 08/25/2021] [Accepted: 08/25/2021] [Indexed: 11/28/2022]
Abstract
With advances in single-cell techniques, measuring gene dynamics at cellular resolution has become practicable. In contrast, the increased complexity of data has made it more challenging computationally to unravel underlying biological mechanisms. Thus, it is critical to develop novel computational methods capable of dealing with such complexity and of providing predictive deductions from such data. Many methods have been developed to address such challenges, each with its own advantages and limitations. We present an iterative regression algorithm for inferring a mechanistic gene network from single-cell data, especially suited to overcoming problems posed by measurement outliers. Using this regression, we infer a developmental model for the gene dynamics in Drosophila melanogaster blastoderm embryo. Our results show that the predictive power of the inferred model is higher than that of other models inferred with least squares and ridge regressions. As a baseline for how well a mechanistic model should be expected to perform, we find that model predictions of the gene dynamics are more accurate than predictions made with neural networks of varying architectures and complexity. This holds true even in the limit of small sample sizes. We compare predictions for various gene knockouts with published experimental results, finding substantial qualitative agreement. We also make predictions for gene dynamics under various gene network perturbations, impossible in non-mechanistic models.
Collapse
Affiliation(s)
- Jungmin Han
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Sudheesha Perera
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92617, United States of America.
| | - Vipul Periwal
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| |
Collapse
|
38
|
Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 2021; 12:6775. [PMID: 34811351 PMCID: PMC8635420 DOI: 10.1038/s41467-021-27137-3] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.
Collapse
Affiliation(s)
- Qing Ye
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China ,grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China ,grid.13402.340000 0004 1759 700XState Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058 China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China
| | - Jiming Chen
- grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
39
|
Chen J, Cheong C, Lan L, Zhou X, Liu J, Lyu A, Cheung WK, Zhang L. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data. Brief Bioinform 2021; 22:bbab325. [PMID: 34424948 PMCID: PMC8499812 DOI: 10.1093/bib/bbab325] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/12/2021] [Accepted: 07/26/2021] [Indexed: 01/11/2023] Open
Abstract
Single-cell RNA sequencing has enabled to capture the gene activities at single-cell resolution, thus allowing reconstruction of cell-type-specific gene regulatory networks (GRNs). The available algorithms for reconstructing GRNs are commonly designed for bulk RNA-seq data, and few of them are applicable to analyze scRNA-seq data by dealing with the dropout events and cellular heterogeneity. In this paper, we represent the joint gene expression distribution of a gene pair as an image and propose a novel supervised deep neural network called DeepDRIM which utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRN from scRNA-seq data. Due to the consideration of TF-gene pair's neighborhood context, DeepDRIM can effectively eliminate the false positives caused by transitive gene-gene interactions. We compared DeepDRIM with nine GRN reconstruction algorithms designed for either bulk or single-cell RNA-seq data. It achieves evidently better performance for the scRNA-seq data collected from eight cell lines. The simulated data show that DeepDRIM is robust to the dropout rate, the cell number and the size of the training data. We further applied DeepDRIM to the scRNA-seq gene expression of B cells from the bronchoalveolar lavage fluid of the patients with mild and severe coronavirus disease 2019. We focused on the cell-type-specific GRN alteration and observed targets of TFs that were differentially expressed between the two statuses to be enriched in lysosome, apoptosis, response to decreased oxygen level and microtubule, which had been proved to be associated with coronavirus infection.
Collapse
Affiliation(s)
- Jiaxing Chen
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - ChinWang Cheong
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Liang Lan
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Jiming Liu
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| |
Collapse
|
40
|
Shang J, Wang J, Sun Y, Li F, Liu JX, Zhang H. Multiscale part mutual information for quantifying nonlinear direct associations in networks. Bioinformatics 2021; 37:2920-2929. [PMID: 33730153 DOI: 10.1093/bioinformatics/btab182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/15/2021] [Accepted: 03/15/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jing Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Honghai Zhang
- College of Life Science, Qufu Normal University, Qufu 273165, China
| |
Collapse
|
41
|
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes. BIOLOGY 2021; 10:biology10090921. [PMID: 34571798 PMCID: PMC8469369 DOI: 10.3390/biology10090921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022]
Abstract
Simple Summary The interactions between SNPs, which are known as epistasis, can strongly influence the phenotype. Their detection is still a challenge, which is made even more difficult through the existence of background associations that can hide correct epistatic interactions. To address the limitations of existing methods, we present in this study our novel method MIDESP for the detection of epistatic SNP pairs. It is the first mutual information-based method that can be applied to both qualitative and quantitative phenotypes and which explicitly accounts for background associations in the dataset. Abstract The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens.
Collapse
|
42
|
Qi X, Lin Y, Chen J, Shen B. Decoding competing endogenous RNA networks for cancer biomarker discovery. Brief Bioinform 2021; 21:441-457. [PMID: 30715152 DOI: 10.1093/bib/bbz006] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 12/13/2018] [Accepted: 12/25/2018] [Indexed: 02/05/2023] Open
Abstract
Crosstalk between competing endogenous RNAs (ceRNAs) is mediated by shared microRNAs (miRNAs) and plays important roles both in normal physiology and tumorigenesis; thus, it is attractive for systems-level decoding of gene regulation. As ceRNA networks link the function of miRNAs with that of transcripts sharing the same miRNA response elements (MREs), e.g. pseudogenes, competing mRNAs, long non-coding RNAs, and circular RNAs, the perturbation of crucial interactions in ceRNA networks may contribute to carcinogenesis by affecting the balance of cellular regulatory system. Therefore, discovering biomarkers that indicate cancer initiation, development, and/or therapeutic responses via reconstructing and analyzing ceRNA networks is of clinical significance. In this review, the regulatory function of ceRNAs in cancer and crucial determinants of ceRNA crosstalk are firstly discussed to gain a global understanding of ceRNA-mediated carcinogenesis. Then, computational and experimental approaches for ceRNA network reconstruction and ceRNA validation, respectively, are described from a systems biology perspective. We focus on strategies for biomarker identification based on analyzing ceRNA networks and highlight the translational applications of ceRNA biomarkers for cancer management. This article will shed light on the significance of miRNA-mediated ceRNA interactions and provide important clues for discovering ceRNA network-based biomarker in cancer biology, thereby accelerating the pace of precision medicine and healthcare for cancer patients.
Collapse
Affiliation(s)
- Xin Qi
- Center for Systems Biology, Soochow University, Suzhou, China
| | - Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
43
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
44
|
Wang N, Lefaudeux D, Mazumder A, Li JJ, Hoffmann A. Identifying the combinatorial control of signal-dependent transcription factors. PLoS Comput Biol 2021; 17:e1009095. [PMID: 34166361 PMCID: PMC8263068 DOI: 10.1371/journal.pcbi.1009095] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 07/07/2021] [Accepted: 05/18/2021] [Indexed: 12/13/2022] Open
Abstract
The effectiveness of immune responses depends on the precision of stimulus-responsive gene expression programs. Cells specify which genes to express by activating stimulus-specific combinations of stimulus-induced transcription factors (TFs). Their activities are decoded by a gene regulatory strategy (GRS) associated with each response gene. Here, we examined whether the GRSs of target genes may be inferred from stimulus-response (input-output) datasets, which remains an unresolved model-identifiability challenge. We developed a mechanistic modeling framework and computational workflow to determine the identifiability of all possible combinations of synergistic (AND) or non-synergistic (OR) GRSs involving three transcription factors. Considering different sets of perturbations for stimulus-response studies, we found that two thirds of GRSs are easily distinguishable but that substantially more quantitative data is required to distinguish the remaining third. To enhance the accuracy of the inference with timecourse experimental data, we developed an advanced error model that avoids error overestimates by distinguishing between value and temporal error. Incorporating this error model into a Bayesian framework, we show that GRS models can be identified for individual genes by considering multiple datasets. Our analysis rationalizes the allocation of experimental resources by identifying most informative TF stimulation conditions. Applying this computational workflow to experimental data of immune response genes in macrophages, we found that a much greater fraction of genes are combinatorially controlled than previously reported by considering compensation among transcription factors. Specifically, we revealed that a group of known NFκB target genes may also be regulated by IRF3, which is supported by chromatin immuno-precipitation analysis. Our study provides a computational workflow for designing and interpreting stimulus-response gene expression studies to identify underlying gene regulatory strategies and further a mechanistic understanding.
Collapse
Affiliation(s)
- Ning Wang
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Diane Lefaudeux
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
| | - Anup Mazumder
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
| | - Jingyi Jessica Li
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Statistics, University of California, Los Angeles, California, United States of America
| | - Alexander Hoffmann
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
45
|
Integrated Inference of Asymmetric Protein Interaction Networks Using Dynamic Model and Individual Patient Proteomics Data. Symmetry (Basel) 2021. [DOI: 10.3390/sym13061097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Recent advances in experimental biology studies have produced large amount of molecular activity data. In particular, individual patient data provide non-time series information for the molecular activities in disease conditions. The challenge is how to design effective algorithms to infer regulatory networks using the individual patient datasets and consequently address the issue of network symmetry. This work is aimed at developing an efficient pipeline to reverse-engineer regulatory networks based on the individual patient proteomic data. The first step uses the SCOUT algorithm to infer the pseudo-time trajectory of individual patients. Then the path-consistent method with part mutual information is used to construct a static network that contains the potential protein interactions. To address the issue of network symmetry in terms of undirected symmetric network, a dynamic model of ordinary differential equations is used to further remove false interactions to derive asymmetric networks. In this work a dataset from triple-negative breast cancer patients is used to develop a protein-protein interaction network with 15 proteins.
Collapse
|
46
|
Jeong D, Lim S, Lee S, Oh M, Cho C, Seong H, Jung W, Kim S. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis. Front Genet 2021; 12:652623. [PMID: 34093651 PMCID: PMC8172963 DOI: 10.3389/fgene.2021.652623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/26/2021] [Indexed: 01/01/2023] Open
Abstract
Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.
Collapse
Affiliation(s)
- Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, South Korea
| | - Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Hyeju Seong
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul, South Korea
| |
Collapse
|
47
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
48
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
49
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
50
|
Zhang XF, Ou-Yang L, Yan T, Hu XT, Yan H. A Joint Graphical Model for Inferring Gene Networks Across Multiple Subpopulations and Data Types. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1043-1055. [PMID: 31794418 DOI: 10.1109/tcyb.2019.2952711] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstructing gene networks from gene expression data is a long-standing challenge. In most applications, the observations can be divided into several distinct but related subpopulations and the gene expression measurements can be collected from multiple data types. Most existing methods are designed to estimate a single gene network from a single dataset. These methods may be suboptimal since they do not exploit the similarities and differences among different subpopulations and data types. In this article, we propose a joint graphical model to estimate the multiple gene networks simultaneously. Our model decomposes each subpopulation-specific gene network as a sum of common and unique components and imposes a group lasso penalty on gene networks corresponding to different data types. The gene network variations across subpopulations can be learned automatically by the decompositions of networks, and the similarities and differences among data types can be captured by the group lasso penalty. The simulation studies demonstrate that our method outperforms the state-of-the-art methods. We also apply our method to the cancer genome atlas breast cancer datasets to reconstruct subtype-specific gene networks. Hub nodes in the estimated subnetworks unique to individual cancer subtypes rediscover well-known genes associated with breast cancer subtypes and provide interesting predictions.
Collapse
|