1
|
Wang YR, Du PF. WCSGNet: a graph neural network approach using weighted cell-specific networks for cell-type annotation in scRNA-seq. Front Genet 2025; 16:1553352. [PMID: 40034748 PMCID: PMC11872911 DOI: 10.3389/fgene.2025.1553352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Accepted: 01/27/2025] [Indexed: 03/05/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for understanding cellular heterogeneity, providing unprecedented resolution in molecular regulation analysis. Existing supervised learning approaches for cell type annotation primarily utilize gene expression profiles from scRNA-seq data. Although some methods incorporated gene interaction network information, they fail to use cell-specific gene association networks. This limitation overlooks the unique gene interaction patterns within individual cells, potentially compromising the accuracy of cell type classification. We introduce WCSGNet, a graph neural network-based algorithm for automatic cell-type annotation that leverages Weighted Cell-Specific Networks (WCSNs). These networks are constructed based on highly variable genes and inherently capture both gene expression patterns and gene association network structure features. Extensive experimental validation demonstrates that WCSGNet consistently achieves superior cell type classification performance, ranking among the top-performing methods while maintaining robust stability across diverse datasets. Notably, WCSGNet exhibits a distinct advantage in handling imbalanced datasets, outperforming existing methods in these challenging scenarios. All datasets and codes for reproducing this work were deposited in a GitHub repository (https://github.com/Yi-ellen/WCSGNet).
Collapse
Affiliation(s)
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
2
|
Chatrabgoun O, Daneshkhah A, Torkaman P, Johnston M, Sohrabi Safa N, Kashif Bashir A. Covariate-adjusted construction of gene regulatory networks using a combination of generalized linear model and penalized maximum likelihood. PLoS One 2025; 20:e0309556. [PMID: 39879184 PMCID: PMC11778759 DOI: 10.1371/journal.pone.0309556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 08/03/2024] [Indexed: 01/31/2025] Open
Abstract
Many machine learning techniques have been used to construct gene regulatory networks (GRNs) through precision matrix that considers conditional independence among genes, and finally produces sparse version of GRNs. This construction can be improved using the auxiliary information like gene expression profile of the related species or gene markers. To reach out this goal, we apply a generalized linear model (GLM) in first step and later a penalized maximum likelihood to construct the gene regulatory network using Glasso technique for the residuals of a multi-level multivariate GLM among the gene expressions of one species as a multi-levels response variable and the gene expression of related species as a multivariate covariates. By considering the intrinsic property of the gene data which the number of variables is much greater than the number of available samples, a bootstrap version of multi-response multivariate GLM is used. To find most appropriate related species, a cross-validation technique has been used to compute the minimum square error of the fitted GLM under different regularization. The penalized maximum likelihood under a lasso or elastic net penalty is applied on the residual of fitted GLM to find the sparse precision matrix. Finally, we show that the presented algorithm which is a combination of fitted GLM and applying the penalized maximum likelihood on the residual of the model is extremely fast, and can exploit sparsity in the constructed GRNs. Also, we exhibit flexibility of the proposed method presented in this paper by comparing with the other methods to demonstrate the super validity of our approach.
Collapse
Affiliation(s)
- Omid Chatrabgoun
- School of Computing, Electronics and Mathematics, Coventry University, Coventry, United Kingdom
- Department of Statistics, Faculty of Mathematical Sciences and Statistics, Malayer University, Malayer, Iran
| | - Alireza Daneshkhah
- Faculty of Mathematics and Data Science, Emirates Aviation University, Dubai, UAE
| | - Parisa Torkaman
- Department of Statistics, Faculty of Mathematical Sciences and Statistics, Malayer University, Malayer, Iran
| | - Mark Johnston
- School of Computing, Electronics and Mathematics, Coventry University, Coventry, United Kingdom
| | - Nader Sohrabi Safa
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| | - Ali Kashif Bashir
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| |
Collapse
|
3
|
Pan C, Chen Y. Informeasure: an R/bioconductor package for quantifying nonlinear dependence between variables in biological networks from an information theory perspective. BMC Bioinformatics 2024; 25:382. [PMID: 39695935 DOI: 10.1186/s12859-024-05996-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Using information measures to infer biological regulatory networks can capture nonlinear relationships between variables. However, it is computationally challenging, and there is a lack of convenient tools. RESULTS We introduce Informeasure, an R package designed to quantify nonlinear dependencies in biological regulatory networks from an information theory perspective. This package compiles a comprehensive set of information measurements, including mutual information, conditional mutual information, interaction information, partial information decomposition, and part mutual information. Mutual information is used for bivariate network inference, while the other four estimators are dedicated to trivariate network analysis. CONCLUSIONS Informeasure is a turnkey solution, allowing users to utilize these information measures immediately upon installation. Informeasure is available as an R/Bioconductor package at https://bioconductor.org/packages/Informeasure .
Collapse
Affiliation(s)
- Chu Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China.
| | - Yanlin Chen
- School of Software, Henan University of Engineering, Zhengzhou, Henan, China
| |
Collapse
|
4
|
Yang B, Li J, Li X, Liu S. Gene regulatory network inference based on novel ensemble method. Brief Funct Genomics 2024; 23:866-878. [PMID: 39324652 DOI: 10.1093/bfgp/elae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024] Open
Abstract
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Jing Li
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Xiang Li
- Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China
| | - Sanrong Liu
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| |
Collapse
|
5
|
Huang Y, Huang S, Zhang XF, Ou-Yang L, Liu C. NJGCG: A node-based joint Gaussian copula graphical model for gene networks inference across multiple states. Comput Struct Biotechnol J 2024; 23:3199-3210. [PMID: 39263209 PMCID: PMC11388165 DOI: 10.1016/j.csbj.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 08/05/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024] Open
Abstract
Inferring the interactions between genes is essential for understanding the mechanisms underlying biological processes. Gene networks will change along with the change of environment and state. The accumulation of gene expression data from multiple states makes it possible to estimate the gene networks in various states based on computational methods. However, most existing gene network inference methods focus on estimating a gene network from a single state, ignoring the similarities between networks in different but related states. Moreover, in addition to individual edges, similarities and differences between different networks may also be driven by hub genes. But existing network inference methods rarely consider hub genes, which affects the accuracy of network estimation. In this paper, we propose a novel node-based joint Gaussian copula graphical (NJGCG) model to infer multiple gene networks from gene expression data containing heterogeneous samples jointly. Our model can handle various gene expression data with missing values. Furthermore, a tree-structured group lasso penalty is designed to identify the common and specific hub genes in different gene networks. Simulation studies show that our proposed method outperforms other compared methods in all cases. We also apply NJGCG to infer the gene networks for different stages of differentiation in mouse embryonic stem cells and different subtypes of breast cancer, and explore changes in gene networks across different stages of differentiation or different subtypes of breast cancer. The common and specific hub genes in the estimated gene networks are closely related to stem cell differentiation processes and heterogeneity within breast cancers.
Collapse
Affiliation(s)
- Yun Huang
- Department of Geriatrics, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Clinical Research Center for Geriatric Hypertension Disease of Fujian province, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| | - Sen Huang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Chen Liu
- Department of Oncology, Molecular Oncology Research Institute, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Department of Oncology, National Regional Medical Center, Binhai Campus of The First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
- Fujian Key Laboratory of Precision Medicine for Cancer, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| |
Collapse
|
6
|
Niloofar P, Aghdam R, Eslahchi C. GAEM: Genetic Algorithm based Expectation-Maximization for inferring Gene Regulatory Networks from incomplete data. Comput Biol Med 2024; 183:109238. [PMID: 39426072 DOI: 10.1016/j.compbiomed.2024.109238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 09/02/2024] [Accepted: 09/30/2024] [Indexed: 10/21/2024]
Abstract
In Bioinformatics, inferring the structure of a Gene Regulatory Network (GRN) from incomplete gene expression data is a difficult task. One popular method for inferring the structure GRNs is to apply the Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). Although PCA-CMI excels at extracting GRN skeletons, it struggles with missing values in datasets. As a result, applying PCA-CMI to infer GRNs, necessitates a preprocessing method for data imputation. In this paper, we present the GAEM algorithm, which uses an iterative approach based on a combination of Genetic Algorithm and Expectation-Maximization to infer the structure of GRN from incomplete gene expression datasets. GAEM learns the GRN structure from the incomplete dataset via an algorithm that iteratively updates the imputed values based on the learnt GRN until the convergence criteria are met. We evaluate the performance of this algorithm under various missingness mechanisms (ignorable and nonignorable) and percentages (5%, 15%, and 40%). The traditional approach to handling missing values in gene expression datasets involves estimating them first and then constructing the GRN. However, our methodology differs in that both missing values and the GRN are updated iteratively until convergence. Results from the DREAM3 dataset demonstrate that the GAEM algorithm appears to be a more reliable method overall, especially for smaller network sizes, GAEM outperforms methods where the incomplete dataset is imputed first, followed by learning the GRN structure from the imputed data. We have implemented the GAEM algorithm within the GAEM R package, which is accessible at the following GitHub repository: https://github.com/parniSDU/GAEM.
Collapse
Affiliation(s)
- Parisa Niloofar
- Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense, 5230, Denmark.
| | - Rosa Aghdam
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, WI, Madison, USA; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Iran
| |
Collapse
|
7
|
Li C, Huang X, Luo X, Lin X. Gene regulatory network inference based on modified adaptive lasso. J Bioinform Comput Biol 2024; 22:2450026. [PMID: 39831426 DOI: 10.1142/s0219720024500264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.
Collapse
Affiliation(s)
- Chao Li
- College of Information Engineering, Dalian Key Laboratory of Smart Fisheries, Dalian Ocean University, Dalian 116023, Liaoning Province, P. R. China
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, Liaoning Province, P. R. China
| | - Xiaoran Huang
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, Liaoning Province, P. R. China
| | - Xiao Luo
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, Liaoning Province, P. R. China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, Dalian 116024, Liaoning Province, P. R. China
| |
Collapse
|
8
|
Sun C, Liu ZP. Discovering explainable biomarkers for breast cancer anti-PD1 response via network Shapley value analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 257:108481. [PMID: 39488042 DOI: 10.1016/j.cmpb.2024.108481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/20/2024] [Accepted: 10/24/2024] [Indexed: 11/04/2024]
Abstract
BACKGROUND AND OBJECTIVE Immunotherapy holds promise in enhancing pathological complete response rates in breast cancer, albeit confined to a select cohort of patients. Consequently, pinpointing factors predictive of treatment responsiveness is of paramount importance. Gene expression and regulation, inherently operating within intricate networks, constitute fundamental molecular machinery for cellular processes and often serve as robust biomarkers. Nevertheless, contemporary feature selection approaches grapple with two key challenges: opacity in modeling and scarcity in accounting for gene-gene interactions METHODS: To address these limitations, we devise a novel feature selection methodology grounded in cooperative game theory, harmoniously integrating with sophisticated machine learning models. This approach identifies interconnected gene regulatory network biomarker modules with priori genetic linkage architecture. Specifically, we leverage Shapley values on network to quantify feature importance, while strategically constraining their integration based on network expansion principles and nodal adjacency, thereby fostering enhanced interpretability in feature selection. We apply our methods to a publicly available single-cell RNA sequencing dataset of breast cancer immunotherapy responses, using the identified feature gene set as biomarkers. Functional enrichment analysis with independent validations further illustrates their effective predictive performance RESULTS: We demonstrate the sophistication and excellence of the proposed method in data with network structure. It unveiled a cohesive biomarker module encompassing 27 genes for immunotherapy response. Notably, this module proves adept at precisely predicting anti-PD1 therapeutic outcomes in breast cancer patients with classification accuracy of 0.905 and AUC value of 0.971, underscoring its unique capacity to illuminate gene functionalities CONCLUSION: The proposed method is effective for identifying network module biomarkers, and the detected anti-PD1 response biomarkers can enrich our understanding of the underlying physiological mechanisms of immunotherapy, which have a promising application for realizing precision medicine.
Collapse
Affiliation(s)
- Chenxi Sun
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
9
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
10
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
11
|
Emadi M, Boroujeni FZ, Pirgazi J. Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1816-1829. [PMID: 38963747 DOI: 10.1109/tcbb.2024.3423383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]
Abstract
Microarray data provide lots of information regarding gene expression levels. Due to the large amount of such data, their analysis requires sufficient computational methods for identifying and analyzing gene regulation networks; however, researchers in this field are faced with numerous challenges such as consideration for too many genes and at the same time, the limited number of samples and their noisy nature of the data. In this paper, a hybrid method base on fuzzy cognitive map and compressed sensing is used to identify interactions between genes. For this purpose, in inference of the gene regulation network, the Ensemble Kalman filtered compressed sensing is used to learn the fuzzy cognitive map. Using the Ensemble Kalman filter and compressed sensing, the fuzzy cognitive map will be robust against noise. The proposed algorithm is evaluated using several metrics and compared with several well know methods such as LASSOFCM, KFRegular, CMI2NI. The experimental results show that the proposed method outperforms methods proposed in recent years in terms of SSmean, Data Error and accuracy.
Collapse
|
12
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
13
|
Yu Y, Hou L, Liu X, Wu S, Li H, Xue F. A novel constraint-based structure learning algorithm using marginal causal prior knowledge. Sci Rep 2024; 14:19279. [PMID: 39164273 PMCID: PMC11335901 DOI: 10.1038/s41598-024-68379-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 07/23/2024] [Indexed: 08/22/2024] Open
Abstract
Causal discovery with prior knowledge is important for improving performance. We consider the incorporation of marginal causal relations, which correspond to the presence or absence of directed paths in a causal model. We propose the Marginal Prior Causal Knowledge PC (MPPC) algorithm to incorporate marginal causal relations into a constraint-based structure learning algorithm. We provide the theorems of conditional independence properties by combining observational data and marginal causal relations. We compare the MPPC algorithm with other structure learning methods in both simulation studies and real-world networks. The results indicate that, compare with other constraint-based structure learning methods, MPPC algorithm can incorporate marginal causal relations and is more effective and more efficient.
Collapse
Affiliation(s)
- Yifan Yu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Lei Hou
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Xinhui Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Sijia Wu
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Hongkai Li
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000.
| | - Fuzhong Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhua West Road, Jinan, Shandong Province, 250000, People's Republic of China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000.
| |
Collapse
|
14
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
15
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. Biophys J 2024; 123:221-234. [PMID: 38102827 PMCID: PMC10808046 DOI: 10.1016/j.bpj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/18/2023] [Accepted: 12/12/2023] [Indexed: 12/17/2023] Open
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present a modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding rational strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally determined from experiments, augmented with dynamical network computations involving endpoint objective functions, mutual information, change-point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method, based on the strategies described above. The cybernetic-inspired method is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this innovative framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights.
Collapse
Affiliation(s)
- Rubesh Raja
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Sana Khanum
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Lina Aboulmouna
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Mano R Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shakti Gupta
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California; Departments of Computer Science and Engineering, Cellular and Molecular Medicine, San Diego Supercomputer Center, and the Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California.
| | - Doraiswami Ramkrishna
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
16
|
Kim D, Heo Y, Kim M, Suminda GGD, Manzoor U, Min Y, Kim M, Yang J, Park Y, Zhao Y, Ghosh M, Son YO. Inhibitory effects of Acanthopanax sessiliflorus Harms extract on the etiology of rheumatoid arthritis in a collagen-induced arthritis mouse model. Arthritis Res Ther 2024; 26:11. [PMID: 38167214 PMCID: PMC10763440 DOI: 10.1186/s13075-023-03241-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The biological function of Acanthopanax sessiliflorus Harm (ASH) has been investigated on various diseases; however, the effects of ASH on arthritis have not been investigated so far. This study investigates the effects of ASH on rheumatoid arthritis (RA). METHODS Supercritical carbon dioxide (CO2) was used for ASH extract preparation, and its primary components, pimaric and kaurenoic acids, were identified using gas chromatography-mass spectrometer (GC-MS). Collagenase-induced arthritis (CIA) was used as the RA model, and primary cultures of articular chondrocytes were used to examine the inhibitory effects of ASH extract on arthritis in three synovial joints: ankle, sole, and knee. RESULTS Pimaric and kaurenoic acids attenuated pro-inflammatory cytokine-mediated increase in the catabolic factors and retrieved pro-inflammatory cytokine-mediated decrease in related anabolic factors in vitro; however, they did not affect pro-inflammatory cytokine (IL-1β, TNF-α, and IL-6)-mediated cytotoxicity. ASH effectively inhibited cartilage degradation in the knee, ankle, and toe in the CIA model and decreased pannus development in the knee. Immunohistochemistry demonstrated that ASH mostly inhibited the IL-6-mediated matrix metalloproteinase. Gene Ontology and pathway studies bridge major gaps in the literature and provide insights into the pathophysiology and in-depth mechanisms of RA-like joint degeneration. CONCLUSIONS To the best of our knowledge, this is the first study to conduct extensive research on the efficacy of ASH extract in inhibiting the pathogenesis of RA. However, additional animal models and clinical studies are required to validate this hypothesis.
Collapse
Affiliation(s)
- Dahye Kim
- Division of Animal Genetics and Bioinformatics, National Institute of Animal Science, RDA, Wanju, Republic of Korea
| | - Yunji Heo
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Mangeun Kim
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Godagama Gamaarachchige Dinesh Suminda
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Umar Manzoor
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
- Laboratory of Immune and Inflammatory Disease, College of Pharmacy, Jeju Research Institute of Pharmaceutical Sciences, Jeju National University, Jeju, 63243, Republic of Korea
| | - Yunhui Min
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Minhye Kim
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Jiwon Yang
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
| | - Youngjun Park
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea
- Laboratory of Immune and Inflammatory Disease, College of Pharmacy, Jeju Research Institute of Pharmaceutical Sciences, Jeju National University, Jeju, 63243, Republic of Korea
| | - Yaping Zhao
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China
| | - Mrinmoy Ghosh
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Department of Biotechnology, School of Bio, Chemical and Processing Engineering (SBCE), Kalasalingam Academy of Research and Education, Krishnankoil, Srivilliputhur, 626126, India.
| | - Young-Ok Son
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju City, Jeju Special Self-Governing Province, 63243, Republic of Korea.
- Practical Translational Research Center, Jeju National University, Jeju, 63243, Republic of Korea.
| |
Collapse
|
17
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
18
|
Shi W, Zhong B, Dong J, Hu X, Li L. Super enhancer-driven core transcriptional regulatory circuitry crosstalk with cancer plasticity and patient mortality in triple-negative breast cancer. Front Genet 2023; 14:1258862. [PMID: 37900187 PMCID: PMC10602724 DOI: 10.3389/fgene.2023.1258862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/02/2023] [Indexed: 10/31/2023] Open
Abstract
Triple-negative breast cancer (TNBC) is a clinically aggressive subtype of breast cancer. Core transcriptional regulatory circuitry (CRC) consists of autoregulated transcription factors (TFs) and their enhancers, which dominate gene expression programs and control cell fate. However, there is limited knowledge of CRC in TNBC. Herein, we systemically characterized the activated super-enhancers (SEs) and interrogated 14 CRCs in breast cancer. We found that CRCs could be broadly involved in DNA conformation change, metabolism process, and signaling response affecting the gene expression reprogramming. Furthermore, these CRC TFs are capable of coordinating with partner TFs bridging the enhancer-promoter loops. Notably, the CRC TF and partner pairs show remarkable specificity for molecular subtypes of breast cancer, especially in TNBC. USF1, SOX4, and MYBL2 were identified as the TNBC-specific CRC TFs. We further demonstrated that USF1 was a TNBC immunophenotype-related TF. Our findings that the rewiring of enhancer-driven CRCs was related to cancer immune and mortality, will facilitate the development of epigenetic anti-cancer treatment strategies.
Collapse
Affiliation(s)
- Wensheng Shi
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Bowen Zhong
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Jiaming Dong
- Department of Radiation, Cangzhou Central Hospital, Changsha, China
| | - Xiheng Hu
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Changsha, Hunan, China
- National Engineering Research Center of Personalized Diagnostic and Therapeutic Technology, Central South University, Changsha, Hunan, China
- Furong Laboratory, Changsha, Hunan, China
- Department of Urology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Lingfang Li
- Department of Cardiovascular Medicine, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
19
|
Barbagallo C, Stella M, Ferrara C, Caponnetto A, Battaglia R, Barbagallo D, Di Pietro C, Ragusa M. RNA-RNA competitive interactions: a molecular civil war ruling cell physiology and diseases. EXPLORATION OF MEDICINE 2023:504-540. [DOI: 10.37349/emed.2023.00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/02/2023] [Indexed: 09/02/2023] Open
Abstract
The idea that proteins are the main determining factors in the functioning of cells and organisms, and their dysfunctions are the first cause of pathologies, has been predominant in biology and biomedicine until recently. This protein-centered view was too simplistic and failed to explain the physiological and pathological complexity of the cell. About 80% of the human genome is dynamically and pervasively transcribed, mostly as non-protein-coding RNAs (ncRNAs), which competitively interact with each other and with coding RNAs generating a complex RNA network regulating RNA processing, stability, and translation and, accordingly, fine-tuning the gene expression of the cells. Qualitative and quantitative dysregulations of RNA-RNA interaction networks are strongly involved in the onset and progression of many pathologies, including cancers and degenerative diseases. This review will summarize the RNA species involved in the competitive endogenous RNA network, their mechanisms of action, and involvement in pathological phenotypes. Moreover, it will give an overview of the most advanced experimental and computational methods to dissect and rebuild RNA networks.
Collapse
Affiliation(s)
- Cristina Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Michele Stella
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | | | - Angela Caponnetto
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Rosalia Battaglia
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Davide Barbagallo
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Cinzia Di Pietro
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| | - Marco Ragusa
- Section of Biology and Genetics, Department of Biomedical and Biotechnological Sciences, University of Catania, 95123 Catania, Italy
| |
Collapse
|
20
|
Li L, Sun L, Chen G, Wong CW, Ching WK, Liu ZP. LogBTF: gene regulatory network inference using Boolean threshold network model from single-cell gene expression data. Bioinformatics 2023; 39:btad256. [PMID: 37079737 PMCID: PMC10172039 DOI: 10.1093/bioinformatics/btad256] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/25/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION The source data and code are available at https://github.com/zpliulab/LogBTF.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Liangjie Sun
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Chi-Wing Wong
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Wai-Ki Ching
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
21
|
Xu J, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39:btad165. [PMID: 37004161 PMCID: PMC10085635 DOI: 10.1093/bioinformatics/btad165] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/28/2023] [Accepted: 03/25/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) technologies provide an opportunity to infer cell-specific gene regulatory networks (GRNs), which is an important challenge in systems biology. Although numerous methods have been developed for inferring GRNs from scRNA-seq data, it is still a challenge to deal with cellular heterogeneity. RESULTS To address this challenge, we developed an interpretable transformer-based method namely STGRNS for inferring GRNs from scRNA-seq data. In this algorithm, gene expression motif technique was proposed to convert gene pairs into contiguous sub-vectors, which can be used as input for the transformer encoder. By avoiding missing phase-specific regulations in a network, gene expression motif can improve the accuracy of GRN inference for different types of scRNA-seq data. To assess the performance of STGRNS, we implemented the comparative experiments with some popular methods on extensive benchmark datasets including 21 static and 27 time-series scRNA-seq dataset. All the results show that STGRNS is superior to other comparative methods. In addition, STGRNS was also proved to be more interpretable than "black box" deep learning methods, which are well-known for the difficulty to explain the predictions clearly. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/zhanglab-wbgcas/STGRNS.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074 China
| |
Collapse
|
22
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.21.533676. [PMID: 36993235 PMCID: PMC10055344 DOI: 10.1101/2023.03.21.533676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present an elegant modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding entirely novel strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally-determined from experiments, augmented with dynamical network computations involving end point objective functions, mutual information, change point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method (CIM), utilizing the strategies described above. The CIM is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this state-of-the-art framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights. STATEMENT OF SIGNIFICANCE Cellular processes like cell cycle are overly complex, involving multiple players interacting at multiple levels, and explicit modeling of such systems is challenging. The availability of longitudinal RNA measurements provides an opportunity to "reverse-engineer" for novel regulatory models. We develop a novel framework, inspired using goal-oriented cybernetic model, to implicitly model transcriptional regulation by constraining the system using inferred temporal goals. A preliminary causal network based on information-theory is used as a starting point, and our framework is used to distill the network to temporally-based networks containing essential molecular players. The strength of this approach is its ability to dynamically model the RNA temporal measurements. The approach developed paves the way for inferring regulatory processes in many complex cellular processes.
Collapse
|
23
|
Wang Y, Liu C, Qiao X, Han X, Liu ZP. PKI: A bioinformatics method of quantifying the importance of nodes in gene regulatory network via a pseudo knockout index. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2023; 1866:194911. [PMID: 36804477 DOI: 10.1016/j.bbagrm.2023.194911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/09/2023] [Accepted: 01/30/2023] [Indexed: 02/18/2023]
Abstract
BACKGROUND Gene regulatory network (GRN) is a model that characterizes the complex relationships between genes and thereby provides an informatics environment to measure the importance of nodes. The evaluation of important nodes in a GRN can effectively refer to their functional implications severing as key players in particular biological processes, such as master regulator and driver gene. Currently, it is mainly based on network topological parameters and focuses only on evaluating a single node individually. However, genes and products play their functions by interacting with each other. It is worth noting that the effects of gene combinations in GRN are not simply additive. Key combinations discovery is of significance in revealing gene sets with important functions. Recently, with the development of single-cell RNA-sequencing (scRNA-seq) technology, we can quantify gene expression profiles of individual cells that provide the potential to identify crucial nodes in gene regulations regarding specific condition, e.g., stem cell differentiation. RESULTS In this paper, we propose a bioinformatics method, called Pseudo Knockout Importance (PKI), to quantify the importance of node and node sets in a specific GRN structure using time-course scRNA-seq data. First, we construct ordinary differential equations to approach the gene regulations during cell differentiation. Then we design gene pseudo knockout experiments and define PKI score evaluation criteria based on the coefficient of determination. The importance of nodes can be described as the influence on the ODE system of removing variables. For key gene combinations, PKI is derived as a combinatorial optimization problem of quantifying the in silico gene knockout effects. CONCLUSIONS Here, we focus our analyses on the specific GRN of embryonic stem cells with time series gene expression profile. To verify the effectiveness and advantage of PKI method, we compare its node importance rankings with other twelve kinds of centrality-based methods, such as degree and Latora closeness. For key node combinations, we compare the results with the method based on minimum dominant set. Moreover, the famous combinations of transcription factors in induced pluripotent stem cell are also employed to verify the vital gene combinations identified by PKI. These results demonstrate the reliability and superiority of the proposed method.
Collapse
Affiliation(s)
- Yijuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Chao Liu
- Department of Orthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xu Qiao
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Xianhua Han
- Faculty of Science, Yamaguchi University, Yamaguchi 753-8511, Japan
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
24
|
Jiang X, Liu K, Peng H, Fang J, Zhang A, Han Y, Zhang X. Comparative network analysis reveals the dynamics of organic acid diversity during fruit ripening in peach (Prunus persica L. Batsch). BMC PLANT BIOLOGY 2023; 23:16. [PMID: 36617558 PMCID: PMC9827700 DOI: 10.1186/s12870-023-04037-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 01/02/2023] [Indexed: 06/17/2023]
Abstract
BACKGROUND Organic acids are important components that determine the fruit flavor of peach (Prunus persica L. Batsch). However, the dynamics of organic acid diversity during fruit ripening and the key genes that modulate the organic acids metabolism remain largely unknown in this kind of fruit tree which yield ranks sixth in the world. RESULTS In this study, we used 3D transcriptome data containing three dimensions of information, namely time, phenotype and gene expression, from 5 different varieties of peach to construct gene co-expression networks throughout fruit ripening of peach. With the network inferred, the time-ordered network comparative analysis was performed to select high-acid specific gene co-expression network and then clarify the regulatory factors controlling organic acid accumulation. As a result, network modules related to organic acid synthesis and metabolism under high-acid and low-acid comparison conditions were identified for our following research. In addition, we obtained 20 candidate genes as regulatory factors related to organic acid metabolism in peach. CONCLUSIONS The study provides new insights into the dynamics of organic acid accumulation during fruit ripening, complements the results of classical co-expression network analysis and establishes a foundation for key genes discovery from time-series multiple species transcriptome data.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jing Fang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Yuepeng Han
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| |
Collapse
|
25
|
Amy Lyu MJ, Tang Q, Wang Y, Essemine J, Chen F, Ni X, Chen G, Zhu XG. Evolution of gene regulatory network of C 4 photosynthesis in the genus Flaveria reveals the evolutionary status of C 3-C 4 intermediate species. PLANT COMMUNICATIONS 2023; 4:100426. [PMID: 35986514 PMCID: PMC9860191 DOI: 10.1016/j.xplc.2022.100426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/16/2022] [Accepted: 08/11/2022] [Indexed: 06/15/2023]
Abstract
C4 photosynthesis evolved from ancestral C3 photosynthesis by recruiting pre-existing genes to fulfill new functions. The enzymes and transporters required for the C4 metabolic pathway have been intensively studied and well documented; however, the transcription factors (TFs) that regulate these C4 metabolic genes are not yet well understood. In particular, how the TF regulatory network of C4 metabolic genes was rewired during the evolutionary process is unclear. Here, we constructed gene regulatory networks (GRNs) for four closely evolutionarily related species from the genus Flaveria, which represent four different evolutionary stages of C4 photosynthesis: C3 (F. robusta), type I C3-C4 (F. sonorensis), type II C3-C4 (F. ramosissima), and C4 (F. trinervia). Our results show that more than half of the co-regulatory relationships between TFs and core C4 metabolic genes are species specific. The counterparts of the C4 genes in C3 species were already co-regulated with photosynthesis-related genes, whereas the required TFs for C4 photosynthesis were recruited later. The TFs involved in C4 photosynthesis were widely recruited in the type I C3-C4 species; nevertheless, type II C3-C4 species showed a divergent GRN from C4 species. In line with these findings, a 13CO2 pulse-labeling experiment showed that the CO2 initially fixed into C4 acid was not directly released to the Calvin-Benson-Bassham cycle in the type II C3-C4 species. Therefore, our study uncovered dynamic changes in C4 genes and TF co-regulation during the evolutionary process; furthermore, we showed that the metabolic pathway of the type II C3-C4 species F. ramosissima represents an alternative evolutionary solution to the ammonia imbalance in C3-C4 intermediate species.
Collapse
Affiliation(s)
- Ming-Ju Amy Lyu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Qiming Tang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Yanjie Wang
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Jemaa Essemine
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Faming Chen
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Xiaoxiang Ni
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China; University of Chinese Academy of Sciences
| | - Genyun Chen
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Xin-Guang Zhu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
26
|
Fan Z, Kernan KF, Sriram A, Benos PV, Canna SW, Carcillo JA, Kim S, Park HJ. Deep neural networks with knockoff features identify nonlinear causal relations and estimate effect sizes in complex biological systems. Gigascience 2022; 12:giad044. [PMID: 37395630 PMCID: PMC10316696 DOI: 10.1093/gigascience/giad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/31/2023] [Accepted: 05/29/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Learning the causal structure helps identify risk factors, disease mechanisms, and candidate therapeutics for complex diseases. However, although complex biological systems are characterized by nonlinear associations, existing bioinformatic methods of causal inference cannot identify the nonlinear relationships and estimate their effect size. RESULTS To overcome these limitations, we developed the first computational method that explicitly learns nonlinear causal relations and estimates the effect size using a deep neural network approach coupled with the knockoff framework, named causal directed acyclic graphs using deep learning variable selection (DAG-deepVASE). Using simulation data of diverse scenarios and identifying known and novel causal relations in molecular and clinical data of various diseases, we demonstrated that DAG-deepVASE consistently outperforms existing methods in identifying true and known causal relations. In the analyses, we also illustrate how identifying nonlinear causal relations and estimating their effect size help understand the complex disease pathobiology, which is not possible using other methods. CONCLUSIONS With these advantages, the application of DAG-deepVASE can help identify driver genes and therapeutic agents in biomedical studies and clinical trials.
Collapse
Affiliation(s)
- Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kate F Kernan
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Aditya Sriram
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, USA
| | - Scott W Canna
- Pediatric Rheumatology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Joseph A Carcillo
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Soyeon Kim
- Division of Pediatric Pulmonary Medicine, Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Hyun Jung Park
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
27
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
28
|
Jia Z, Zhang X. Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes. Front Genet 2022; 13:923339. [PMID: 36568360 PMCID: PMC9768335 DOI: 10.3389/fgene.2022.923339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/08/2022] [Indexed: 12/12/2022] Open
Abstract
Accurate determination of causalities between genes is a challenge in the inference of gene regulatory networks (GRNs) from the gene expression profile. Although many methods have been developed for the reconstruction of GRNs, most of them are insufficient in determining causalities or regulatory directions. In this work, we present a novel method, namely, DDTG, to improve the accuracy of causality determination in GRN inference by dissecting downstream target genes. In the proposed method, the topology and hierarchy of GRNs are determined by mutual information and conditional mutual information, and the regulatory directions of GRNs are determined by Taylor formula-based regression. In addition, indirect interactions are removed with the sparseness of the network topology to improve the accuracy of network inference. The method is validated on the benchmark GRNs from DREAM3 and DREAM4 challenges. The results demonstrate the superior performance of the DDTG method on causality determination of GRNs compared to some popular GRN inference methods. This work provides a useful tool to infer the causal gene regulatory network.
Collapse
Affiliation(s)
- Zhigang Jia
- School of Mathematics and Statistics, Xinyang Normal University, Xinyang, China,Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China,*Correspondence: Xiujun Zhang,
| |
Collapse
|
29
|
The Analysis of Relevant Gene Networks Based on Driver Genes in Breast Cancer. Diagnostics (Basel) 2022; 12:diagnostics12112882. [PMID: 36428940 PMCID: PMC9689550 DOI: 10.3390/diagnostics12112882] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The occurrence and development of breast cancer has a strong correlation with a person's genetics. Therefore, it is important to analyze the genetic factors of breast cancer for future development of potential targeted therapies from the genetic level. METHODS In this study, we complete an analysis of the relevant protein-protein interaction network relating to breast cancer. This includes three steps, which are breast cancer-relevant genes selection using mutual information method, protein-protein interaction network reconstruction based on the STRING database, and vital genes calculating by nodes centrality analysis. RESULTS The 230 breast cancer-relevant genes were chosen in gene selection to reconstruct the protein-protein interaction network and some vital genes were calculated by node centrality analyses. Node centrality analyses conducted with the top 10 and top 20 values of each metric found 19 and 39 statistically vital genes, respectively. In order to prove the biological significance of these vital genes, we carried out the survival analysis and DNA methylation analysis, inquired about the prognosis in other cancer tissues and the RNA expression level in breast cancer. The results all proved the validity of the selected genes. CONCLUSIONS These genes could provide a valuable reference in clinical treatment among breast cancer patients.
Collapse
|
30
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
31
|
Kelly J, Berzuini C, Keavney B, Tomaszewski M, Guo H. A review of causal discovery methods for molecular network analysis. Mol Genet Genomic Med 2022; 10:e2055. [PMID: 36087049 PMCID: PMC9544222 DOI: 10.1002/mgg3.2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/12/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology andgenetics. There are an increasing number of methodlogies that have been developed and applied to moleular networks to investigate these causal interactions. METHODS We have introduced and reviewed the available methods for building large-scale causal molecular networks that have been developed and applied in the past decade. RESULTS In this review we have identified and summarized the existing methods for infering causality in large-scale causal molecular networks, and discussed important factors that will need to be considered in future research in this area. CONCLUSION Existing methods to infering causal molecular networks have their own strengths and limitations so there is no one best approach, and it is instead down to the discretion of the researcher. This review also to discusses some of the current limitations to biological interpretation of these networks, and important factors to consider for future studies on molecular networks.
Collapse
Affiliation(s)
- Jack Kelly
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Carlo Berzuini
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Bernard Keavney
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Division of Cardiology and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Maciej Tomaszewski
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Manchester Heart Centre and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Hui Guo
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| |
Collapse
|
32
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:bbac219. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
33
|
Passemiers A, Moreau Y, Raimondi D. Fast and accurate inference of gene regulatory networks through robust precision matrix estimation. Bioinformatics 2022; 38:2802-2809. [PMID: 35561176 PMCID: PMC9113237 DOI: 10.1093/bioinformatics/btac178] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/14/2022] [Accepted: 03/22/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Transcriptional regulation mechanisms allow cells to adapt and respond to external stimuli by altering gene expression. The possible cell transcriptional states are determined by the underlying gene regulatory network (GRN), and reliably inferring such network would be invaluable to understand biological processes and disease progression. RESULTS In this article, we present a novel method for the inference of GRNs, called PORTIA, which is based on robust precision matrix estimation, and we show that it positively compares with state-of-the-art methods while being orders of magnitude faster. We extensively validated PORTIA using the DREAM and MERLIN+P datasets as benchmarks. In addition, we propose a novel scoring metric that builds on graph-theoretical concepts. AVAILABILITY AND IMPLEMENTATION The code and instructions for data acquisition and full reproduction of our results are available at https://github.com/AntoinePassemiers/PORTIA-Manuscript. PORTIA is available on PyPI as a Python package (portia-grn). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
34
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
35
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
36
|
Zhang H, Chen J, Tian T. Bayesian Inference of Stochastic Dynamic Models Using Early-Rejection Methods Based on Sequential Stochastic Simulations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1484-1494. [PMID: 33216717 DOI: 10.1109/tcbb.2020.3039490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Stochastic modelling is an important method to investigate the functions of noise in a wide range of biological systems. However, the parameter inference for stochastic models is still a challenging problem partially due to the large computing time required for stochastic simulations. To address this issue, we propose a novel early-rejection method by using sequential stochastic simulations. We first show that a large number of stochastic simulations are required to obtain reliable inference results. Instead of generating a large number of simulations for each parameter sample, we propose to generate these simulations in a number of stages. The simulation process will go to the next stage only if the accuracy of simulations at the current stage satisfies a given error criterion. We propose a formula to determine the error criterion and use a stochastic differential equation model to examine the effects of different criteria. Three biochemical network models are used to evaluate the efficiency and accuracy of the proposed method. Numerical results suggest the proposed early-rejection method achieves substantial improvement in the efficiency for the inference of stochastic models.
Collapse
|
37
|
Hernández-Gómez C, Hernández-Lemus E, Espinal-Enríquez J. The Role of Copy Number Variants in Gene Co-Expression Patterns for Luminal B Breast Tumors. Front Genet 2022; 13:806607. [PMID: 35432489 PMCID: PMC9010943 DOI: 10.3389/fgene.2022.806607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/03/2022] [Indexed: 12/20/2022] Open
Abstract
Gene co-expression networks have become a usual approach to integrate the vast amounts of information coming from gene expression studies in cancer cohorts. The reprogramming of the gene regulatory control and the molecular pathways depending on such control are central to the characterization of the disease, aiming to unveil the consequences for cancer prognosis and therapeutics. There is, however, a multitude of factors which have been associated with anomalous control of gene expression in cancer. In the particular case of co-expression patterns, we have previously documented a phenomenon of loss of long distance co-expression in several cancer types, including breast cancer. Of the many potential factors that may contribute to this phenomenology, copy number variants (CNVs) have been often discussed. However, no systematic assessment of the role that CNVs may play in shaping gene co-expression patterns in breast cancer has been performed to date. For this reason we have decided to develop such analysis. In this study, we focus on using probabilistic modeling techniques to evaluate to what extent CNVs affect the phenomenon of long/short range co-expression in Luminal B breast tumors. We analyzed the co-expression patterns in chromosome 8, since it is known to be affected by amplifications/deletions during cancer development. We found that the CNVs pattern in chromosome 8 of Luminal B network does not alter the co-expression patterns significantly, which means that the co-expression program in this cancer phenotype is not determined by CNV structure. Additionally, we found that region 8q24.3 is highly dense in interactions, as well as region p21.3. The most connected genes in this network belong to those cytobands and are associated with several manifestations of cancer in different tissues. Interestingly, among the most connected genes, we found MAF1 and POLR3D, which may constitute an axis of regulation of gene transcription, in particular for non-coding RNA species. We believe that by advancing on our knowledge of the molecular mechanisms behind gene regulation in cancer, we will be better equipped, not only to understand tumor biology, but also to broaden the scope of diagnostic, prognostic and therapeutic interventions to ultimately benefit oncologic patients.
Collapse
Affiliation(s)
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
- *Correspondence: Jesús Espinal-Enríquez, ; Enrique Hernández-Lemus,
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
- *Correspondence: Jesús Espinal-Enríquez, ; Enrique Hernández-Lemus,
| |
Collapse
|
38
|
Degree of Freedom of Gene Expression in Saccharomyces cerevisiae. Microbiol Spectr 2022; 10:e0083821. [PMID: 35230153 PMCID: PMC9045123 DOI: 10.1128/spectrum.00838-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The complexity of genome-wide gene expression has not yet been adequately addressed due to a lack of comprehensive statistical analyses. In the present study, we introduce degree of freedom (DOF) as a summary statistic for evaluating gene expression complexity. Because DOF can be interpreted by a state-space representation, application of the DOF is highly useful for understanding gene activities. We used over 11,000 gene expression data sets to reveal that the DOF of gene expression in Saccharomyces cerevisiae is not greater than 450. We further demonstrated that various degrees of freedom of gene expression can be interpreted by different sequence motifs within promoter regions and Gene Ontology (GO) terms. The well-known TATA box is the most significant one among the identified motifs, while the GO term "ribosome genesis" is an associated biological process. On the basis of transcriptional freedom, our findings suggest that the regulation of gene expression can be modeled using only a few state variables. IMPORTANCE Yeast works like a well-organized factory. Each of its components works in its own way, while affecting the activities of others. The order of all activities is largely governed by the regulation of gene expression. In recent decades, biologists have recognized many regulations for yeast genes. However, it is not known how closely the regulation links each gene together to make all components of the cell work as a whole. In other words, biologists are very interested in how many independent control factors are needed to operate an artificial "cell" that works the same as a real one. In this work, we suggested that only 450 control factors were sufficient to represent the regulation of all 5800 yeast genes.
Collapse
|
39
|
Feng H, Zheng R, Wang J, Wu FX, Li M. NIMCE: A Gene Regulatory Network Inference Approach Based on Multi Time Delays Causal Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1042-1049. [PMID: 33035155 DOI: 10.1109/tcbb.2020.3029846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.
Collapse
|
40
|
Zhao M, He W, Tang J, Zou Q, Guo F. A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 2022; 23:6513730. [DOI: 10.1093/bib/bbab568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/09/2021] [Accepted: 12/11/2021] [Indexed: 12/21/2022] Open
Abstract
Abstract
Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.
Collapse
|
41
|
Wang Y, Liu ZP. Identifying biomarkers for breast cancer by gene regulatory network rewiring. BMC Bioinformatics 2022; 22:308. [PMID: 35045805 PMCID: PMC8772043 DOI: 10.1186/s12859-021-04225-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 12/09/2022] Open
Abstract
Background Mining gene regulatory network (GRN) is an important avenue for addressing cancer mechanism. Mutations in cancer genome perturb GRN and cause a rewiring in an orchestrated network. Hence, the exploration of gene regulatory network rewiring is significant to discover potential biomarkers and indicators for discriminating cancer phenotypes. Results Here, we propose a new bioinformatics method of identifying biomarkers based on network rewiring in different states. It firstly reconstructs GRN in different phenotypic conditions from gene expression data with a priori background network. We employ the algorithm based on path consistency algorithm and conditional mutual information to delete false-positive regulatory interactions between independent nodes/genes or not closely related gene pairs. And then a differential gene regulatory network (D-GRN) is constructed from the rewiring parts in the two phenotype-specific GRNs. Community detection technique is then applied for D-GRN to detect functional modules. Finally, we apply logistic regression classifier with recursive feature elimination to select biomarker genes in each module individually. The extracted feature genes result in a gene set of biomarkers with impressing ability to distinguish normal samples from controls. We verify the identified biomarkers in external independent validation datasets. For a proof-of-concept study, we apply the framework to identify diagnostic biomarkers of breast cancer. The identified biomarkers obtain a maximum AUC of 0.985 in the internal sample classification experiments. And these biomarkers achieve a maximum AUC of 0.989 in the external validations. Conclusion In conclusion, network rewiring reveals significant differences between different phenotypes, which indicating cancer dysfunctional mechanisms. With the development of sequencing technology, the amount and quality of gene expression data become available. Condition-specific gene regulatory networks that are close to the real regulations in different states will be established. Revealing the network rewiring will greatly benefit the discovery of biomarkers or signatures for phenotypes. D-GRN is a general method to meet this demand of deciphering the high-throughput data for biomarker discovery. It is also easy to be extended for identifying biomarkers of other complex diseases beyond breast cancer.
Collapse
Affiliation(s)
- Yijuan Wang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.
| |
Collapse
|
42
|
Disentangling direct from indirect relationships in association networks. Proc Natl Acad Sci U S A 2022; 119:2109995119. [PMID: 34992138 PMCID: PMC8764688 DOI: 10.1073/pnas.2109995119] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2021] [Indexed: 11/18/2022] Open
Abstract
Networks are vital tools for understanding and modeling interactions in complex systems in science and engineering, and direct and indirect interactions are pervasive in all types of networks. However, quantitatively disentangling direct and indirect relationships in networks remains a formidable task. Here, we present a framework, called iDIRECT (Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity), for quantitatively inferring direct dependencies in association networks. Using copula-based transitivity, iDIRECT eliminates/ameliorates several challenging mathematical problems, including ill-conditioning, self-looping, and interaction strength overflow. With simulation data as benchmark examples, iDIRECT showed high prediction accuracies. Application of iDIRECT to reconstruct gene regulatory networks in Escherichia coli also revealed considerably higher prediction power than the best-performing approaches in the DREAM5 (Dialogue on Reverse Engineering Assessment and Methods project, #5) Network Inference Challenge. In addition, applying iDIRECT to highly diverse grassland soil microbial communities in response to climate warming showed that the iDIRECT-processed networks were significantly different from the original networks, with considerably fewer nodes, links, and connectivity, but higher relative modularity. Further analysis revealed that the iDIRECT-processed network was more complex under warming than the control and more robust to both random and target species removal (P < 0.001). As a general approach, iDIRECT has great advantages for network inference, and it should be widely applicable to infer direct relationships in association networks across diverse disciplines in science and engineering.
Collapse
|
43
|
Abdulkadhar S, Natarajan J. A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature. Methods Mol Biol 2022; 2496:141-157. [PMID: 35713863 DOI: 10.1007/978-1-0716-2305-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A biological pathway or regulatory network is a collection of molecular regulators which can activate the changes in cellular processes leading to an assembly of new molecules by series of actions among the molecules. There are three important pathways in system biology studies namely signaling pathways, metabolic pathways, and genetic pathways (or) gene regulatory networks. Recently, biological pathway construction from scientific literature is given much attention as the scientific literature contains a rich set of linguistic features to extract biological associations between genes and proteins. These associations can be united to construct biological networks. Here, we present a brief overview about various biological pathways, biomedical text resources/corpora for network construction and state-of-the-art existing methods for network construction followed by our hybrid text mining protocol for extracting pathways and regulatory networks from biomedical literature.
Collapse
Affiliation(s)
- Sabenabanu Abdulkadhar
- Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, India
| | - Jeyakumar Natarajan
- Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, India.
| |
Collapse
|
44
|
Han J, Perera S, Wunderlich Z, Periwal V. Mechanistic gene networks inferred from single-cell data with an outlier-insensitive method. Math Biosci 2021; 342:108722. [PMID: 34688607 PMCID: PMC8722367 DOI: 10.1016/j.mbs.2021.108722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 08/25/2021] [Accepted: 08/25/2021] [Indexed: 11/28/2022]
Abstract
With advances in single-cell techniques, measuring gene dynamics at cellular resolution has become practicable. In contrast, the increased complexity of data has made it more challenging computationally to unravel underlying biological mechanisms. Thus, it is critical to develop novel computational methods capable of dealing with such complexity and of providing predictive deductions from such data. Many methods have been developed to address such challenges, each with its own advantages and limitations. We present an iterative regression algorithm for inferring a mechanistic gene network from single-cell data, especially suited to overcoming problems posed by measurement outliers. Using this regression, we infer a developmental model for the gene dynamics in Drosophila melanogaster blastoderm embryo. Our results show that the predictive power of the inferred model is higher than that of other models inferred with least squares and ridge regressions. As a baseline for how well a mechanistic model should be expected to perform, we find that model predictions of the gene dynamics are more accurate than predictions made with neural networks of varying architectures and complexity. This holds true even in the limit of small sample sizes. We compare predictions for various gene knockouts with published experimental results, finding substantial qualitative agreement. We also make predictions for gene dynamics under various gene network perturbations, impossible in non-mechanistic models.
Collapse
Affiliation(s)
- Jungmin Han
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Sudheesha Perera
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92617, United States of America.
| | - Vipul Periwal
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20814, United States of America.
| |
Collapse
|
45
|
Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 2021; 12:6775. [PMID: 34811351 PMCID: PMC8635420 DOI: 10.1038/s41467-021-27137-3] [Citation(s) in RCA: 91] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.
Collapse
Affiliation(s)
- Qing Ye
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China ,grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China ,grid.13402.340000 0004 1759 700XState Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058 China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China
| | - Jiming Chen
- grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
46
|
Chen J, Cheong C, Lan L, Zhou X, Liu J, Lyu A, Cheung WK, Zhang L. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data. Brief Bioinform 2021; 22:bbab325. [PMID: 34424948 PMCID: PMC8499812 DOI: 10.1093/bib/bbab325] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/12/2021] [Accepted: 07/26/2021] [Indexed: 01/11/2023] Open
Abstract
Single-cell RNA sequencing has enabled to capture the gene activities at single-cell resolution, thus allowing reconstruction of cell-type-specific gene regulatory networks (GRNs). The available algorithms for reconstructing GRNs are commonly designed for bulk RNA-seq data, and few of them are applicable to analyze scRNA-seq data by dealing with the dropout events and cellular heterogeneity. In this paper, we represent the joint gene expression distribution of a gene pair as an image and propose a novel supervised deep neural network called DeepDRIM which utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRN from scRNA-seq data. Due to the consideration of TF-gene pair's neighborhood context, DeepDRIM can effectively eliminate the false positives caused by transitive gene-gene interactions. We compared DeepDRIM with nine GRN reconstruction algorithms designed for either bulk or single-cell RNA-seq data. It achieves evidently better performance for the scRNA-seq data collected from eight cell lines. The simulated data show that DeepDRIM is robust to the dropout rate, the cell number and the size of the training data. We further applied DeepDRIM to the scRNA-seq gene expression of B cells from the bronchoalveolar lavage fluid of the patients with mild and severe coronavirus disease 2019. We focused on the cell-type-specific GRN alteration and observed targets of TFs that were differentially expressed between the two statuses to be enriched in lysosome, apoptosis, response to decreased oxygen level and microtubule, which had been proved to be associated with coronavirus infection.
Collapse
Affiliation(s)
- Jiaxing Chen
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - ChinWang Cheong
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Liang Lan
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Jiming Liu
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| |
Collapse
|
47
|
Shang J, Wang J, Sun Y, Li F, Liu JX, Zhang H. Multiscale part mutual information for quantifying nonlinear direct associations in networks. Bioinformatics 2021; 37:2920-2929. [PMID: 33730153 DOI: 10.1093/bioinformatics/btab182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/15/2021] [Accepted: 03/15/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jing Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Honghai Zhang
- College of Life Science, Qufu Normal University, Qufu 273165, China
| |
Collapse
|
48
|
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes. BIOLOGY 2021; 10:biology10090921. [PMID: 34571798 PMCID: PMC8469369 DOI: 10.3390/biology10090921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022]
Abstract
Simple Summary The interactions between SNPs, which are known as epistasis, can strongly influence the phenotype. Their detection is still a challenge, which is made even more difficult through the existence of background associations that can hide correct epistatic interactions. To address the limitations of existing methods, we present in this study our novel method MIDESP for the detection of epistatic SNP pairs. It is the first mutual information-based method that can be applied to both qualitative and quantitative phenotypes and which explicitly accounts for background associations in the dataset. Abstract The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens.
Collapse
|
49
|
Qi X, Lin Y, Chen J, Shen B. Decoding competing endogenous RNA networks for cancer biomarker discovery. Brief Bioinform 2021; 21:441-457. [PMID: 30715152 DOI: 10.1093/bib/bbz006] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 12/13/2018] [Accepted: 12/25/2018] [Indexed: 02/05/2023] Open
Abstract
Crosstalk between competing endogenous RNAs (ceRNAs) is mediated by shared microRNAs (miRNAs) and plays important roles both in normal physiology and tumorigenesis; thus, it is attractive for systems-level decoding of gene regulation. As ceRNA networks link the function of miRNAs with that of transcripts sharing the same miRNA response elements (MREs), e.g. pseudogenes, competing mRNAs, long non-coding RNAs, and circular RNAs, the perturbation of crucial interactions in ceRNA networks may contribute to carcinogenesis by affecting the balance of cellular regulatory system. Therefore, discovering biomarkers that indicate cancer initiation, development, and/or therapeutic responses via reconstructing and analyzing ceRNA networks is of clinical significance. In this review, the regulatory function of ceRNAs in cancer and crucial determinants of ceRNA crosstalk are firstly discussed to gain a global understanding of ceRNA-mediated carcinogenesis. Then, computational and experimental approaches for ceRNA network reconstruction and ceRNA validation, respectively, are described from a systems biology perspective. We focus on strategies for biomarker identification based on analyzing ceRNA networks and highlight the translational applications of ceRNA biomarkers for cancer management. This article will shed light on the significance of miRNA-mediated ceRNA interactions and provide important clues for discovering ceRNA network-based biomarker in cancer biology, thereby accelerating the pace of precision medicine and healthcare for cancer patients.
Collapse
Affiliation(s)
- Xin Qi
- Center for Systems Biology, Soochow University, Suzhou, China
| | - Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
50
|
Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet 2021; 12:722981. [PMID: 34484307 PMCID: PMC8415361 DOI: 10.3389/fgene.2021.722981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
The "dark transcriptome" can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins ("orphan-ORFs"); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
Collapse
Affiliation(s)
- Jing Li
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
| | - Urminder Singh
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Zebulun Arendsee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Eve Syrkin Wurtele
- Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Center for Metabolic Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|