1
|
Roy S, Sheikh SZ, Furey TS. CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression. PLoS Comput Biol 2024; 20:e1012016. [PMID: 38630807 PMCID: PMC11057768 DOI: 10.1371/journal.pcbi.1012016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 04/29/2024] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.
Collapse
Affiliation(s)
- Satyaki Roy
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Shehzad Z. Sheikh
- Departments of Medicine and Genetics, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Terrence S. Furey
- Departments of Genetics and Biology, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
2
|
Figueiredo GVC, Fantin LH, Canteri MG, Ferreira da Rocha JC, Filho DDSJ. A Bayesian Probability Model Can Simulate the Knowledge of Soybean Rust Researchers to Optimize the Application of Fungicides. INTERNATIONAL JOURNAL OF AGRICULTURAL AND ENVIRONMENTAL INFORMATION SYSTEMS 2019. [DOI: 10.4018/ijaeis.2019100103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Asian rust is the main soybean disease in Brazil, causing up to 80% of yield reduction. The use of fungicides is the main form of control; however, due to farmer's concern with outbreaks many unnecessary applications are performed. The present study aims to verify the usefulness of a probability model to estimate the timing and the number of fungicides sprays required to control Asian soybean rust, using Bayesian networks and knowledge engineering. The model was developed through interviews with rust researchers and a literature review. The Bayesian network was constructed with the GeNIe 2.0 software. The validation process was performed by 42 farmers and 10 rust researchers, using 28 test cases. Among the 28 tested cases, generated by the system, the agreement with the model was 47.5% for the farmers and 89.3% for the rust researchers. In general, the farmers overestimate the number. The results showed that the Bayesian network has accurately represented the knowledge of the expert, and also could help the farmers to avoid the unnecessary applications.
Collapse
|
3
|
Gogoshin G, Boerwinkle E, Rodin AS. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data. J Comput Biol 2016; 24:340-356. [PMID: 27681505 PMCID: PMC5372779 DOI: 10.1089/cmb.2016.0100] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- 1 Diabetes and Metabolism Research Institute , City of Hope, Duarte, California
| | - Eric Boerwinkle
- 2 Human Genetics Center, School of Public Health, University of Texas Health Science Center , Houston, Texas.,3 Institute of Molecular Medicine, University of Texas Health Science Center , Houston, Texas
| | - Andrei S Rodin
- 1 Diabetes and Metabolism Research Institute , City of Hope, Duarte, California
| |
Collapse
|
4
|
From within host dynamics to the epidemiology of infectious disease: Scientific overview and challenges. Math Biosci 2015; 270:143-55. [PMID: 26474512 DOI: 10.1016/j.mbs.2015.10.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Since their earliest days, humans have been struggling with infectious diseases. Caused by viruses, bacteria, protozoa, or even higher organisms like worms, these diseases depend critically on numerous intricate interactions between parasites and hosts, and while we have learned much about these interactions, many details are still obscure. It is evident that the combined host-parasite dynamics constitutes a complex system that involves components and processes at multiple scales of time, space, and biological organization. At one end of this hierarchy we know of individual molecules that play crucial roles for the survival of a parasite or for the response and survival of its host. At the other end, one realizes that the spread of infectious diseases by far exceeds specific locales and, due to today's easy travel of hosts carrying a multitude of organisms, can quickly reach global proportions. The community of mathematical modelers has been addressing specific aspects of infectious diseases for a long time. Most of these efforts have focused on one or two select scales of a multi-level disease and used quite different computational approaches. This restriction to a molecular, physiological, or epidemiological level was prudent, as it has produced solid pillars of a foundation from which it might eventually be possible to launch comprehensive, multi-scale modeling efforts that make full use of the recent advances in biology and, in particular, the various high-throughput methodologies accompanying the emerging -omics revolution. This special issue contains contributions from biologists and modelers, most of whom presented and discussed their work at the workshop From within Host Dynamics to the Epidemiology of Infectious Disease, which was held at the Mathematical Biosciences Institute at Ohio State University in April 2014. These contributions highlight some of the forays into a deeper understanding of the dynamics between parasites and their hosts, and the consequences of this dynamics for the spread and treatment of infectious diseases.
Collapse
|
5
|
Yin W, Garimalla S, Moreno A, Galinski MR, Styczynski MP. A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems. BMC SYSTEMS BIOLOGY 2015; 9:49. [PMID: 26310492 PMCID: PMC4551520 DOI: 10.1186/s12918-015-0194-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 08/06/2015] [Indexed: 11/10/2022]
Abstract
Background There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Results Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. Conclusions We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0194-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Weiwei Yin
- Key Laboratory for Biomedical Engineering of Education Ministry, Department of Biomedical Engineering, Zhejiang University, Hangzhou, P. R. China. .,School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, 311 Ferst Drive NW, Atlanta, GA, 30332-0100, USA.
| | - Swetha Garimalla
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Alberto Moreno
- Division of Infectious Diseases, Emory Vaccine Center, Yerkes National Primate Research Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA.
| | - Mary R Galinski
- Division of Infectious Diseases, Emory Vaccine Center, Yerkes National Primate Research Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA.
| | - Mark P Styczynski
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, 311 Ferst Drive NW, Atlanta, GA, 30332-0100, USA.
| |
Collapse
|
6
|
Gutierrez JB, Galinski MR, Cantrell S, Voit EO. WITHDRAWN: From within host dynamics to the epidemiology of infectious disease: Scientific overview and challenges. Math Biosci 2015:S0025-5564(15)00085-1. [PMID: 25890102 DOI: 10.1016/j.mbs.2015.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Juan B Gutierrez
- Department of Mathematics, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, United States .
| | - Mary R Galinski
- Emory University School of Medicine, Division of Infectious Diseases, Emory Vaccine Center, Yerkes National Primate Research Center, Emory University, 954 Gatewood Road, Atlanta, GA 30329, United States .
| | - Stephen Cantrell
- Department of Mathematics, University of Miami, Coral Gables, FL 33124, United States .
| | - Eberhard O Voit
- Department of Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Drive, Suite 4103, Atlanta, GA 30332-0535, United States .
| |
Collapse
|