1
|
Can H, Chanumolu SK, Nielsen BD, Alvarez S, Naldrett MJ, Ünlü G, Otu HH. Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge. Cells 2023; 12:1998. [PMID: 37566077 PMCID: PMC10417344 DOI: 10.3390/cells12151998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/11/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023] Open
Abstract
Multi-omics has the promise to provide a detailed molecular picture of biological systems. Although obtaining multi-omics data is relatively easy, methods that analyze such data have been lagging. In this paper, we present an algorithm that uses probabilistic graph representations and external knowledge to perform optimal structure learning and deduce a multifarious interaction network for multi-omics data from a bacterial community. Kefir grain, a microbial community that ferments milk and creates kefir, represents a self-renewing, stable, natural microbial community. Kefir has been shown to have a wide range of health benefits. We obtained a controlled bacterial community using the two most abundant and well-studied species in kefir grains: Lentilactobacillus kefiri and Lactobacillus kefiranofaciens. We applied growth temperatures of 30 °C and 37 °C and obtained transcriptomic, metabolomic, and proteomic data for the same 20 samples (10 samples per temperature). We obtained a multi-omics interaction network, which generated insights that would not have been possible with single-omics analysis. We identified interactions among transcripts, proteins, and metabolites, suggesting active toxin/antitoxin systems. We also observed multifarious interactions that involved the shikimate pathway. These observations helped explain bacterial adaptation to different stress conditions, co-aggregation, and increased activation of L. kefiranofaciens at 37 °C.
Collapse
Affiliation(s)
- Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Sree K. Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Barbara D. Nielsen
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Sophie Alvarez
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Michael J. Naldrett
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Gülhan Ünlü
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
- Department of Chemical and Biological Engineering, University of Idaho, Moscow, ID 83844, USA
- School of Food Science, Washington State University, Pullman, WA 99164, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
2
|
Fang WQ, Wu YL, Hwang MJ. A Noise-Tolerating Gene Association Network Uncovering an Oncogenic Regulatory Motif in Lymphoma Transcriptomics. Life (Basel) 2023; 13:1331. [PMID: 37374114 DOI: 10.3390/life13061331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
In cancer genomics research, gene expressions provide clues to gene regulations implicating patients' risk of survival. Gene expressions, however, fluctuate due to noises arising internally and externally, making their use to infer gene associations, hence regulation mechanisms, problematic. Here, we develop a new regression approach to model gene association networks while considering uncertain biological noises. In a series of simulation experiments accounting for varying levels of biological noises, the new method was shown to be robust and perform better than conventional regression methods, as judged by a number of statistical measures on unbiasedness, consistency and accuracy. Application to infer gene associations in germinal-center B cells led to the discovery of a three-by-two regulatory motif gene expression and a three-gene prognostic signature for diffuse large B-cell lymphoma.
Collapse
Affiliation(s)
- Wei-Quan Fang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
- Division of New Drug, Center for Drug Evaluation, Taipei 115, Taiwan
| | - Yu-Le Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Ming-Jing Hwang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
3
|
Escorcia-Rodríguez JM, Gaytan-Nuñez E, Hernandez-Benitez EM, Zorro-Aranda A, Tello-Palencia MA, Freyre-González JA. Improving gene regulatory network inference and assessment: The importance of using network structure. Front Genet 2023; 14:1143382. [PMID: 36926589 PMCID: PMC10012345 DOI: 10.3389/fgene.2023.1143382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/03/2023] Open
Abstract
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regulatory network inference than co-expression-based methods, the latter is better suited to infer function-specific regulons and co-regulation networks. When merging expression data, the size increase should outweigh the noise inclusion and graph structure should be considered when integrating the inferences. We conclude with guidelines to take advantage of inference methods and their assessment based on the applications and available expression datasets.
Collapse
Affiliation(s)
- Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Estefani Gaytan-Nuñez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Ericka M Hernandez-Benitez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| | - Marco A Tello-Palencia
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
4
|
Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022; 6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by ∼15-fold and conventional methods by ∼5–17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.
Collapse
|
5
|
Chen CK. Inference of genetic regulatory networks with regulatory hubs using vector autoregressions and automatic relevance determination with model selections. Stat Appl Genet Mol Biol 2021; 20:121-143. [PMID: 34963205 DOI: 10.1515/sagmb-2020-0054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/15/2021] [Indexed: 12/11/2022]
Abstract
The inference of genetic regulatory networks (GRNs) reveals how genes interact with each other. A few genes can regulate many genes as targets to control cell functions. We present new methods based on the order-1 vector autoregression (VAR1) for inferring GRNs from gene expression time series. The methods use the automatic relevance determination (ARD) to incorporate the regulatory hub structure into the estimation of VAR1 in a Bayesian framework. Several sparse approximation schemes are applied to the estimated regression weights or VAR1 model to generate the sparse weighted adjacency matrices representing the inferred GRNs. We apply the proposed and several widespread reference methods to infer GRNs with up to 100 genes using simulated, DREAM4 in silico and experimental E. coli gene expression time series. We show that the proposed methods are efficient on simulated hub GRNs and scale-free GRNs using short time series simulated by VAR1s and outperform reference methods on small-scale DREAM4 in silico GRNs and E. coli GRNs. They can utilize the known major regulatory hubs to improve the performance on larger DREAM4 in silico GRNs and E. coli GRNs. The impact of nonlinear time series data on the performance of proposed methods is discussed.
Collapse
Affiliation(s)
- Chi-Kan Chen
- Department of Applied Mathematics, National Chung Hsing University, 145 Xingda Rd., South District, Taichung City, Taiwan, ROC
| |
Collapse
|
6
|
Liang X, Young WC, Hung LH, Raftery AE, Yeung KY. Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data. J Comput Biol 2019; 26:1113-1129. [PMID: 31009236 PMCID: PMC6786343 DOI: 10.1089/cmb.2019.0036] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia
| | - William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Seattle, Washington
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| |
Collapse
|
7
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
8
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
9
|
Temporal genetic association and temporal genetic causality methods for dissecting complex networks. Nat Commun 2018; 9:3980. [PMID: 30266904 PMCID: PMC6162292 DOI: 10.1038/s41467-018-06203-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 08/23/2018] [Indexed: 12/27/2022] Open
Abstract
A large amount of panomic data has been generated in populations for understanding causal relationships in complex biological systems. Both genetic and temporal models can be used to establish causal relationships among molecular, cellular, or phenotypical traits, but with limitations. To fully utilize high-dimension temporal and genetic data, we develop a multivariate polynomial temporal genetic association (MPTGA) approach for detecting temporal genetic loci (teQTLs) of quantitative traits monitored over time in a population and a temporal genetic causality test (TGCT) for inferring causal relationships between traits linked to the locus. We apply MPTGA and TGCT to simulated data sets and a yeast F2 population in response to rapamycin, and demonstrate increased power to detect teQTLs. We identify a teQTL hotspot locus interacting with rapamycin treatment, infer putative causal regulators of the teQTL hotspot, and experimentally validate RRD1 as the causal regulator for this teQTL hotspot.
Collapse
|
10
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
11
|
Janani S, Ramyachitra D, Ranjani Rani R. PCD-DPPI: Protein complex detection from dynamic PPI using shuffled frog-leaping algorithm. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
12
|
Chen CK. Inference of gene networks from gene expression time series using recurrent neural networks and sparse MAP estimation. J Bioinform Comput Biol 2018; 16:1850009. [PMID: 30051742 DOI: 10.1142/s0219720018500099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
BACKGROUND The inference of genetic regulatory networks (GRNs) provides insight into the cellular responses to signals. A class of recurrent neural networks (RNNs) capturing the dynamics of GRN has been used as a basis for inferring small-scale GRNs from gene expression time series. The Bayesian framework facilitates incorporating the hypothesis of GRN into the model estimation to improve the accuracy of GRN inference. RESULTS We present new methods for inferring small-scale GRNs based on RNNs. The weights of wires of RNN represent the strengths of gene-to-gene regulatory interactions. We use a class of automatic relevance determination (ARD) priors to enforce the sparsity in the maximum a posteriori (MAP) estimates of wire weights of RNN. A particle swarm optimization (PSO) is integrated as an optimization engine into the MAP estimation process. Likely networks of genes generated based on estimated wire weights are combined using the majority rule to determine a final estimated GRN. As an alternative, a class of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -norm ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math> ) priors is used for attaining the sparse MAP estimates of wire weights of RNN. We also infer the GRN using the maximum likelihood (ML) estimates of wire weights of RNN. The RNN-based GRN inference algorithms, ARD-RNN, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN, and ML-RNN are tested on simulated and experimental E. coli and yeast time series containing 6-11 genes and 7-19 data points. Published GRN inference algorithms based on regressions and mutual information networks are performed on the benchmark datasets to compare performances. CONCLUSION ARD and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -norm priors are used for the estimation of wire weights of RNN. Results of GRN inference experiments show that ARD-RNN, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN have similar best accuracies on the simulated time series. The ARD-RNN is more accurate than <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN, ML-RNN, and mostly more accurate than the reference algorithms on the experimental time series. The effectiveness of ARD-RNN for inferring small-scale GRNs using gene expression time series of limited length is empirically verified.
Collapse
Affiliation(s)
- Chi-Kan Chen
- Department of Applied Mathematics, National Chung Hsing University, Taiwan
| |
Collapse
|
13
|
DynSig: Modelling Dynamic Signaling Alterations along Gene Pathways for Identifying Differential Pathways. Genes (Basel) 2018; 9:genes9070323. [PMID: 29954150 PMCID: PMC6071020 DOI: 10.3390/genes9070323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 06/25/2018] [Accepted: 06/25/2018] [Indexed: 11/16/2022] Open
Abstract
Although a number of methods have been proposed for identifying differentially expressed pathways (DEPs), few efforts consider the dynamic components of pathway networks, i.e., gene links. We here propose a signaling dynamics detection method for identification of DEPs, DynSig, which detects the molecular signaling changes in cancerous cells along pathway topology. Specifically, DynSig relies on gene links, instead of gene nodes, in pathways, and models the dynamic behavior of pathways based on Markov chain model (MCM). By incorporating the dynamics of molecular signaling, DynSig allows for an in-depth characterization of pathway activity. To identify DEPs, a novel statistic of activity alteration of pathways was formulated as an overall signaling perturbation score between sample classes. Experimental results on both simulation and real-world datasets demonstrate the effectiveness and efficiency of the proposed method in identifying differential pathways.
Collapse
|
14
|
Hung LH, Shi K, Wu M, Young WC, Raftery AE, Yeung KY. fastBMA: scalable network inference and transitive reduction. Gigascience 2018; 6:1-10. [PMID: 29020744 PMCID: PMC5632288 DOI: 10.1093/gigascience/gix078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 08/10/2017] [Indexed: 11/15/2022] Open
Abstract
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Kaiyuan Shi
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Migao Wu
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
- Correspondence address. Ka Yee Yeung, Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A.; Tel: 253-692-4924; Fax: 253-692-5862; E-mail:
| |
Collapse
|
15
|
Mittal V, Hung LH, Keswani J, Kristiyanto D, Lee SB, Yeung KY. GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software. Gigascience 2018; 6:1-6. [PMID: 28327936 PMCID: PMC5530313 DOI: 10.1093/gigascience/giw013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 12/16/2016] [Indexed: 11/30/2022] Open
Abstract
Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line–based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.
Collapse
|
16
|
Young WC, Raftery AE, Yeung KY. Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data. Ann Appl Stat 2017; 11:1998-2026. [PMID: 30740193 DOI: 10.1214/17-aoas1051] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.
Collapse
Affiliation(s)
- William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Ka Yee Yeung
- Institute of Technology, University of Washington Tacoma, Campus Box 358426, 1900 Commerce Street, Tacoma, WA 98402
| |
Collapse
|
17
|
Shahdoust M, Pezeshk H, Mahjub H, Sadeghi M. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints. PLoS One 2017; 12:e0184795. [PMID: 28938012 PMCID: PMC5609748 DOI: 10.1371/journal.pone.0184795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 08/31/2017] [Indexed: 01/07/2023] Open
Abstract
The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches.
Collapse
Affiliation(s)
- Maryam Shahdoust
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Hossein Mahjub
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
18
|
Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks. Interdiscip Sci 2017; 10:823-835. [PMID: 28748400 DOI: 10.1007/s12539-017-0254-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 07/01/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]
Abstract
BACKGROUND The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. RESULTS We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RERNN/RERMLP), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RERNN-RNN and RERMLP-RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. CONCLUSION The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RERMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RERNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RERMLP-RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.
Collapse
|
19
|
Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data. PLoS One 2017; 12:e0171240. [PMID: 28166542 PMCID: PMC5293552 DOI: 10.1371/journal.pone.0171240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 01/17/2017] [Indexed: 11/19/2022] Open
Abstract
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Collapse
|
20
|
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2016; 13:1241-1251. [PMID: 27775378 DOI: 10.3934/mbe.2016041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.
Collapse
Affiliation(s)
- William Chad Young
- University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, United States.
| | | | | |
Collapse
|
21
|
Lakizadeh A, Jalili S. BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes. PLoS One 2016; 11:e0159923. [PMID: 27462706 PMCID: PMC4963120 DOI: 10.1371/journal.pone.0159923] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Accepted: 07/11/2016] [Indexed: 01/08/2023] Open
Abstract
Considering the roles of protein complexes in many biological processes in the cell, detection of protein complexes from available protein-protein interaction (PPI) networks is a key challenge in the post genome era. Despite high dynamicity of cellular systems and dynamic interaction between proteins in a cell, most computational methods have focused on static networks which cannot represent the inherent dynamicity of protein interactions. Recently, some researchers try to exploit the dynamicity of PPI networks by constructing a set of dynamic PPI subnetworks correspondent to each time-point (column) in a gene expression data. However, many genes can participate in multiple biological processes and cellular processes are not necessarily related to every sample, but they might be relevant only for a subset of samples. So, it is more interesting to explore each subnetwork based on a subset of genes and conditions (i.e., biclusters) in a gene expression data. Here, we present a new method, called BiCAMWI to employ dynamicity in detecting protein complexes. The preprocessing phase of the proposed method is based on a novel genetic algorithm that extracts some sets of genes that are co-regulated under some conditions from input gene expression data. Each extracted gene set is called bicluster. In the detection phase of the proposed method, then, based on the biclusters, some dynamic PPI subnetworks are extracted from input static PPI network. Protein complexes are identified by applying a detection method on each dynamic PPI subnetwork and aggregating the results. Experimental results confirm that BiCAMWI effectively models the dynamicity inherent in static PPI networks and achieves significantly better results than state-of-the-art methods. So, we suggest BiCAMWI as a more reliable method for protein complex detection.
Collapse
Affiliation(s)
- Amir Lakizadeh
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
22
|
Hung LH, Kristiyanto D, Lee SB, Yeung KY. GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research. PLoS One 2016; 11:e0152686. [PMID: 27045593 PMCID: PMC4821530 DOI: 10.1371/journal.pone.0152686] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/17/2016] [Indexed: 12/03/2022] Open
Abstract
Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms. As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Daniel Kristiyanto
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Sung Bong Lee
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
- * E-mail:
| |
Collapse
|
23
|
Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis. PLoS Comput Biol 2015; 11:e1004575. [PMID: 26600239 PMCID: PMC4658017 DOI: 10.1371/journal.pcbi.1004575] [Citation(s) in RCA: 221] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 09/30/2015] [Indexed: 01/15/2023] Open
Abstract
A major challenge in developmental biology is to understand the genetic and cellular processes/programs driving organ formation and differentiation of the diverse cell types that comprise the embryo. While recent studies using single cell transcriptome analysis illustrate the power to measure and understand cellular heterogeneity in complex biological systems, processing large amounts of RNA-seq data from heterogeneous cell populations creates the need for readily accessible tools for the analysis of single-cell RNA-seq (scRNA-seq) profiles. The present study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells. The pipeline supports the analysis for: 1) the distinction and identification of major cell types; 2) the identification of cell type specific gene signatures; and 3) the determination of driving forces of given cell types. We applied this pipeline to the RNA-seq analysis of single cells isolated from embryonic mouse lung at E16.5. Through the pipeline analysis, we distinguished major cell types of fetal mouse lung, including epithelial, endothelial, smooth muscle, pericyte, and fibroblast-like cell types, and identified cell type specific gene signatures, bioprocesses, and key regulators. SINCERA is implemented in R, licensed under the GNU General Public License v3, and freely available from CCHMC PBGE website, https://research.cchmc.org/pbge/sincera.html.
Collapse
Affiliation(s)
- Minzhe Guo
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computing Systems, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Hui Wang
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - S. Steven Potter
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Jeffrey A. Whitsett
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Yan Xu
- The Perinatal Institute, Section of Neonatology, Perinatal and Pulmonary Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States of America
- * E-mail:
| |
Collapse
|
24
|
Fronczuk M, Raftery AE, Yeung KY. CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks. SOURCE CODE FOR BIOLOGY AND MEDICINE 2015; 10:11. [PMID: 26566394 PMCID: PMC4642660 DOI: 10.1186/s13029-015-0043-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 10/31/2015] [Indexed: 12/31/2022]
Abstract
Background Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility. Results We have developed a tool called CyNetworkBMA for inferring gene networks from expression data that integrates with Cytoscape. Our application offers a graphical user interface for networkBMA, an efficient implementation of Bayesian Model Averaging methods for network construction. The client-server architecture of CyNetworkBMA makes it possible to distribute or centralize computation depending on user needs. Conclusions CyNetworkBMA is an easy-to-use tool that makes network inference accessible to non-programmers through seamless integration with Cytoscape. CyNetworkBMA is available on the Cytoscape App Store at http://apps.cytoscape.org/apps/cynetworkbma. Electronic supplementary material The online version of this article (doi:10.1186/s13029-015-0043-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maciej Fronczuk
- Institute of Technology, University of Washington, Tacoma, 98402 WA USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, 98195 WA USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, 98402 WA USA
| |
Collapse
|
25
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|
26
|
Yin F, Qin C, Gao J, Liu M, Luo X, Zhang W, Liu H, Liao X, Shen Y, Mao L, Zhang Z, Lin H, Lübberstedt T, Pan G. Genome-wide identification and analysis of drought-responsive genes and microRNAs in tobacco. Int J Mol Sci 2015; 16:5714-40. [PMID: 25775154 PMCID: PMC4394501 DOI: 10.3390/ijms16035714] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2014] [Revised: 01/19/2015] [Accepted: 01/29/2015] [Indexed: 01/16/2023] Open
Abstract
Drought stress response is a complex trait regulated at transcriptional and post-transcriptional levels in tobacco. Since the 1990s, many studies have shown that miRNAs act in many ways to regulate target expression in plant growth, development and stress response. The recent draft genome sequence of Nicotiana benthamiana has provided a framework for Digital Gene Expression (DGE) and small RNA sequencing to understand patterns of transcription in the context of plant response to environmental stress. We sequenced and analyzed three Digital Gene Expression (DGE) libraries from roots of normal and drought-stressed tobacco plants, and four small RNA populations from roots, stems and leaves of control or drought-treated tobacco plants, respectively. We identified 276 candidate drought responsive genes (DRGs) with sequence similarities to 64 known DRGs from other model plant crops, 82 were transcription factors (TFs) including WRKY, NAC, ERF and bZIP families. Of these tobacco DRGs, 54 differentially expressed DRGs included 21 TFs, which belonged to 4 TF families such as NAC (6), MYB (4), ERF (10), and bZIP (1). Additionally, we confirmed expression of 39 known miRNA families (122 members) and five conserved miRNA families, which showed differential regulation under drought stress. Targets of miRNAs were further surveyed based on a recently published study, of which ten targets were DRGs. An integrated gene regulatory network is proposed for the molecular mechanisms of tobacco root response to drought stress using differentially expressed DRGs, the changed expression profiles of miRNAs and their target transcripts. This network analysis serves as a reference for future studies on tobacco response stresses such as drought, cold and heavy metals.
Collapse
Affiliation(s)
- Fuqiang Yin
- School of Agricultural Sciences, Xichang College, Xichang 615000, China.
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | - Cheng Qin
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
- Zunyi Academy of Agricultural Sciences, Zunyi 563102, China.
| | - Jian Gao
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | - Ming Liu
- School of Agricultural Sciences, Xichang College, Xichang 615000, China.
| | - Xirong Luo
- Zunyi Academy of Agricultural Sciences, Zunyi 563102, China.
| | - Wenyou Zhang
- School of Agricultural Sciences, Xichang College, Xichang 615000, China.
| | - Hongjun Liu
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | - Xinhui Liao
- Beijing Genomics Institute-Shenzhen, Shenzhen 518083, China.
| | - Yaou Shen
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | - Likai Mao
- Beijing Genomics Institute-Shenzhen, Shenzhen 518083, China.
| | - Zhiming Zhang
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | - Haijian Lin
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| | | | - Guangtang Pan
- Maize Research Institute of Sichuan Agricultural University/Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, China.
| |
Collapse
|
27
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
28
|
Huang X, Zi Z. Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR). MOLECULAR BIOSYSTEMS 2015; 10:2023-30. [PMID: 24899235 DOI: 10.1039/c4mb00053f] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Bayesian network and linear regression methods have been widely applied to reconstruct cellular regulatory networks. In this work, we propose a Bayesian model averaging for linear regression (BMALR) method to infer molecular interactions in biological systems. This method uses a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. We have assessed the performance of BMALR by benchmarking on both in silico DREAM datasets and real experimental datasets. The results show that BMALR achieves both high prediction accuracy and high computational efficiency across different benchmarks. A pre-processing of the datasets with the log transformation can further improve the performance of BMALR, leading to a new top overall performance. In addition, BMALR can achieve robust high performance in community predictions when it is combined with other competing methods. The proposed method BMALR is competitive compared to the existing network inference methods. Therefore, BMALR will be useful to infer regulatory interactions in biological networks. A free open source software tool for the BMALR algorithm is available at https://sites.google.com/site/bmalr4netinfer/.
Collapse
Affiliation(s)
- Xun Huang
- BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany.
| | | |
Collapse
|
29
|
Fraley C, Percival D. Model-Averaged [Formula: see text] Regularization using Markov Chain Monte Carlo Model Composition. J STAT COMPUT SIM 2015; 85:1090-1101. [PMID: 25642001 PMCID: PMC4307951 DOI: 10.1080/00949655.2013.861839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining [Formula: see text] regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the [Formula: see text] regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics.
Collapse
Affiliation(s)
- Chris Fraley
- Insilicos and University of Washington, Seattle, WA /
| | | |
Collapse
|
30
|
Ou-Yang L, Dai DQ, Li XL, Wu M, Zhang XF, Yang P. Detecting temporal protein complexes from dynamic protein-protein interaction networks. BMC Bioinformatics 2014; 15:335. [PMID: 25282536 PMCID: PMC4288635 DOI: 10.1186/1471-2105-15-335] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Accepted: 09/23/2014] [Indexed: 12/13/2022] Open
Abstract
Background Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization. Results In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of time-course gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TS-OCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TS-OCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points. Conclusions Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the state-of-the-art complex detection techniques. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-335) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China.
| | | | | | | | | |
Collapse
|
31
|
Santra T. A bayesian framework that integrates heterogeneous data for inferring gene regulatory networks. Front Bioeng Biotechnol 2014; 2:13. [PMID: 25152886 PMCID: PMC4126456 DOI: 10.3389/fbioe.2014.00013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/28/2014] [Indexed: 11/29/2022] Open
Abstract
Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein-protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
| |
Collapse
|
32
|
Bai Y, Ni M, Cooper B, Wei Y, Fury W. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads. BMC Genomics 2014; 15:325. [PMID: 24884790 PMCID: PMC4035057 DOI: 10.1186/1471-2164-15-325] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 04/04/2014] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data. RESULTS We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq. CONCLUSIONS PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications.
Collapse
Affiliation(s)
- Yu Bai
- Regeneron Pharmaceuticals, Inc, Tarrytown, New York USA
| | - Min Ni
- Regeneron Pharmaceuticals, Inc, Tarrytown, New York USA
| | - Blerta Cooper
- Regeneron Pharmaceuticals, Inc, Tarrytown, New York USA
| | - Yi Wei
- Regeneron Pharmaceuticals, Inc, Tarrytown, New York USA
| | - Wen Fury
- Regeneron Pharmaceuticals, Inc, Tarrytown, New York USA
| |
Collapse
|
33
|
Young WC, Raftery AE, Yeung KY. Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC SYSTEMS BIOLOGY 2014; 8:47. [PMID: 24742092 PMCID: PMC4006459 DOI: 10.1186/1752-0509-8-47] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/04/2014] [Indexed: 11/22/2022]
Abstract
Background Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships. As genome-wide data for mammalian systems are being generated, it is critical to develop network inference methods that can handle tens of thousands of genes efficiently, provide a systematic framework for the integration of multiple data sources, and yield robust, accurate and compact gene-to-gene relationships. Results We developed and applied ScanBMA, a Bayesian inference method that incorporates external information to improve the accuracy of the inferred network. In particular, we developed a new strategy to efficiently search the model space, applied data transformations to reduce the effect of spurious relationships, and adopted the g-prior to guide the search for candidate regulators. Our method is highly computationally efficient, thus addressing the scalability issue with network inference. The method is implemented as the ScanBMA function in the networkBMA Bioconductor software package. Conclusions We compared ScanBMA to other popular methods using time series yeast data as well as time-series simulated data from the DREAM competition. We found that ScanBMA produced more compact networks with a greater proportion of true positives than the competing methods. Specifically, ScanBMA generally produced more favorable areas under the Receiver-Operating Characteristic and Precision-Recall curves than other regression-based methods and mutual-information based methods. In addition, ScanBMA is competitive with other network inference methods in terms of running time.
Collapse
Affiliation(s)
| | | | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Box 357735, 98195-7735, Seattle WA, USA.
| |
Collapse
|
34
|
An algebra-based method for inferring gene regulatory networks. BMC SYSTEMS BIOLOGY 2014; 8:37. [PMID: 24669835 PMCID: PMC4022379 DOI: 10.1186/1752-0509-8-37] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 03/06/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. RESULTS This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. CONCLUSIONS Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.
Collapse
|
35
|
Duan G, Walther D, Schulze WX. Reconstruction and analysis of nutrient-induced phosphorylation networks in Arabidopsis thaliana. FRONTIERS IN PLANT SCIENCE 2013; 4:540. [PMID: 24400017 PMCID: PMC3872036 DOI: 10.3389/fpls.2013.00540] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Accepted: 12/12/2013] [Indexed: 05/23/2023]
Abstract
Elucidating the dynamics of molecular processes in living organisms in response to external perturbations is a central goal in modern systems biology. We investigated the dynamics of protein phosphorylation events in Arabidopsis thaliana exposed to changing nutrient conditions. Phosphopeptide expression levels were detected at five consecutive time points over a time interval of 30 min after nutrient resupply following prior starvation. The three tested inorganic, ionic nutrients NH(+) 4, NO(-) 3, PO(3-) 4 elicited similar phosphosignaling responses that were distinguishable from those invoked by the sugars mannitol, sucrose. When embedded in the protein-protein interaction network of Arabidopsis thaliana, phosphoproteins were found to exhibit a higher degree compared to average proteins. Based on the time-series data, we reconstructed a network of regulatory interactions mediated by phosphorylation. The performance of different network inference methods was evaluated by the observed likelihood of physical interactions within and across different subcellular compartments and based on gene ontology semantic similarity. The dynamic phosphorylation network was then reconstructed using a Pearson correlation method with added directionality based on partial variance differences. The topology of the inferred integrated network corresponds to an information dissemination architecture, in which the phosphorylation signal is passed on to an increasing number of phosphoproteins stratified into an initiation, processing, and effector layer. Specific phosphorylation peptide motifs associated with the distinct layers were identified indicating the action of layer-specific kinases. Despite the limited temporal resolution, combined with information on subcellular location, the available time-series data proved useful for reconstructing the dynamics of the molecular signaling cascade in response to nutrient stress conditions in the plant Arabidopsis thaliana.
Collapse
Affiliation(s)
- Guangyou Duan
- Max Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
| | - Waltraud X. Schulze
- Max Planck Institute of Molecular Plant PhysiologyPotsdam, Germany
- Department of Plant Systems Biology, Universität HohenheimStuttgart, Germany
| |
Collapse
|
36
|
Williams-DeVane CR, Reif DM, Hubal EC, Bushel PR, Hudgens EE, Gallagher JE, Edwards SW. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes. BMC SYSTEMS BIOLOGY 2013; 7:119. [PMID: 24188919 PMCID: PMC4228284 DOI: 10.1186/1752-0509-7-119] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 10/18/2013] [Indexed: 12/30/2022]
Abstract
Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease.
Collapse
Affiliation(s)
- Clarlynda R Williams-DeVane
- National Health and Environmental Effects Research Laboratory - Integrated Systems Toxicology Division, U,S, Environmental Protection Agency, Research Triangle Park, Durham, NC 27711, USA.
| | | | | | | | | | | | | |
Collapse
|
37
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 512] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|