1
|
Lafit G, Tuerlinckx F, Myin-Germeys I, Ceulemans E. A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models. Sci Rep 2019; 9:17759. [PMID: 31780817 PMCID: PMC6882820 DOI: 10.1038/s41598-019-53795-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 11/05/2019] [Indexed: 12/28/2022] Open
Abstract
Gaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.
Collapse
Affiliation(s)
- Ginette Lafit
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium.
- Center for Contextual Psychiatry, KU Leuven-University of Leuven, Leuven, 3000, Belgium.
| | - Francis Tuerlinckx
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Inez Myin-Germeys
- Center for Contextual Psychiatry, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Eva Ceulemans
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
2
|
Jelisavcic V, Stojkovic I, Milutinovic V, Obradovic Z. Fast learning of scale-free networks based on Cholesky factorization. INT J INTELL SYST 2018. [DOI: 10.1002/int.21984] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Vladisav Jelisavcic
- School of Electrical Engineering; University of Belgrade; Belgrade Serbia
- Mathematical Institute of the Serbian Academy of Sciences and Arts; Belgrade Serbia
| | - Ivan Stojkovic
- School of Electrical Engineering; University of Belgrade; Belgrade Serbia
- Center for Data Analytics & Biomedical Informatics; Temple University; Philadelphia USA
| | - Veljko Milutinovic
- School of Electrical Engineering; University of Belgrade; Belgrade Serbia
| | - Zoran Obradovic
- Center for Data Analytics & Biomedical Informatics; Temple University; Philadelphia USA
| |
Collapse
|
3
|
Pham T, Sheridan P, Shimodaira H. PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS One 2015; 10:e0137796. [PMID: 26378457 PMCID: PMC4574777 DOI: 10.1371/journal.pone.0137796] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 08/21/2015] [Indexed: 11/24/2022] Open
Abstract
Preferential attachment is a stochastic process that has been proposed to explain certain topological features characteristic of complex networks from diverse domains. The systematic investigation of preferential attachment is an important area of research in network science, not only for the theoretical matter of verifying whether this hypothesized process is operative in real-world networks, but also for the practical insights that follow from knowledge of its functional form. Here we describe a maximum likelihood based estimation method for the measurement of preferential attachment in temporal complex networks. We call the method PAFit, and implement it in an R package of the same name. PAFit constitutes an advance over previous methods primarily because we based it on a nonparametric statistical framework that enables attachment kernel estimation free of any assumptions about its functional form. We show this results in PAFit outperforming the popular methods of Jeong and Newman in Monte Carlo simulations. What is more, we found that the application of PAFit to a publically available Flickr social network dataset yielded clear evidence for a deviation of the attachment kernel from the popularly assumed log-linear form. Independent of our main work, we provide a correction to a consequential error in Newman’s original method which had evidently gone unnoticed since its publication over a decade ago.
Collapse
Affiliation(s)
- Thong Pham
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka, Japan
- * E-mail:
| | - Paul Sheridan
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Hidetoshi Shimodaira
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka, Japan
| |
Collapse
|
4
|
Raja Chowdhury A, Chetty M. Network decomposition based large-scale reverse engineering of gene regulatory network. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.02.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Birlutiu A, d'Alché-Buc F, Heskes T. A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:538-550. [PMID: 26357265 DOI: 10.1109/tcbb.2014.2359441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.
Collapse
|
6
|
Hirose K, Ogura Y, Shimodaira H. ESTIMATING SCALE-FREE NETWORKS VIA THE EXPONENTIATION OF MINIMAX CONCAVE PENALTY. JOURNAL JAPANESE SOCIETY OF COMPUTATIONAL STATISTICS 2015. [DOI: 10.5183/jjscs.1503001_215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Kei Hirose
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University
| | | | - Hidetoshi Shimodaira
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University
| |
Collapse
|
7
|
|
8
|
Chowdhury AR, Chetty M, Vinh NX. Evaluating influence of microRNA in reconstructing gene regulatory networks. Cogn Neurodyn 2013; 8:251-9. [PMID: 24808933 DOI: 10.1007/s11571-013-9265-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Revised: 07/23/2013] [Accepted: 07/25/2013] [Indexed: 12/11/2022] Open
Abstract
Gene regulatory network (GRN) consists of interactions between transcription factors (TFs) and target genes (TGs). Recently, it has been observed that micro RNAs (miRNAs) play a significant part in genetic interactions. However, current microarray technologies do not capture miRNA expression levels. To overcome this, we propose a new technique to reverse engineer GRN from the available partial microarray data which contains expression levels of TFs and TGs only. Using S-System model, the approach is adapted to cope with the unavailability of information about the expression levels of miRNAs. The versatile Differential Evolutionary algorithm is used for optimization and parameter estimation. Experimental studies on four in silico networks, and a real network of Saccharomyces cerevisiae called IRMA network, show significant improvement compared to traditional S-System approach.
Collapse
Affiliation(s)
- Ahsan Raja Chowdhury
- Gippsland School of Information Technology, Monash University, Victoria, Australia ; National ICT Australia (NICTA), VRL, Melbourne, Australia
| | - Madhu Chetty
- Gippsland School of Information Technology, Monash University, Victoria, Australia ; National ICT Australia (NICTA), VRL, Melbourne, Australia
| | - Nguyen Xuan Vinh
- Gippsland School of Information Technology, Monash University, Victoria, Australia
| |
Collapse
|
9
|
Chowdhury AR, Chetty M, Vinh NX. Incorporating time-delays in S-System model for reverse engineering genetic networks. BMC Bioinformatics 2013; 14:196. [PMID: 23777625 PMCID: PMC3839642 DOI: 10.1186/1471-2105-14-196] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Accepted: 06/07/2013] [Indexed: 11/10/2022] Open
Abstract
Background In any gene regulatory network (GRN), the complex interactions occurring amongst transcription factors and target genes can be either instantaneous or time-delayed. However, many existing modeling approaches currently applied for inferring GRNs are unable to represent both these interactions simultaneously. As a result, all these approaches cannot detect important interactions of the other type. S-System model, a differential equation based approach which has been increasingly applied for modeling GRNs, also suffers from this limitation. In fact, all S-System based existing modeling approaches have been designed to capture only instantaneous interactions, and are unable to infer time-delayed interactions. Results In this paper, we propose a novel Time-Delayed S-System (TDSS) model which uses a set of delay differential equations to represent the system dynamics. The ability to incorporate time-delay parameters in the proposed S-System model enables simultaneous modeling of both instantaneous and time-delayed interactions. Furthermore, the delay parameters are not limited to just positive integer values (corresponding to time stamps in the data), but can also take fractional values. Moreover, we also propose a new criterion for model evaluation exploiting the sparse and scale-free nature of GRNs to effectively narrow down the search space, which not only reduces the computation time significantly but also improves model accuracy. The evaluation criterion systematically adapts the max-min in-degrees and also systematically balances the effect of network accuracy and complexity during optimization. Conclusion The four well-known performance measures applied to the experimental studies on synthetic networks with various time-delayed regulations clearly demonstrate that the proposed method can capture both instantaneous and delayed interactions correctly with high precision. The experiments carried out on two well-known real-life networks, namely IRMA and SOS DNA repair network in Escherichia coli show a significant improvement compared with other state-of-the-art approaches for GRN modeling.
Collapse
Affiliation(s)
- Ahsan Raja Chowdhury
- Gippsland School of Information Technology, Monash University, Churchill, Victoria-3842, Australia.
| | | | | |
Collapse
|
10
|
Phatak SS, Zhang S. A novel multi-modal drug repurposing approach for identification of potent ACK1 inhibitors. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:29-40. [PMID: 23424109 PMCID: PMC3864554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Exploiting drug polypharmacology to identify novel modes of actions for drug repurposing has gained significant attentions in the current era of weak drug pipelines. From a serendipitous to systematic or rational ways, a variety of unimodal computational approaches have been developed but the complexity of the problem clearly needs multi-modal approaches for better solutions. In this study, we propose an integrative computational framework based on classical structure-based drug design and chemical-genomic similarity methods, combined with molecular graph theories for this task. Briefly, a pharmacophore modeling method was employed to guide the selection of docked poses resulting from our high-throughput virtual screening. We then evaluated if complementary results (hits missed by docking) can be obtained by using a novel chemo-genomic similarity approach based on chemical/sequence information. Finally, we developed a bipartite-graph based on the extensive data curation of DrugBank, PDB, and UniProt. This drug-target bipartite graph was used to assess similarity of different inhibitors based on their connections to other compounds and targets. The approaches were applied to the repurposing of existing drugs against ACK1, a novel cancer target significantly overexpressed in breast and prostate cancers during their progression. Upon screening of ∼1,447 marketed drugs, a final set of 10 hits were selected for experimental testing. Among them, four drugs were identified as potent ACK1 inhibitors. Especially the inhibition of ACK1 by Dasatinib was as strong as IC(50)=1nM. We anticipate that our novel, integrative strategy can be easily extended to other biological targets with a more comprehensive coverage of known bio-chemical space for repurposing studies.
Collapse
Affiliation(s)
- Sharangdhar S. Phatak
- Integrated Molecular Discovery Laboratory (iMDL), The University Texas M.D. Anderson Cancer Center, School of Biomedical Informatics, The Univ. Texas Health Science Center, 7000 Fannin St. Ste 600, Houston, Texas, 77030, USA,
| | - Shuxing Zhang
- Integrated Molecular Discovery Laboratory (iMDL), The University Texas M.D. Anderson Cancer Center, 1901 East Road, Unit 1950, Houston, TX, 77030, USA,
| |
Collapse
|
11
|
Xuan NV, Chetty M, Coppel R, Wangikar PP. Gene regulatory network modeling via global optimization of high-order dynamic Bayesian network. BMC Bioinformatics 2012; 13:131. [PMID: 22694481 PMCID: PMC3433362 DOI: 10.1186/1471-2105-13-131] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2011] [Accepted: 06/13/2012] [Indexed: 11/11/2022] Open
Abstract
Background Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks. Results To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques. Conclusions Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
Collapse
Affiliation(s)
- Nguyen Vinh Xuan
- Gippsland School of Information Technology, Monash University, Melbourne, Australia.
| | | | | | | |
Collapse
|
12
|
Almudevar A, LaCombe J. On the choice of prior density for the Bayesian analysis of pedigree structure. Theor Popul Biol 2011; 81:131-43. [PMID: 22200649 DOI: 10.1016/j.tpb.2011.12.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 11/28/2011] [Accepted: 12/07/2011] [Indexed: 11/18/2022]
Abstract
This article is concerned with the choice of structural prior density for use in a fully Bayesian approach to pedigree inference. It is found that the choice of prior has considerable influence on the accuracy of the estimation. To guide this choice, a scale invariance property is introduced. Under a structural prior with this property, the marginal prior distribution of the local properties of a pedigree node (number of parents, offspring, etc.) does not depend on the number of nodes in the pedigree. Such priors are found to arise naturally by an application of the Minimum Description Length (MDL) principle, under which construction of a prior becomes equivalent to the problem of determining the length of a code required to encode a pedigree, using the principles of information theory. The approach is demonstrated using simulated and actual data, and is compared to two well-known applications, CERVUS and COLONY.
Collapse
Affiliation(s)
- Anthony Almudevar
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States.
| | | |
Collapse
|
13
|
Bender C, Heyde SV, Henjes F, Wiemann S, Korf U, Beissbarth T. Inferring signalling networks from longitudinal data using sampling based approaches in the R-package 'ddepn'. BMC Bioinformatics 2011; 12:291. [PMID: 21771315 PMCID: PMC3146886 DOI: 10.1186/1471-2105-12-291] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Accepted: 07/19/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network inference from high-throughput data has become an important means of current analysis of biological systems. For instance, in cancer research, the functional relationships of cancer related proteins, summarised into signalling networks are of central interest for the identification of pathways that influence tumour development. Cancer cell lines can be used as model systems to study the cellular response to drug treatments in a time-resolved way. Based on these kind of data, modelling approaches for the signalling relationships are needed, that allow to generate hypotheses on potential interference points in the networks. RESULTS We present the R-package 'ddepn' that implements our recent approach on network reconstruction from longitudinal data generated after external perturbation of network components. We extend our approach by two novel methods: a Markov Chain Monte Carlo method for sampling network structures with two edge types (activation and inhibition) and an extension of a prior model that penalises deviances from a given reference network while incorporating these two types of edges. Further, as alternative prior we include a model that learns signalling networks with the scale-free property. CONCLUSIONS The package 'ddepn' is freely available on R-Forge and CRAN http://ddepn.r-forge.r-project.org, http://cran.r-project.org. It allows to conveniently perform network inference from longitudinal high-throughput data using two different sampling based network structure search algorithms.
Collapse
Affiliation(s)
- Christian Bender
- German Cancer Research Center (DKFZ), Division of Molecular Genome Analysis, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
| | | | | | | | | | | |
Collapse
|