1
|
Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 2013; 29:1424-32. [PMID: 23572412 DOI: 10.1093/bioinformatics/btt160] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
2
|
Leal Valentim F, Neven F, Boyen P, van Dijk ADJ. Interactome-wide prediction of protein-protein binding sites reveals effects of protein sequence variation in Arabidopsis thaliana. PLoS One 2012; 7:e47022. [PMID: 23077539 PMCID: PMC3471968 DOI: 10.1371/journal.pone.0047022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 09/07/2012] [Indexed: 11/18/2022] Open
Abstract
The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.
Collapse
Affiliation(s)
| | - Frank Neven
- Hasselt University and Transnational University of Limburg, Hasselt, Belgium
| | - Peter Boyen
- Hasselt University and Transnational University of Limburg, Hasselt, Belgium
| | - Aalt D. J. van Dijk
- Plant Research International, Bioscience, Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
3
|
van Dijk ADJ, van Mourik S, van Ham RCHJ. Mutational robustness of gene regulatory networks. PLoS One 2012; 7:e30591. [PMID: 22295094 PMCID: PMC3266278 DOI: 10.1371/journal.pone.0030591] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 12/19/2011] [Indexed: 11/18/2022] Open
Abstract
Mutational robustness of gene regulatory networks refers to their ability to generate constant biological output upon mutations that change network structure. Such networks contain regulatory interactions (transcription factor – target gene interactions) but often also protein-protein interactions between transcription factors. Using computational modeling, we study factors that influence robustness and we infer several network properties governing it. These include the type of mutation, i.e. whether a regulatory interaction or a protein-protein interaction is mutated, and in the case of mutation of a regulatory interaction, the sign of the interaction (activating vs. repressive). In addition, we analyze the effect of combinations of mutations and we compare networks containing monomeric with those containing dimeric transcription factors. Our results are consistent with available data on biological networks, for example based on evolutionary conservation of network features. As a novel and remarkable property, we predict that networks are more robust against mutations in monomer than in dimer transcription factors, a prediction for which analysis of conservation of DNA binding residues in monomeric vs. dimeric transcription factors provides indirect evidence.
Collapse
Affiliation(s)
- Aalt D J van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Wageningen, The Netherlands.
| | | | | |
Collapse
|
4
|
Severing EI, van Dijk ADJ, Morabito G, Busscher-Lange J, Immink RGH, van Ham RCHJ. Predicting the impact of alternative splicing on plant MADS domain protein function. PLoS One 2012; 7:e30524. [PMID: 22295091 PMCID: PMC3266260 DOI: 10.1371/journal.pone.0030524] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Accepted: 12/18/2011] [Indexed: 11/18/2022] Open
Abstract
Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1-3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC proteins.
Collapse
Affiliation(s)
- Edouard I. Severing
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
| | - Giuseppa Morabito
- Plant Developmental Systems, Plant Research International, Wageningen, The Netherlands
| | | | - Richard G. H. Immink
- Centre for BioSystems Genomics, Wageningen, The Netherlands
- Plant Developmental Systems, Plant Research International, Wageningen, The Netherlands
| | - Roeland C. H. J. van Ham
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
5
|
Simpson GL, Birks HJB. Statistical Learning in Palaeolimnology. TRACKING ENVIRONMENTAL CHANGE USING LAKE SEDIMENTS 2012. [DOI: 10.1007/978-94-007-2745-8_9] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
6
|
Boyen P, Van Dyck D, Neven F, van Ham RCHJ, van Dijk ADJ. SLIDER: a generic metaheuristic for the discovery of correlated motifs in protein-protein interaction networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1344-1357. [PMID: 21282865 DOI: 10.1109/tcbb.2011.17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
Collapse
Affiliation(s)
- Peter Boyen
- Hasselt University, Agoralaan Gebouw D, B-3590 Diepenbeek, Belgium.
| | | | | | | | | |
Collapse
|
7
|
Schmeier S, Jankovic B, Bajic VB. Simplified method to predict mutual interactions of human transcription factors based on their primary structure. PLoS One 2011; 6:e21887. [PMID: 21750739 PMCID: PMC3130058 DOI: 10.1371/journal.pone.0021887] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 06/14/2011] [Indexed: 11/18/2022] Open
Abstract
Background Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39% on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account.
Collapse
Affiliation(s)
- Sebastian Schmeier
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Boris Jankovic
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Vladimir B. Bajic
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
- * E-mail:
| |
Collapse
|
8
|
Kourmpetis YA, van Dijk AD, van Ham RC, ter Braak CJ. Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources. PLANT PHYSIOLOGY 2011; 155:271-81. [PMID: 21098674 PMCID: PMC3075770 DOI: 10.1104/pp.110.162164] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Although Arabidopsis (Arabidopsis thaliana) is the best studied plant species, the biological role of one-third of its proteins is still unknown. We developed a probabilistic protein function prediction method that integrates information from sequences, protein-protein interactions, and gene expression. The method was applied to proteins from Arabidopsis. Evaluation of prediction performance showed that our method has improved performance compared with single source-based prediction approaches and two existing integration approaches. An innovative feature of our method is that it enables transfer of functional information between proteins that are not directly associated with each other. We provide novel function predictions for 5,807 proteins. Recent experimental studies confirmed several of the predictions. We highlight these in detail for proteins predicted to be involved in flowering and floral organ development.
Collapse
|
9
|
van Dijk ADJ, Morabito G, Fiers M, van Ham RCHJ, Angenent GC, Immink RGH. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction. PLoS Comput Biol 2010; 6:e1001017. [PMID: 21124869 PMCID: PMC2991254 DOI: 10.1371/journal.pcbi.1001017] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 10/27/2010] [Indexed: 11/18/2022] Open
Abstract
Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution.
Collapse
Affiliation(s)
| | | | - Martijn Fiers
- Plant Research International, Bioscience, Wageningen, The Netherlands
| | | | - Gerco C. Angenent
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
| | - Richard G. H. Immink
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
10
|
van Dijk ADJ, van Ham RCHJ. Conserved and variable correlated mutations in the plant MADS protein network. BMC Genomics 2010; 11:607. [PMID: 20979667 PMCID: PMC3017862 DOI: 10.1186/1471-2164-11-607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 10/28/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.
Collapse
Affiliation(s)
- Aalt DJ van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
11
|
van Mourik S, van Dijk ADJ, de Gee M, Immink RGH, Kaufmann K, Angenent GC, van Ham RCHJ, Molenaar J. Continuous-time modeling of cell fate determination in Arabidopsis flowers. BMC SYSTEMS BIOLOGY 2010; 4:101. [PMID: 20649974 PMCID: PMC2922098 DOI: 10.1186/1752-0509-4-101] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2010] [Accepted: 07/22/2010] [Indexed: 01/02/2023]
Abstract
Background The genetic control of floral organ specification is currently being investigated by various approaches, both experimentally and through modeling. Models and simulations have mostly involved boolean or related methods, and so far a quantitative, continuous-time approach has not been explored. Results We propose an ordinary differential equation (ODE) model that describes the gene expression dynamics of a gene regulatory network that controls floral organ formation in the model plant Arabidopsis thaliana. In this model, the dimerization of MADS-box transcription factors is incorporated explicitly. The unknown parameters are estimated from (known) experimental expression data. The model is validated by simulation studies of known mutant plants. Conclusions The proposed model gives realistic predictions with respect to independent mutation data. A simulation study is carried out to predict the effects of a new type of mutation that has so far not been made in Arabidopsis, but that could be used as a severe test of the validity of the model. According to our predictions, the role of dimers is surprisingly important. Moreover, the functional loss of any dimer leads to one or more phenotypic alterations.
Collapse
Affiliation(s)
- Simon van Mourik
- Biometris, Plant Sciences Group, Wageningen University and Research Center, Wageningen, The Netherlands.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Immink RG, Kaufmann K, Angenent GC. The ‘ABC’ of MADS domain protein behaviour and interactions. Semin Cell Dev Biol 2010; 21:87-93. [DOI: 10.1016/j.semcdb.2009.10.004] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Accepted: 10/23/2009] [Indexed: 02/05/2023]
|
13
|
Geurts P, Irrthum A, Wehenkel L. Supervised learning with decision tree-based methods in computational and systems biology. MOLECULAR BIOSYSTEMS 2009; 5:1593-605. [PMID: 20023720 DOI: 10.1039/b907946g] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.
Collapse
Affiliation(s)
- Pierre Geurts
- Department of EE and CS & GIGA-Research, University of Liège, Belgium.
| | | | | |
Collapse
|