1
|
Buzzao D, Persson E, Guala D, Sonnhammer ELL. FunCoup 6: advancing functional association networks across species with directed links and improved user experience. Nucleic Acids Res 2025; 53:D658-D671. [PMID: 39530220 PMCID: PMC11701656 DOI: 10.1093/nar/gkae1021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/11/2024] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
FunCoup 6 (https://funcoup.org) represents a significant advancement in global functional association networks, aiming to provide researchers with a comprehensive view of the functional coupling interactome. This update introduces novel methodologies and integrated tools for improved network inference and analysis. Major new developments in FunCoup 6 include vastly expanding the coverage of gene regulatory links, a new framework for bin-free Bayesian training and a new website. FunCoup 6 integrates a new tool for disease and drug target module identification using the TOPAS algorithm. To expand the utility of the resource for biomedical research, it incorporates pathway enrichment analysis using the ANUBIX and EASE algorithms. The unique comparative interactomics analysis in FunCoup provides insights of network conservation, now allowing users to align orthologs only or query each species network independently. Bin-free training was applied to 23 primary species, and in addition, networks were generated for all remaining 618 species in InParanoiDB 9. Accompanying these advancements, FunCoup 6 features a new redesigned website, together with updated API functionalities, and represents a pivotal step forward in functional genomics research, offering unique capabilities for exploring the complex landscape of protein interactions.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| |
Collapse
|
2
|
Idrees S, Paudel KR. Bioinformatics prediction and screening of viral mimicry candidates through integrating known and predicted DMI data. Arch Microbiol 2023; 206:30. [PMID: 38117335 DOI: 10.1007/s00203-023-03764-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023]
Abstract
Domain-motif interactions (DMIs) represent transient bonds formed when a Short Linear Motif (SLiM) engages a globular domain via a compact contact interface. Understanding the mechanics of DMIs is critical for maintaining diverse regulatory processes and deciphering how various viruses hijack host cellular machinery. However, identifying DMIs through traditional in vitro and in vivo experiments is challenging due to their degenerate nature and small contact areas. Predictions often carry a high rate of false positives, necessitating rigorous in-silico validation before embarking on experimental work. This study assessed the binding energy changes in predicted SLiM instances through in-silico peptide exchange experiment, elucidating how they interact with known 3D DMI complexes. We identified a subset of potential mimicry candidates that exhibited effective binding affinities with native DMI structures, suggesting their potential to be true mimicry candidates. The identified viral SLiMs can be potential targets in developing therapeutics, opening new opportunities for innovative treatments that can be finely tuned to address the complex molecular underpinnings of various diseases. To gain a comprehensive understanding of identified DMIs, it is imperative to conduct further validation through experimental approaches.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, Faculty of Science, School of Life Sciences, Sydney, NSW, Australia.
| | - Keshav Raj Paudel
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, Faculty of Science, School of Life Sciences, Sydney, NSW, Australia
| |
Collapse
|
3
|
Alborzi SZ, Ahmed Nacer A, Najjar H, Ritchie DW, Devignes MD. PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions. PLoS Comput Biol 2021; 17:e1008844. [PMID: 34370723 PMCID: PMC8376228 DOI: 10.1371/journal.pcbi.1008844] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 08/19/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022] Open
Abstract
Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called “PPIDM” (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described “CODAC” (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as “Gold-Standard” a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/. We revisit at a large scale the question of inferring DDIs from PPIs. Compared to previous studies, we take a unified approach accross multiple sources of PPIs. This approach is a method for inferring new edges in a tripartite graph setting and can be compared to link prediction approaches in knowledge graphs. Aggregation of several sources is performed using an optimized weighted average of the individual scores calculated in each source. A huge dataset of over 84K DDIs is produced which far exceeds the previous datasets. We show that a significant portion of the PPIDM dataset covers a large number of PPIs from curated (IMEx) or non curated (STRING) databases. Such a reservoir of DDIs deserves further exploration and can be combined with high-throughput methods such as cross-linking mass spectrometry to identify plausible protein partners of proteins of interest.
Collapse
|
4
|
Guala D, Ogris C, Müller N, Sonnhammer ELL. Genome-wide functional association networks: background, data & state-of-the-art resources. Brief Bioinform 2019; 21:1224-1237. [PMID: 31281921 PMCID: PMC7373183 DOI: 10.1093/bib/bbz064] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 02/06/2023] Open
Abstract
The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration.
Collapse
Affiliation(s)
- Dimitri Guala
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Christoph Ogris
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Nikola Müller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
5
|
Holland DO, Shapiro BH, Xue P, Johnson ME. Protein-protein binding selectivity and network topology constrain global and local properties of interface binding networks. Sci Rep 2017; 7:5631. [PMID: 28717235 PMCID: PMC5514078 DOI: 10.1038/s41598-017-05686-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 06/01/2017] [Indexed: 01/30/2023] Open
Abstract
Protein-protein interactions networks (PPINs) are known to share a highly conserved structure across all organisms. What is poorly understood, however, is the structure of the child interface interaction networks (IINs), which map the binding sites proteins use for each interaction. In this study we analyze four independently constructed IINs from yeast and humans and find a conserved structure of these networks with a unique topology distinct from the parent PPIN. Using an IIN sampling algorithm and a fitness function trained on the manually curated PPINs, we show that IIN topology can be mostly explained as a balance between limits on interface diversity and a need for physico-chemical binding complementarity. This complementarity must be optimized both for functional interactions and against mis-interactions, and this selectivity is encoded in the IIN motifs. To test whether the parent PPIN shapes IINs, we compared optimal IINs in biological PPINs versus random PPINs. We found that the hubs in biological networks allow for selective binding with minimal interfaces, suggesting that binding specificity is an additional pressure for a scale-free-like PPIN. We confirm through phylogenetic analysis that hub interfaces are strongly conserved and rewiring of interactions between proteins involved in endocytosis preserves interface binding selectivity.
Collapse
Affiliation(s)
- David O Holland
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Benjamin H Shapiro
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Pei Xue
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Margaret E Johnson
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA.
| |
Collapse
|
6
|
Prediction of human protein–protein interaction by a domain-based approach. J Theor Biol 2016; 396:144-53. [PMID: 26925814 DOI: 10.1016/j.jtbi.2016.02.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 01/29/2016] [Accepted: 02/20/2016] [Indexed: 02/04/2023]
|
7
|
Wang J, Zuo Y, Liu L, Man Y, Tadesse MG, Ressom HW. Identification of functional modules by integration of multiple data sources using a Bayesian network classifier. ACTA ACUST UNITED AC 2015; 7:206-17. [PMID: 24736851 DOI: 10.1161/circgenetics.113.000087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND Prediction of functional modules is indispensable for detecting protein deregulation in human complex diseases such as cancer. Bayesian network is one of the most commonly used models to integrate heterogeneous data from multiple sources such as protein domain, interactome, functional annotation, genome-wide gene expression, and the literature. METHODS AND RESULTS In this article, we present a Bayesian network classifier that is customized to (1) increase the ability to integrate diverse information from different sources, (2) effectively predict protein-protein interactions, (3) infer aberrant networks with scale-free and small-world properties, and (4) group molecules into functional modules or pathways based on the primary function and biological features. Application of this model in discovering protein biomarkers of hepatocellular carcinoma leads to the identification of functional modules that provide insights into the mechanism of the development and progression of hepatocellular carcinoma. These functional modules include cell cycle deregulation, increased angiogenesis (eg, vascular endothelial growth factor, blood vessel morphogenesis), oxidative metabolic alterations, and aberrant activation of signaling pathways involved in cellular proliferation, survival, and differentiation. CONCLUSIONS The discoveries and conclusions derived from our customized Bayesian network classifier are consistent with previously published results. The proposed approach for determining Bayesian network structure facilitates the integration of heterogeneous data from multiple sources to elucidate the mechanisms of complex diseases.
Collapse
Affiliation(s)
- Jinlian Wang
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC
| | | | | | | | | | | |
Collapse
|
8
|
Understanding Protein–Protein Interactions Using Local Structural Features. J Mol Biol 2013; 425:1210-24. [DOI: 10.1016/j.jmb.2013.01.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 01/08/2013] [Accepted: 01/14/2013] [Indexed: 11/21/2022]
|
9
|
Wright PC, Jaffe S, Noirel J, Zou X. Opportunities for protein interaction network-guided cellular engineering. IUBMB Life 2012; 65:17-27. [DOI: 10.1002/iub.1114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2012] [Revised: 10/14/2012] [Accepted: 10/15/2012] [Indexed: 01/23/2023]
|
10
|
Structural and functional analysis of multi-interface domains. PLoS One 2012; 7:e50821. [PMID: 23272073 PMCID: PMC3522720 DOI: 10.1371/journal.pone.0050821] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2012] [Accepted: 10/29/2012] [Indexed: 02/03/2023] Open
Abstract
A multi-interface domain is a domain that can shape multiple and distinctive binding sites to contact with many other domains, forming a hub in domain-domain interaction networks. The functions played by the multiple interfaces are usually different, but there is no strict bijection between the functions and interfaces as some subsets of the interfaces play the same function. This work applies graph theory and algorithms to discover fingerprints for the multiple interfaces of a domain and to establish associations between the interfaces and functions, based on a huge set of multi-interface proteins from PDB. We found that about 40% of proteins have the multi-interface property, however the involved multi-interface domains account for only a tiny fraction (1.8%) of the total number of domains. The interfaces of these domains are distinguishable in terms of their fingerprints, indicating the functional specificity of the multiple interfaces in a domain. Furthermore, we observed that both cooperative and distinctive structural patterns, which will be useful for protein engineering, exist in the multiple interfaces of a domain.
Collapse
|
11
|
Armean IM, Lilley KS, Trotter MWB. Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments. Mol Cell Proteomics 2012; 12:1-13. [PMID: 23071097 DOI: 10.1074/mcp.r112.019554] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.
Collapse
Affiliation(s)
- Irina M Armean
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
12
|
Turenne N, Tiys E, Ivanisenko V, Yudin N, Ignatieva E, Valour D, Degrelle SA, Hue I. Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development. BioData Min 2012; 5:12. [PMID: 22931563 PMCID: PMC3563503 DOI: 10.1186/1756-0381-5-12] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 08/15/2012] [Indexed: 12/16/2022] Open
Abstract
Background Since processes in well-known model organisms have specific features different from those in Bos taurus, the organism under study, a good way to describe gene regulation in ruminant embryos would be a species-specific consideration of closely related species to cattle, sheep and pig. However, as highlighted by a recent report, gene dictionaries in pig are smaller than in cattle, bringing a risk to reduce the gene resources to be mined (and so for sheep dictionaries). Bioinformatics approaches that allow an integration of available information on gene function in model organisms, taking into account their specificity, are thus needed. Besides these closely related and biologically relevant species, there is indeed much more knowledge of (i) trophoblast proliferation and differentiation or (ii) embryogenesis in human and mouse species, which provides opportunities for reconstructing proliferation and/or differentiation processes in other mammalian embryos, including ruminants. The necessary knowledge can be obtained partly from (i) stem cell or cancer research to supply useful information on molecular agents or molecular interactions at work in cell proliferation and (ii) mouse embryogenesis to supply useful information on embryo differentiation. However, the total number of publications for all these topics and species is great and their manual processing would be tedious and time consuming. This is why we used text mining for automated text analysis and automated knowledge extraction. To evaluate the quality of this “mining”, we took advantage of studies that reported gene expression profiles during the elongation of bovine embryos and defined a list of transcription factors (or TF, n = 64) that we used as biological “gold standard”. When successful, the “mining” approach would identify them all, as well as novel ones. Methods To gain knowledge on molecular-genetic regulations in a non model organism, we offer an approach based on literature-mining and score arrangement of data from model organisms. This approach was applied to identify novel transcription factors during bovine blastocyst elongation, a process that is not observed in rodents and primates. As a result, searching through human and mouse corpuses, we identified numerous bovine homologs, among which 11 to 14% of transcription factors including the gold standard TF as well as novel TF potentially important to gene regulation in ruminant embryo development. The scripts of the workflow are written in Perl and available on demand. They require data input coming from all various databases for any kind of biological issue once the data has been prepared according to keywords for the studied topic and species; we can provide data sample to illustrate the use and functionality of the workflow. Results To do so, we created a workflow that allowed the pipeline processing of literature data and biological data, extracted from Web of Science (WoS) or PubMed but also from Gene Expression Omnibus (GEO), Gene Ontology (GO), Uniprot, HomoloGene, TcoF-DB and TFe (TF encyclopedia). First, the human and mouse homologs of the bovine proteins were selected, filtered by text corpora and arranged by score functions. The score functions were based on the gene name frequencies in corpora. Then, transcription factors were identified using TcoF-DB and double-checked using TFe to characterise TF groups and families. Thus, among a search space of 18,670 bovine homologs, 489 were identified as transcription factors. Among them, 243 were absent from the high-throughput data available at the time of the study. They thus stand so far for putative TF acting during bovine embryo elongation, but might be retrieved from a recent RNA sequencing dataset (Mamo et al. , 2012). Beyond the 246 TF that appeared expressed in bovine elongating tissues, we restricted our interpretation to those occurring within a list of 50 top-ranked genes. Among the transcription factors identified therein, half belonged to the gold standard (ASCL2, c-FOS, ETS2, GATA3, HAND1) and half did not (ESR1, HES1, ID2, NANOG, PHB2, TP53, STAT3). Conclusions A workflow providing search for transcription factors acting in bovine elongation was developed. The model assumed that proteins sharing the same protein domains in closely related species had the same protein functionalities, even if they were differently regulated among species or involved in somewhat different pathways. Under this assumption, we merged the information on different mammalian species from different databases (literature and biology) and proposed 489 TF as potential participants of embryo proliferation and differentiation, with (i) a recall of 95% with regard to a biological gold standard defined in 2011 and (ii) an extension of more than 3 times the gold standard of TF detected so far in elongating tissues. The working capacity of the workflow was supported by the manual expertise of the biologists on the results. The workflow can serve as a new kind of bioinformatics tool to work on fused data sources and can thus be useful in studies of a wide range of biological processes.
Collapse
Affiliation(s)
- Nicolas Turenne
- INRA, SenS, UR1326, IFRIS, Champs-sur-Marne, F-77420, France
| | - Evgeniy Tiys
- Sector of Computational Proteomics, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Vladimir Ivanisenko
- Sector of Computational Proteomics, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Nikolay Yudin
- Laboratory of Animal Molecular Genetics, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Elena Ignatieva
- Laboratory of Evolutionary Bioinformatics and Theoretical, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - Damien Valour
- INRA, UMR1198 Biologie du Développement et Reproduction, Jouy-en-Josas, F-78352, France.,ENVA, Maisons Alfort, F-94704, France
| | - Séverine A Degrelle
- INRA, UMR1198 Biologie du Développement et Reproduction, Jouy-en-Josas, F-78352, France.,ENVA, Maisons Alfort, F-94704, France
| | - Isabelle Hue
- INRA, UMR1198 Biologie du Développement et Reproduction, Jouy-en-Josas, F-78352, France.,ENVA, Maisons Alfort, F-94704, France
| |
Collapse
|
13
|
Jang WH, Jung SH, Han DS. A computational model for predicting protein interactions based on multidomain collaboration. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1081-1090. [PMID: 22508910 DOI: 10.1109/tcbb.2012.55] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Recently, several domain-based computational models for predicting protein-protein interactions (PPIs) have been proposed. The conventional methods usually infer domain or domain combination (DC) interactions from already known interacting sets of proteins, and then predict PPIs using the information. However, the majority of these models often have limitations in providing detailed information on which domain pair (single domain interaction) or DC pair (multidomain interaction) will actually interact for the predicted protein interaction. Therefore, a more comprehensive and concrete computational model for the prediction of PPIs is needed. We developed a computational model to predict PPIs using the information of intraprotein domain cohesion and interprotein DC coupling interaction. A method of identifying the primary interacting DC pair was also incorporated into the model in order to infer actual participants in a predicted interaction. Our method made an apparent improvement in the PPI prediction accuracy, and the primary interacting DC pair identification was valid specifically in predicting multidomain protein interactions. In this paper, we demonstrate that 1) the intraprotein domain cohesion is meaningful in improving the accuracy of domain-based PPI prediction, 2) a prediction model incorporating the intradomain cohesion enables us to identify the primary interacting DC pair, and 3) a hybrid approach using the intra/interdomain interaction information can lead to a more accurate prediction.
Collapse
Affiliation(s)
- Woo-Hyuk Jang
- Department of Information and Communications Engineering, Korea Advanced Institute of Science and Technology, Kaist, 335 Gwahak-ro, Yuseong-gu, Daejeon 305-701, Korea.
| | | | | |
Collapse
|
14
|
Kim Y, Min B, Yi GS. IDDI: integrated domain-domain interaction and protein interaction analysis system. Proteome Sci 2012; 10 Suppl 1:S9. [PMID: 22759586 PMCID: PMC3380739 DOI: 10.1186/1477-5956-10-s1-s9] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Deciphering protein-protein interaction (PPI) in domain level enriches valuable information about binding mechanism and functional role of interacting proteins. The 3D structures of complex proteins are reliable source of domain-domain interaction (DDI) but the number of proven structures is very limited. Several resources for the computationally predicted DDI have been generated but they are scattered in various places and their prediction show erratic performances. A well-organized PPI and DDI analysis system integrating these data with fair scoring system is necessary. Method We integrated three structure-based DDI datasets and twenty computationally predicted DDI datasets and constructed an interaction analysis system, named IDDI, which enables to browse protein and domain interactions with their relationships. To integrate heterogeneous DDI information, a novel scoring scheme is introduced to determine the reliability of DDI by considering the prediction scores of each DDI and the confidence levels of each prediction method in the datasets, and independencies between predicted datasets. In addition, we connected this DDI information to the comprehensive PPI information and developed a unified interface for the interaction analysis exploring interaction networks at both protein and domain level. Result IDDI provides 204,705 DDIs among total 7,351 Pfam domains in the current version. The result presents that total number of DDIs is increased eight times more than that of previous studies. Due to the increment of data, 50.4% of PPIs could be correlated with DDIs which is more than twice of previous resources. Newly designed scoring scheme outperformed the previous system in its accuracy too. User interface of IDDI system provides interactive investigation of proteins and domains in interactions with interconnected way. A specific example is presented to show the efficiency of the systems to acquire the comprehensive information of target protein with PPI and DDI relationships. IDDI is freely available at http://pcode.kaist.ac.kr/iddi/.
Collapse
Affiliation(s)
- Yul Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Bumki Min
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Gwan-Su Yi
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| |
Collapse
|
15
|
Lees J, Yeats C, Perkins J, Sillitoe I, Rentzsch R, Dessailly BH, Orengo C. Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Res 2011; 40:D465-71. [PMID: 22139938 PMCID: PMC3245158 DOI: 10.1093/nar/gkr1181] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includes a domain functional family (FunFam) level below the homologous superfamily level assignments. Additions have also been made to the interaction data. More significantly, to help with the visualization and interpretation of multi-genome scale data sets, we have developed a new, revamped website. Searching has been simplified with more sophisticated filtering of results, along with new tools based on Cytoscape Web, for visualizing protein–protein interaction networks, differences in domain composition between genomes and the taxonomic distribution of individual superfamilies.
Collapse
Affiliation(s)
- Jonathan Lees
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower St, London WC1E 6BT, UK.
| | | | | | | | | | | | | |
Collapse
|
16
|
Alexeyenko A, Schmitt T, Tjärnberg A, Guala D, Frings O, Sonnhammer ELL. Comparative interactomics with Funcoup 2.0. Nucleic Acids Res 2011; 40:D821-8. [PMID: 22110034 PMCID: PMC3245127 DOI: 10.1093/nar/gkr1062] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.
Collapse
Affiliation(s)
- Andrey Alexeyenko
- School of Biotechnology, Royal Institute of Technology, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden
| | | | | | | | | | | |
Collapse
|
17
|
Lees JG, Heriche JK, Morilla I, Ranea JA, Orengo CA. Systematic computational prediction of protein interaction networks. Phys Biol 2011; 8:035008. [PMID: 21572181 DOI: 10.1088/1478-3975/8/3/035008] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use.
Collapse
Affiliation(s)
- J G Lees
- Research Department of Structural & Molecular Biology, University College London, London, UK.
| | | | | | | | | |
Collapse
|
18
|
Daemen A, Signoretto M, Gevaert O, Suykens JAK, De Moor B. Improved microarray-based decision support with graph encoded interactome data. PLoS One 2010; 5:e10225. [PMID: 20419106 PMCID: PMC2856685 DOI: 10.1371/journal.pone.0010225] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 03/28/2010] [Indexed: 12/31/2022] Open
Abstract
In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting (microRNA.org) outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method.
Collapse
Affiliation(s)
- Anneleen Daemen
- Department of Electrical Engineering ESAT/SCD, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | |
Collapse
|