1
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
2
|
Alborzi SZ, Ahmed Nacer A, Najjar H, Ritchie DW, Devignes MD. PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions. PLoS Comput Biol 2021; 17:e1008844. [PMID: 34370723 PMCID: PMC8376228 DOI: 10.1371/journal.pcbi.1008844] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 08/19/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022] Open
Abstract
Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called “PPIDM” (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described “CODAC” (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as “Gold-Standard” a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/. We revisit at a large scale the question of inferring DDIs from PPIs. Compared to previous studies, we take a unified approach accross multiple sources of PPIs. This approach is a method for inferring new edges in a tripartite graph setting and can be compared to link prediction approaches in knowledge graphs. Aggregation of several sources is performed using an optimized weighted average of the individual scores calculated in each source. A huge dataset of over 84K DDIs is produced which far exceeds the previous datasets. We show that a significant portion of the PPIDM dataset covers a large number of PPIs from curated (IMEx) or non curated (STRING) databases. Such a reservoir of DDIs deserves further exploration and can be combined with high-throughput methods such as cross-linking mass spectrometry to identify plausible protein partners of proteins of interest.
Collapse
|
3
|
Zhang W, Coba MP, Sun F. Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:4. [PMID: 26818594 PMCID: PMC4895779 DOI: 10.1186/s12918-015-0247-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understandingof the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. Methods Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. Results We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn’s disease and type 2 diabetes. Conclusions The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0247-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wangshu Zhang
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA.
| | - Marcelo P Coba
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. .,Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. .,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China.
| |
Collapse
|
4
|
Segura J, Sorzano COS, Cuenca-Alba J, Aloy P, Carazo JM. Using neighborhood cohesiveness to infer interactions between protein domains. Bioinformatics 2015; 31:2545-52. [DOI: 10.1093/bioinformatics/btv188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/28/2015] [Indexed: 01/18/2023] Open
|
5
|
Jeong JC, Chen X. A New Semantic Functional Similarity over Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:322-334. [PMID: 26357220 DOI: 10.1109/tcbb.2014.2343963] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying functionally similar or closely related genes and gene products has significant impacts on biological and clinical studies as well as drug discovery. In this paper, we propose an effective and practically useful method measuring both gene and gene product similarity by integrating the topology of gene ontology, known functional domains and their functional annotations. The proposed method is comprehensively evaluated through statistical analysis of the similarities derived from sequence, structure and phylogenetic profiles, and clustering analysis of disease genes clusters. Our results show that the proposed method clearly outperforms other conventional methods. Furthermore, literature analysis also reveals that the proposed method is both statistically and biologically promising for identifying functionally similar genes or gene products. In particular, we demonstrate that the proposed functional similarity metric is capable of discoverying new disease related genes or gene products.
Collapse
|
6
|
Memišević V, Wallqvist A, Reifman J. Reconstituting protein interaction networks using parameter-dependent domain-domain interactions. BMC Bioinformatics 2013; 14:154. [PMID: 23651452 PMCID: PMC3660195 DOI: 10.1186/1471-2105-14-154] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 04/05/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We can describe protein-protein interactions (PPIs) as sets of distinct domain-domain interactions (DDIs) that mediate the physical interactions between proteins. Experimental data confirm that DDIs are more consistent than their corresponding PPIs, lending support to the notion that analyses of DDIs may improve our understanding of PPIs and lead to further insights into cellular function, disease, and evolution. However, currently available experimental DDI data cover only a small fraction of all existing PPIs and, in the absence of structural data, determining which particular DDI mediates any given PPI is a challenge. RESULTS We present two contributions to the field of domain interaction analysis. First, we introduce a novel computational strategy to merge domain annotation data from multiple databases. We show that when we merged yeast domain annotations from six annotation databases we increased the average number of domains per protein from 1.05 to 2.44, bringing it closer to the estimated average value of 3. Second, we introduce a novel computational method, parameter-dependent DDI selection (PADDS), which, given a set of PPIs, extracts a small set of domain pairs that can reconstruct the original set of protein interactions, while attempting to minimize false positives. Based on a set of PPIs from multiple organisms, our method extracted 27% more experimentally detected DDIs than existing computational approaches. CONCLUSIONS We have provided a method to merge domain annotation data from multiple sources, ensuring large and consistent domain annotation for any given organism. Moreover, we provided a method to extract a small set of DDIs from the underlying set of PPIs and we showed that, in contrast to existing approaches, our method was not biased towards DDIs with low or high occurrence counts. Finally, we used these two methods to highlight the influence of the underlying annotation density on the characteristics of extracted DDIs. Although increased annotations greatly expanded the possible DDIs, the lack of knowledge of the true biological false positive interactions still prevents an unambiguous assignment of domain interactions responsible for all protein network interactions.Executable files and examples are given at: http://www.bhsai.org/downloads/padds/
Collapse
Affiliation(s)
- Vesna Memišević
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD 21702, USA
| | | | | |
Collapse
|
7
|
Diversity in genetic in vivo methods for protein-protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system. Microbiol Mol Biol Rev 2012; 76:331-82. [PMID: 22688816 DOI: 10.1128/mmbr.05021-11] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The yeast two-hybrid system pioneered the field of in vivo protein-protein interaction methods and undisputedly gave rise to a palette of ingenious techniques that are constantly pushing further the limits of the original method. Sensitivity and selectivity have improved because of various technical tricks and experimental designs. Here we present an exhaustive overview of the genetic approaches available to study in vivo binary protein interactions, based on two-hybrid and protein fragment complementation assays. These methods have been engineered and employed successfully in microorganisms such as Saccharomyces cerevisiae and Escherichia coli, but also in higher eukaryotes. From single binary pairwise interactions to whole-genome interactome mapping, the self-reassembly concept has been employed widely. Innovative studies report the use of proteins such as ubiquitin, dihydrofolate reductase, and adenylate cyclase as reconstituted reporters. Protein fragment complementation assays have extended the possibilities in protein-protein interaction studies, with technologies that enable spatial and temporal analyses of protein complexes. In addition, one-hybrid and three-hybrid systems have broadened the types of interactions that can be studied and the findings that can be obtained. Applications of these technologies are discussed, together with the advantages and limitations of the available assays.
Collapse
|
8
|
Kim Y, Min B, Yi GS. IDDI: integrated domain-domain interaction and protein interaction analysis system. Proteome Sci 2012; 10 Suppl 1:S9. [PMID: 22759586 PMCID: PMC3380739 DOI: 10.1186/1477-5956-10-s1-s9] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Deciphering protein-protein interaction (PPI) in domain level enriches valuable information about binding mechanism and functional role of interacting proteins. The 3D structures of complex proteins are reliable source of domain-domain interaction (DDI) but the number of proven structures is very limited. Several resources for the computationally predicted DDI have been generated but they are scattered in various places and their prediction show erratic performances. A well-organized PPI and DDI analysis system integrating these data with fair scoring system is necessary. Method We integrated three structure-based DDI datasets and twenty computationally predicted DDI datasets and constructed an interaction analysis system, named IDDI, which enables to browse protein and domain interactions with their relationships. To integrate heterogeneous DDI information, a novel scoring scheme is introduced to determine the reliability of DDI by considering the prediction scores of each DDI and the confidence levels of each prediction method in the datasets, and independencies between predicted datasets. In addition, we connected this DDI information to the comprehensive PPI information and developed a unified interface for the interaction analysis exploring interaction networks at both protein and domain level. Result IDDI provides 204,705 DDIs among total 7,351 Pfam domains in the current version. The result presents that total number of DDIs is increased eight times more than that of previous studies. Due to the increment of data, 50.4% of PPIs could be correlated with DDIs which is more than twice of previous resources. Newly designed scoring scheme outperformed the previous system in its accuracy too. User interface of IDDI system provides interactive investigation of proteins and domains in interactions with interconnected way. A specific example is presented to show the efficiency of the systems to acquire the comprehensive information of target protein with PPI and DDI relationships. IDDI is freely available at http://pcode.kaist.ac.kr/iddi/.
Collapse
Affiliation(s)
- Yul Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Bumki Min
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| | - Gwan-Su Yi
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea
| |
Collapse
|
9
|
Jeong JC, Lin X, Chen XW. On position-specific scoring matrix for protein function prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:308-315. [PMID: 20855926 DOI: 10.1109/tcbb.2010.93] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
While genome sequencing projects have generated tremendous amounts of protein sequence data for a vast number of genomes, substantial portions of most genomes are still unannotated. Despite the success of experimental methods for identifying protein functions, they are often lab intensive and time consuming. Thus, it is only practical to use in silico methods for the genome-wide functional annotations. In this paper, we propose new features extracted from protein sequence only and machine learning-based methods for computational function prediction. These features are derived from a position-specific scoring matrix, which has shown great potential in other bininformatics problems. We evaluate these features using four different classifiers and yeast protein data. Our experimental results show that features derived from the position-specific scoring matrix are appropriate for automatic function annotation.
Collapse
Affiliation(s)
- Jong Cheol Jeong
- Electrical Engineering and Computer Science Department, University of Kansas, Lawrence, KS 66045, USA.
| | | | | |
Collapse
|
10
|
Cai L, Pan H, Trzciński K, Thompson CM, Wu Q, Kramnik I. MYBBP1A: a new Ipr1's binding protein in mice. Mol Biol Rep 2010; 37:3863-8. [PMID: 20221700 PMCID: PMC3084015 DOI: 10.1007/s11033-010-0042-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 02/24/2010] [Indexed: 12/16/2022]
Abstract
Infection with mycobacterium tuberculosis (MTB) can cause different outcomes in hosts with variant genetic backgrounds. Previously, we identified an intracellular pathogen resistance 1 (Ipr1) gene with the role of resistance of MTB infection in mice model. However, until now, its binding proteins have been little known even for its human homology, SP110. In this study, the homology for mouse Ipr1 in canines was found to have an extra domain structure, h.1.5.1. And 30 potential candidate proteins were predicted to bind canine Ipr1, which were characterized of the interacting structure with the h.1.5.1. Among them, MYBBP1A was verified to bind with both Ipr1 and eGFP-Ipr1 in mouse macrophage J774A.1 clone 21 cells using co-immunoprecipitation method. And with the constructed high-confidence Ipr1-involved network, we suggested that Ipr1 might be involved in apoptosis pathway via MYBBP1A.
Collapse
Affiliation(s)
- Lei Cai
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, 667 Huntington Avenue, Boston, MA 02115, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Yellaboina S, Tasneem A, Zaykin DV, Raghavachari B, Jothi R. DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res 2010; 39:D730-5. [PMID: 21113022 PMCID: PMC3013741 DOI: 10.1093/nar/gkq1229] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
DOMINE is a comprehensive collection of known and predicted domain–domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain–domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26 219 among 5140 unique Pfam domains, a 23% increase compared to 20 513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein–protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.
Collapse
Affiliation(s)
- Sailu Yellaboina
- Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | | | | | | | | |
Collapse
|
12
|
Kerrigan JJ, Xie Q, Ames RS, Lu Q. Production of protein complexes via co-expression. Protein Expr Purif 2010; 75:1-14. [PMID: 20692346 DOI: 10.1016/j.pep.2010.07.015] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2010] [Revised: 07/22/2010] [Accepted: 07/31/2010] [Indexed: 12/21/2022]
Abstract
Multi-protein complexes are involved in essentially all cellular processes. A protein's function is defined by a combination of its own properties, its interacting partners, and the stoichiometry of each. Depending on binding partners, a transcription factor can function as an activator in one instance and a repressor in another. The study of protein function or malfunction is best performed in the relevant context. While many protein complexes can be reconstituted from individual component proteins after being produced individually, many others require co-expression of their native partners in the host cells for proper folding, stability, and activity. Protein co-expression has led to the production of a variety of biological active complexes in sufficient quantities for biochemical, biophysical, structural studies, and high throughput screens. This article summarizes examples of such cases and discusses critical considerations in selecting co-expression partners, and strategies to achieve successful production of protein complexes.
Collapse
Affiliation(s)
- John J Kerrigan
- Biological Reagents & Assay Development, Platform Technology & Science, GlaxoSmithKline R&D, 1250 South Collegeville Road, Collegeville, PA 19426, USA
| | | | | | | |
Collapse
|
13
|
Liu Y, Tozeren A. Modular composition predicts kinase/substrate interactions. BMC Bioinformatics 2010; 11:349. [PMID: 20579376 PMCID: PMC2912303 DOI: 10.1186/1471-2105-11-349] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2009] [Accepted: 06/25/2010] [Indexed: 01/14/2023] Open
Abstract
Background Phosphorylation events direct the flow of signals and metabolites along cellular protein networks. Current annotations of kinase-substrate binding events are far from complete. In this study, we scanned the entire human protein sequences using the PROSITE domain annotation tool to identify patterns of domain composition in kinases and their substrates. We identified statistically enriched pairs of strings of domains (signature pairs) in kinase-substrate couples presented in the 2006 version of the PTM database. Results The signature pairs enriched in kinase - substrate binding interactions turned out to be highly specific to kinase subtypes. The resulting list of signature pairs predicted kinase-substrate interactions in validation dataset not used in learning with high statistical accuracy. Conclusions The method presented here produces predictions of protein phosphorylation events with high accuracy and mid-level coverage. Our method can be used in expanding the currently available drafts of cell signaling pathways and thus will be an important tool in the development of combination drug therapies targeting complex diseases.
Collapse
Affiliation(s)
- Yichuan Liu
- Center for Integrated Bioinformatics, Drexel University, Philadelphia, PA 19104, USA
| | | |
Collapse
|