1
|
Rasti S, Vogiatzis C. A survey of computational methods in protein–protein interaction networks. ANNALS OF OPERATIONS RESEARCH 2019; 276:35-87. [DOI: 10.1007/s10479-018-2956-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2
|
Functional characterization of human genes from exon expression and RNA interference results. Methods Mol Biol 2013; 910:33-53. [PMID: 22821591 DOI: 10.1007/978-1-61779-965-5_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Complex biological systems comprise a large number of interacting molecules. The identification and detailed characterization of the functions of the involved genes and proteins are crucial for modeling and understanding such systems. To interrogate the various cellular processes, high-throughput techniques such as the Affymetrix Exon Array or RNA interference (RNAi) screens are powerful experimental approaches for functional genomics. However, they typically yield long gene lists that require computational methods to further analyze and functionally annotate the experimental results and to gain more insight into important molecular interactions. Here, we focus on bioinformatics software tools for the functional interpretation of exon expression data to discover alternative splicing events and their impact on gene and protein architecture, molecular networks, and pathways. We additionally demonstrate how to explore large lists of candidate genes as they also result from RNAi screens. In particular, our exemplary application studies show how to analyze the function of human genes that play a major role in human stem cells or viral infections.
Collapse
|
3
|
Mazandu GK, Mulder NJ. Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci 2012; 13:7283-7302. [PMID: 22837694 PMCID: PMC3397526 DOI: 10.3390/ijms13067283] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 05/28/2012] [Accepted: 06/07/2012] [Indexed: 11/16/2022] Open
Abstract
High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled "unknown", "uncharacterized" or "hypothetical", limiting our understanding of virulence and pathogenicity of these organisms. Even though biological functions of proteins encoded by these genes are not known, many of them have been predicted to be involved in key processes in these organisms. In particular, for Mycobacterium tuberculosis, some of these "hypothetical" proteins, for example those belonging to the Pro-Glu or Pro-Pro-Glu (PE/PPE) family, have been suspected to play a crucial role in the intracellular lifestyle of this pathogen, and may contribute to its survival in different environments. We have generated a functional interaction network for Mycobacterium tuberculosis proteins and used this to predict functions for many of its hypothetical proteins. Here we performed functional enrichment analysis of these proteins based on their predicted biological functions to identify annotations that are statistically relevant, and analysed and compared network properties of hypothetical proteins to the known proteins. From the statistically significant annotations and network information, we have tried to derive biologically meaningful annotations related to infection and disease. This quantitative analysis provides an overview of the functional contributions of Mycobacterium tuberculosis "hypothetical" proteins to many basic cellular functions, including its adaptability in the host system and its ability to evade the host immune response.
Collapse
Affiliation(s)
| | - Nicola J. Mulder
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +27-21-406-6058; Fax: +27-21-406-6068
| |
Collapse
|
4
|
Zhang YN, Pan XY, Huang Y, Shen HB. Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. J Theor Biol 2011; 283:44-52. [PMID: 21635901 DOI: 10.1016/j.jtbi.2011.05.023] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 04/20/2011] [Accepted: 05/16/2011] [Indexed: 12/11/2022]
Abstract
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | |
Collapse
|
5
|
Abstract
Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46 900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
Collapse
Affiliation(s)
- Qibin Luo
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | |
Collapse
|
6
|
Pan XY, Zhang YN, Shen HB. Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. J Proteome Res 2010; 9:4992-5001. [PMID: 20698572 DOI: 10.1021/pr100618t] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interaction (PPI) is at the core of the entire interactomic system of any living organism. Although there are many human protein-protein interaction links being experimentally determined, the number is still relatively very few compared to the estimation that there are ∼300,000 protein-protein interactions in human beings. Hence, it is still urgent and challenging to develop automated computational methods to accurately and efficiently predict protein-protein interactions. In this paper, we propose a novel hierarchical LDA-RF (latent dirichlet allocation-random forest) model to predict human protein-protein interactions from protein primary sequences directly, which is featured by a high success rate and strong ability for handling large-scale data sets by digging the hidden internal structures buried into the noisy amino acid sequences in low dimensional latent semantic space. First, the local sequential features represented by conjoint triads are constructed from sequences. Then the generative LDA model is used to project the original feature space into the latent semantic space to obtain low dimensional latent topic features, which reflect the hidden structures between proteins. Finally, the powerful random forest model is used to predict the probability for interaction of two proteins. Our results show that the proposed latent topic feature is very promising for PPI prediction and could also become a powerful strategy to deal with many other bioinformatics problems. As a web server, LDA-RF is freely available at http://www.csbio.sjtu.edu.cn/bioinf/LR_PPI for academic use.
Collapse
Affiliation(s)
- Xiao-Yong Pan
- Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, Shanghai, China
| | | | | |
Collapse
|
7
|
Zhang S, Chen H, Liu K, Sun Z. Inferring protein function by domain context similarities in protein-protein interaction networks. BMC Bioinformatics 2009; 10:395. [PMID: 19954509 PMCID: PMC2793267 DOI: 10.1186/1471-2105-10-395] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2009] [Accepted: 12/02/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome sequencing projects generate massive amounts of sequence data but there are still many proteins whose functions remain unknown. The availability of large scale protein-protein interaction data sets makes it possible to develop new function prediction methods based on protein-protein interaction (PPI) networks. Although several existing methods combine multiple information resources, there is no study that integrates protein domain information and PPI networks to predict protein functions. RESULTS The domain context similarity can be a useful index to predict protein function similarity. The prediction accuracy of our method in yeast is between 63%-67%, which outperforms the other methods in terms of ROC curves. CONCLUSION This paper presents a novel protein function prediction method that combines protein domain composition information and PPI networks. Performance evaluations show that this method outperforms existing methods.
Collapse
Affiliation(s)
- Song Zhang
- MOE Key Laboratory of Bioinformatics, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing, PR China.
| | | | | | | |
Collapse
|
8
|
Björkholm P, Sonnhammer ELL. Comparative analysis and unification of domain–domain interaction networks. Bioinformatics 2009; 25:3020-5. [DOI: 10.1093/bioinformatics/btp522] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
9
|
Blankenburg H, Finn RD, Prlić A, Jenkinson AM, Ramírez F, Emig D, Schelhorn SE, Büch J, Lengauer T, Albrecht M. DASMI: exchanging, annotating and assessing molecular interaction data. ACTA ACUST UNITED AC 2009; 25:1321-8. [PMID: 19420069 PMCID: PMC2677739 DOI: 10.1093/bioinformatics/btp142] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet. RESULTS We introduce the DASMI system for the dynamic exchange, annotation and assessment of molecular interaction data. DASMI is based on the widely used Distributed Annotation System (DAS) and consists of a data exchange specification, web servers for providing the interaction data and clients for data integration and visualization. The decentralized architecture of DASMI affords the online retrieval of the most recent data from distributed sources and databases. DASMI can also be extended easily by adding new data sources and clients. We describe all DASMI components and demonstrate their use for protein and domain interactions. AVAILABILITY The DASMI tools are available at http://www.dasmi.de/ and http://ipfam.sanger.ac.uk/graph. The DAS registry and the DAS 1.53E specification is found at http://www.dasregistry.org/.
Collapse
Affiliation(s)
- Hagen Blankenburg
- Max Planck Institute for Informatics, Campus E 1.4, 66123 Saarbrücken, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Blankenburg H, Ramírez F, Büch J, Albrecht M. DASMIweb: online integration, analysis and assessment of distributed protein interaction data. Nucleic Acids Res 2009; 37:W122-8. [PMID: 19502495 PMCID: PMC2703953 DOI: 10.1093/nar/gkp438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
In recent years, we have witnessed a substantial increase of the amount of available protein interaction data. However, most data are currently not readily accessible to the biologist at a single site, but scattered over multiple online repositories. Therefore, we have developed the DASMIweb server that affords the integration, analysis and qualitative assessment of distributed sources of interaction data in a dynamic fashion. Since DASMIweb allows for querying many different resources of protein and domain interactions simultaneously, it serves as an important starting point for interactome studies and assists the user in finding publicly accessible interaction data with minimal effort. The pool of queried resources is fully configurable and supports the inclusion of own interaction data or confidence scores. In particular, DASMIweb integrates confidence measures like functional similarity scores to assess individual interactions. The retrieved results can be exported in different file formats like MITAB or SIF. DASMIweb is freely available at http://www.dasmiweb.de.
Collapse
Affiliation(s)
- Hagen Blankenburg
- Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany.
| | | | | | | |
Collapse
|
11
|
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A. Protein function annotation by homology-based inference. Genome Biol 2009; 10:207. [PMID: 19226439 PMCID: PMC2688287 DOI: 10.1186/gb-2009-10-2-207] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Where information on homologous proteins is available,
progress is being made in automated prediction of protein function
from sequence and structure. With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Collapse
Affiliation(s)
- Yaniv Loewenstein
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Sudarsky Center, Jerusalem 91904, Israel
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Schelhorn SE, Lengauer T, Albrecht M. An integrative approach for predicting interactions of protein regions. ACTA ACUST UNITED AC 2008; 24:i35-41. [PMID: 18689837 DOI: 10.1093/bioinformatics/btn290] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Protein-protein interactions are commonly mediated by the physical contact of distinct protein regions. Computational identification of interacting protein regions aids in the detailed understanding of protein networks and supports the prediction of novel protein interactions and the reconstruction of protein complexes. RESULTS We introduce an integrative approach for predicting protein region interactions using a probabilistic model fitted to an observed protein network. In particular, we consider globular domains, short linear motifs and coiled-coil regions as potential protein-binding regions. Possible cooperations between multiple regions within the same protein are taken into account. A.negrained confidence system allows for varying the impact of specific protein interactions and region annotations on the modeling process. We apply our prediction approach to a large training set using a maximum likelihood method, compare different scoring functions for region interactions and validate the predicted interactions against a collection of experimentally observed interactions. In addition, we analyze prediction performance with respect to the inclusion of different region types, the incorporation of confidence values for training data and the utilization of predicted protein interactions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
13
|
Protein-protein interactions: analysis and prediction. MODERN GENOME ANNOTATION 2008. [PMCID: PMC7120725 DOI: 10.1007/978-3-211-75123-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Proteins represent the tools and appliances of the cell — they assemble into larger structural elements, catalyze the biochemical reactions of metabolism, transmit signals, move cargo across membrane boundaries and carry out many other tasks. For most of these functions proteins cannot act in isolation but require close cooperation with other proteins to accomplish their task. Often, this collaborative action implies physical interaction of the proteins involved. Accordingly, experimental detection, in silico prediction and computational analysis of protein-protein interactions (PPI) have attracted great attention in the quest for discovering functional links among proteins and deciphering the complex networks of the cell.
Collapse
|