1
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
2
|
Jha K, Saha S, Karmakar S. Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3215-3225. [PMID: 37027644 DOI: 10.1109/tcbb.2023.3248797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The knowledge of protein-protein interaction (PPI) helps us to understand proteins' functions, the causes and growth of several diseases, and can aid in designing new drugs. The majority of existing PPI research has relied mainly on sequence-based approaches. With the availability of multi-omics datasets (sequence, 3D structure) and advancements in deep learning techniques, it is feasible to develop a deep multi-modal framework that fuses the features learned from different sources of information to predict PPI. In this work, we propose a multi-modal approach utilizing protein sequence and 3D structure. To extract features from the 3D structure of proteins, we use a pre-trained vision transformer model that has been fine-tuned on the structural representation of proteins. The protein sequence is encoded into a feature vector using a pre-trained language model. The feature vectors extracted from the two modalities are fused and then fed to the neural network classifier to predict the protein interactions. To showcase the effectiveness of the proposed methodology, we conduct experiments on two popular PPI datasets, namely, the human dataset and the S. cerevisiae dataset. Our approach outperforms the existing methodologies to predict PPI, including multi-modal approaches. We also evaluate the contributions of each modality by designing uni-modal baselines. We perform experiments with three modalities as well, having gene ontology as the third modality.
Collapse
|
3
|
Mohseni Behbahani Y, Saighi P, Corsi F, Laine E, Carbone A. LEVELNET to visualize, explore, and compare protein-protein interaction networks. Proteomics 2023; 23:e2200159. [PMID: 37403279 DOI: 10.1002/pmic.202200159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 04/27/2023] [Accepted: 04/28/2023] [Indexed: 07/06/2023]
Abstract
Physical interactions between proteins are central to all biological processes. Yet, the current knowledge of who interacts with whom in the cell and in what manner relies on partial, noisy, and highly heterogeneous data. Thus, there is a need for methods comprehensively describing and organizing such data. LEVELNET is a versatile and interactive tool for visualizing, exploring, and comparing protein-protein interaction (PPI) networks inferred from different types of evidence. LEVELNET helps to break down the complexity of PPI networks by representing them as multi-layered graphs and by facilitating the direct comparison of their subnetworks toward biological interpretation. It focuses primarily on the protein chains whose 3D structures are available in the Protein Data Bank. We showcase some potential applications, such as investigating the structural evidence supporting PPIs associated to specific biological processes, assessing the co-localization of interaction partners, comparing the PPI networks obtained through computational experiments versus homology transfer, and creating PPI benchmarks with desired properties.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Paul Saighi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Flavia Corsi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| |
Collapse
|
4
|
Jha K, Karmakar S, Saha S. Graph-BERT and language model-based framework for protein-protein interaction identification. Sci Rep 2023; 13:5663. [PMID: 37024543 PMCID: PMC10079975 DOI: 10.1038/s41598-023-31612-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023] Open
Abstract
Identification of protein-protein interactions (PPI) is among the critical problems in the domain of bioinformatics. Previous studies have utilized different AI-based models for PPI classification with advances in artificial intelligence (AI) techniques. The input to these models is the features extracted from different sources of protein information, mainly sequence-derived features. In this work, we present an AI-based PPI identification model utilizing a PPI network and protein sequences. The PPI network is represented as a graph where each node is a protein pair, and an edge is defined between two nodes if there exists a common protein between these nodes. Each node in a graph has a feature vector. In this work, we have used the language model to extract feature vectors directly from protein sequences. The feature vectors for protein in pairs are concatenated and used as a node feature vector of a PPI network graph. Finally, we have used the Graph-BERT model to encode the PPI network graph with sequence-based features and learn the hidden representation of the feature vector for each node. The next step involves feeding the learned representations of nodes to the fully connected layer, the output of which is fed into the softmax layer to classify the protein interactions. To assess the efficacy of the proposed PPI model, we have performed experiments on several PPI datasets. The experimental results demonstrate that the proposed approach surpasses the existing PPI works and designed baselines in classifying PPI.
Collapse
Affiliation(s)
- Kanchan Jha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India.
| | - Sourav Karmakar
- Department of Computer Science and Engineering, National Institute of Technology Durgapur, Durgapur, West Bengal, 713209, India
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India
| |
Collapse
|
5
|
Jha K, Saha S. Analyzing Effect of Multi-Modality in Predicting Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:162-173. [PMID: 35259112 DOI: 10.1109/tcbb.2022.3157531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Nowadays, multiple sources of information about proteins are available such as protein sequences, 3D structures, Gene Ontology (GO), etc. Most of the works on protein-protein interaction (PPI) identification had utilized these information about proteins, mainly sequence-based, but individually. The new advances in deep learning techniques allow us to leverage multiple sources/modalities of proteins, which complement each other. Some recent works have shown that multi-modal PPI models perform better than uni-modal approaches. This paper aims to investigate whether the performance of multi-modal PPI models is always consistent or depends on other factors such as dataset distribution, algorithms used to learn features, etc. We have used three modalities for this study: Protein sequence, 3D structure, and GO. Various techniques, including deep learning algorithms, are employed to extract features from multiple sources of proteins. These feature vectors from different modalities are then integrated in several combinations (bi-modal and tri-modal) to predict PPI. To conduct this study, we have used Human and S. cerevisiae PPI datasets. The obtained results demonstrate the potentiality of a multi-modal approach and deep learning techniques in predicting protein interactions. However, the predictive capability of a model for PPI depends on feature extraction methods as well. Also, increasing the modality does not always ensure performance improvement. In this study, the PPI model integrating two modalities outperforms the designed uni-modal and tri-modal PPI models.
Collapse
|
6
|
Shang L, Zhang Y, Liu Y, Jin C, Yuan Y, Tian C, Ni M, Bo X, Zhang L, Li D, He F, Wang J. A Yeast BiFC-seq Method for Genome-wide Interactome Mapping. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:795-807. [PMID: 34314873 PMCID: PMC9880813 DOI: 10.1016/j.gpb.2021.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 12/14/2020] [Accepted: 03/10/2021] [Indexed: 01/31/2023]
Abstract
Genome-wide physical protein-protein interaction (PPI) mapping remains a major challenge for current technologies. Here, we reported a high-efficiency BiFC-seq method, yeast-enhanced green fluorescent protein-based bimolecular fluorescence complementation (yEGFP-BiFC) coupled with next-generation DNA sequencing, for interactome mapping. We first applied yEGFP-BiFC method to systematically investigate an intraviral network of the Ebola virus. Two-thirds (9/14) of known interactions of EBOV were recaptured, and five novel interactions were discovered. Next, we used the BiFC-seq method to map the interactome of the tumor protein p53. We identified 97 interactors of p53, more than three-quarters of which were novel. Furthermore, in a more complex background, we screened potential interactors by pooling two BiFC libraries together and revealed a network of 229 interactions among 205 proteins. These results show that BiFC-seq is a highly sensitive, rapid, and economical method for genome-wide interactome mapping.
Collapse
Affiliation(s)
- Limin Shang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yuehui Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yuchen Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Chaozhi Jin
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yanzhi Yuan
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Chunyan Tian
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Ming Ni
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Li Zhang
- Department of Rehabilitation Medicine, Nan Lou; Key Laboratory of Wound Repair and Regeneration of PLA, College of Life Sciences, Chinese PLA General Hospital, Beijing 100853, China
| | - Dong Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
| | - Jian Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China; School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China.
| |
Collapse
|
7
|
Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Commun Biol 2022; 5:652. [PMID: 35780196 PMCID: PMC9250521 DOI: 10.1038/s42003-022-03617-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/21/2022] [Indexed: 12/02/2022] Open
Abstract
Predicting protein–protein interaction and non-interaction are two important different aspects of multi-body structure predictions, which provide vital information about protein function. Some computational methods have recently been developed to complement experimental methods, but still cannot effectively detect real non-interacting protein pairs. We proposed a gene sequence-based method, named NVDT (Natural Vector combine with Dinucleotide and Triplet nucleotide), for the prediction of interaction and non-interaction. For protein–protein non-interactions (PPNIs), the proposed method obtained accuracies of 86.23% for Homo sapiens and 85.34% for Mus musculus, and it performed well on three types of non-interaction networks. For protein-protein interactions (PPIs), we obtained accuracies of 99.20, 94.94, 98.56, 95.41, and 94.83% for Saccharomyces cerevisiae, Drosophila melanogaster, Helicobacter pylori, Homo sapiens, and Mus musculus, respectively. Furthermore, NVDT outperformed established sequence-based methods and demonstrated high prediction results for cross-species interactions. NVDT is expected to be an effective approach for predicting PPIs and PPNIs. Protein-protein non-interactions and interactions are distinguished and predicted by gene sequence using single nucleotide and contiguous nucleotides combined with machine learning models.
Collapse
|
8
|
Jha K, Saha S, Singh H. Prediction of protein-protein interaction using graph neural networks. Sci Rep 2022; 12:8360. [PMID: 35589837 PMCID: PMC9120162 DOI: 10.1038/s41598-022-12201-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 04/18/2022] [Indexed: 01/09/2023] Open
Abstract
Proteins are the essential biological macromolecules required to perform nearly all biological processes, and cellular functions. Proteins rarely carry out their tasks in isolation but interact with other proteins (known as protein-protein interaction) present in their surroundings to complete biological activities. The knowledge of protein-protein interactions (PPIs) unravels the cellular behavior and its functionality. The computational methods automate the prediction of PPI and are less expensive than experimental methods in terms of resources and time. So far, most of the works on PPI have mainly focused on sequence information. Here, we use graph convolutional network (GCN) and graph attention network (GAT) to predict the interaction between proteins by utilizing protein's structural information and sequence features. We build the graphs of proteins from their PDB files, which contain 3D coordinates of atoms. The protein graph represents the amino acid network, also known as residue contact network, where each node is a residue. Two nodes are connected if they have a pair of atoms (one from each node) within the threshold distance. To extract the node/residue features, we use the protein language model. The input to the language model is the protein sequence, and the output is the feature vector for each amino acid of the underlying sequence. We validate the predictive capability of the proposed graph-based approach on two PPI datasets: Human and S. cerevisiae. Obtained results demonstrate the effectiveness of the proposed approach as it outperforms the previous leading methods. The source code for training and data to train the model are available at https://github.com/JhaKanchan15/PPI_GNN.git .
Collapse
Affiliation(s)
- Kanchan Jha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India.
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, Bihar, 801103, India
| | - Hiteshi Singh
- Department of Electrical Engineering, Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, 342030, India
| |
Collapse
|
9
|
Muscolino A, Di Maria A, Rapicavoli RV, Alaimo S, Bellomo L, Billeci F, Borzì S, Ferragina P, Ferro A, Pulvirenti A. NETME: on-the-fly knowledge network construction from biomedical literature. APPLIED NETWORK SCIENCE 2022; 7:1. [PMID: 35013714 PMCID: PMC8733431 DOI: 10.1007/s41109-021-00435-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/21/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND The rapidly increasing biological literature is a key resource to automatically extract and gain knowledge concerning biological elements and their relations. Knowledge Networks are helpful tools in the context of biological knowledge discovery and modeling. RESULTS We introduce a novel system called NETME, which, starting from a set of full-texts obtained from PubMed, through an easy-to-use web interface, interactively extracts biological elements from ontological databases and then synthesizes a network inferring relations among such elements. The results clearly show that our tool is capable of inferring comprehensive and reliable biological networks. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s41109-021-00435-x.
Collapse
Affiliation(s)
| | - Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | | | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Lorenzo Bellomo
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Fabrizio Billeci
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Stefano Borzì
- Department of Maths and Computer Science, University of Catania, Catania, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| |
Collapse
|
10
|
Hu X, Feng C, Ling T, Chen M. Deep learning frameworks for protein–protein interaction prediction. Comput Struct Biotechnol J 2022; 20:3223-3233. [PMID: 35832624 PMCID: PMC9249595 DOI: 10.1016/j.csbj.2022.06.025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/27/2022] [Accepted: 06/12/2022] [Indexed: 11/26/2022] Open
|
11
|
Khazen G, Gyulkhandanian A, Issa T, Maroun RC. Getting to know each other: PPIMem, a novel approach for predicting transmembrane protein-protein complexes. Comput Struct Biotechnol J 2021; 19:5184-5197. [PMID: 34630938 PMCID: PMC8476896 DOI: 10.1016/j.csbj.2021.09.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/23/2021] [Accepted: 09/12/2021] [Indexed: 02/03/2023] Open
Abstract
Because of their considerable number and diversity, membrane proteins and their macromolecular complexes represent the functional units of cells. Their quaternary structure may be stabilized by interactions between the α-helices of different proteins in the hydrophobic region of the cell membrane. Membrane proteins equally represent potential pharmacological targets par excellence for various diseases. Unfortunately, their experimental 3D structure and that of their complexes with other intramembrane protein partners are scarce due to technical difficulties. To overcome this key problem, we devised PPIMem, a computational approach for the specific prediction of higher-order structures of α-helical transmembrane proteins. The novel approach involves proper identification of the amino acid residues at the interface of molecular complexes with a 3D structure. The identified residues compose then nonlinear interaction motifs that are conveniently expressed as mathematical regular expressions. These are efficiently implemented for motif search in amino acid sequence databases, and for the accurate prediction of intramembrane protein-protein complexes. Our template interface-based approach predicted 21,544 binary complexes between 1,504 eukaryotic plasma membrane proteins across 39 species. We compare our predictions to experimental datasets of protein-protein interactions as a first validation method. The online database that results from the PPIMem algorithm with the annotated predicted interactions are implemented as a web server and can be accessed directly at https://transint.univ-evry.fr.
Collapse
Affiliation(s)
- Georges Khazen
- Computer Science and Mathematics Department, Lebanese American University, Byblos, Lebanon
| | - Aram Gyulkhandanian
- Inserm U1204/Université d'Evry/Université Paris-Saclay, Structure-Activité des Biomolécules Normales et Pathologiques, 91025 Evry, France
| | - Tina Issa
- Computer Science and Mathematics Department, Lebanese American University, Byblos, Lebanon
| | - Rachid C Maroun
- Inserm U1204/Université d'Evry/Université Paris-Saclay, Structure-Activité des Biomolécules Normales et Pathologiques, 91025 Evry, France
| |
Collapse
|
12
|
Mukherjee I, Chakrabarti S. Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes. Comput Struct Biotechnol J 2021; 19:3779-3795. [PMID: 34285778 PMCID: PMC8271121 DOI: 10.1016/j.csbj.2021.06.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022] Open
Abstract
Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggested that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.
Collapse
Affiliation(s)
- Ishita Mukherjee
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| |
Collapse
|
13
|
Das S, Chakrabarti S. Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci Rep 2021; 11:1761. [PMID: 33469042 PMCID: PMC7815773 DOI: 10.1038/s41598-020-80900-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 12/15/2020] [Indexed: 01/29/2023] Open
Abstract
Structural insight of the protein-protein interaction (PPI) interface can provide knowledge about the kinetics, thermodynamics and molecular functions of the complex while elucidating its role in diseases and further enabling it as a potential therapeutic target. However, owing to experimental lag in solving protein-protein complex structures, three-dimensional (3D) knowledge of the PPI interfaces can be gained via computational approaches like molecular docking and post-docking analyses. Despite development of numerous docking tools and techniques, success in identification of native like interfaces based on docking score functions is limited. Hence, we employed an in-depth investigation of the structural features of the interface that might successfully delineate native complexes from non-native ones. We identify interface properties, which show statistically significant difference between native and non-native interfaces belonging to homo and hetero, protein-protein complexes. Utilizing these properties, a support vector machine (SVM) based classification scheme has been implemented to differentiate native and non-native like complexes generated using docking decoys. Benchmarking and comparative analyses suggest very good performance of our SVM classifiers. Further, protein interactions, which are proven via experimental findings but not resolved structurally, were subjected to this approach where 3D-models of the complexes were generated and most likely interfaces were predicted. A web server called Protein Complex Prediction by Interface Properties (PCPIP) is developed to predict whether interface of a given protein-protein dimer complex resembles known protein interfaces. The server is freely available at http://www.hpppi.iicb.res.in/pcpip/ .
Collapse
Affiliation(s)
- Subhrangshu Das
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| | - Saikat Chakrabarti
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| |
Collapse
|
14
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
15
|
Jané P, Gógl G, Kostmann C, Bich G, Girault V, Caillet-Saguy C, Eberling P, Vincentelli R, Wolff N, Travé G, Nominé Y. Interactomic affinity profiling by holdup assay: Acetylation and distal residues impact the PDZome-binding specificity of PTEN phosphatase. PLoS One 2020; 15:e0244613. [PMID: 33382810 PMCID: PMC7774954 DOI: 10.1371/journal.pone.0244613] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 12/12/2020] [Indexed: 12/15/2022] Open
Abstract
Protein domains often recognize short linear protein motifs composed of a core conserved consensus sequence surrounded by less critical, modulatory positions. PTEN, a lipid phosphatase involved in phosphatidylinositol 3-kinase (PI3K) pathway, contains such a short motif located at the extreme C-terminus capable to recognize PDZ domains. It has been shown that the acetylation of this motif could modulate the interaction with several PDZ domains. Here we used an accurate experimental approach combining high-throughput holdup chromatographic assay and competitive fluorescence polarization technique to measure quantitative binding affinity profiles of the PDZ domain-binding motif (PBM) of PTEN. We substantially extended the previous knowledge towards the 266 known human PDZ domains, generating the full PDZome-binding profile of the PTEN PBM. We confirmed that inclusion of N-terminal flanking residues, acetylation or mutation of a lysine at a modulatory position significantly altered the PDZome-binding profile. A numerical specificity index is also introduced as an attempt to quantify the specificity of a given PBM over the complete PDZome. Our results highlight the impact of modulatory residues and post-translational modifications on PBM interactomes and their specificity.
Collapse
Affiliation(s)
- Pau Jané
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Gergő Gógl
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Camille Kostmann
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Goran Bich
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Virginie Girault
- Unité Récepteurs-canaux, Institut Pasteur, UMR 3571/CNRS, Paris, France
| | | | - Pascal Eberling
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Renaud Vincentelli
- Architecture et Fonction des Macromolécules Biologiques (AFMB), CNRS/Aix-Marseille Université, Marseille, France
| | - Nicolas Wolff
- Unité Récepteurs-canaux, Institut Pasteur, UMR 3571/CNRS, Paris, France
| | - Gilles Travé
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| | - Yves Nominé
- (Equipe labelisée Ligue, 2015) Department of Integrative Structural Biology, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), INSERM U1258/CNRS UMR 7104/Université de Strasbourg, Illkirch, France
| |
Collapse
|
16
|
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction. Sci Rep 2020; 10:19171. [PMID: 33154416 PMCID: PMC7645622 DOI: 10.1038/s41598-020-75467-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 09/17/2020] [Indexed: 11/08/2022] Open
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
Collapse
|
17
|
Zhan XK, You ZH, Li LP, Li Y, Wang Z, Pan J. Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence. Evol Bioinform Online 2020; 16:1176934320934498. [PMID: 32655275 PMCID: PMC7328357 DOI: 10.1177/1176934320934498] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 05/20/2020] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in the life cycles of
living cells. Thus, it is important to understand the underlying mechanisms of
PPIs. Although many high-throughput technologies have generated large amounts of
PPI data in different organisms, the experiments for detecting PPIs are still
costly and time-consuming. Therefore, novel computational methods are urgently
needed for predicting PPIs. For this reason, developing a new computational
method for predicting PPIs is drawing more and more attention. In this study, we
proposed a novel computational method based on texture feature of protein
sequence for predicting PPIs. Especially, the Gabor feature is used to extract
texture feature and protein evolutionary information from Position-Specific
Scoring Matrix, which is generated by Position-Specific Iterated Basic Local
Alignment Search Tool. Then, random forest–based classifiers are used to infer
the protein interactions. When performed on PPI data sets of yeast,
human, and Helicobacter pylori, we obtained good
results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To
better evaluate the proposed method, we compared Gabor feature, Discrete Cosine
Transform, and Local Phase Quantization. Our results show that the proposed
method is both feasible and stable and the Gabor feature descriptor is reliable
in extracting protein sequence information. Furthermore, additional experiments
have been conducted to predict PPIs of other 4 species data sets. The promising
results indicate that our proposed method is both powerful and robust.
Collapse
Affiliation(s)
- Xin-Ke Zhan
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Yang Li
- School of Information Engineering, Xijing University, Xi'an, China
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi'an, China
| |
Collapse
|
18
|
Armean IM, Lilley KS, Trotter MWB, Pilkington NCV, Holden SB. Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation. Bioinformatics 2019; 34:1884-1892. [PMID: 29390084 PMCID: PMC5972588 DOI: 10.1093/bioinformatics/btx803] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 01/29/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation Protein–protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. Results PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi—a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. Availability and implementation https://github.com/ima23/maxent-ppi Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Irina M Armean
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge CB2 1GA, UK
| | - Kathryn S Lilley
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge CB2 1GA, UK
| | - Matthew W B Trotter
- Celegene Institute for Translational Research Europe (CITRE), Sevilla 41092, Spain
| | - Nicholas C V Pilkington
- Department of Computer Science, Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, UK
| | - Sean B Holden
- Department of Computer Science, Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, UK
| |
Collapse
|
19
|
Bischof J, Duffraisse M, Furger E, Ajuria L, Giraud G, Vanderperre S, Paul R, Björklund M, Ahr D, Ahmed AW, Spinelli L, Brun C, Basler K, Merabet S. Generation of a versatile BiFC ORFeome library for analyzing protein-protein interactions in live Drosophila. eLife 2018; 7:38853. [PMID: 30247122 PMCID: PMC6177257 DOI: 10.7554/elife.38853] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 09/18/2018] [Indexed: 11/24/2022] Open
Abstract
Transcription factors achieve specificity by establishing intricate interaction networks that will change depending on the cell context. Capturing these interactions in live condition is however a challenging issue that requires sensitive and non-invasive methods. We present a set of fly lines, called ‘multicolor BiFC library’, which covers most of the Drosophila transcription factors for performing Bimolecular Fluorescence Complementation (BiFC). The multicolor BiFC library can be used to probe two different binary interactions simultaneously and is compatible for large-scale interaction screens. The library can also be coupled with established Drosophila genetic resources to analyze interactions in the developmentally relevant expression domain of each protein partner. We provide proof of principle experiments of these various applications, using Hox proteins in the live Drosophila embryo as a case study. Overall this novel collection of ready-to-use fly lines constitutes an unprecedented genetic toolbox for the identification and analysis of protein-protein interactions in vivo.
Collapse
Affiliation(s)
- Johannes Bischof
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | | | - Edy Furger
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | | | | | | | | | - Mikael Björklund
- Zhejiang University-University of Edinburgh Institute, Zhejiang University, Haining, China
| | | | | | | | - Christine Brun
- INSERM, Aix-Marseille Université, Marseille, France.,TAGC, Centre National de la Recherche Scientifique, Marseille, France
| | - Konrad Basler
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | | |
Collapse
|
20
|
Kalmykova SD, Arapidi GP, Urban AS, Osetrova MS, Gordeeva VD, Ivanov VT, Govorun VM. In Silico Analysis of Peptide Potential Biological Functions. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2018. [DOI: 10.1134/s106816201804009x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Tran L, Hamp T, Rost B. ProfPPIdb: Pairs of physical protein-protein interactions predicted for entire proteomes. PLoS One 2018; 13:e0199988. [PMID: 30020956 PMCID: PMC6051629 DOI: 10.1371/journal.pone.0199988] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 06/17/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods. RESULTS We extracted reliable experimental data about which proteins interact (binary) for eight diverse model organisms from public databases, namely from Escherichia coli, Schizosaccharomyces pombe, Plasmodium falciparum, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, and for the previously used Homo sapiens and Saccharomyces cerevisiae. Those data were the base to develop a PPI prediction method for each model organism. The method used evolutionary information through a profile-kernel Support Vector Machine (SVM). With the resulting eight models, we predicted all possible protein pairs in each organism and made the top predictions available through a web application. Almost all of the PPIs made available were predicted between proteins that have not been observed in any interaction, in particular for less well-studied organisms. Thus, our work complements existing resources and is particularly helpful for designing experiments because of its uniqueness. Experimental annotations and computational predictions are strongly influenced by the fact that some proteins have many partners and others few. To optimize machine learning, recent methods explicitly ignored such a network-structure and rely either on domain knowledge or sequence-only methods. Our approach is independent of domain-knowledge and leverages evolutionary information. The database interface representing our results is accessible from https://rostlab.org/services/ppipair/. The data can also be downloaded from https://figshare.com/collections/ProfPPI-DB/4141784.
Collapse
Affiliation(s)
- Linh Tran
- Imperial College London (ICL), Department of Computing, United Kingdom
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
- * E-mail:
| | - Tobias Hamp
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
| | - Burkhard Rost
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
- Technical University of Munich (TUM), Institute for Advanced Study (TUM-IAS), Lichtenbergstr, Germany
| |
Collapse
|
22
|
Hu Y, Fang Z, Yang Y, Fan T, Wang J. Analyzing the pathways enriched in genes associated with nicotine dependence in the context of human protein-protein interaction network. J Biomol Struct Dyn 2018; 37:1177-1188. [PMID: 29546796 DOI: 10.1080/07391102.2018.1453377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Nicotine dependence is the primary addictive stage of cigarette smoking. Although a lot of studies have been performed to explore the molecular mechanism underlying nicotine dependence, our understanding on this disorder is still far from complete. Over the past decades, an increasing number of candidate genes involved in nicotine dependence have been identified by different technical approaches, including the genetic association analysis. In this study, we performed a comprehensive collection of candidate genes reported to be genetically associated with nicotine dependence. Then, the biochemical pathways enriched in these genes were identified by considering the gene's propensity to be related to nicotine dependence. One of the most widely used pathway enrichment analysis approach, over-representation analysis, ignores the function non-equivalence of genes in candidate gene set and may have low discriminative power in identifying some dysfunctional pathways. To overcome such drawbacks, we constructed a comprehensive human protein-protein interaction network, and then assigned a function weighting score to each candidate gene based on their network topological features. Evaluation indicated the function weighting score scheme was consistent with available evidence. Finally, the function weighting scores of the candidate genes were incorporated into pathway analysis to identify the dysfunctional pathways involved in nicotine dependence, and the interactions between pathways was detected by pathway crosstalk analysis. Compared to conventional over-representation-based pathway analysis tool, the modified method exhibited improved discriminative power and detected some novel pathways potentially underlying nicotine dependence. In summary, we conducted a comprehensive collection of genes associated with nicotine dependence and then detected the biochemical pathways enriched in these genes using a modified pathway enrichment analysis approach with function weighting score of candidate genes integrated. Our results may provide insight into the molecular mechanism underlying nicotine dependence.
Collapse
Affiliation(s)
- Ying Hu
- a School of Biomedical Engineering , Tianjin Medical University , Tianjin 300070 , China
| | - Zhonghai Fang
- a School of Biomedical Engineering , Tianjin Medical University , Tianjin 300070 , China
| | - Yichen Yang
- a School of Biomedical Engineering , Tianjin Medical University , Tianjin 300070 , China
| | - Ting Fan
- a School of Biomedical Engineering , Tianjin Medical University , Tianjin 300070 , China
| | - Ju Wang
- a School of Biomedical Engineering , Tianjin Medical University , Tianjin 300070 , China
| |
Collapse
|
23
|
Abstract
The knowledge of protein-protein interactions (PPIs) and PPI networks (PPINs) is the key to starting to understand the biological processes inside the cell. Many computational tools have been designed to help explore PPIs and PPINs, such as those for interaction detection, reliability assessment and interaction network construction. Here, the application of computational tools is reviewed from three perspectives: PPI database construction, PPI prediction, and interaction network construction and analysis. This overview will provide researchers guidance on choosing appropriate methods for exploring PPIs.
Collapse
Affiliation(s)
- Shaowei Dong
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada
| | - Nicholas J Provart
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
24
|
Detecting pathway relationship in the context of human protein-protein interaction network and its application to Parkinson’s disease. Methods 2017; 131:93-103. [DOI: 10.1016/j.ymeth.2017.08.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 07/31/2017] [Accepted: 08/03/2017] [Indexed: 02/06/2023] Open
|
25
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
26
|
Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 2017; 18:277. [PMID: 28545462 PMCID: PMC5445391 DOI: 10.1186/s12859-017-1700-2] [Citation(s) in RCA: 190] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 05/18/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. RESULTS We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. CONCLUSIONS To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.
Collapse
Affiliation(s)
- Tanlin Sun
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Bo Zhou
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing National Laboratory for Molecular Science, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.
| |
Collapse
|
27
|
Wei L, Xing P, Zeng J, Chen J, Su R, Guo F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med 2017; 83:67-74. [PMID: 28320624 DOI: 10.1016/j.artmed.2017.03.001] [Citation(s) in RCA: 159] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 02/17/2017] [Accepted: 03/01/2017] [Indexed: 10/20/2022]
Abstract
Computational methods are employed in bioinformatics to predict protein-protein interactions (PPIs). PPIs and protein-protein non-interactions (PPNIs) display different levels of development, and the number of PPIs is considerably greater than that of PPNIs. This significant difference in the number of PPIs and PPNIs increases the cost of constructing a balanced dataset. PPIs can be classified as either physical or genetic. However, ready-made PPNI databases were proven only to have no physical interactions and were not proven to have no genetic interactions. Hence, ready-made PPNI databases contain false negative non-interactions. In this study, two PPNI datasets were artificially generated from a PPI database. In contrast to various traditional PPI feature extraction methods based on sequential information, two types of novel feature extraction methods were proposed. One is based on secondary structure information, and the other is based on the physicochemical properties of proteins. The experimental results of the RandomPairs dataset validate the efficiency and effectiveness of the proposed prediction model. These results reveal the potential of constructing a PPI negative dataset to reduce false negatives. Related datasets, tools, and source codes are accessible at http://lab.malab.cn/soft/PPIPre/PPIPre.html.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Pengwei Xing
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Jiancang Zeng
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - JinXiu Chen
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Ran Su
- School of Software, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|
28
|
You ZH, Zhou M, Luo X, Li S. Highly Efficient Framework for Predicting Interactions Between Proteins. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:731-743. [PMID: 28113829 DOI: 10.1109/tcyb.2016.2524994] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a central role in many biological processes. Although a large amount of human PPI data has been generated by high-throughput experimental techniques, they are very limited compared to the estimated 130 000 protein interactions in humans. Hence, automatic methods for human PPI-detection are highly desired. This work proposes a novel framework, i.e., Low-rank approximation-kernel Extreme Learning Machine (LELM), for detecting human PPI from a protein's primary sequences automatically. It has three main steps: 1) mapping each protein sequence into a matrix built on all kinds of adjacent amino acids; 2) applying the low-rank approximation model to the obtained matrix to solve its lowest rank representation, which reflects its true subspace structures; and 3) utilizing a powerful kernel extreme learning machine to predict the probability for PPI based on this lowest rank representation. Experimental results on a large-scale human PPI dataset demonstrate that the proposed LELM has significant advantages in accuracy and efficiency over the state-of-art approaches. Hence, this work establishes a new and effective way for the automatic detection of PPI.
Collapse
|
29
|
Launay G, Ceres N, Martin J. Non-interacting proteins may resemble interacting proteins: prevalence and implications. Sci Rep 2017; 7:40419. [PMID: 28084410 PMCID: PMC5289270 DOI: 10.1038/srep40419] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 12/07/2016] [Indexed: 12/13/2022] Open
Abstract
The vast majority of proteins do not form functional interactions in physiological conditions. We have considered several sets of protein pairs from S. cerevisiae with no functional interaction reported, denoted as non-interacting pairs, and compared their 3D structures to available experimental complexes. We identified some non-interacting pairs with significant structural similarity with experimental complexes, indicating that, even though they do not form functional interactions, they have compatible structures. We estimate that up to 8.7% of non-interacting protein pairs could have compatible structures. This number of interactions exceeds the number of functional interactions (around 0.2% of the total interactions) by a factor 40. Network analysis suggests that the interactions formed by non-interacting pairs with compatible structures could be particularly hazardous to the protein-protein interaction network. From a structural point of view, these interactions display no aberrant structural characteristics, and are even predicted as relatively stable and enriched in potential physical interactors, suggesting a major role of regulation to prevent them.
Collapse
Affiliation(s)
- Guillaume Launay
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
| | - Nicoletta Ceres
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
| | - Juliette Martin
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
| |
Collapse
|
30
|
Chen H, Shen J, Wang L, Song J. Towards Data Analytics of Pathogen-Host Protein-Protein Interaction: A Survey. 2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) 2016:377-388. [DOI: 10.1109/bigdatacongress.2016.60] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
31
|
Kumar G, Kumar R, Kumar Pal M, Gupta P, Gupta R, Mehra S. Improving extraction of protein — Protein interaction datasets from KUPS using hashing approach. 2016 INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND SYSTEMS BIOLOGY (BSB) 2016:1-4. [DOI: 10.1109/bsb.2016.7552135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
32
|
Rigid-Docking Approaches to Explore Protein-Protein Interaction Space. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 160:33-55. [PMID: 27830312 DOI: 10.1007/10_2016_41] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.
Collapse
|
33
|
Türei D, Földvári-Nagy L, Fazekas D, Módos D, Kubisch J, Kadlecsik T, Demeter A, Lenti K, Csermely P, Vellai T, Korcsmáros T. Autophagy Regulatory Network - a systems-level bioinformatics resource for studying the mechanism and regulation of autophagy. Autophagy 2015; 11:155-65. [PMID: 25635527 PMCID: PMC4502651 DOI: 10.4161/15548627.2014.994346] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Autophagy is a complex cellular process having multiple roles, depending on tissue, physiological, or pathological conditions. Major post-translational regulators of autophagy are well known, however, they have not yet been collected comprehensively. The precise and context-dependent regulation of autophagy necessitates additional regulators, including transcriptional and post-transcriptional components that are listed in various datasets. Prompted by the lack of systems-level autophagy-related information, we manually collected the literature and integrated external resources to gain a high coverage autophagy database. We developed an online resource, Autophagy Regulatory Network (ARN; http://autophagy-regulation.org), to provide an integrated and systems-level database for autophagy research. ARN contains manually curated, imported, and predicted interactions of autophagy components (1,485 proteins with 4,013 interactions) in humans. We listed 413 transcription factors and 386 miRNAs that could regulate autophagy components or their protein regulators. We also connected the above-mentioned autophagy components and regulators with signaling pathways from the SignaLink 2 resource. The user-friendly website of ARN allows researchers without computational background to search, browse, and download the database. The database can be downloaded in SQL, CSV, BioPAX, SBML, PSI-MI, and in a Cytoscape CYS file formats. ARN has the potential to facilitate the experimental validation of novel autophagy components and regulators. In addition, ARN helps the investigation of transcription factors, miRNAs and signaling pathways implicated in the control of the autophagic pathway. The list of such known and predicted regulators could be important in pharmacological attempts against cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Dénes Türei
- a Department of Genetics ; Eötvös Loránd University ; Budapest , Hungary
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Yao J, Guo H, Yang X. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction. Int J Genomics 2015; 2015:608042. [PMID: 26539460 PMCID: PMC4619929 DOI: 10.1155/2015/608042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Revised: 07/22/2015] [Accepted: 07/26/2015] [Indexed: 12/15/2022] Open
Abstract
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.
Collapse
Affiliation(s)
- Jianzhuang Yao
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - Hong Guo
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - Xiaohan Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
35
|
Trepte P, Buntru A, Klockmeier K, Willmore L, Arumughan A, Secker C, Zenkner M, Brusendorf L, Rau K, Redel A, Wanker EE. DULIP: A Dual Luminescence-Based Co-Immunoprecipitation Assay for Interactome Mapping in Mammalian Cells. J Mol Biol 2015; 427:3375-88. [PMID: 26264872 DOI: 10.1016/j.jmb.2015.08.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/31/2015] [Accepted: 08/03/2015] [Indexed: 12/30/2022]
Abstract
Mapping of protein-protein interactions (PPIs) is critical for understanding protein function and complex biological processes. Here, we present DULIP, a dual luminescence-based co-immunoprecipitation assay, for systematic PPI mapping in mammalian cells. DULIP is a second-generation luminescence-based PPI screening method for the systematic and quantitative analysis of co-immunoprecipitations using two different luciferase tags. Benchmarking studies with positive and negative PPI reference sets revealed that DULIP allows the detection of interactions with high sensitivity and specificity. Furthermore, the analysis of a PPI reference set with known binding affinities demonstrated that both low- and high-affinity interactions can be detected with DULIP assays. Finally, using the well-characterized interaction between Syntaxin-1 and Munc18, we found that DULIP is capable of detecting the effects of point mutations on interaction strength. Taken together, our studies demonstrate that DULIP is a sensitive and reliable method of great utility for systematic interactome research. It can be applied for interaction screening and validation of PPIs in mammalian cells. Moreover, DULIP permits the specific analysis of mutation-dependent binding patterns.
Collapse
Affiliation(s)
- Philipp Trepte
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Alexander Buntru
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Konrad Klockmeier
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Lindsay Willmore
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Anup Arumughan
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Christopher Secker
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Martina Zenkner
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Lydia Brusendorf
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Kirstin Rau
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Alexandra Redel
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany
| | - Erich E Wanker
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Robert-Roessle-Straße 10, 13125 Berlin, Germany.
| |
Collapse
|
36
|
You ZH, Chan KCC, Hu P. Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 2015; 10:e0125811. [PMID: 25946106 PMCID: PMC4422660 DOI: 10.1371/journal.pone.0125811] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 03/04/2015] [Indexed: 11/18/2022] Open
Abstract
The study of protein-protein interactions (PPIs) can be very important for the understanding of biological cellular functions. However, detecting PPIs in the laboratories are both time-consuming and expensive. For this reason, there has been much recent effort to develop techniques for computational prediction of PPIs as this can complement laboratory procedures and provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale. Although much progress has already been achieved in this direction, the problem is still far from being solved. More effective approaches are still required to overcome the limitations of the current ones. In this study, a novel Multi-scale Local Descriptor (MLD) feature representation scheme is proposed to extract features from a protein sequence. This scheme can capture multi-scale local information by varying the length of protein-sequence segments. Based on the MLD, an ensemble learning method, the Random Forest (RF) method, is used as classifier. The MLD feature representation scheme facilitates the mining of interaction information from multi-scale continuous amino acid segments, making it easier to capture multiple overlapping continuous binding patterns within a protein sequence. When the proposed method is tested with the PPI data of Saccharomyces cerevisiae, it achieves a prediction accuracy of 94.72% with 94.34% sensitivity at the precision of 98.91%. Extensive experiments are performed to compare our method with existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors also with the H. pylori dataset. The reason why such good results are achieved can largely be credited to the learning capabilities of the RF model and the novel MLD feature representation scheme. The experiment results show that the proposed approach can be very promising for predicting PPIs and can be a useful tool for future proteomic studies.
Collapse
Affiliation(s)
- Zhu-Hong You
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Pengwei Hu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
37
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
38
|
You ZH, Yu JZ, Zhu L, Li S, Wen ZK. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.05.072] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
39
|
Protein-protein interaction predictions using text mining methods. Methods 2014; 74:47-53. [PMID: 25448298 DOI: 10.1016/j.ymeth.2014.10.026] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 09/05/2014] [Accepted: 10/21/2014] [Indexed: 01/10/2023] Open
Abstract
It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein-protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools.
Collapse
|
40
|
Goldfarb D, Hast BE, Wang W, Major MB. Spotlite: web application and augmented algorithms for predicting co-complexed proteins from affinity purification--mass spectrometry data. J Proteome Res 2014; 13:5944-55. [PMID: 25300367 DOI: 10.1021/pr5008416] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interactions defined by affinity purification and mass spectrometry (APMS) suffer from high false discovery rates. Consequently, lists of potential interactions must be pruned of contaminants before network construction and interpretation, historically an expensive, time-intensive, and error-prone task. In recent years, numerous computational methods were developed to identify genuine interactions from the hundreds of candidates. Here, comparative analysis of three popular algorithms, HGSCore, CompPASS, and SAINT, revealed complementarity in their classification accuracies, which is supported by their divergent scoring strategies. We improved each algorithm by an average area under a receiver operating characteristics curve increase of 16% by integrating a variety of indirect data known to correlate with established protein-protein interactions, including mRNA coexpression, gene ontologies, domain-domain binding affinities, and homologous protein interactions. Each APMS scoring approach was incorporated into a separate logistic regression model along with the indirect features; the resulting three classifiers demonstrate improved performance on five diverse APMS data sets. To facilitate APMS data scoring within the scientific community, we created Spotlite, a user-friendly and fast web application. Within Spotlite, data can be scored with the augmented classifiers, annotated, and visualized ( http://cancer.unc.edu/majorlab/software.php ). The utility of the Spotlite platform to reveal physical, functional, and disease-relevant characteristics within APMS data is established through a focused analysis of the KEAP1 E3 ubiquitin ligase.
Collapse
Affiliation(s)
- Dennis Goldfarb
- Department of Computer Science, University of North Carolina at Chapel Hill , Box #3175, Chapel Hill, North Carolina 27599, United States
| | | | | | | |
Collapse
|
41
|
Qi Q, Li J, Cheng J. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods. BMC Proc 2014; 8:S5. [PMID: 25374614 PMCID: PMC4202177 DOI: 10.1186/1753-6561-8-s6-s5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods.
Collapse
Affiliation(s)
- Qi Qi
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jilong Li
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65201, USA ; Informatics Institute, University of Missouri, Columbia, MO 65201, USA
| |
Collapse
|
42
|
Large-scale protein-protein interactions detection by integrating big biosensing data with computational model. BIOMED RESEARCH INTERNATIONAL 2014; 2014:598129. [PMID: 25215285 PMCID: PMC4151593 DOI: 10.1155/2014/598129] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 07/24/2014] [Indexed: 01/12/2023]
Abstract
Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection.
Collapse
|
43
|
Murakami Y, Mizuguchi K. Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators. BMC Bioinformatics 2014; 15:213. [PMID: 24953126 PMCID: PMC4229973 DOI: 10.1186/1471-2105-15-213] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 06/17/2014] [Indexed: 02/02/2023] Open
Abstract
Background Identification of protein-protein interactions (PPIs) is essential for a better understanding of biological processes, pathways and functions. However, experimental identification of the complete set of PPIs in a cell/organism (“an interactome”) is still a difficult task. To circumvent limitations of current high-throughput experimental techniques, it is necessary to develop high-performance computational methods for predicting PPIs. Results In this article, we propose a new computational method to predict interaction between a given pair of protein sequences using features derived from known homologous PPIs. The proposed method is capable of predicting interaction between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (FSeq), (b) statistical propensities of domain pairs observed in interacting proteins (FDom) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (FNet). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order. The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC0.5% = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method. Conclusions Our results suggest that FNet, a feature representing proximity in a known PPI network between two proteins that are homologous to a target protein pair, contributes to the prediction of whether the target proteins interact or not. PSOPIA will help identify novel PPIs and estimate complete PPI networks. The method proposed in this article is freely available on the web at http://mizuguchilab.org/PSOPIA.
Collapse
Affiliation(s)
- Yoichi Murakami
- Bioinformatics Project, National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan.
| | | |
Collapse
|
44
|
Youngs N, Penfold-Brown D, Bonneau R, Shasha D. Negative example selection for protein function prediction: the NoGO database. PLoS Comput Biol 2014; 10:e1003644. [PMID: 24922051 PMCID: PMC4055410 DOI: 10.1371/journal.pcbi.1003644] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Accepted: 04/08/2014] [Indexed: 12/28/2022] Open
Abstract
Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). Many machine learning methods have been applied to the task of predicting the biological function of proteins based on a variety of available data. The majority of these methods require negative examples: proteins that are known not to perform a function, in order to achieve meaningful predictions, but negative examples are often not available. In addition, past heuristic methods for negative example selection suffer from a high error rate. Here, we rigorously compare two novel algorithms against past heuristics, as well as some algorithms adapted from a similar task in text-classification. Through this comparison, performed on several different benchmarks, we demonstrate that our algorithms make significantly fewer mistakes when predicting negative examples. We also provide a database of negative examples for general use in machine learning for protein function prediction (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).
Collapse
Affiliation(s)
- Noah Youngs
- Department of Computer Science, New York University, New York, New York, United States of America
| | - Duncan Penfold-Brown
- Social Media and Political Participation Lab, New York University, New York, New York, United States of America
| | - Richard Bonneau
- Department of Computer Science, New York University, New York, New York, United States of America
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
- * E-mail: (RB); (DS)
| | - Dennis Shasha
- Department of Computer Science, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
- * E-mail: (RB); (DS)
| |
Collapse
|
45
|
The domain landscape of virus-host interactomes. BIOMED RESEARCH INTERNATIONAL 2014; 2014:867235. [PMID: 24991570 PMCID: PMC4065681 DOI: 10.1155/2014/867235] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 03/19/2014] [Indexed: 12/31/2022]
Abstract
Viral infections result in millions of deaths in the world today. A thorough analysis of virus-host interactomes may reveal insights into viral infection and pathogenic strategies. In this study, we presented a landscape of virus-host interactomes based on protein domain interaction. Compared to the analysis at protein level, this domain-domain interactome provided a unique abstraction of protein-protein interactome. Through comparisons among DNA, RNA, and retrotranscribing viruses, we identified a core of human domains, that viruses used to hijack the cellular machinery and evade the immune system, which might be promising antiviral drug targets. We showed that viruses preferentially interacted with host hub and bottleneck domains, and the degree and betweenness centrality among three categories of viruses are significantly different. Further analysis at functional level highlighted that different viruses perturbed the host cellular molecular network by common and unique strategies. Most importantly, we creatively proposed a viral disease network among viral domains, human domains and the corresponding diseases, which uncovered several unknown virus-disease relationships that needed further verification. Overall, it is expected that the findings will help to deeply understand the viral infection and contribute to the development of antiviral therapy.
Collapse
|
46
|
Song X, Li L, Srimani PK, Yu PS, Wang JZ. Measure the Semantic Similarity of GO Terms Using Aggregate Information Content. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:468-476. [PMID: 26356015 DOI: 10.1109/tcbb.2013.176] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The rapid development of gene ontology (GO) and huge amount of biomedical data annotated by GO terms necessitate computation of semantic similarity of GO terms and, in turn, measurement of functional similarity of genes based on their annotations. In this paper we propose a novel and efficient method to measure the semantic similarity of GO terms. The proposed method addresses the limitations in existing GO term similarity measurement techniques; it computes the semantic content of a GO term by considering the information content of all of its ancestor terms in the graph. The aggregate information content (AIC) of all ancestor terms of a GO term implicitly reflects the GO term's location in the GO graph and also represents how human beings use this GO term and all its ancestor terms to annotate genes. We show that semantic similarity of GO terms obtained by our method closely matches the human perception. Extensive experimental studies show that this novel method also outperforms all existing methods in terms of the correlation with gene expression data. We have developed web services for measuring semantic similarity of GO terms and functional similarity of genes using the proposed AIC method and other popular methods. These web services are available at http://bioinformatics.clemson.edu/G-SESAME.
Collapse
|
47
|
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. ACTA ACUST UNITED AC 2014; 30:1601-8. [PMID: 24493033 PMCID: PMC4029037 DOI: 10.1093/bioinformatics/btu074] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite. Contact:gian.tartaglia@crg.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Andreas Zanzoni
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
48
|
Casado-Vela J, Fuentes M, Franco-Zorrilla JM. Screening of Protein–Protein and Protein–DNA Interactions Using Microarrays. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 95:231-81. [DOI: 10.1016/b978-0-12-800453-1.00008-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
49
|
Ohue M, Matsuzaki Y, Shimoda T, Ishida T, Akiyama Y. Highly precise protein-protein interaction prediction based on consensus between template-based and de novo docking methods. BMC Proc 2013; 7:S6. [PMID: 24564962 PMCID: PMC4044902 DOI: 10.1186/1753-6561-7-s7-s6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Elucidation of protein-protein interaction (PPI) networks is important for understanding disease mechanisms and for drug discovery. Tertiary-structure-based in silico PPI prediction methods have been developed with two typical approaches: a method based on template matching with known protein structures and a method based on de novo protein docking. However, the template-based method has a narrow applicable range because of its use of template information, and the de novo docking based method does not have good prediction performance. In addition, both of these in silico prediction methods have insufficient precision, and require validation of the predicted PPIs by biological experiments, leading to considerable expenditure; therefore, PPI prediction methods with greater precision are needed. Results We have proposed a new structure-based PPI prediction method by combining template-based prediction and de novo docking prediction. When we applied the method to the human apoptosis signaling pathway, we obtained a precision value of 0.333, which is higher than that achieved using conventional methods (0.231 for PRISM, a template-based method, and 0.145 for MEGADOCK, a non-template-based method), while maintaining an F-measure value (0.285) comparable to that obtained using conventional methods (0.296 for PRISM, and 0.220 for MEGADOCK). Conclusions Our consensus method successfully predicted a PPI network with greater precision than conventional template/non-template methods, which may thus reduce the cost of validation by laboratory experiments for confirming novel PPIs from predicted PPIs. Therefore, our method may serve as an aid for promoting interactome analysis.
Collapse
|
50
|
Template-based structure modeling of protein-protein interactions. Curr Opin Struct Biol 2013; 24:10-23. [PMID: 24721449 DOI: 10.1016/j.sbi.2013.11.005] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2013] [Revised: 10/29/2013] [Accepted: 11/21/2013] [Indexed: 01/21/2023]
Abstract
The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the protein-protein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement.
Collapse
|