1
|
Cai W, Liu P, Wang Z, Jiang H, Liu C, Fei Z, Yang Z. Link prediction in protein-protein interaction network: A similarity multiplied similarity algorithm with paths of length three. J Theor Biol 2024; 589:111850. [PMID: 38740126 DOI: 10.1016/j.jtbi.2024.111850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/26/2024] [Accepted: 05/03/2024] [Indexed: 05/16/2024]
Abstract
Protein-protein interactions (PPIs) are crucial for various biological processes, and predicting PPIs is a major challenge. To solve this issue, the most common method is link prediction. Currently, the link prediction methods based on network Paths of Length Three (L3) have been proven to be highly effective. In this paper, we propose a novel link prediction algorithm, named SMS, which is based on L3 and protein similarities. We first design a mixed similarity that combines the topological structure and attribute features of nodes. Then, we compute the predicted value by summing the product of all similarities along the L3. Furthermore, we propose the Max Similarity Multiplied Similarity (maxSMS) algorithm from the perspective of maximum impact. Our computational prediction results show that on six datasets, including S. cerevisiae, H. sapiens, and others, the maxSMS algorithm improves the precision of the top 500, area under the precision-recall curve, and normalized discounted cumulative gain by an average of 26.99%, 53.67%, and 6.7%, respectively, compared to other optimal methods.
Collapse
Affiliation(s)
- Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Zunfang Wang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Hong Jiang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Zhaojie Fei
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Zhuang Yang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
2
|
Pancino N, Gallegati C, Romagnoli F, Bongini P, Bianchini M. Protein-Protein Interfaces: A Graph Neural Network Approach. Int J Mol Sci 2024; 25:5870. [PMID: 38892057 PMCID: PMC11173158 DOI: 10.3390/ijms25115870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 05/15/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
Protein-protein interactions (PPIs) are fundamental processes governing cellular functions, crucial for understanding biological systems at the molecular level. Compared to experimental methods for PPI prediction and site identification, computational deep learning approaches represent an affordable and efficient solution to tackle these problems. Since protein structure can be summarized as a graph, graph neural networks (GNNs) represent the ideal deep learning architecture for the task. In this work, PPI prediction is modeled as a node-focused binary classification task using a GNN to determine whether a generic residue is part of the interface. Biological data were obtained from the Protein Data Bank in Europe (PDBe), leveraging the Protein Interfaces, Surfaces, and Assemblies (PISA) service. To gain a deeper understanding of how proteins interact, the data obtained from PISA were assembled into three datasets: Whole, Interface, and Chain, consisting of data on the whole protein, couples of interacting chains, and single chains, respectively. These three datasets correspond to three different nuances of the problem: identifying interfaces between protein complexes, between chains of the same protein, and interface regions in general. The results indicate that GNNs are capable of solving each of the three tasks with very good performance levels.
Collapse
Affiliation(s)
- Niccolò Pancino
- Department of Information Engineering and Mathematics, University of Siena, Via Roma, 56, 53100 Siena, Italy; (C.G.); (P.B.); (M.B.)
| | | | | | | | | |
Collapse
|
3
|
Yin S, Mi X, Shukla D. Leveraging machine learning models for peptide-protein interaction prediction. RSC Chem Biol 2024; 5:401-417. [PMID: 38725911 PMCID: PMC11078210 DOI: 10.1039/d3cb00208j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/07/2024] [Indexed: 05/12/2024] Open
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as docking and molecular dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
- Department of Bioengineering, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| |
Collapse
|
4
|
Ma W, Bi X, Jiang H, Zhang S, Wei Z. CollaPPI: A Collaborative Learning Framework for Predicting Protein-Protein Interactions. IEEE J Biomed Health Inform 2024; 28:3167-3177. [PMID: 38466584 DOI: 10.1109/jbhi.2024.3375621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Exploring protein-protein interaction (PPI) is of paramount importance for elucidating the intrinsic mechanism of various biological processes. Nevertheless, experimental determination of PPI can be both time-consuming and expensive, motivating the exploration of data-driven deep learning technologies as a viable, efficient, and accurate alternative. Nonetheless, most current deep learning-based methods regarded a pair of proteins to be predicted for possible interaction as two separate entities when extracting PPI features, thus neglecting the knowledge sharing among the collaborative protein and the target protein. Aiming at the above issue, a collaborative learning framework CollaPPI was proposed in this study, where two kinds of collaboration, i.e., protein-level collaboration and task-level collaboration, were incorporated to achieve not only the knowledge-sharing between a pair of proteins, but also the complementation of such shared knowledge between biological domains closely related to PPI (i.e., protein function, and subcellular location). Evaluation results demonstrated that CollaPPI obtained superior performance compared to state-of-the-art methods on two PPI benchmarks. Besides, evaluation results of CollaPPI on the additional PPI type prediction task further proved its excellent generalization ability.
Collapse
|
5
|
Hu J, Li Z, Rao B, Thafar MA, Arif M. Improving protein-protein interaction prediction using protein language model and protein network features. Anal Biochem 2024; 693:115550. [PMID: 38679191 DOI: 10.1016/j.ab.2024.115550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/12/2024] [Accepted: 04/25/2024] [Indexed: 05/01/2024]
Abstract
Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at https://github.com/rickleezhe/KSGPPI.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
| | - Zhe Li
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education, Beijing, 100039, China.
| | - Maha A Thafar
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar.
| |
Collapse
|
6
|
Gomes Souza F, Bhansali S, Pal K, Silveira Maranhão FD, Santos Oliveira M, Valladão VS, Brandão E Silva DS, Silva GB. A 30-Year Review on Nanocomposites: Comprehensive Bibliometric Insights into Microstructural, Electrical, and Mechanical Properties Assisted by Artificial Intelligence. MATERIALS (BASEL, SWITZERLAND) 2024; 17:1088. [PMID: 38473560 DOI: 10.3390/ma17051088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
From 1990 to 2024, this study presents a groundbreaking bibliometric and sentiment analysis of nanocomposite literature, distinguishing itself from existing reviews through its unique computational methodology. Developed by our research group, this novel approach systematically investigates the evolution of nanocomposites, focusing on microstructural characterization, electrical properties, and mechanical behaviors. By deploying advanced Boolean search strategies within the Scopus database, we achieve a meticulous extraction and in-depth exploration of thematic content, a methodological advancement in the field. Our analysis uniquely identifies critical trends and insights concerning nanocomposite microstructure, electrical attributes, and mechanical performance. The paper goes beyond traditional textual analytics and bibliometric evaluation, offering new interpretations of data and highlighting significant collaborative efforts and influential studies within the nanocomposite domain. Our findings uncover the evolution of research language, thematic shifts, and global contributions, providing a distinct and comprehensive view of the dynamic evolution of nanocomposite research. A critical component of this study is the "State-of-the-Art and Gaps Extracted from Results and Discussions" section, which delves into the latest advancements in nanocomposite research. This section details various nanocomposite types and their properties and introduces novel interpretations of their applications, especially in nanocomposite films. By tracing historical progress and identifying emerging trends, this analysis emphasizes the significance of collaboration and influential studies in molding the field. Moreover, the "Literature Review Guided by Artificial Intelligence" section showcases an innovative AI-guided approach to nanocomposite research, a first in this domain. Focusing on articles from 2023, selected based on citation frequency, this method offers a new perspective on the interplay between nanocomposites and their electrical properties. It highlights the composition, structure, and functionality of various systems, integrating recent findings for a comprehensive overview of current knowledge. The sentiment analysis, with an average score of 0.638771, reflects a positive trend in academic discourse and an increasing recognition of the potential of nanocomposites. Our bibliometric analysis, another methodological novelty, maps the intellectual domain, emphasizing pivotal research themes and the influence of crosslinking time on nanocomposite attributes. While acknowledging its limitations, this study exemplifies the indispensable role of our innovative computational tools in synthesizing and understanding the extensive body of nanocomposite literature. This work not only elucidates prevailing trends but also contributes a unique perspective and novel insights, enhancing our understanding of the nanocomposite research field.
Collapse
Affiliation(s)
- Fernando Gomes Souza
- Biopolymers & Sensors Lab., Instituto de Macromoléculas Professora Eloisa Mano, Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-853, Brazil
- Programa de Engenharia da Nanotecnologia, Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia (COPPE), Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-914, Brazil
| | - Shekhar Bhansali
- Biomolecular Sciences Institute, College of Engineering & Computing, Center for Aquatic Chemistry and Environment, Florida International University, 10555 West Flagler St EC3900, Miami, FL 33174, USA
| | - Kaushik Pal
- Department of Physics, University Center for Research and Development (UCRD), Chandigarh University, Mohali 140413, Punjab, India
| | - Fabíola da Silveira Maranhão
- Biopolymers & Sensors Lab., Instituto de Macromoléculas Professora Eloisa Mano, Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-853, Brazil
| | - Marcella Santos Oliveira
- Biopolymers & Sensors Lab., Instituto de Macromoléculas Professora Eloisa Mano, Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-853, Brazil
| | - Viviane Silva Valladão
- Biopolymers & Sensors Lab., Instituto de Macromoléculas Professora Eloisa Mano, Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-853, Brazil
| | - Daniele Silvéria Brandão E Silva
- Programa de Engenharia da Nanotecnologia, Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia (COPPE), Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-914, Brazil
| | - Gabriel Bezerra Silva
- Biopolymers & Sensors Lab., Instituto de Macromoléculas Professora Eloisa Mano, Universidade Federal do Rio de Janeiro, Centro de Tecnologia-Cidade Universitária, Rio de Janeiro 21941-853, Brazil
| |
Collapse
|
7
|
Qi X, Zhao Y, Qi Z, Hou S, Chen J. Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges. Molecules 2024; 29:903. [PMID: 38398653 PMCID: PMC10892089 DOI: 10.3390/molecules29040903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open
Abstract
Drug discovery plays a critical role in advancing human health by developing new medications and treatments to combat diseases. How to accelerate the pace and reduce the costs of new drug discovery has long been a key concern for the pharmaceutical industry. Fortunately, by leveraging advanced algorithms, computational power and biological big data, artificial intelligence (AI) technology, especially machine learning (ML), holds the promise of making the hunt for new drugs more efficient. Recently, the Transformer-based models that have achieved revolutionary breakthroughs in natural language processing have sparked a new era of their applications in drug discovery. Herein, we introduce the latest applications of ML in drug discovery, highlight the potential of advanced Transformer-based ML models, and discuss the future prospects and challenges in the field.
Collapse
Affiliation(s)
- Xin Qi
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Yuanchun Zhao
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Zhuang Qi
- School of Software, Shandong University, Jinan 250101, China;
| | - Siyu Hou
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Jiajia Chen
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| |
Collapse
|
8
|
Kole A, Bag AK, Pal AJ, De D. Generic model to unravel the deeper insights of viral infections: an empirical application of evolutionary graph coloring in computational network biology. BMC Bioinformatics 2024; 25:74. [PMID: 38365632 PMCID: PMC10874019 DOI: 10.1186/s12859-024-05690-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/02/2024] [Indexed: 02/18/2024] Open
Abstract
PURPOSE Graph coloring approach has emerged as a valuable problem-solving tool for both theoretical and practical aspects across various scientific disciplines, including biology. In this study, we demonstrate the graph coloring's effectiveness in computational network biology, more precisely in analyzing protein-protein interaction (PPI) networks to gain insights about the viral infections and its consequences on human health. Accordingly, we propose a generic model that can highlight important hub proteins of virus-associated disease manifestations, changes in disease-associated biological pathways, potential drug targets and respective drugs. We test our model on SARS-CoV-2 infection, a highly transmissible virus responsible for the COVID-19 pandemic. The pandemic took significant human lives, causing severe respiratory illnesses and exhibiting various symptoms ranging from fever and cough to gastrointestinal, cardiac, renal, neurological, and other manifestations. METHODS To investigate the underlying mechanisms of SARS-CoV-2 infection-induced dysregulation of human pathobiology, we construct a two-level PPI network and employed a differential evolution-based graph coloring (DEGCP) algorithm to identify critical hub proteins that might serve as potential targets for resolving the associated issues. Initially, we concentrate on the direct human interactors of SARS-CoV-2 proteins to construct the first-level PPI network and subsequently applied the DEGCP algorithm to identify essential hub proteins within this network. We then build a second-level PPI network by incorporating the next-level human interactors of the first-level hub proteins and use the DEGCP algorithm to predict the second level of hub proteins. RESULTS We first identify the potential crucial hub proteins associated with SARS-CoV-2 infection at different levels. Through comprehensive analysis, we then investigate the cellular localization, interactions with other viral families, involvement in biological pathways and processes, functional attributes, gene regulation capabilities as transcription factors, and their associations with disease-associated symptoms of these identified hub proteins. Our findings highlight the significance of these hub proteins and their intricate connections with disease pathophysiology. Furthermore, we predict potential drug targets among the hub proteins and identify specific drugs that hold promise in preventing or treating SARS-CoV-2 infection and its consequences. CONCLUSION Our generic model demonstrates the effectiveness of DEGCP algorithm in analyzing biological PPI networks, provides valuable insights into disease biology, and offers a basis for developing novel therapeutic strategies for other viral infections that may cause future pandemic.
Collapse
Affiliation(s)
- Arnab Kole
- Department of Computer Application, The Heritage Academy, Kolkata, W.B., 700107, India.
| | - Arup Kumar Bag
- Beckman Research Institute of City of Hope, Duarte, CA, 91010, USA
| | | | - Debashis De
- Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Nadia, W.B., 741249, India
| |
Collapse
|
9
|
Yin S, Mi X, Shukla D. Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction. ARXIV 2024:arXiv:2310.18249v2. [PMID: 37961736 PMCID: PMC10635286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as Docking and Molecular Dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
10
|
Krokidis MG, Dimitrakopoulos GN, Vrahatis AG, Exarchos TP, Vlamos P. Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
Affiliation(s)
| | | | | | | | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
11
|
Xie S, Xie X, Zhao X, Liu F, Wang Y, Ping J, Ji Z. HNSPPI: a hybrid computational model combing network and sequence information for predicting protein-protein interaction. Brief Bioinform 2023; 24:bbad261. [PMID: 37480553 DOI: 10.1093/bib/bbad261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/24/2023] [Accepted: 06/26/2023] [Indexed: 07/24/2023] Open
Abstract
Most life activities in organisms are regulated through protein complexes, which are mainly controlled via Protein-Protein Interactions (PPIs). Discovering new interactions between proteins and revealing their biological functions are of great significance for understanding the molecular mechanisms of biological processes and identifying the potential targets in drug discovery. Current experimental methods only capture stable protein interactions, which lead to limited coverage. In addition, expensive cost and time consuming are also the obvious shortcomings. In recent years, various computational methods have been successfully developed for predicting PPIs based only on protein homology, primary sequences of protein or gene ontology information. Computational efficiency and data complexity are still the main bottlenecks for the algorithm generalization. In this study, we proposed a novel computational framework, HNSPPI, to predict PPIs. As a hybrid supervised learning model, HNSPPI comprehensively characterizes the intrinsic relationship between two proteins by integrating amino acid sequence information and connection properties of PPI network. The experimental results show that HNSPPI works very well on six benchmark datasets. Moreover, the comparison analysis proved that our model significantly outperforms other five existing algorithms. Finally, we used the HNSPPI model to explore the SARS-CoV-2-Human interaction system and found several potential regulations. In summary, HNSPPI is a promising model for predicting new protein interactions from known PPI data.
Collapse
Affiliation(s)
- Shijie Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xiaojun Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xin Zhao
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Fei Liu
- Joint International Research Laboratory of Animal Health and Food Safety of Ministry of Education & Single Molecule Nanometry Laboratory (Sinmolab), Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yiming Wang
- Key Laboratory of Biological Interactions and Crop Health, Department of Plant Pathology, Nanjing Agricultural University, 210095, Nanjing, China
| | - Jihui Ping
- MOE International Joint Collaborative Research Laboratory for Animal Health and Food Safety & Jiangsu Engineering Laboratory of Animal Immunology, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| |
Collapse
|
12
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
13
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
14
|
Saldinger JC, Raymond M, Elvati P, Violi A. Domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. NATURE COMPUTATIONAL SCIENCE 2023; 3:393-402. [PMID: 38177838 DOI: 10.1038/s43588-023-00438-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 03/24/2023] [Indexed: 01/06/2024]
Abstract
Although challenging, the accurate and rapid prediction of nanoscale interactions has broad applications for numerous biological processes and material properties. While several models have been developed to predict the interaction of specific biological components, they use system-specific information that hinders their application to more general materials. Here we present NeCLAS, a general and efficient machine learning pipeline that predicts the location of nanoscale interactions, providing human-intelligible predictions. NeCLAS outperforms current nanoscale prediction models for generic nanoparticles up to 10-20 nm, reproducing interactions for biological and non-biological systems. Two aspects contribute to these results: a low-dimensional representation of nanoparticles and molecules (to reduce the effect of data uncertainty), and environmental features (to encode the physicochemical neighborhood at multiple scales). This framework has several applications, from basic research to rapid prototyping and design in nanobiotechnology.
Collapse
Affiliation(s)
| | - Matt Raymond
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Paolo Elvati
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Angela Violi
- Chemical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA.
- Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
15
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|