1
|
Sun X, Wu Z, Su J, Li C. A deep attention model for wide-genome protein-peptide binding affinity prediction at a sequence level. Int J Biol Macromol 2024; 276:133811. [PMID: 38996881 DOI: 10.1016/j.ijbiomac.2024.133811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/09/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
Peptides are pivotal in numerous biological activities by engaging in up to 40 % of protein-protein interactions in many cellular processes. Due to their exceptional specificity and effectiveness, peptides have emerged as promising candidates for drug design. However, accurately predicting protein-peptide binding affinity remains a challenging. Aiming at the problem, we develop a prediction model PepPAP based on convolutional neural network and multi-head attention, which relies solely on sequence features. These features include physicochemical properties, intrinsic disorder, sequence encoding, and especially interface propensity which is extracted from 16,689 non-redundant protein-peptide complexes. Notably, the adopted regression stratification cross-validation scheme proposed in our previous work is beneficial to improve the prediction for the cases with extreme binding affinity values. On three benchmark test datasets: T100, a series of peptides targeting to PDZ domain and CXCR4, PepPAP shows excellent performance, outperforming the existing methods and demonstrating its good generalization ability. Furthermore, PepPAP has good results in binary interaction prediction, and the analysis of the feature space distribution visualization highlights PepPAP's effectiveness. To the best of our knowledge, PepPAP is the first sequence-based deep attention model for wide-genome protein-peptide binding affinity prediction, and holds the potential to offer valuable insights for the peptide-based drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Bogdanova EA, Novoseletsky VN. ProBAN: Neural network algorithm for predicting binding affinity in protein-protein complexes. Proteins 2024; 92:1127-1136. [PMID: 38722047 DOI: 10.1002/prot.26700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/22/2024] [Accepted: 04/26/2024] [Indexed: 08/07/2024]
Abstract
Determining binding affinities in protein-protein and protein-peptide complexes is a challenging task that directly impacts the development of peptide and protein pharmaceuticals. Although several models have been proposed to predict the value of the dissociation constant and the Gibbs free energy, they are currently not capable of making stable predictions with high accuracy, in particular for complexes consisting of more than two molecules. In this work, we present ProBAN, a new method for predicting binding affinity in protein-protein complexes based on a deep convolutional neural network. Prediction is carried out for the spatial structures of complexes, presented in the format of a 4D tensor, which includes information about the location of atoms and their abilities to participate in various types of interactions realized in protein-protein and protein-peptide complexes. The effectiveness of the model was assessed both on an internal test data set containing complexes consisting of three or more molecules, as well as on an external test for the PPI-Affinity service. As a result, we managed to achieve the best prediction quality on these data sets among all the analyzed models: on the internal test, Pearson correlation R = 0.6, MAE = 1.60, on the external test, R = 0.55, MAE = 1.75. The open-source code, the trained ProBAN model, and the collected dataset are freely available at the following link https://github.com/EABogdanova/ProBAN.
Collapse
|
3
|
Nguyen QH, Nguyen-Vo TH, Do TTT, Nguyen BP. An efficient hybrid deep learning architecture for predicting short antimicrobial peptides. Proteomics 2024; 24:e2300382. [PMID: 38837544 DOI: 10.1002/pmic.202300382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 05/02/2024] [Accepted: 05/07/2024] [Indexed: 06/07/2024]
Abstract
Short-length antimicrobial peptides (AMPs) have been demonstrated to have intensified antimicrobial activities against a wide spectrum of microbes. Therefore, exploration of novel and promising short AMPs is highly essential in developing various types of antimicrobial drugs or treatments. In addition to experimental approaches, computational methods have been developed to improve screening efficiency. Although existing computational methods have achieved satisfactory performance, there is still much room for model improvement. In this study, we proposed iAMP-DL, an efficient hybrid deep learning architecture, for predicting short AMPs. The model was constructed using two well-known deep learning architectures: the long short-term memory architecture and convolutional neural networks. To fairly assess the performance of the model, we compared our model with existing state-of-the-art methods using the same independent test set. Our comparative analysis shows that iAMP-DL outperformed other methods. Furthermore, to assess the robustness and stability of our model, the experiments were repeated 10 times to observe the variation in prediction efficiency. The results demonstrate that iAMP-DL is an effective, robust, and stable framework for detecting promising short AMPs. Another comparative study of different negative data sampling methods also confirms the effectiveness of our method and demonstrates that it can also be used to develop a robust model for predicting AMPs in general. The proposed framework was also deployed as an online web server with a user-friendly interface to support the research community in identifying short AMPs.
Collapse
Affiliation(s)
- Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
- School of Innovation, Design and Technology, Wellington Institute of Technology, Lower Hutt, New Zealand
| | - Trang T T Do
- Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
- Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
| |
Collapse
|
4
|
Martínez‐Mauricio KL, García‐Jacas CR, Cordoves‐Delgado G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow. Protein Sci 2024; 33:e4928. [PMID: 38501511 PMCID: PMC10949403 DOI: 10.1002/pro.4928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/28/2024] [Accepted: 01/30/2024] [Indexed: 03/20/2024]
Abstract
Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.
Collapse
Affiliation(s)
- Karla L. Martínez‐Mauricio
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - César R. García‐Jacas
- Cátedras CONAHCYT – Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - Greneter Cordoves‐Delgado
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| |
Collapse
|
5
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
6
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
7
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
8
|
Antimicrobial peptides with cell-penetrating activity as prophylactic and treatment drugs. Biosci Rep 2022; 42:231731. [PMID: 36052730 PMCID: PMC9508529 DOI: 10.1042/bsr20221789] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 08/31/2022] [Accepted: 09/01/2022] [Indexed: 01/18/2023] Open
Abstract
Health is fundamental for the development of individuals and evolution of species. In that sense, for human societies is relevant to understand how the human body has developed molecular strategies to maintain health. In the present review, we summarize diverse evidence that support the role of peptides in this endeavor. Of particular interest to the present review are antimicrobial peptides (AMP) and cell-penetrating peptides (CPP). Different experimental evidence indicates that AMP/CPP are able to regulate autophagy, which in turn regulates the immune system response. AMP also assists in the establishment of the microbiota, which in turn is critical for different behavioral and health aspects of humans. Thus, AMP and CPP are multifunctional peptides that regulate two aspects of our bodies that are fundamental to our health: autophagy and microbiota. While it is now clear the multifunctional nature of these peptides, we are still in the early stages of the development of computational strategies aimed to assist experimentalists in identifying selective multifunctional AMP/CPP to control nonhealthy conditions. For instance, both AMP and CPP are computationally characterized as amphipatic and cationic, yet none of these features are relevant to differentiate these peptides from non-AMP or non-CPP. The present review aims to highlight current knowledge that may facilitate the development of AMP’s design tools for preventing or treating illness.
Collapse
|
9
|
Romero-Molina S, Ruiz-Blanco YB, Mieres-Perez J, Harms M, Münch J, Ehrmann M, Sanchez-Garcia E. PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein-Peptide and Protein-Protein Binding Affinity. J Proteome Res 2022; 21:1829-1841. [PMID: 35654412 PMCID: PMC9361347 DOI: 10.1021/acs.jproteome.2c00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Virtual screening
of protein–protein and protein–peptide
interactions is a challenging task that directly impacts the processes
of hit identification and hit-to-lead optimization in drug design
projects involving peptide-based pharmaceuticals. Although several
screening tools designed to predict the binding affinity of protein–protein
complexes have been proposed, methods specifically developed to predict
protein–peptide binding affinity are comparatively scarce.
Frequently, predictors trained to score the affinity of small molecules
are used for peptides indistinctively, despite the larger complexity
and heterogeneity of interactions rendered by peptide binders. To
address this issue, we introduce PPI-Affinity, a tool that leverages
support vector machine (SVM) predictors of binding affinity to screen
datasets of protein–protein and protein–peptide complexes,
as well as to generate and rank mutants of a given structure. The
performance of the SVM models was assessed on four benchmark datasets,
which include protein–protein and protein–peptide binding
affinity data. In addition, we evaluated our model on a set of mutants
of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor
CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with
peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Joel Mieres-Perez
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Mirja Harms
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany
| | - Jan Münch
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center, Ulm 89081, Germany
| | - Michael Ehrmann
- Faculty of Biology, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| |
Collapse
|
10
|
Computational Design of Inhibitors Targeting the Catalytic β Subunit of Escherichia coli FOF1-ATP Synthase. Antibiotics (Basel) 2022; 11:antibiotics11050557. [PMID: 35625201 PMCID: PMC9138118 DOI: 10.3390/antibiotics11050557] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 01/27/2023] Open
Abstract
With the uncontrolled growth of multidrug-resistant bacteria, there is an urgent need to search for new therapeutic targets, to develop drugs with novel modes of bactericidal action. FoF1-ATP synthase plays a crucial role in bacterial bioenergetic processes, and it has emerged as an attractive antimicrobial target, validated by the pharmaceutical approval of an inhibitor to treat multidrug-resistant tuberculosis. In this work, we aimed to design, through two types of in silico strategies, new allosteric inhibitors of the ATP synthase, by targeting the catalytic β subunit, a centerpiece in communication between rotor subunits and catalytic sites, to drive the rotary mechanism. As a model system, we used the F1 sector of Escherichia coli, a bacterium included in the priority list of multidrug-resistant pathogens. Drug-like molecules and an IF1-derived peptide, designed through molecular dynamics simulations and sequence mining approaches, respectively, exhibited in vitro micromolar inhibitor potency against F1. An analysis of bacterial and Mammalia sequences of the key structural helix-turn-turn motif of the C-terminal domain of the β subunit revealed highly and moderately conserved positions that could be exploited for the development of new species-specific allosteric inhibitors. To our knowledge, these inhibitors are the first binders computationally designed against the catalytic subunit of FOF1-ATP synthase.
Collapse
|
11
|
Holch A, Bauer R, Olari LR, Rodriguez AA, Ständker L, Preising N, Karacan M, Wiese S, Walther P, Ruiz-Blanco YB, Sanchez-Garcia E, Schumann C, Münch J, Spellerberg B. Respiratory ß-2-Microglobulin exerts pH dependent antimicrobial activity. Virulence 2021; 11:1402-1414. [PMID: 33092477 PMCID: PMC7588194 DOI: 10.1080/21505594.2020.1831367] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The respiratory tract is a major entry site for microbial pathogens. To combat bacterial infections, the immune system has various defense mechanisms at its disposal, including antimicrobial peptides (AMPs). To search for novel AMPs from the respiratory tract, a peptide library from human broncho-alveolar-lavage (BAL) fluid was screened for antimicrobial activity by radial diffusion assays allowing the efficient detection of antibacterial activity within a small sample size. After repeated testing-cycles and subsequent purification, we identified ß-2-microglobulin (B2M) in antibacterially active fractions. B2M belongs to the MHC-1 receptor complex present at the surface of nucleated cells. It is known to inhibit the growth of Listeria monocytogenes and Escherichia coli and to facilitate phagocytosis of Staphylococcus aureus. Using commercially available B2M we confirmed a dose-dependent inhibition of Pseudomonas aeruginosa and L. monocytogenes. To characterize AMP activity within the B2M sequence, peptide fragments of the molecule were tested for antimicrobial activity. Activity could be localized to the C-terminal part of B2M. Investigating pH dependency of the antimicrobial activity of B2M demonstrated an increased activity at pH values of 5.5 and below, a hallmark of infection and inflammation. Sytox green uptake into bacterial cells following the exposure to B2M was determined and revealed a pH-dependent loss of bacterial membrane integrity. TEM analysis showed areas of disrupted bacterial membranes in L. monocytogenes incubated with B2M and high amounts of lysed bacterial cells. In conclusion, B2M as part of a ubiquitous cell surface complex may represent a potent antimicrobial agent by interfering with bacterial membrane integrity.
Collapse
Affiliation(s)
- Armin Holch
- Institute of Medical Microbiology and Hygiene, University Hospital , Ulm, Germany
| | - Richard Bauer
- Institute of Medical Microbiology and Hygiene, University Hospital , Ulm, Germany
| | - Lia-Raluca Olari
- Institute of Molecular Virology, University Hospital , Ulm, Germany
| | - Armando A Rodriguez
- Core Facility Functional Peptidomics, Ulm University Medical Center , Ulm, Germany.,Core Unit Mass Spectrometry and Proteomics, Ulm University , Ulm, Germany
| | - Ludger Ständker
- Core Facility Functional Peptidomics, Ulm University Medical Center , Ulm, Germany
| | - Nico Preising
- Core Facility Functional Peptidomics, Ulm University Medical Center , Ulm, Germany
| | - Merve Karacan
- Core Facility Functional Peptidomics, Ulm University Medical Center , Ulm, Germany
| | - Sebastian Wiese
- Core Unit Mass Spectrometry and Proteomics, Ulm University , Ulm, Germany
| | - Paul Walther
- Central Facility for Electron Microscopy, Ulm University Medical Center , Ulm, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen , Essen, Germany
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen , Essen, Germany
| | - Christian Schumann
- Pneumology, Thoracic Oncology, Sleep and Respiratory Critical Care Medicine, Clinics Kempten-Allgäu, Kempten and Immenstadt , Germany
| | - Jan Münch
- Institute of Molecular Virology, University Hospital , Ulm, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center , Ulm, Germany
| | - Barbara Spellerberg
- Institute of Medical Microbiology and Hygiene, University Hospital , Ulm, Germany
| |
Collapse
|
12
|
Pinacho-Castellanos SA, García-Jacas CR, Gilson MK, Brizuela CA. Alignment-Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set. J Chem Inf Model 2021; 61:3141-3157. [PMID: 34081438 DOI: 10.1021/acs.jcim.1c00251] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In the last two decades, a large number of machine-learning-based predictors for the activities of antimicrobial peptides (AMPs) have been proposed. These predictors differ from one another in the learning method and in the training and testing data sets used. Unfortunately, the training data sets present several drawbacks, such as a low representativeness regarding the experimentally validated AMP space, and duplicated peptide sequences between negative and positive data sets. These limitations give a low confidence to most of the approaches to be used in prospective studies. To address these weaknesses, we propose novel modeling and assessing data sets from the largest experimentally validated nonredundant peptide data set reported to date. From these novel data sets, alignment-free quantitative sequence-activity models (AF-QSAMs) based on Random Forest are created to identify general AMPs and their antibacterial, antifungal, antiparasitic, and antiviral functional types. An applicability domain analysis is carried out to determine the reliability of the predictions obtained, which, to the best of our knowledge, is performed for the first time for AMP recognition. A benchmarking is undertaken between the models proposed and several models from the literature that are freely available in 13 programs (ClassAMP, iAMP-2L, ADAM, MLAMP, AMPScanner v2.0, AntiFP, AMPfun, PEPred-suite, AxPEP, CAMPR3, iAMPpred, APIN, and Meta-iAVP). The models proposed are those with the best performance in all of the endpoints modeled, while most of the methods from the literature have weak-to-random predictive agreements. The models proposed are also assessed through Y-scrambling and repeated k-fold cross-validation tests, demonstrating that the outcomes obtained by them are not given by chance. Three chemometric analyses also confirmed the relevance of the peptides descriptors used in the modeling. Therefore, it can be concluded that the models built by fixing the drawbacks existing in the literature contribute to identifying antibacterial, antifungal, antiparasitic, and antiviral peptides with high effectivity and reliability. Models are freely available via the AMPDiscover tool at https://biocom-ampdiscover.cicese.mx/.
Collapse
Affiliation(s)
- Sergio A Pinacho-Castellanos
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.,Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI), Instituto Politécnico Nacional (IPN), 22435 Tijuana, Baja California, México
| | - César R García-Jacas
- Cátedras CONACYT-Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, United States
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México
| |
Collapse
|
13
|
Ruiz-Blanco YB, Ávila-Barrientos LP, Hernández-García E, Antunes A, Agüero-Chapin G, García-Hernández E. Engineering protein fragments via evolutionary and protein-protein interaction algorithms: de novo design of peptide inhibitors for F O F 1 -ATP synthase. FEBS Lett 2020; 595:183-194. [PMID: 33151544 DOI: 10.1002/1873-3468.13988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/23/2020] [Accepted: 10/30/2020] [Indexed: 11/08/2022]
Abstract
Enzyme subunit interfaces have remarkable potential in drug design as both target and scaffold for their own inhibitors. We show an evolution-driven strategy for the de novo design of peptide inhibitors targeting interfaces of the Escherichia coli FoF1-ATP synthase as a case study. The evolutionary algorithm ROSE was applied to generate diversity-oriented peptide libraries by engineering peptide fragments from ATP synthase interfaces. The resulting peptides were scored with PPI-Detect, a sequence-based predictor of protein-protein interactions. Two selected peptides were confirmed by in vitro inhibition and binding tests. The proposed methodology can be widely applied to design peptides targeting relevant interfaces of enzymatic complexes.
Collapse
Affiliation(s)
| | | | | | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Portugal
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Portugal
| | | |
Collapse
|
14
|
Aguilera-Mendoza L, Marrero-Ponce Y, García-Jacas CR, Chavez E, Beltran JA, Guillen-Ramirez HA, Brizuela CA. Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach. Sci Rep 2020; 10:18074. [PMID: 33093586 PMCID: PMC7583304 DOI: 10.1038/s41598-020-75029-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 09/23/2020] [Indexed: 12/15/2022] Open
Abstract
The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the "ocean" of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool ( http://mobiosd-hub.com/starpep/ ), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
Collapse
Affiliation(s)
- Longendri Aguilera-Mendoza
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17-1200-841, Quito, Ecuador.
- Grupo GINUMED, Corporacion Universitaria Rafael Nuñez. Facultad de Salud, Programa de Medicina, Cartagena, Colombia.
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia, Spain.
| | - César R García-Jacas
- Cátedras Conacyt - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Edgar Chavez
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico
| | - Jesus A Beltran
- Department of Informatics, University of California, Irvine, Irvine, CA, USA
| | - Hugo A Guillen-Ramirez
- Department of BioMedical Research (DBMR), University of Bern, Bern, 3008, Switzerland
- Department of Medical Oncology, Inselspital, University Hospital and University of Bern, 3010, Bern, Switzerland
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico.
| |
Collapse
|
15
|
Poot Velez AH, Fontove F, Del Rio G. Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes. Int J Mol Sci 2020; 21:E4787. [PMID: 32640745 PMCID: PMC7370293 DOI: 10.3390/ijms21134787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/20/2020] [Accepted: 06/28/2020] [Indexed: 01/22/2023] Open
Abstract
Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
Collapse
Affiliation(s)
- Albros Hermes Poot Velez
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| | | | - Gabriel Del Rio
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| |
Collapse
|
16
|
Romero-Molina S, Ruiz-Blanco YB, Green JR, Sanchez-Garcia E. ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins. Protein Sci 2020; 28:1734-1743. [PMID: 31271472 DOI: 10.1002/pro.3673] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/21/2019] [Accepted: 06/24/2019] [Indexed: 12/24/2022]
Abstract
Computational tools for the analysis of protein data and the prediction of biological properties are essential in life sciences and biomedical research. Here, we introduce ProtDCal-Suite, a web server comprising a set of machine learning-based methods for studying proteins. The main module of ProtDCal-Suite is the ProtDCal software. ProtDCal translates the structural information of proteins into numerical descriptors that serve as input to machine-learning techniques. The ProtDCal-Suite server also incorporates a post-processing optional stage that allows ranking and filtering the obtained descriptors by computing their Shannon entropy values across the input set of proteins. ProtDCal's codification was used in the development of models for the prediction of specific protein properties. Thus, the other modules of ProtDCal-Suite are protein analysis tools implemented using ProtDCal's descriptors. Among them are PPI-Detect, for predicting the interaction likelihood of protein-protein and protein-peptide pairs, Enzyme Identifier, for identifying enzymes from amino acid sequences or 3D structures, and Pred-NGlyco, for predicting N-glycosylation sites. ProtDCal-Suite is freely accessible at https://protdcal.zmb.uni-due.de.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - James R Green
- Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
17
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|