1
|
Ayala-Ruano S, Marrero-Ponce Y, Aguilera-Mendoza L, Pérez N, Agüero-Chapin G, Antunes A, Aguilar AC. Network Science and Group Fusion Similarity-Based Searching to Explore the Chemical Space of Antiparasitic Peptides. ACS OMEGA 2022; 7:46012-46036. [PMID: 36570318 PMCID: PMC9773354 DOI: 10.1021/acsomega.2c03398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/21/2022] [Indexed: 05/13/2023]
Abstract
Antimicrobial peptides (AMPs) have appeared as promising compounds to treat a wide range of diseases. Their clinical potentialities reside in the wide range of mechanisms they can use for both killing microbes and modulating immune responses. However, the hugeness of the AMPs' chemical space (AMPCS), represented by more than 1065 unique sequences, has represented a big challenge for the discovery of new promising therapeutic peptides and for the identification of common structural motifs. Here, we introduce network science and a similarity searching approach to discover new promising AMPs, specifically antiparasitic peptides (APPs). We exploited the network-based representation of APPs' chemical space (APPCS) to retrieve valuable information by using three network types: chemical space (CSN), half-space proximal (HSPN), and metadata (METN). Some centrality measures were applied to identify in each network the most important and nonredundant peptides. Then, these central peptides were considered as queries (Qs) in group fusion similarity-based searches against a comprehensive collection of known AMPs, stored in the graph database StarPepDB, to propose new potential APPs. The performance of the resulting multiquery similarity-based search models (mQSSMs) was evaluated in five benchmarking data sets of APP/non-APPs. The predictions performed by the best mQSSM showed a strong-to-very-strong performance since their external Matthews correlation coefficient (MCC) values ranged from 0.834 to 0.965. Outstanding MCC values (>0.85) were attained by the mQSSM with 219 Qs from both networks CSN and HSPN with 0.5 as similarity threshold in external data sets. Then, the performance of our best mQSSM was compared with the APPs prediction servers AMPDiscover and AMPFun. The proposed model showed its relevance by outperforming state-of-the-art machine learning models to predict APPs. After applying the best mQSSM and additional filters on the non-APP space from StarPepDB, 95 AMPs were repurposed as potential APP hits. Due to the high sequence diversity of these peptides, different computational approaches were applied to identify relevant motifs for searching and designing new APPs. Lastly, we identified 11 promising APP lead candidates by using our best mQSSMs together with diversity-based network analyses, and 24 web servers for activity/toxicity and drug-like properties. These results support that network-based similarity searches can be an effective and reliable strategy to identify APPs. The proposed models and pipeline are freely available through the StarPep toolbox software at http://mobiosd-hub.com/starpep.
Collapse
Affiliation(s)
- Sebastián Ayala-Ruano
- Grupo
de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina,
Colegio de Ciencias de la Salud (COCSA), Universidad San Francisco de Quito, Av. Interoceánica Km 12 1/2 y Av. Florencia, Quito 17-1200-841, Ecuador
- Colegio
de Ciencias e Ingenierías “El Politécnico”, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
| | - Yovani Marrero-Ponce
- Grupo
de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina,
Colegio de Ciencias de la Salud (COCSA), Universidad San Francisco de Quito, Av. Interoceánica Km 12 1/2 y Av. Florencia, Quito 17-1200-841, Ecuador
- Computer-Aided
Molecular “Biosilico” Discovery and Bioinformatics Research
International Network (CAMD-BIR IN), Cumbayá, Quito 170901, Ecuador
- Universidad
San Francisco de Quito (USFQ), Instituto
de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Pichincha, Ecuador
- Departamento
de Ciencias de la Computación, Centro
de Investigación Científica y de Educación Superior
de Ensenada (CICESE), Baja California 22860, Mexico
| | - Longendri Aguilera-Mendoza
- Departamento
de Ciencias de la Computación, Centro
de Investigación Científica y de Educación Superior
de Ensenada (CICESE), Baja California 22860, Mexico
| | - Noel Pérez
- Colegio
de Ciencias e Ingenierías “El Politécnico”, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR,
Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton
de Matos s/n, 4450-208 Porto, Portugal
- Department
of Biology, Faculty of Sciences, University
of Porto, Rua do Campo
Alegre, 4169-007 Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR,
Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton
de Matos s/n, 4450-208 Porto, Portugal
- Department
of Biology, Faculty of Sciences, University
of Porto, Rua do Campo
Alegre, 4169-007 Porto, Portugal
| | - Ana Cristina Aguilar
- Grupo
de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina,
Colegio de Ciencias de la Salud (COCSA), Universidad San Francisco de Quito, Av. Interoceánica Km 12 1/2 y Av. Florencia, Quito 17-1200-841, Ecuador
| |
Collapse
|
3
|
Lara Ortiz MT, Martinell García V, Del Rio G. Saturation Mutagenesis of the Transmembrane Region of HokC in Escherichia coli Reveals Its High Tolerance to Mutations. Int J Mol Sci 2021; 22:ijms221910359. [PMID: 34638709 PMCID: PMC8509063 DOI: 10.3390/ijms221910359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Cells adapt to different stress conditions, such as the antibiotics presence. This adaptation sometimes is achieved by changing relevant protein positions, of which the mutability is limited by structural constrains. Understanding the basis of these constrains represent an important challenge for both basic science and potential biotechnological applications. To study these constraints, we performed a systematic saturation mutagenesis of the transmembrane region of HokC, a toxin used by Escherichia coli to control its own population, and observed that 92% of single-point mutations are tolerated and that all the non-tolerated mutations have compensatory mutations that reverse their effect. We provide experimental evidence that HokC accumulates multiple compensatory mutations that are found as correlated mutations in the HokC family multiple sequence alignment. In agreement with these observations, transmembrane proteins show higher probability to present correlated mutations and are less densely packed locally than globular proteins; previous mutagenesis results on transmembrane proteins further support our observations on the high tolerability to mutations of transmembrane regions of proteins. Thus, our experimental results reveal the HokC transmembrane region high tolerance to loss-of-function mutations that is associated with low sequence conservation and high rate of correlated mutations in the HokC family sequences alignment, which are features shared with other transmembrane proteins.
Collapse
|
4
|
Poot Velez AH, Fontove F, Del Rio G. Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes. Int J Mol Sci 2020; 21:E4787. [PMID: 32640745 PMCID: PMC7370293 DOI: 10.3390/ijms21134787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/20/2020] [Accepted: 06/28/2020] [Indexed: 01/22/2023] Open
Abstract
Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
Collapse
Affiliation(s)
- Albros Hermes Poot Velez
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| | | | - Gabriel Del Rio
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| |
Collapse
|
5
|
Fontove F, Del Rio G. Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification. ENTROPY 2020; 22:e22040472. [PMID: 33286246 PMCID: PMC7516957 DOI: 10.3390/e22040472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 03/30/2020] [Accepted: 04/07/2020] [Indexed: 11/16/2022]
Abstract
Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented.
Collapse
Affiliation(s)
| | - Gabriel Del Rio
- Department of Biochemistry and Structural Biology, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico
- Correspondence:
| |
Collapse
|