1
|
Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018; 13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open
Abstract
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Qinxin Pan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America
- * E-mail:
| |
Collapse
|
2
|
Ruiz-Blanco YB, Agüero-Chapin G, García-Hernández E, Álvarez O, Antunes A, Green J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics 2017; 18:349. [PMID: 28732462 PMCID: PMC5521120 DOI: 10.1186/s12859-017-1758-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/13/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Yasser B Ruiz-Blanco
- Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, 54830, Santa Clara, Cuba.,Theoretical Chemistry, Max Planck Institute für Kohlenforschung, 45470, Mulheim an der Ruhr, Germany
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| | - Enrique García-Hernández
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), 04360, D.F, México, Mexico
| | - Orlando Álvarez
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|
3
|
Novel "extended sequons" of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 2016; 49:317-325. [PMID: 27896447 DOI: 10.1007/s00726-016-2362-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 11/05/2016] [Indexed: 10/20/2022]
Abstract
N-Glycosylation is a common post-translational modification that plays an important role in the proper folding and function of many proteins. This modification is largely dependent on the presence of a sequence motif called a "sequon" defined as Asn-Xxx-Ser/Thr. However, evidence has shown that the presence of such a "sequon" is insufficient to determine the occurrence of N-glycosylation with high precision. This study aims to elucidate patterns that can more accurately predict N-glycosylation sites in human proteins. The novel motifs are evaluated using benchmarking data from 188 organisms. Performance is largely sustained compared to the human data, which validates the robustness of the novel extracted "extended sequons". We, therefore, introduce new knowledge about sequence-related factors that control N-glycosylation.
Collapse
|
4
|
Kleandrova VV, Ruso JM, Speck-Planche A, Dias Soeiro Cordeiro MN. Enabling the Discovery and Virtual Screening of Potent and Safe Antimicrobial Peptides. Simultaneous Prediction of Antibacterial Activity and Cytotoxicity. ACS COMBINATORIAL SCIENCE 2016; 18:490-8. [PMID: 27280735 DOI: 10.1021/acscombsci.6b00063] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antimicrobial peptides (AMPs) represent promising alternatives to fight against bacterial pathogens. However, cellular toxicity remains one of the main concerns in the early development of peptide-based drugs. This work introduces the first multitasking (mtk) computational model focused on performing simultaneous predictions of antibacterial activities, and cytotoxicities of peptides. The model was created from a data set containing 3592 cases, and it displayed accuracy higher than 96% for classifying/predicting peptides in both training and prediction (test) sets. The technique known as alanine scanning was computationally applied to illustrate the calculation of the quantitative contributions of the amino acids (in their respective positions of the sequence) to the biological effects of a defined peptide. A small library formed by 10 peptides was generated, where peptides were designed by considering the interpretations of the different descriptors in the mtk-computational model. All the peptides were predicted to exhibit high antibacterial activities against multiple bacterial strains, and low cytotoxicity against various cell types. The present mtk-computational model can be considered a very useful tool to support high throughput research for the discovery of potent and safe AMPs.
Collapse
Affiliation(s)
- Valeria V. Kleandrova
- Faculty
of Technology and Production Management, Moscow State University of Food Production, Volokolamskoe shosse 11, Moscow, Russia
| | - Juan M. Ruso
- Department
of Applied Physics, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
| | - Alejandro Speck-Planche
- Department
of Applied Physics, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain
- LAQV@REQUIMTE,
Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
| | | |
Collapse
|
5
|
Speck-Planche A, Kleandrova VV, Ruso JM, Cordeiro MNDS. First Multitarget Chemo-Bioinformatic Model To Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens. J Chem Inf Model 2016; 56:588-98. [PMID: 26960000 DOI: 10.1021/acs.jcim.5b00630] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Antimicrobial peptides (AMPs) have emerged as promising therapeutic alternatives to fight against the diverse infections caused by different pathogenic microorganisms. In this context, theoretical approaches in bioinformatics have paved the way toward the creation of several in silico models capable of predicting antimicrobial activities of peptides. All current models have several significant handicaps, which prevent the efficient search for highly active AMPs. Here, we introduce the first multitarget (mt) chemo-bioinformatic model devoted to performing alignment-free prediction of antibacterial activity of peptides against multiple Gram-positive bacterial strains. The model was constructed from a data set containing 2488 cases of AMPs sequences assayed against at least 1 out of 50 Gram-positive bacterial strains. This mt-chemo-bioinformatic model displayed percentages of correct classification higher than 90.00% in both training and prediction (test) sets. For the first time, two computational approaches derived from basic concepts in genetics and molecular biology were applied, allowing the calculations of the relative contributions of any amino acid (in a defined position) to the antibacterial activity of an AMP and depending on the bacterial strain used in the biological assay. The present mt-chemo-bioinformatic model constitutes a powerful tool to enable the discovery of potent and versatile AMPs.
Collapse
Affiliation(s)
- Alejandro Speck-Planche
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain.,REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| | - Valeria V Kleandrova
- Faculty of Technology and Production Management, Moscow State University of Food Production , Volokolamskoe shosse 11, 125080 Moscow, Russia
| | - Juan M Ruso
- Department of Applied Physics, University of Santiago de Compostela (USC) , 15782 Santiago de Compostela, Spain
| | - M N D S Cordeiro
- REQUIMTE/Department of Chemistry and Biochemistry, University of Porto , 4169-007 Porto, Portugal
| |
Collapse
|
6
|
Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y. ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinformatics 2015; 16:162. [PMID: 25982853 PMCID: PMC4432771 DOI: 10.1186/s12859-015-0586-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution. RESULTS We here introduce ProtDCal, a new computational software suite capable of generating tens of thousands of features considering both sequence-based and 3D-structural descriptors. We demonstrate, by means of principle component analysis and Shannon entropy tests, how ProtDCal's sequence-based descriptors provide new and more relevant information not encoded by currently available servers for sequence-based protein feature generation. The wide diversity of the 3D-structure-based features generated by ProtDCal is shown to provide additional complementary information and effectively completes its general protein encoding capability. As demonstration of the utility of ProtDCal's features, prediction models of N-linked glycosylation sites are trained and evaluated. Classification performance compares favourably with that of contemporary predictors of N-linked glycosylation sites, in spite of not using domain-specific features as input information. CONCLUSIONS ProtDCal provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http://bioinf.sce.carleton.ca/ProtDCal/ . ProtDCal introduces local and group-based encoding which enhances the diversity of the information captured by the computed features. Furthermore, we have shown that adding structure-based descriptors contributes non-redundant additional information to the features-based characterization of polypeptide systems. This software is intended to provide a useful tool for general-purpose encoding of protein sequences and structures for applications is protein classification, similarity analyses and function prediction.
Collapse
Affiliation(s)
- Yasser B Ruiz-Blanco
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Waldo Paz
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Centre of Informatics Studies (CEI), Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP:54830, Villa Clara, Cuba.
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Yovani Marrero-Ponce
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Grupo de Investigación Microbiología y Ambiente (GIMA). Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura, Calle Real de Ternera, Cartagena (Bolivar), Colombia.
| |
Collapse
|
7
|
Ruiz-Blanco YB, Marrero-Ponce Y, Prieto PJ, Salgado J, García Y, Sotomayor-Torres CM. A Hooke׳s law-based approach to protein folding rate. J Theor Biol 2015; 364:407-17. [DOI: 10.1016/j.jtbi.2014.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 08/28/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
|