1
|
Agüero-Chapin G, Galpert-Cañizares D, Domínguez-Pérez D, Marrero-Ponce Y, Pérez-Machado G, Teijeira M, Antunes A. Emerging Computational Approaches for Antimicrobial Peptide Discovery. Antibiotics (Basel) 2022; 11:antibiotics11070936. [PMID: 35884190 PMCID: PMC9311958 DOI: 10.3390/antibiotics11070936] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 02/05/2023] Open
Abstract
In the last two decades many reports have addressed the application of artificial intelligence (AI) in the search and design of antimicrobial peptides (AMPs). AI has been represented by machine learning (ML) algorithms that use sequence-based features for the discovery of new peptidic scaffolds with promising biological activity. From AI perspective, evolutionary algorithms have been also applied to the rational generation of peptide libraries aimed at the optimization/design of AMPs. However, the literature has scarcely dedicated to other emerging non-conventional in silico approaches for the search/design of such bioactive peptides. Thus, the first motivation here is to bring up some non-standard peptide features that have been used to build classical ML predictive models. Secondly, it is valuable to highlight emerging ML algorithms and alternative computational tools to predict/design AMPs as well as to explore their chemical space. Another point worthy of mention is the recent application of evolutionary algorithms that actually simulate sequence evolution to both the generation of diversity-oriented peptide libraries and the optimization of hit peptides. Last but not least, included here some new considerations in proteogenomic analyses currently incorporated into the computational workflow for unravelling AMPs in natural sources.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| | - Deborah Galpert-Cañizares
- Departamento de Ciencia de la Computación, Universidad Central Marta Abreu de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Dany Domínguez-Pérez
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Proquinorte, Unipessoal, Lda, Avenida 5 de Outubro, 124, 7º Piso, Avenidas Novas, 1050-061 Lisboa, Portugal
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas and Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador;
| | - Gisselle Pérez-Machado
- EpiDisease S.L—Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Marta Teijeira
- Departamento de Química Orgánica, Facultade de Química, Universidade de Vigo, 36310 Vigo, Spain;
- Instituto de Investigación Sanitaria Galicia Sur, Hospital Álvaro Cunqueiro, 36213 Vigo, Spain
| | - Agostinho Antunes
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| |
Collapse
|
2
|
Almeida D, Domínguez-Pérez D, Matos A, Agüero-Chapin G, Osório H, Vasconcelos V, Campos A, Antunes A. Putative Antimicrobial Peptides of the Posterior Salivary Glands from the Cephalopod Octopus vulgaris Revealed by Exploring a Composite Protein Database. Antibiotics (Basel) 2020; 9:antibiotics9110757. [PMID: 33143020 PMCID: PMC7693380 DOI: 10.3390/antibiotics9110757] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 10/28/2020] [Accepted: 10/28/2020] [Indexed: 12/13/2022] Open
Abstract
Cephalopods, successful predators, can use a mixture of substances to subdue their prey, becoming interesting sources of bioactive compounds. In addition to neurotoxins and enzymes, the presence of antimicrobial compounds has been reported. Recently, the transcriptome and the whole proteome of the Octopus vulgaris salivary apparatus were released, but the role of some compounds—e.g., histones, antimicrobial peptides (AMPs), and toxins—remains unclear. Herein, we profiled the proteome of the posterior salivary glands (PSGs) of O. vulgaris using two sample preparation protocols combined with a shotgun-proteomics approach. Protein identification was performed against a composite database comprising data from the UniProtKB, all transcriptomes available from the cephalopods’ PSGs, and a comprehensive non-redundant AMPs database. Out of the 10,075 proteins clustered in 1868 protein groups, 90 clusters corresponded to venom protein toxin families. Additionally, we detected putative AMPs clustered with histones previously found as abundant proteins in the saliva of O. vulgaris. Some of these histones, such as H2A and H2B, are involved in systemic inflammatory responses and their antimicrobial effects have been demonstrated. These results not only confirm the production of enzymes and toxins by the O. vulgaris PSGs but also suggest their involvement in the first line of defense against microbes.
Collapse
Affiliation(s)
- Daniela Almeida
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
| | - Dany Domínguez-Pérez
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
| | - Ana Matos
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
- Biology Department of the Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
- Biology Department of the Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Hugo Osório
- i3S—Instituto de Investigação e Inovação em Saúde-i3S, University of Porto, 4200-135 Porto, Portugal;
- Ipatimup—Institute of Molecular Pathology and Immunology of the University of Porto, University of Porto, 4200-135 Porto, Portugal
- Department of Pathology and Oncology of the Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
| | - Vitor Vasconcelos
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
- Biology Department of the Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Alexandre Campos
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
| | - Agostinho Antunes
- CIIMAR/CIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal; (D.A.); (D.D.-P.); (A.M.); (G.A.-C.); (V.V.); (A.C.)
- Biology Department of the Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
- Correspondence:
| |
Collapse
|
3
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
4
|
Sinha S, Nge CE, Leong CY, Ng V, Crasta S, Alfatah M, Goh F, Low KN, Zhang H, Arumugam P, Lezhava A, Chen SL, Kanagasundaram Y, Ng SB, Eisenhaber F, Eisenhaber B. Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723. BMC Genomics 2019; 20:374. [PMID: 31088369 PMCID: PMC6518819 DOI: 10.1186/s12864-019-5762-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 05/02/2019] [Indexed: 12/20/2022] Open
Abstract
Background Phomafungin is a recently reported broad spectrum antifungal compound but its biosynthetic pathway is unknown. We combed publicly available Phoma genomes but failed to find any putative biosynthetic gene cluster that could account for its biosynthesis. Results Therefore, we sequenced the genome of one of our Phoma strains (F3723) previously identified as having antifungal activity in a high-throughput screen. We found a biosynthetic gene cluster that was predicted to synthesize a cyclic lipodepsipeptide that differs in the amino acid composition compared to Phomafungin. Antifungal activity guided isolation yielded a new compound, BII-Rafflesfungin, the structure of which was determined. Conclusions We describe the NRPS-t1PKS cluster ‘BIIRfg’ compatible with the synthesis of the cyclic lipodepsipeptide BII-Rafflesfungin [HMHDA-L-Ala-L-Glu-L-Asn-L-Ser-L-Ser-D-Ser-D-allo-Thr-Gly]. We report new Stachelhaus codes for Ala, Glu, Asn, Ser, Thr, and Gly. We propose a mechanism for BII-Rafflesfungin biosynthesis, which involves the formation of the lipid part by BIIRfg_PKS followed by activation and transfer of the lipid chain by a predicted AMP-ligase on to the first PCP domain of the BIIRfg_NRPS gene. Electronic supplementary material The online version of this article (10.1186/s12864-019-5762-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.
| | - Choy-Eng Nge
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Chung Yan Leong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Veronica Ng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Sharon Crasta
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Mohammad Alfatah
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Falicia Goh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Kia-Ngee Low
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Huibin Zhang
- Genome Institue of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, #02-01 Genome, Singapore, 138672, Republic of Singapore
| | - Prakash Arumugam
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Alexander Lezhava
- Genome Institue of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, #02-01 Genome, Singapore, 138672, Republic of Singapore
| | - Swaine L Chen
- Genome Institue of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, #02-01 Genome, Singapore, 138672, Republic of Singapore.,Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, 1E Kent Ridge Road, NUHS Tower Block, Level 10, Singapore, 119228, Republic of Singapore
| | - Yoganathan Kanagasundaram
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Siew Bee Ng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.,School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), 50 Nanyang Drive, Singapore, 637553, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.
| |
Collapse
|
5
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
6
|
Ruiz-Blanco YB, Agüero-Chapin G, García-Hernández E, Álvarez O, Antunes A, Green J. Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone. BMC Bioinformatics 2017; 18:349. [PMID: 28732462 PMCID: PMC5521120 DOI: 10.1186/s12859-017-1758-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/13/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Yasser B Ruiz-Blanco
- Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, 54830, Santa Clara, Cuba.,Theoretical Chemistry, Max Planck Institute für Kohlenforschung, 45470, Mulheim an der Ruhr, Germany
| | - Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| | - Enrique García-Hernández
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), 04360, D.F, México, Mexico
| | - Orlando Álvarez
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|
7
|
Agüero-Chapin G, Pérez-Machado G, Sánchez-Rodríguez A, Santos MM, Antunes A. Alignment-Free Methods for the Detection and Specificity Prediction of Adenylation Domains. Methods Mol Biol 2016; 1401:253-272. [PMID: 26831713 DOI: 10.1007/978-1-4939-3375-4_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying adenylation domains (A-domains) and their substrate specificity can aid the detection of nonribosomal peptide synthetases (NRPS) at genome/proteome level and allow inferring the structure of oligopeptides with relevant biological activities. However, that is challenging task due to the high sequence diversity of A-domains (~10-40 % of amino acid identity) and their selectivity for 50 different natural/unnatural amino acids. Altogether these characteristics make their detection and the prediction of their substrate specificity a real challenge when using traditional sequence alignment methods, e.g., BLAST searches. In this chapter we describe two workflows based on alignment-free methods intended for the identification and substrate specificity prediction of A-domains. To identify A-domains we introduce a graphical-numerical method, implemented in TI2BioP version 2.0 (topological indices to biopolymers), which in a first step uses protein four-color maps to represent A-domains. In a second step, simple topological indices (TIs), called spectral moments, are derived from the graphical representations of known A-domains (positive dataset) and of unrelated but well-characterized sequences (negative set). Spectral moments are then used as input predictors for statistical classification techniques to build alignment-free models. Finally, the resulting alignment-free models can be used to explore entire proteomes for unannotated A-domains. In addition, this graphical-numerical methodology works as a sequence-search method that can be ensemble with homology-based tools to deeply explore the A-domain signature and cope with the diversity of this class (Aguero-Chapin et al., PLoS One 8(7):e65926, 2013). The second workflow for the prediction of A-domain's substrate specificity is based on alignment-free models constructed by transductive support vector machines (TSVMs) that incorporate information of uncharacterized A-domains. The construction of the models was implemented in the NRPSpredictor and in a first step uses the physicochemical fingerprint of the 34 residues lining the active site of the phenylalanine-adenylation domain of gramicidin synthetase A [PDB ID 1 amu] to derive a feature vector. Homologous positions were extracted for A-domains with known and unknown substrate specificities and turned into feature vectors. At the same time, A-domains with known specificities towards similar substrates were clustered by physicochemical properties of amino acids (AA). In a second step, support vector machines (SVMs) were optimized from feature vectors of characterized A-domains in each of the resulting clusters. Later, SVMs were used in the variant of TSVMs that integrate a fraction of uncharacterized A-domains during training to predict unknown specificities. Finally, uncharacterized A-domains were scored by each of the constructed alignment-free models (TSVM) representing each substrate specificity resulting from the clustering. The model producing the largest score for the uncharacterized A-domain assigns the substrate specificity to it (Rausch et al., Nucleic Acids Res 33:5799-5808, 2005).
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, Porto, 4050-123, Portugal
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas (UCLV), Santa Clara, 54830, Cuba
| | - Gisselle Pérez-Machado
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas (UCLV), Santa Clara, 54830, Cuba
| | - Aminael Sánchez-Rodríguez
- Departamento de Ciencias Naturales, Universidad Técnica Particular de Loja, San Cayetano Alto, S/N, Loja, Ecuador
| | - Miguel Machado Santos
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, Porto, 4050-123, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Porto, 4169-007, Portugal
| | - Agostinho Antunes
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, Porto, 4050-123, Portugal.
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Porto, 4169-007, Portugal.
| |
Collapse
|