1
|
de Abreu AP, Carvalho FC, Mariano D, Bastos LL, Silva JRP, de Oliveira LM, de Melo-Minardi RC, Sabino ADP. An Approach for Engineering Peptides for Competitive Inhibition of the SARS-COV-2 Spike Protein. Molecules 2024; 29:1577. [PMID: 38611856 PMCID: PMC11013848 DOI: 10.3390/molecules29071577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 02/29/2024] [Accepted: 03/22/2024] [Indexed: 04/14/2024] Open
Abstract
SARS-CoV-2 is the virus responsible for a respiratory disease called COVID-19 that devastated global public health. Since 2020, there has been an intense effort by the scientific community to develop safe and effective prophylactic and therapeutic agents against this disease. In this context, peptides have emerged as an alternative for inhibiting the causative agent. However, designing peptides that bind efficiently is still an open challenge. Here, we show an algorithm for peptide engineering. Our strategy consists of starting with a peptide whose structure is similar to the interaction region of the human ACE2 protein with the SPIKE protein, which is important for SARS-COV-2 infection. Our methodology is based on a genetic algorithm performing systematic steps of random mutation, protein-peptide docking (using the PyRosetta library) and selecting the best-optimized peptides based on the contacts made at the peptide-protein interface. We performed three case studies to evaluate the tool parameters and compared our results with proposals presented in the literature. Additionally, we performed molecular dynamics (MD) simulations (three systems, 200 ns each) to probe whether our suggested peptides could interact with the spike protein. Our results suggest that our methodology could be a good strategy for designing peptides.
Collapse
Affiliation(s)
- Ana Paula de Abreu
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Frederico Chaves Carvalho
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Luana Luiza Bastos
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Juliana Rodrigues Pereira Silva
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Leandro Morais de Oliveira
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Raquel C. de Melo-Minardi
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Institute of Exact Sciences, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil; (A.P.d.A.); (F.C.C.); (L.L.B.); (L.M.d.O.)
| | - Adriano de Paula Sabino
- Laboratory of Clinical and Experimental Hematology, Clinical and Toxicological Analysis Department, Faculty of Pharmacy, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
2
|
Horácio ECA, de Carvalho LM, Pereira GG, Abrahim MC, Coelho MP, De Jesus DA, García GJY, de Melo-Minardi RC, Nagamatsu ST. Know-how of holding a Bioinformatics competition: Structure, model, overview, and perspectives. PLoS Comput Biol 2023; 19:e1011679. [PMID: 38127831 PMCID: PMC10735175 DOI: 10.1371/journal.pcbi.1011679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open
Abstract
The article presents a framework for a Bioinformatics competition that focuses on 4 key aspects: structure, model, overview, and perspectives. Structure represents the organizational framework employed to coordinate the main tasks involved in the competition. Model showcases the competition design, which encompasses 3 phases. Overview presents our case study, the League of Brazilian Bioinformatics (LBB) 2nd Edition. Finally, the section on perspectives provides a brief discussion of the LBB 2nd Edition, along with insights and feedback from participants. LBB is a biannual team competition launched in 2019 to promote the ongoing training of human resources in Bioinformatics and Computational Biology in Brazil. LBB aims to stimulate ongoing training in Bioinformatics by encouraging participation in competitions, promoting the organization of future Bioinformatics competitions, and fostering the integration of the Bioinformatics and Computational Biology community in the country, as well as collaboration among participants. The LBB 2nd Edition was launched in 2021 and featured 251 competitors forming 91 teams. Knowledge competitions promote learning, collaboration, and innovation, which are crucial for advancing scientific knowledge and solving real-world problems. In summary, this article serves as a valuable resource for individuals and organizations interested in developing knowledge competitions, offering a model based on our experience with LBB to benefit all levels of Bioinformatics trainees.
Collapse
Affiliation(s)
- Elvira C. A. Horácio
- Postgraduate Program Genetics, Institute of Biological Sciences, Federal University of Minas Gerais, Minas Gerais, Brazil
- Rene Rachou Institute, Oswaldo Cruz Foundation, Minas Gerais, Brazil
| | - Lucas M. de Carvalho
- Center for Computing in Engineering and Sciences, State University of Campinas, São Paulo, Brazil
| | | | - Mayla C. Abrahim
- Laboratório de Tecnologia Imunológica, Instituto de Tecnologia em Imunobiológicos, Vice-Diretoria de Desenvolvimento Tecnológico, Bio-Manguinhos, Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil
| | - Mônica P. Coelho
- Division of Clinical Immunology and Allergy, Medical Research Laboratory, School of Medicine - University of São Paulo, São Paulo, Brazil
| | - Deivid A. De Jesus
- Graduate Program in Genetics. Institute of Biology. Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Glen J. Y. García
- Bioinformatics Graduate Program, Institute of Biological Sciences, Federal University of Minas Gerais, Minas Gerais, Brazil
| | - Raquel C. de Melo-Minardi
- Computer Science Department, Institute of Exact Sciences, Federal University of Minas Gerais, Minas Gerais, Brazil
| | - Sheila T. Nagamatsu
- Yale University, School of Medicine, New Haven, Connecticut, United States of America
| |
Collapse
|
3
|
Salgado Á, de Melo-Minardi RC, Giovanetti M, Veloso A, Morais-Rodrigues F, Adelino T, de Jesus R, Tosta S, Azevedo V, Lourenco J, Alcantara LCJ. Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus. PLoS One 2022; 17:e0278982. [PMID: 36508435 PMCID: PMC9744328 DOI: 10.1371/journal.pone.0278982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open
Abstract
Yellow fever virus (YFV) is the agent of the most severe mosquito-borne disease in the tropics. Recently, Brazil suffered major YFV outbreaks with a high fatality rate affecting areas where the virus has not been reported for decades, consisting of urban areas where a large number of unvaccinated people live. We developed a machine learning framework combining three different algorithms (XGBoost, random forest and regularized logistic regression) to analyze YFV genomic sequences. This method was applied to 56 YFV sequences from human infections and 27 from non-human primate (NHPs) infections to investigate the presence of genetic signatures possibly related to disease severity (in human related sequences) and differences in PCR cycle threshold (Ct) values (in NHP related sequences). Our analyses reveal four non-synonymous single nucleotide variations (SNVs) on sequences from human infections, in proteins NS3 (E614D), NS4a (I69V), NS5 (R727G, V643A) and six non-synonymous SNVs on NHP sequences, in proteins E (L385F), NS1 (A171V), NS3 (I184V) and NS5 (N11S, I374V, E641D). We performed comparative protein structural analysis on these SNVs, describing possible impacts on protein function. Despite the fact that the dataset is limited in size and that this study does not consider virus-host interactions, our work highlights the use of machine learning as a versatile and fast initial approach to genomic data exploration.
Collapse
Affiliation(s)
- Álvaro Salgado
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (AS); (LCJA); (JL)
| | - Raquel C. de Melo-Minardi
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Marta Giovanetti
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Adriano Veloso
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Francielly Morais-Rodrigues
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Talita Adelino
- Laboratório Central de Saúde Pública, Fundação Ezequiel Dias, Belo Horizonte, Minas Gerais, Brazil
| | - Ronaldo de Jesus
- Coordenação Geral dos Laboratórios de Saúde Pública, Secretaria de Vigilância em Saúde, Ministério da Saúde, Brasília, DF, Brazil
| | - Stephane Tosta
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vasco Azevedo
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - José Lourenco
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- * E-mail: (AS); (LCJA); (JL)
| | - Luiz Carlos J. Alcantara
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
- * E-mail: (AS); (LCJA); (JL)
| |
Collapse
|
4
|
Santana CA, Izidoro SC, de Melo-Minardi RC, Tyzack JD, Ribeiro AJM, Pires DEV, Thornton JM, de A Silveira S. GRaSP-web: a machine learning strategy to predict binding sites based on residue neighborhood graphs. Nucleic Acids Res 2022; 50:W392-W397. [PMID: 35524575 PMCID: PMC9252730 DOI: 10.1093/nar/gkac323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/14/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open
Abstract
Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Australia
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| |
Collapse
|
5
|
Martins PM, Santos LH, Mariano D, Queiroz FC, Bastos LL, Gomes IDS, Fischer PHC, Rocha REO, Silveira SA, de Lima LHF, de Magalhães MTQ, Oliveira MGA, de Melo-Minardi RC. Propedia: a database for protein-peptide identification based on a hybrid clustering algorithm. BMC Bioinformatics 2021; 22:1. [PMID: 33388027 PMCID: PMC7776311 DOI: 10.1186/s12859-020-03881-z] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/13/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Protein-peptide interactions play a fundamental role in a wide variety of biological processes, such as cell signaling, regulatory networks, immune responses, and enzyme inhibition. Peptides are characterized by low toxicity and small interface areas; therefore, they are good targets for therapeutic strategies, rational drug planning and protein inhibition. Approximately 10% of the ethical pharmaceutical market is protein/peptide-based. Furthermore, it is estimated that 40% of protein interactions are mediated by peptides. Despite the fast increase in the volume of biological data, particularly on sequences and structures, there remains a lack of broad and comprehensive protein-peptide databases and tools that allow the retrieval, characterization and understanding of protein-peptide recognition and consequently support peptide design. RESULTS We introduce Propedia, a comprehensive and up-to-date database with a web interface that permits clustering, searching and visualizing of protein-peptide complexes according to varied criteria. Propedia comprises over 19,000 high-resolution structures from the Protein Data Bank including structural and sequence information from protein-peptide complexes. The main advantage of Propedia over other peptide databases is that it allows a more comprehensive analysis of similarity and redundancy. It was constructed based on a hybrid clustering algorithm that compares and groups peptides by sequences, interface structures and binding sites. Propedia is available through a graphical, user-friendly and functional interface where users can retrieve, and analyze complexes and download each search data set. We performed case studies and verified that the utility of Propedia scores to rank promissing interacting peptides. In a study involving predicting peptides to inhibit SARS-CoV-2 main protease, we showed that Propedia scores related to similarity between different peptide complexes with SARS-CoV-2 main protease are in agreement with molecular dynamics free energy calculation. CONCLUSIONS Propedia is a database and tool to support structure-based rational design of peptides for special purposes. Protein-peptide interactions can be useful to predict, classifying and scoring complexes or for designing new molecules as well. Propedia is up-to-date as a ready-to-use webserver with a friendly and resourceful interface and is available at: https://bioinfo.dcc.ufmg.br/propedia.
Collapse
Affiliation(s)
- Pedro M. Martins
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Lucianna H. Santos
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Felippe C. Queiroz
- Department of Computer Science, Universidade Federal de Viçosa, Av Peter Henry Rolfs, Viçosa, MG Brazil
| | - Luana L. Bastos
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Isabela de S. Gomes
- Department of Computer Science, Universidade Federal de Viçosa, Av Peter Henry Rolfs, Viçosa, MG Brazil
| | - Pedro H. C. Fischer
- Laboratory of Molecular Modeling and Bioinformatics, Department of Exact and Biological Sciences, Universidade Federal de São João Del-Rei, Rua Sétimo Moreira Martins, Sete Lagoas, MG Brazil
| | - Rafael E. O. Rocha
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Sabrina A. Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Av Peter Henry Rolfs, Viçosa, MG Brazil
| | - Leonardo H. F. de Lima
- Laboratory of Molecular Modeling and Bioinformatics, Department of Exact and Biological Sciences, Universidade Federal de São João Del-Rei, Rua Sétimo Moreira Martins, Sete Lagoas, MG Brazil
| | - Mariana T. Q. de Magalhães
- Macromolecule Biophysics Laboratory (LBM), Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| | - Maria G. A. Oliveira
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Av Peter Henry Rolfs, Viçosa, MG Brazil
| | - Raquel C. de Melo-Minardi
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Av Pres. Antônio Carlos, Belo Horizonte, MG 31720-901 Brazil
| |
Collapse
|
6
|
Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020; 36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
Results
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average.
Availability and implementation
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - João P A Moraes
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
7
|
Fassio AV, Santos LH, Silveira SA, Ferreira RS, de Melo-Minardi RC. nAPOLI: A Graph-Based Strategy to Detect and Visualize Conserved Protein-Ligand Interactions in Large-Scale. IEEE/ACM Trans Comput Biol Bioinform 2020; 17:1317-1328. [PMID: 30629512 DOI: 10.1109/tcbb.2019.2892099] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Essential roles in biological systems depend on protein-ligand recognition, which is mostly driven by specific non-covalent interactions. Consequently, investigating these interactions contributes to understanding how molecular recognition occurs. Nowadays, a large-scale data set of protein-ligand complexes is available in the Protein Data Bank, what led several tools to be proposed as an effort to elucidate protein-ligand interactions. Nonetheless, there is not an all-in-one tool that couples large-scale statistical, visual, and interactive analysis of conserved protein-ligand interactions. Therefore, we propose nAPOLI (Analysis of PrOtein-Ligand Interactions), a web server that combines large-scale analysis of conserved interactions in protein-ligand complexes at the atomic-level, interactive visual representations, and comprehensive reports of the interacting residues/atoms to detect and explore conserved non-covalent interactions. We demonstrate the potential of nAPOLI in detecting important conserved interacting residues through four case studies: two involving a human cyclin-dependent kinase 2 (CDK2), one related to ricin, and other to the human nuclear receptor subfamily 3 (hNR3). nAPOLI proved to be suitable to identify conserved interactions according to literature, as well as highlight additional interactions. Finally, we illustrate, with a virtual screening ligand selection, how nAPOLI can be widely applied in structural biology and drug design. nAPOLI is freely available at bioinfo.dcc.ufmg.br/napoli/.
Collapse
|
8
|
Ribeiro VS, Santana CA, Fassio AV, Cerqueira FR, da Silveira CH, Romanelli JPR, Patarroyo-Vargas A, Oliveira MGA, Gonçalves-Almeida V, Izidoro SC, de Melo-Minardi RC, Silveira SDA. visGReMLIN: graph mining-based detection and visualization of conserved motifs at 3D protein-ligand interface at the atomic level. BMC Bioinformatics 2020; 21:80. [PMID: 32164574 PMCID: PMC7068867 DOI: 10.1186/s12859-020-3347-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. Results To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. Conclusions We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.
Collapse
Affiliation(s)
- Vagner S Ribeiro
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Alexandre V Fassio
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Brazil
| | - Carlos H da Silveira
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - João P R Romanelli
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Adriana Patarroyo-Vargas
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Maria G A Oliveira
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Instituto de Biotecnologia aplicada à Agropecuária (BIOAGRO), Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Valdete Gonçalves-Almeida
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sandro C Izidoro
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| |
Collapse
|
9
|
Fassio AV, Martins PM, Guimarães SDS, Junior SSA, Ribeiro VS, de Melo-Minardi RC, Silveira SDA. Vermont: a multi-perspective visual interactive platform for mutational analysis. BMC Bioinformatics 2017; 18:403. [PMID: 28929973 PMCID: PMC5606220 DOI: 10.1186/s12859-017-1789-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A huge amount of data about genomes and sequence variation is available and continues to grow on a large scale, which makes experimentally characterizing these mutations infeasible regarding disease association and effects on protein structure and function. Therefore, reliable computational approaches are needed to support the understanding of mutations and their impacts. Here, we present VERMONT 2.0, a visual interactive platform that combines sequence and structural parameters with interactive visualizations to make the impact of protein point mutations more understandable. RESULTS We aimed to contribute a novel visual analytics oriented method to analyze and gain insight on the impact of protein point mutations. To assess the ability of VERMONT to do this, we visually examined a set of mutations that were experimentally characterized to determine if VERMONT could identify damaging mutations and why they can be considered so. CONCLUSIONS VERMONT allowed us to understand mutations by interpreting position-specific structural and physicochemical properties. Additionally, we note some specific positions we believe have an impact on protein function/structure in the case of mutation.
Collapse
Affiliation(s)
- Alexandre V Fassio
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil. .,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil.
| | - Pedro M Martins
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil
| | - Samuel da S Guimarães
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Sócrates S A Junior
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Vagner S Ribeiro
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, 6627, Antônio Carlos avenue, Pampulha, Belo Horizonte, 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Peter Henry Rolfs avenue, Campus Universitário, Viçosa, 36570-900, Brazil
| |
Collapse
|
10
|
Gonçalves WRS, Gonçalves-Almeida VM, Arruda AL, Meira W, da Silveira CH, Pires DEV, de Melo-Minardi RC. PDBest: a user-friendly platform for manipulating and enhancing protein structures. Bioinformatics 2015; 31:2894-6. [PMID: 25910698 DOI: 10.1093/bioinformatics/btv223] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 04/19/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED PDBest (PDB Enhanced Structures Toolkit) is a user-friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. With an intuitive graphical interface it allows users with no programming background to download and manipulate their files. The platform also exports protocols, enabling users to easily share PDB searching and filtering criteria, enhancing analysis reproducibility. AVAILABILITY AND IMPLEMENTATION PDBest installation packages are freely available for several platforms at http://www.pdbest.dcc.ufmg.br CONTACT wellisson@dcc.ufmg.br, dpires@dcc.ufmg.br, raquelcm@dcc.ufmg.br SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Aleksander L Arruda
- Department of Computer Science, Universidade Federal de Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Brazil
| | | | | | | |
Collapse
|
11
|
Abstract
MOTIVATION Currently, 25% of proteins annotated in Pfam have their function unknown. One way of predicting proteins function is by looking at their active site, which has two main parts: the catalytic site and the substrate binding site. The active site is more conserved than the other residues of the protein and can be a rich source of information for protein function prediction. This article presents a new heuristic method, named genetic active site search (GASS), which searches for given active site 3D templates in unknown proteins. The method can perform non-exact amino acid matches (conservative mutations), is able to find amino acids in different chains and does not impose any restrictions on the active site size. RESULTS GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.
Collapse
Affiliation(s)
- Sandro C Izidoro
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Raquel C de Melo-Minardi
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Gisele L Pappa
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
12
|
Silveira SA, Fassio AV, Gonçalves-Almeida VM, de Lima EB, Barcelos YT, Aburjaile FF, Rodrigues LM, Meira W, de Melo-Minardi RC. VERMONT: Visualizing mutations and their effects on protein physicochemical and topological property conservation. BMC Proc 2014; 8:S4. [PMID: 25237391 PMCID: PMC4155615 DOI: 10.1186/1753-6561-8-s2-s4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.
Collapse
Affiliation(s)
- Sabrina A Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Alexandre V Fassio
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Valdete M Gonçalves-Almeida
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Elisa B de Lima
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Yussif T Barcelos
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Flávia F Aburjaile
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Laerte M Rodrigues
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| |
Collapse
|
13
|
Pires DEV, de Melo-Minardi RC, da Silveira CH, Campos FF, Meira W. aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction. ACTA ACUST UNITED AC 2013; 29:855-61. [PMID: 23396119 DOI: 10.1093/bioinformatics/btt058] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Receptor-ligand interactions are a central phenomenon in most biological systems. They are characterized by molecular recognition, a complex process mainly driven by physicochemical and structural properties of both receptor and ligand. Understanding and predicting these interactions are major steps towards protein ligand prediction, target identification, lead discovery and drug design. RESULTS We propose a novel graph-based-binding pocket signature called aCSM, which proved to be efficient and effective in handling large-scale protein ligand prediction tasks. We compare our results with those described in the literature and demonstrate that our algorithm overcomes the competitor's techniques. Finally, we predict novel ligands for proteins from Trypanosoma cruzi, the parasite responsible for Chagas disease, and validate them in silico via a docking protocol, showing the applicability of the method in suggesting ligands for pockets in a real-world scenario. AVAILABILITY AND IMPLEMENTATION Datasets and the source code are available at http://www.dcc.ufmg.br/∼dpires/acsm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha Belo Horizonte - MG, 31270-901, Brazil.
| | | | | | | | | |
Collapse
|
14
|
Pires DEV, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W. Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 2011; 12 Suppl 4:S12. [PMID: 22369665 PMCID: PMC3287581 DOI: 10.1186/1471-2164-12-s4-s12] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. Results CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. Conclusions We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| | | | | | | | | | | |
Collapse
|
15
|
Bellinzoni M, Bastard K, Perret A, Zaparucha A, Perchat N, Vergne C, Wagner T, de Melo-Minardi RC, Artiguenave F, Cohen GN, Weissenbach J, Salanoubat M, Alzari PM. 3-Keto-5-aminohexanoate cleavage enzyme: a common fold for an uncommon Claisen-type condensation. J Biol Chem 2011; 286:27399-405. [PMID: 21632536 PMCID: PMC3149333 DOI: 10.1074/jbc.m111.253260] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Revised: 05/17/2011] [Indexed: 11/06/2022] Open
Abstract
The exponential increase in genome sequencing output has led to the accumulation of thousands of predicted genes lacking a proper functional annotation. Among this mass of hypothetical proteins, enzymes catalyzing new reactions or using novel ways to catalyze already known reactions might still wait to be identified. Here, we provide a structural and biochemical characterization of the 3-keto-5-aminohexanoate cleavage enzyme (Kce), an enzymatic activity long known as being involved in the anaerobic fermentation of lysine but whose catalytic mechanism has remained elusive so far. Although the enzyme shows the ubiquitous triose phosphate isomerase (TIM) barrel fold and a Zn(2+) cation reminiscent of metal-dependent class II aldolases, our results based on a combination of x-ray snapshots and molecular modeling point to an unprecedented mechanism that proceeds through deprotonation of the 3-keto-5-aminohexanoate substrate, nucleophilic addition onto an incoming acetyl-CoA, intramolecular transfer of the CoA moiety, and final retro-Claisen reaction leading to acetoacetate and 3-aminobutyryl-CoA. This model also accounts for earlier observations showing the origin of carbon atoms in the products, as well as the absence of detection of any covalent acyl-enzyme intermediate. Kce is the first representative of a large family of prokaryotic hypothetical proteins, currently annotated as the "domain of unknown function" DUF849.
Collapse
Affiliation(s)
- Marco Bellinzoni
- From the Unité de Microbiologie Structurale, Institut Pasteur, and CNRS-URA2185, 25 rue du Dr. Roux, 75724 Paris Cedex 15
| | - Karine Bastard
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Alain Perret
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Anne Zaparucha
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Nadia Perchat
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Carine Vergne
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Tristan Wagner
- From the Unité de Microbiologie Structurale, Institut Pasteur, and CNRS-URA2185, 25 rue du Dr. Roux, 75724 Paris Cedex 15
| | - Raquel C. de Melo-Minardi
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - François Artiguenave
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Georges N. Cohen
- the Institut Pasteur, 28 rue du Dr. Roux, 75724 Paris Cedex 15, France
| | - Jean Weissenbach
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Marcel Salanoubat
- the Direction des Sciences du Vivant, Commissariat à l'Energie Atomique (CEA), Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, 91057 Evry
- CNRS-UMR8030, 2 rue Gaston Crémieux, 91057 Evry
- the Université d'Evry Val d'Essonne, boulevard François Mitterrand, 91057 Evry, and
| | - Pedro M. Alzari
- From the Unité de Microbiologie Structurale, Institut Pasteur, and CNRS-URA2185, 25 rue du Dr. Roux, 75724 Paris Cedex 15
| |
Collapse
|
16
|
de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. ACTA ACUST UNITED AC 2010; 26:3075-82. [PMID: 20980272 DOI: 10.1093/bioinformatics/btq595] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. RESULTS Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. AVAILABILITY http://www.genoscope.fr/ASMC/.
Collapse
|