1
|
Xia Y, Pan X, Shen HB. A comprehensive survey on protein-ligand binding site prediction. Curr Opin Struct Biol 2024; 86:102793. [PMID: 38447285 DOI: 10.1016/j.sbi.2024.102793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
2
|
Kryshtafovych A, Antczak M, Szachniuk M, Zok T, Kretsch RC, Rangan R, Pham P, Das R, Robin X, Studer G, Durairaj J, Eberhardt J, Sweeney A, Topf M, Schwede T, Fidelis K, Moult J. New prediction categories in CASP15. Proteins 2023; 91:1550-1557. [PMID: 37306011 PMCID: PMC10713864 DOI: 10.1002/prot.26515] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 06/13/2023]
Abstract
Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
Collapse
Affiliation(s)
| | - Maciej Antczak
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Rachael C. Kretsch
- Biophysics Program, Stanford University School of MedicineStanfordCaliforniaUSA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of MedicineStanfordCaliforniaUSA
| | - Phillip Pham
- Biochemistry DepartmentStanford University School of MedicineStanfordCaliforniaUSA
| | - Rhiju Das
- Biochemistry DepartmentStanford University School of MedicineStanfordCaliforniaUSA
- Howard Hughes Medical Institute, Stanford UniversityStanfordCaliforniaUSA
| | - Xavier Robin
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Gabriel Studer
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Janani Durairaj
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Jerome Eberhardt
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Aaron Sweeney
- Centre for Structural Systems Biology (CSSB), Leibniz‐Institut für Virologie (LIV)HamburgGermany
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz‐Institut für Virologie (LIV)HamburgGermany
- Universitätsklinikum Hamburg Eppendorf (UKE)HamburgGermany
| | - Torsten Schwede
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Department of Cell Biology and Molecular genetics, University of MarylandRockvilleMarylandUSA
| |
Collapse
|
3
|
Gu L, Li B, Ming D. A multilayer dynamic perturbation analysis method for predicting ligand-protein interactions. BMC Bioinformatics 2022; 23:456. [PMID: 36324073 PMCID: PMC9628359 DOI: 10.1186/s12859-022-04995-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 10/19/2022] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Ligand-protein interactions play a key role in defining protein function, and detecting natural ligands for a given protein is thus a very important bioengineering task. In particular, with the rapid development of AI-based structure prediction algorithms, batch structural models with high reliability and accuracy can be obtained at low cost, giving rise to the urgent requirement for the prediction of natural ligands based on protein structures. In recent years, although several structure-based methods have been developed to predict ligand-binding pockets and ligand-binding sites, accurate and rapid methods are still lacking, especially for the prediction of ligand-binding regions and the spatial extension of ligands in the pockets. RESULTS In this paper, we proposed a multilayer dynamics perturbation analysis (MDPA) method for predicting ligand-binding regions based solely on protein structure, which is an extended version of our previously developed fast dynamic perturbation analysis (FDPA) method. In MDPA/FDPA, ligand binding tends to occur in regions that cause large changes in protein conformational dynamics. MDPA, examined using a standard validation dataset of ligand-protein complexes, yielded an averaged ligand-binding site prediction Matthews coefficient of 0.40, with a prediction precision of at least 50% for 71% of the cases. In particular, for 80% of the cases, the predicted ligand-binding region overlaps the natural ligand by at least 50%. The method was also compared with other state-of-the-art structure-based methods. CONCLUSIONS MDPA is a structure-based method to detect ligand-binding regions on protein surface. Our calculations suggested that a range of spaces inside the protein pockets has subtle interactions with the protein, which can significantly impact on the overall dynamics of the protein. This work provides a valuable tool as a starting point upon which further docking and analysis methods can be used for natural ligand detection in protein functional annotation. The source code of MDPA method is freely available at: https://github.com/mingdengming/mdpa .
Collapse
Affiliation(s)
- Lin Gu
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| | - Bin Li
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| | - Dengming Ming
- grid.412022.70000 0000 9389 5210College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangbei New District, Nanjing City, 211816 Jiangsu People’s Republic of China
| |
Collapse
|
4
|
Zemla AT, Allen JE, Kirshner D, Lightstone FC. PDBspheres: a method for finding 3D similarities in local regions in proteins. NAR Genom Bioinform 2022; 4:lqac078. [PMID: 36225529 PMCID: PMC9549786 DOI: 10.1093/nargab/lqac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 08/06/2022] [Accepted: 09/29/2022] [Indexed: 11/05/2022] Open
Abstract
We present a structure-based method for finding and evaluating structural similarities in protein regions relevant to ligand binding. PDBspheres comprises an exhaustive library of protein structure regions ('spheres') adjacent to complexed ligands derived from the Protein Data Bank (PDB), along with methods to find and evaluate structural matches between a protein of interest and spheres in the library. PDBspheres uses the LGA (Local-Global Alignment) structure alignment algorithm as the main engine for detecting structural similarities between the protein of interest and template spheres from the library, which currently contains >2 million spheres. To assess confidence in structural matches, an all-atom-based similarity metric takes side chain placement into account. Here, we describe the PDBspheres method, demonstrate its ability to detect and characterize binding sites in protein structures, show how PDBspheres-a strictly structure-based method-performs on a curated dataset of 2528 ligand-bound and ligand-free crystal structures, and use PDBspheres to cluster pockets and assess structural similarities among protein binding sites of 4876 structures in the 'refined set' of the PDBbind 2019 dataset.
Collapse
Affiliation(s)
- Adam T Zemla
- To whom correspondence should be addressed. Tel: +1 925 423 5571; Fax: +1 925 423 6437;
| | - Jonathan E Allen
- Global Security Computing Applications, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Dan Kirshner
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Felice C Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| |
Collapse
|
5
|
Paiva VDA, Gomes IDS, Monteiro CR, Mendonça MV, Martins PM, Santana CA, Gonçalves-Almeida V, Izidoro SC, Melo-Minardi RCD, Silveira SDA. Protein structural bioinformatics: An overview. Comput Biol Med 2022; 147:105695. [DOI: 10.1016/j.compbiomed.2022.105695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 06/01/2022] [Accepted: 06/02/2022] [Indexed: 11/27/2022]
|
6
|
Liu Y, Yang X, Gan J, Chen S, Xiao ZX, Cao Y. CB-Dock2: improved protein-ligand blind docking by integrating cavity detection, docking and homologous template fitting. Nucleic Acids Res 2022; 50:W159-W164. [PMID: 35609983 PMCID: PMC9252749 DOI: 10.1093/nar/gkac394] [Citation(s) in RCA: 300] [Impact Index Per Article: 150.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/24/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
Protein-ligand blind docking is a powerful method for exploring the binding sites of receptors and the corresponding binding poses of ligands. It has seen wide applications in pharmaceutical and biological researches. Previously, we proposed a blind docking server, CB-Dock, which has been under heavy use (over 200 submissions per day) by researchers worldwide since 2019. Here, we substantially improved the docking method by combining CB-Dock with our template-based docking engine to enhance the accuracy in binding site identification and binding pose prediction. In the benchmark tests, it yielded the success rate of ∼85% for binding pose prediction (RMSD < 2.0 Å), which outperformed original CB-Dock and most popular blind docking tools. This updated docking server, named CB-Dock2, reconfigured the input and output web interfaces, together with a highly automatic docking pipeline, making it a particularly efficient and easy-to-use tool for the bioinformatics and cheminformatics communities. The web server is freely available at https://cadd.labshare.cn/cb-dock2/.
Collapse
Affiliation(s)
- Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Xiaocong Yang
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Jianhong Gan
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Shuang Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China
| | - Zhi-Xiong Xiao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China.,Animal Disease Prevention and Food Safety Key Laboratory of Sichuan Province, Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Chengdu, China
| |
Collapse
|
7
|
Paiva VA, Mendonça MV, Silveira SA, Ascher DB, Pires DEV, Izidoro SC. GASS-Metal: identifying metal-binding sites on protein structures using genetic algorithms. Brief Bioinform 2022; 23:6590153. [PMID: 35595534 DOI: 10.1093/bib/bbac178] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 12/12/2022] Open
Abstract
Metals are present in >30% of proteins found in nature and assist them to perform important biological functions, including storage, transport, signal transduction and enzymatic activity. Traditional and experimental techniques for metal-binding site prediction are usually costly and time-consuming, making computational tools that can assist in these predictions of significant importance. Here we present Genetic Active Site Search (GASS)-Metal, a new method for protein metal-binding site prediction. The method relies on a parallel genetic algorithm to find candidate metal-binding sites that are structurally similar to curated templates from M-CSA and MetalPDB. GASS-Metal was thoroughly validated using homologous proteins and conservative mutations of residues, showing a robust performance. The ability of GASS-Metal to identify metal-binding sites was also compared with state-of-the-art methods, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96.1% of the sites correctly. GASS-Metal is freely available at https://gassmetal.unifei.edu.br. The GASS-Metal source code is available at https://github.com/sandroizidoro/gassmetal-local.
Collapse
Affiliation(s)
- Vinícius A Paiva
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, Brazil
| | - Murillo V Mendonça
- Institute of Technological Sciences, Campus Theodomiro Carneiro Santiago, Universidade Federal de Itajubá, Itabira, Brazil
| | - Sabrina A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, Brazil
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Sandro C Izidoro
- Institute of Technological Sciences, Campus Theodomiro Carneiro Santiago, Universidade Federal de Itajubá, Itabira, Brazil
| |
Collapse
|
8
|
Santana CA, Izidoro SC, de Melo-Minardi RC, Tyzack JD, Ribeiro AJM, Pires DEV, Thornton JM, de A Silveira S. GRaSP-web: a machine learning strategy to predict binding sites based on residue neighborhood graphs. Nucleic Acids Res 2022; 50:W392-W397. [PMID: 35524575 PMCID: PMC9252730 DOI: 10.1093/nar/gkac323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/14/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open
Abstract
Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Australia
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| |
Collapse
|
9
|
McGreig JE, Uri H, Antczak M, Sternberg MJE, Michaelis M, Wass MN. 3DLigandSite: structure-based prediction of protein-ligand binding sites. Nucleic Acids Res 2022; 50:W13-W20. [PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/13/2022] [Accepted: 04/03/2022] [Indexed: 01/13/2023] Open
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.
Collapse
Affiliation(s)
- Jake E McGreig
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Hannah Uri
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Magdalena Antczak
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Martin Michaelis
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Mark N Wass
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| |
Collapse
|
10
|
Hendrix SG, Chang KY, Ryu Z, Xie ZR. DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method. Int J Mol Sci 2021; 22:ijms22115510. [PMID: 34073705 PMCID: PMC8197219 DOI: 10.3390/ijms22115510] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 04/30/2021] [Accepted: 05/19/2021] [Indexed: 11/18/2022] Open
Abstract
It is essential for future research to develop a new, reliable prediction method of DNA binding sites because DNA binding sites on DNA-binding proteins provide critical clues about protein function and drug discovery. However, the current prediction methods of DNA binding sites have relatively poor accuracy. Using 3D coordinates and the atom-type of surface protein atom as the input, we trained and tested a deep learning model to predict how likely a voxel on the protein surface is to be a DNA-binding site. Based on three different evaluation datasets, the results show that our model not only outperforms several previous methods on two commonly used datasets, but also demonstrates its robust performance to be consistent among the three datasets. The visualized prediction outcomes show that the binding sites are also mostly located in correct regions. We successfully built a deep learning model to predict the DNA binding sites on target proteins. It demonstrates that 3D protein structures plus atom-type information on protein surfaces can be used to predict the potential binding sites on a protein. This approach should be further extended to develop the binding sites of other important biological molecules.
Collapse
Affiliation(s)
- Samuel Godfrey Hendrix
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
| | - Kuan Y. Chang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 202, Taiwan;
| | - Zeezoo Ryu
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
- Department of Computer Science, Franklin College of Arts and Sciences, University of Georgia, Athens, GA 30602, USA
| | - Zhong-Ru Xie
- Computational Drug Discovery Laboratory, School of Electrical and Computer Engineering, College of Engineering, University of Georgia, Athens, GA 30602, USA; (S.G.H.); (Z.R.)
- Correspondence:
| |
Collapse
|
11
|
Brackenridge DA, McGuffin LJ. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods with a Focus on FunFOLD3. Methods Mol Biol 2021; 2365:43-58. [PMID: 34432238 DOI: 10.1007/978-1-0716-1665-9_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Proteins are essential molecules with a diverse range of functions; elucidating their biological and biochemical characteristics can be difficult and time consuming using in vitro and/or in vivo methods. Additionally, in vivo protein-ligand binding site elucidation is unable to keep place with current growth in sequencing, leaving the majority of new protein sequences without known functions. Therefore, the development of new methods, which aim to predict the protein-ligand interactions and ligand-binding site residues directly from amino acid sequences, is becoming increasingly important. In silico prediction can utilise either sequence information, structural information or a combination of both. In this chapter, we will discuss the broad range of methods for ligand-binding site prediction from protein structure and we will describe our method, FunFOLD3, for the prediction of protein-ligand interactions and ligand-binding sites based on template-based modelling. Additionally, we will describe the step-by-step instructions using the FunFOLD3 downloadable application along with examples from the Critical Assessment of Techniques for Protein Structure Prediction (CASP) where FunFOLD3 has been used to aid ligand and ligand-binding site prediction. Finally, we will introduce our newer method, FunFOLD3-D, a version of FunFOLD3 which aims to improve template-based protein-ligand binding site prediction through the integration of docking, using AutoDock Vina.
Collapse
|
12
|
Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020; 36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
Results
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average.
Availability and implementation
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - João P A Moraes
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
13
|
Predicting binding sites from unbound versus bound protein structures. Sci Rep 2020; 10:15856. [PMID: 32985584 PMCID: PMC7522209 DOI: 10.1038/s41598-020-72906-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 07/27/2020] [Indexed: 11/30/2022] Open
Abstract
We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew’s correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.
Collapse
|
14
|
Yang J, Kwon S, Bae SH, Park KM, Yoon C, Lee JH, Seok C. GalaxySagittarius: Structure- and Similarity-Based Prediction of Protein Targets for Druglike Compounds. J Chem Inf Model 2020; 60:3246-3254. [PMID: 32401021 DOI: 10.1021/acs.jcim.0c00104] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Computational techniques for predicting interactions of proteins and druglike molecules have often been used to search for compounds that bind a given protein with high affinity. More recently, such tools have also been applied to the reverse procedure of searching protein targets for a given compound. Among methods for predicting protein-ligand interactions, ligand-based methods relying on similarity to ligands of known interactions are effective only when similar protein-ligand interactions are known. Receptor-based methods predicting protein-ligand interactions by molecular docking are effective only when high-accuracy receptor structures and binding sites are available. Moreover, the computational cost of molecular docking tends to be too high to be applied to the entire protein structure database. In this paper, an effective target prediction method, which combines ligand similarity-based and receptor structure-based approaches, is introduced. In this method, protein-ligand docking is performed after efficient structure- and similarity-based screening. The enriched protein target database by predicted binding ligands and sites allows detection of protein targets with previously unknown ligand interactions. The method, called GalaxySagittarius, is freely available as a web server at http://galaxy.seoklab.org/sagittarius.
Collapse
Affiliation(s)
- Jinsol Yang
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Sang-Hun Bae
- Dr. Noah Biotech, Gwanggyo Ace Tower 2, 256 Changyoung-daero, Yeongtong-gu, Suwon 16229, Republic of Korea
| | - Kyoung Mii Park
- Dr. Noah Biotech, Gwanggyo Ace Tower 2, 256 Changyoung-daero, Yeongtong-gu, Suwon 16229, Republic of Korea
| | - Changsik Yoon
- Dr. Noah Biotech, Gwanggyo Ace Tower 2, 256 Changyoung-daero, Yeongtong-gu, Suwon 16229, Republic of Korea
| | - Ji-Hyun Lee
- Dr. Noah Biotech, Gwanggyo Ace Tower 2, 256 Changyoung-daero, Yeongtong-gu, Suwon 16229, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
15
|
Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 2019; 33:887-903. [PMID: 31628659 DOI: 10.1007/s10822-019-00235-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 10/25/2022]
Abstract
In the current "genomic era" the number of identified genes is growing exponentially. However, the biological function of a large number of the corresponding proteins is still unknown. Recognition of small molecule ligands (e.g., substrates, inhibitors, allosteric regulators, etc.) is pivotal for protein functions in the vast majority of the cases and knowledge of the region where these processes take place is essential for protein function prediction and drug design. In this regard, computational methods represent essential tools to tackle this problem. A significant number of software tools have been developed in the last few years which exploit either protein sequence information, structure information or both. This review describes the most recent developments in protein function recognition and binding site prediction, in terms of both freely-available and commercial solutions and tools, detailing the main characteristics of the considered tools and providing a comparative analysis of their performance.
Collapse
|
16
|
Zhou H, Cao H, Skolnick J. FINDSITE comb2.0: A New Approach for Virtual Ligand Screening of Proteins and Virtual Target Screening of Biomolecules. J Chem Inf Model 2018; 58:2343-2354. [PMID: 30278128 PMCID: PMC6437778 DOI: 10.1021/acs.jcim.8b00309] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Computational approaches for predicting protein-ligand interactions can facilitate drug lead discovery and drug target determination. We have previously developed a threading/structural-based approach, FINDSITEcomb, for the virtual ligand screening of proteins that has been extensively experimentally validated. Even when low resolution predicted protein structures are employed, FINDSITEcomb has the advantage of being faster and more accurate than traditional high-resolution structure-based docking methods. It also overcomes the limitations of traditional QSAR methods that require a known set of seed ligands that bind to the given protein target. Here, we further improve FINDSITEcomb by enhancing its template ligand selection from the PDB/DrugBank/ChEMBL libraries of known protein-ligand interactions by (1) parsing the template proteins and their corresponding binding ligands in the DrugBank and ChEMBL libraries into domains so that the ligands with falsely matched domains to the targets will not be selected as template ligands; (2) applying various thresholds to filter out falsely matched template structures in the structure comparison process and thus their corresponding ligands for template ligand selection. With a sequence identity cutoff of 30% of target to templates and modeled target structures, FINDSITEcomb2.0 is shown to significantly improve upon FINDSITEcomb on the DUD-E benchmark set by increasing the 1% enrichment factor from 16.7 to 22.1, with a p-value of 4.3 × 10-3 by the Student t-test. With an 80% sequence identity cutoff of target to templates for the DUD-E set and modeled target structures, FINDSITEcomb2.0, having a 1% ROC enrichment factor of 52.39, also outperforms state-of-the-art methods that employ machine learning such as a deep convolutional neural network, CNN, with an enrichment of 29.65. Thus, FINDSITEcomb2.0 represents a significant improvement in the state-of-the-art. The FINDSITEcomb2.0 web service is freely available for academic users at http://pwp.gatech.edu/cssb/FINDSITE-COMB-2 .
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332-2000
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332-2000
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332-2000
| |
Collapse
|
17
|
Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform 2018; 10:39. [PMID: 30109435 PMCID: PMC6091426 DOI: 10.1186/s13321-018-0285-8] [Citation(s) in RCA: 181] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 06/29/2018] [Indexed: 01/29/2023] Open
Abstract
Background Ligand binding site prediction from protein structure has many applications related to elucidation of protein function and structure based drug discovery. It often represents only one step of many in complex computational drug design efforts. Although many methods have been published to date, only few of them are suitable for use in automated pipelines or for processing large datasets.
These use cases require stability and speed, which disqualifies many of the recently introduced tools that are either template based or available only as web servers. Results We present P2Rank, a stand-alone template-free tool for prediction of ligand binding sites based on machine learning. It is based on prediction of ligandability of local chemical neighbourhoods that are centered on points placed on the solvent accessible surface of a protein.
We show that P2Rank outperforms several existing tools, which include two widely used stand-alone tools (Fpocket, SiteHound), a comprehensive consensus based tool (MetaPocket 2.0), and a recent deep learning based method (DeepSite). P2Rank belongs to the fastest available tools (requires under 1 s for prediction on one protein), with additional advantage of multi-threaded implementation. Conclusions P2Rank is a new open source software package for ligand binding site prediction from protein structure. It is available as a user-friendly stand-alone command line program and a Java library. P2Rank has a lightweight installation and does not depend on other bioinformatics tools or large structural or sequence databases. Thanks to its speed and ability to make fully automated predictions, it is particularly well suited for processing large datasets or as a component of scalable structural bioinformatics pipelines. Electronic supplementary material The online version of this article (10.1186/s13321-018-0285-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Radoslav Krivák
- Department of Software Engineering, Charles University, Prague, Czech Republic.
| | - David Hoksza
- Department of Software Engineering, Charles University, Prague, Czech Republic.
| |
Collapse
|
18
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
19
|
Hu X, Wang K, Dong Q. Protein ligand-specific binding residue predictions by an ensemble classifier. BMC Bioinformatics 2016; 17:470. [PMID: 27855637 PMCID: PMC5114821 DOI: 10.1186/s12859-016-1348-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 11/10/2016] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Prediction of ligand binding sites is important to elucidate protein functions and is helpful for drug design. Although much progress has been made, many challenges still need to be addressed. Prediction methods need to be carefully developed to account for chemical and structural differences between ligands. RESULTS In this study, we present ligand-specific methods to predict the binding sites of protein-ligand interactions. First, a sequence-based method is proposed that only extracts features from protein sequence information, including evolutionary conservation scores and predicted structure properties. An improved AdaBoost algorithm is applied to address the serious imbalance problem between the binding and non-binding residues. Then, a combined method is proposed that combines the current template-free method and four other well-established template-based methods. The above two methods predict the ligand binding sites along the sequences using a ligand-specific strategy that contains metal ions, acid radical ions, nucleotides and ferroheme. Testing on a well-established dataset showed that the proposed sequence-based method outperformed the profile-based method by 4-19% in terms of the Matthews correlation coefficient on different ligands. The combined method outperformed each of the individual methods, with an improvement in the average Matthews correlation coefficients of 5.55% over all ligands. The results also show that the ligand-specific methods significantly outperform the general-purpose methods, which confirms the necessity of developing elaborate ligand-specific methods for ligand binding site prediction. CONCLUSIONS Two efficient ligand-specific binding site predictors are presented. The standalone package is freely available for academic usage at http://dase.ecnu.edu.cn/qwdong/TargetCom/TargetCom_standalone.tar.gz or request upon the corresponding author.
Collapse
Affiliation(s)
- Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051 People’s Republic of China
| | - Kai Wang
- College of Animal Science and Technology, Jilin Agricultural University, Changchun, 130118 People’s Republic of China
| | - Qiwen Dong
- Institute for Data Science and Engineering, East China Normal University, Shanghai, 200062 People’s Republic of China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055 People’s Republic of China
- Present Address: School of Computer Science and Software Engineering, East China Normal University, #3663, North Zhongshan RD, Shanghai, 200062 China
| |
Collapse
|
20
|
Chen P, Hu S, Zhang J, Gao X, Li J, Xia J, Wang B. A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:901-912. [PMID: 26661785 DOI: 10.1109/tcbb.2015.2505286] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
BACKGROUND Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures. RESULTS This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites. CONCLUSIONS Experimental results on two Critical Assessment of protein Structure Prediction datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art. AVAILABILITY http://www2.ahu.edu.cn/pchen/web/LigandDSES.htm.
Collapse
|
21
|
Jian JW, Elumalai P, Pitti T, Wu CY, Tsai KC, Chang JY, Peng HP, Yang AS. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms. PLoS One 2016; 11:e0160315. [PMID: 27513851 PMCID: PMC4981321 DOI: 10.1371/journal.pone.0160315] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 07/18/2016] [Indexed: 11/18/2022] Open
Abstract
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites.
Collapse
Affiliation(s)
- Jhih-Wei Jian
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan 11221
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
| | | | - Thejkiran Pitti
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan 115
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan 30013
| | - Chih Yuan Wu
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Keng-Chang Tsai
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Jeng-Yih Chang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - Hung-Pin Peng
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
| | - An-Suei Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan 115
- * E-mail:
| |
Collapse
|
22
|
Huwe PJ, Xu Q, Shapovalov MV, Modi V, Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 2016; 84 Suppl 1:370-91. [PMID: 27181425 DOI: 10.1002/prot.24997] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/10/2016] [Accepted: 01/18/2016] [Indexed: 12/26/2022]
Abstract
In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Peter J Huwe
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | | - Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Mark D Andrake
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
23
|
Heo L, Lee H, Baek M, Seok C. Binding Site Prediction of Proteins with Organic Compounds or Peptides Using GALAXY Web Servers. Methods Mol Biol 2016; 1414:33-45. [PMID: 27094284 DOI: 10.1007/978-1-4939-3569-7_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We introduce two GALAXY web servers called GalaxySite and GalaxyPepDock that predict protein complex structures with small organic compounds and peptides, respectively. GalaxySite predicts ligands that may bind the input protein and generates complex structures of the protein with the predicted ligands from the protein structure given as input or predicted from the input sequence. GalaxyPepDock takes a protein structure and a peptide sequence as input and predicts structures for the protein-peptide complex. Both GalaxySite and GalaxyPepDock rely on available experimentally resolved structures of protein-ligand complexes evolutionarily related to the target. With the continuously increasing size of the protein structure database, the probability of finding related proteins in the database is increasing. The servers further relax the complex structures to refine the structural aspects that are missing in the available structures or that are not compatible with the given protein by optimizing physicochemical interactions. GalaxyPepDock allows conformational change of the protein receptor induced by peptide binding. The atomistic interactions with ligands predicted by the GALAXY servers may offer important clues for designing new molecules or proteins with desired binding properties.
Collapse
Affiliation(s)
- Lim Heo
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Hasup Lee
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Minkyung Baek
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, Republic of Korea.
| |
Collapse
|
24
|
Abstract
Protein-ligand binding site prediction methods aim to predict, from amino acid sequence, protein-ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein-ligand interactions has become extremely important to help determine a protein's functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein-ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein-ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein-ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
Collapse
|
25
|
Roche DB, Brackenridge DA, McGuffin LJ. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods. Int J Mol Sci 2015; 16:29829-42. [PMID: 26694353 PMCID: PMC4691145 DOI: 10.3390/ijms161226202] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 12/02/2015] [Accepted: 12/10/2015] [Indexed: 01/14/2023] Open
Abstract
Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.
Collapse
Affiliation(s)
- Daniel Barry Roche
- Institut de Biologie Computationnelle, LIRMM, CNRS, Université de Montpellier, Montpellier 34095, France.
- Centre de Recherche de Biochimie Macromoléculaire, CNRS-UMR 5237, Montpellier 34293, France.
| | | | | |
Collapse
|
26
|
Pramanik S, Kutzner A, Heese K. 3D Structure, Dimerization Modeling, and Lead Discovery by Ligand-protein Interaction Analysis of p60 Transcription Regulator Protein (p60TRP). Mol Inform 2015; 35:99-108. [DOI: 10.1002/minf.201500035] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 05/20/2015] [Indexed: 12/28/2022]
|
27
|
McGuffin LJ, Atkins JD, Salehe BR, Shuid AN, Roche DB. IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences. Nucleic Acids Res 2015; 43:W169-73. [PMID: 25820431 PMCID: PMC4489238 DOI: 10.1093/nar/gkv236] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/08/2015] [Indexed: 11/13/2022] Open
Abstract
IntFOLD is an independent web server that integrates our leading methods for structure and function prediction. The server provides a simple unified interface that aims to make complex protein modelling data more accessible to life scientists. The server web interface is designed to be intuitive and integrates a complex set of quantitative data, so that 3D modelling results can be viewed on a single page and interpreted by non-expert modellers at a glance. The only required input to the server is an amino acid sequence for the target protein. Here we describe major performance and user interface updates to the server, which comprises an integrated pipeline of methods for: tertiary structure prediction, global and local 3D model quality assessment, disorder prediction, structural domain prediction, function prediction and modelling of protein-ligand interactions. The server has been independently validated during numerous CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments, as well as being continuously evaluated by the CAMEO (Continuous Automated Model Evaluation) project. The IntFOLD server is available at: http://www.reading.ac.uk/bioinf/IntFOLD/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Jennifer D Atkins
- School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Bajuna R Salehe
- School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Ahmad N Shuid
- School of Biological Sciences, University of Reading, Reading, RG6 6AS, UK
| | - Daniel B Roche
- Institut de Biologie Computationnelle, LIRMM, CNRS, Université de Montpellier, Montpellier 34095, France Centre de Recherches de Biochimie Macromoléculaire, CNRS- UMR 5237, Montpellier 34293, France
| |
Collapse
|
28
|
Petrey D, Chen TS, Deng L, Garzon JI, Hwang H, Lasso G, Lee H, Silkov A, Honig B. Template-based prediction of protein function. Curr Opin Struct Biol 2015; 32:33-8. [PMID: 25678152 DOI: 10.1016/j.sbi.2015.01.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Revised: 01/13/2015] [Accepted: 01/19/2015] [Indexed: 12/11/2022]
Abstract
We discuss recent approaches for structure-based protein function annotation. We focus on template-based methods where the function of a query protein is deduced from that of a template for which both the structure and function are known. We describe the different ways of identifying a template. These are typically based on sequence analysis but new methods based on purely structural similarity are also being developed that allow function annotation based on structural relationships that cannot be recognized by sequence. The growing number of available structures of known function, improved homology modeling techniques and new developments in the use of structure allow template-based methods to be applied on a proteome-wide scale and in many different biological contexts. This progress significantly expands the range of applicability of structural information in function annotation to a level that previously was only achievable by sequence comparison.
Collapse
Affiliation(s)
- Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States.
| | - T Scott Chen
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Lei Deng
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Jose Ignacio Garzon
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Howook Hwang
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Gorka Lasso
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Hunjoong Lee
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Antonina Silkov
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, United States
| |
Collapse
|
29
|
Berman HM, Gabanyi MJ, Groom CR, Johnson JE, Murshudov GN, Nicholls RA, Reddy V, Schwede T, Zimmerman MD, Westbrook J, Minor W. Data to knowledge: how to get meaning from your result. IUCRJ 2015; 2:45-58. [PMID: 25610627 PMCID: PMC4285880 DOI: 10.1107/s2052252514023306] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 10/22/2014] [Indexed: 05/19/2023]
Abstract
Structural and functional studies require the development of sophisticated 'Big Data' technologies and software to increase the knowledge derived and ensure reproducibility of the data. This paper presents summaries of the Structural Biology Knowledge Base, the VIPERdb Virus Structure Database, evaluation of homology modeling by the Protein Model Portal, the ProSMART tool for conformation-independent structure comparison, the LabDB 'super' laboratory information management system and the Cambridge Structural Database. These techniques and technologies represent important tools for the transformation of crystallographic data into knowledge and information, in an effort to address the problem of non-reproducibility of experimental results.
Collapse
Affiliation(s)
- Helen M. Berman
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Colin R. Groom
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
| | - John E. Johnson
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - Garib N. Murshudov
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, England
| | - Robert A. Nicholls
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, England
| | - Vijay Reddy
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland
- SIB-Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Matthew D. Zimmerman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
| | - John Westbrook
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
30
|
Izidoro SC, de Melo-Minardi RC, Pappa GL. GASS: identifying enzyme active sites with genetic algorithms. ACTA ACUST UNITED AC 2014; 31:864-70. [PMID: 25388152 DOI: 10.1093/bioinformatics/btu746] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
MOTIVATION Currently, 25% of proteins annotated in Pfam have their function unknown. One way of predicting proteins function is by looking at their active site, which has two main parts: the catalytic site and the substrate binding site. The active site is more conserved than the other residues of the protein and can be a rich source of information for protein function prediction. This article presents a new heuristic method, named genetic active site search (GASS), which searches for given active site 3D templates in unknown proteins. The method can perform non-exact amino acid matches (conservative mutations), is able to find amino acids in different chains and does not impose any restrictions on the active site size. RESULTS GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.
Collapse
Affiliation(s)
- Sandro C Izidoro
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Raquel C de Melo-Minardi
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Gisele L Pappa
- Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
31
|
Heo L, Shin WH, Lee MS, Seok C. GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res 2014; 42:W210-4. [PMID: 24753427 PMCID: PMC4086128 DOI: 10.1093/nar/gku321] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Knowledge of ligand-binding sites of proteins provides invaluable information for
functional studies, drug design and protein design. Recent progress in
ligand-binding-site prediction methods has demonstrated that using information
from similar proteins of known structures can improve predictions. The
GalaxySite web server, freely accessible at http://galaxy.seoklab.org/site, combines such information with
molecular docking for more precise binding-site prediction for non-metal
ligands. According to the recent critical assessments of structure prediction
methods held in 2010 and 2012, this server was found to be superior or
comparable to other state-of-the-art programs in the category of
ligand-binding-site prediction. A strong merit of the GalaxySite program is that
it provides additional predictions on binding ligands and their binding poses in
terms of the optimized 3D coordinates of the protein–ligand complexes,
whereas other methods predict only identities of binding-site residues or copy
binding geometry from similar proteins. The additional information on the
specific binding geometry would be very useful for applications in functional
studies and computer-aided drug discovery.
Collapse
Affiliation(s)
- Lim Heo
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Woong-Hee Shin
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Myeong Sup Lee
- Department of Biomedical Sciences, University of Ulsan College of Medicine, Seoul 138-736, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
32
|
Kryshtafovych A, Moult J, Bales P, Bazan JF, Biasini M, Burgin A, Chen C, Cochran FV, Craig TK, Das R, Fass D, Garcia-Doval C, Herzberg O, Lorimer D, Luecke H, Ma X, Nelson DC, van Raaij MJ, Rohwer F, Segall A, Seguritan V, Zeth K, Schwede T. Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 2014; 82 Suppl 2:26-42. [PMID: 24318984 PMCID: PMC4072496 DOI: 10.1002/prot.24489] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Revised: 11/01/2013] [Accepted: 11/09/2013] [Indexed: 11/12/2022]
Abstract
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, California 95616,
| | - John Moult
- Institute for Bioscience and Biotechnology Research, Department of Cell Biology and Molecular genetics, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA;
| | - Patrick Bales
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA;
| | - J. Fernando Bazan
- (1) Departments of Protein Engineering and (2) Structural Biology, Genentech, 1 DNA Way, South San Francisco, CA 94080, (3) Present address: 44th & Aspen Life Sciences, 924 4th St. N., Stillwater, MN 55082,
| | - Marco Biasini
- (1) Biozentrum, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland; (2) SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50, 4056 Basel, Switzerland;
| | - Alex Burgin
- Broad Institute, 5 Cambridge Center, Cambridge, MA 02142, USA;
| | - Chen Chen
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA;
| | - Frank V. Cochran
- Department of Biochemistry, Stanford University, Stanford, California, 94305, USA;
| | | | - Rhiju Das
- (1) Department of Biochemistry, Stanford University, Stanford, California, 94305, USA; (2) Department of Physics, Stanford University, Stanford, California, 94305, USA,
| | - Deborah Fass
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100 Israel, Tel: +972-8-934-3214; Fax: +972-8-934-4136;
| | - Carmela Garcia-Doval
- Centro Nactional de Biotecnologia (CNB-CSIC), calle Darwin 3, E-28049 Madrid, Spain.
| | - Osnat Herzberg
- (1) Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA; (2) Department of Chemistry and Biochemistry, University of Maryland, College Park;
| | - Donald Lorimer
- Emerald Bio, 7869 NE Day Rd W, Bainbridge Isle, WA 98110, USA;
| | - Hartmut Luecke
- Center for Biomembrane Systems and Depts. of Biochemistry, Biophysics & Computer Science, 3205 McGaugh Hall, University of California, Irvine, CA 92697-3900, USA;
| | - Xiaolei Ma
- (1) Departments of Protein Engineering and (2) Structural Biology, Genentech, 1 DNA Way, South San Francisco, CA 94080 (3) Present address: Novartis Institutes for Biomedical Research, 4560 Horton St., Emeryville, CA 94608, USA;
| | - Daniel C. Nelson
- (1) Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA; (2) Department of Veterinary Medicine, University of Maryland, College Park,
| | - Mark J. van Raaij
- Centro Nactional de Biotecnologia (CNB-CSIC), calle Darwin 3, E-28049 Madrid, Spain.
| | - Forest Rohwer
- Department of Biology, San Diego State University, San Diego, CA 92182, USA;
| | - Anca Segall
- Department of Biology, San Diego State University, San Diego, CA 92182, USA;
| | - Victor Seguritan
- Department of Biology, San Diego State University, San Diego, CA 9218
| | - Kornelius Zeth
- Unidad de Biofisica (CSIC-UPV/EHU), Barrio Sarriena s/n 48940, Leioa, Vizcaya, SPAIN, and IKERBASQUE, Basque Foundation for Science, Bilbao, Spain;
| | - Torsten Schwede
- (1) Biozentrum, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland; (2) SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50, 4056 Basel, Switzerland;
| |
Collapse
|
33
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014. [PMID: 24344053 DOI: 10.1002/prot.24452.critical] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850
| | | | | | | | | |
Collapse
|
34
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014; 82 Suppl 2:1-6. [PMID: 24344053 PMCID: PMC4394854 DOI: 10.1002/prot.24452] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/21/2013] [Indexed: 12/28/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland 20850
| | | | | | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
35
|
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
|