1
|
Carpenter KA, Altman RB. Databases of ligand-binding pockets and protein-ligand interactions. Comput Struct Biotechnol J 2024; 23:1320-1338. [PMID: 38585646 PMCID: PMC10997877 DOI: 10.1016/j.csbj.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/16/2024] [Accepted: 03/17/2024] [Indexed: 04/09/2024] Open
Abstract
Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
Collapse
Affiliation(s)
- Kristy A. Carpenter
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
2
|
Xiao Y, Woods RJ. Protein-Ligand CH-π Interactions: Structural Informatics, Energy Function Development, and Docking Implementation. J Chem Theory Comput 2023; 19:5503-5515. [PMID: 37493980 PMCID: PMC10448718 DOI: 10.1021/acs.jctc.3c00300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Indexed: 07/27/2023]
Abstract
Here, we develop an empirical energy function based on quantum mechanical data for the interaction between methane and benzene that captures the contribution from CH-π interactions. Such interactions are frequently observed in protein-ligand crystal structures, particularly for carbohydrate ligands, but have been hard to quantify due to the absence of a model for CH-π interactions in typical molecular mechanical force fields or docking scoring functions. The CH-π term was added to the AutoDock Vina (AD VINA) scoring function enabling its performance to be evaluated against a cohort of more than 1600 occurrences in 496 experimental structures of protein-ligand complexes. By employing a conformational grid search algorithm, inclusion of the CH-π term was shown to improve the prediction of the preferred orientation of flexible ligands in protein-binding sites and to enhance the detection of carbohydrate-binding sites that display CH-π interactions. Last but not least, this term was also shown to improve docking performance for the CASF-2016 benchmark set and a carbohydrate set.
Collapse
Affiliation(s)
- Yao Xiao
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602, United States
| | - Robert J. Woods
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602, United States
| |
Collapse
|
3
|
Atre NM, Alagarasu K, Shil P. ArVirInd-a database of arboviral antigenic proteins from the Indian subcontinent. PeerJ 2022; 10:e13851. [PMID: 36299508 PMCID: PMC9590419 DOI: 10.7717/peerj.13851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 07/16/2022] [Indexed: 01/24/2023] Open
Abstract
Background Studies on antigenic proteins for arboviruses are important for providing diagnostics and vaccine development. India and its neighboring countries have a huge burden of arboviral diseases. Data mining for country-specific sequences from existing bioinformatics databases is cumbersome and time-consuming. This necessitated the development of a database of antigenic proteins from arboviruses isolated from the countries of the Indian subcontinent. Methods Arboviral antigenic protein sequences were obtained from the NCBI and other databases. In silico antigenic characterization was performed (Epitope predictions) and data was incorporated into the database. The front end was designed and developed using HTML, CSS, and PHP. For the backend of the database, we have used MySQL. Results A database, named ArVirInd, is created as a repository of information on curated antigenic proteins. This enlists sequences by country and year of outbreak or origin of the viral strain. For each entry, antigenic information is provided along with functional sites, etc. Researchers can search this database by virus/protein name, country, and year of collection (or in combination) as well as peptide search for epitopes. It is available publicly via the Internet at http://www.arvirind.co.in. ArVirInd will be useful in the study of immune informatics, diagnostics, and vaccinology for arboviruses.
Collapse
Affiliation(s)
- Nitin Motilal Atre
- Bioinformatics, ICMR National Institute of Virology Pune, Pune, Maharashtra, India
| | - Kalichamy Alagarasu
- Bioinformatics, ICMR National Institute of Virology Pune, Pune, Maharashtra, India
| | - Pratip Shil
- Bioinformatics, ICMR National Institute of Virology Pune, Pune, Maharashtra, India
| |
Collapse
|
4
|
Wang B, Mei C, Wang Y, Zhou Y, Cheng MT, Zheng CH, Wang L, Zhang J, Chen P, Xiong Y. Imbalance Data Processing Strategy for Protein Interaction Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:985-994. [PMID: 31751283 DOI: 10.1109/tcbb.2019.2953908] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.
Collapse
|
5
|
Wang Y, Mei C, Zhou Y, Wang Y, Zheng C, Zhen X, Xiong Y, Chen P, Zhang J, Wang B. Semi-supervised prediction of protein interaction sites from unlabeled sample information. BMC Bioinformatics 2019; 20:699. [PMID: 31874616 PMCID: PMC6929468 DOI: 10.1186/s12859-019-3274-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. Results In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. Conclusion The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.
Collapse
Affiliation(s)
- Ye Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Changqing Mei
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yuming Zhou
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Chunhou Zheng
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Xiao Zhen
- School of Computer Science and Technology, Anhui University of Technology, Maanshan, 243002, Anhui, China
| | - Yan Xiong
- School of Computer Science and Technology, University of Science & Technology, Hefei, 230026, Anhui, China
| | - Peng Chen
- Institute of Health Sciences, Anhui University, Hefei, 230601, Anhui, China.
| | - Jun Zhang
- College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, 243002, Anhui, China. .,Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|