1
|
Lingė D, Gedgaudas M, Merkys A, Petrauskas V, Vaitkus A, Grybauskas A, Paketurytė V, Zubrienė A, Zakšauskas A, Mickevičiūtė A, Smirnovienė J, Baranauskienė L, Čapkauskaitė E, Dudutienė V, Urniežius E, Konovalovas A, Kazlauskas E, Shubin K, Schiöth HB, Chen WY, Ladbury JE, Gražulis S, Matulis D. PLBD: protein-ligand binding database of thermodynamic and kinetic intrinsic parameters. Database (Oxford) 2023; 2023:baad040. [PMID: 37290059 PMCID: PMC10250011 DOI: 10.1093/database/baad040] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 05/15/2023] [Indexed: 06/10/2023]
Abstract
We introduce a protein-ligand binding database (PLBD) that presents thermodynamic and kinetic data of reversible protein interactions with small molecule compounds. The manually curated binding data are linked to protein-ligand crystal structures, enabling structure-thermodynamics correlations to be determined. The database contains over 5500 binding datasets of 556 sulfonamide compound interactions with the 12 catalytically active human carbonic anhydrase isozymes defined by fluorescent thermal shift assay, isothermal titration calorimetry, inhibition of enzymatic activity and surface plasmon resonance. In the PLBD, the intrinsic thermodynamic parameters of interactions are provided, which account for the binding-linked protonation reactions. In addition to the protein-ligand binding affinities, the database provides calorimetrically measured binding enthalpies, providing additional mechanistic understanding. The PLBD can be applied to investigations of protein-ligand recognition and could be integrated into small molecule drug design. Database URL https://plbd.org/.
Collapse
Affiliation(s)
- Darius Lingė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Marius Gedgaudas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Andrius Merkys
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Antanas Vaitkus
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Algirdas Grybauskas
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Vaida Paketurytė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Audrius Zakšauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Aurelija Mickevičiūtė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Joana Smirnovienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Lina Baranauskienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Edita Čapkauskaitė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Virginija Dudutienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Ernestas Urniežius
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Aleksandras Konovalovas
- Department of Biochemistry and Molecular Biology, Institute of Biosciences, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Kirill Shubin
- Latvian Institute of Organic Synthesis, Aizkraukles Street 21, Riga LV-1006, Latvia
| | - Helgi B Schiöth
- Functional Pharmacology and Neuroscience, Department of Surgical Sciences, Uppsala University, Kirurgiska Vetenskaper, Box 593, Uppsala 751 24, Sweden
| | - Wen-Yih Chen
- Department of Chemical and Materials Engineering, National Central University, No. 300, Zhongda Rd., Zhongli Dist., Taoyuan City, Jhong-Li 320, Taiwan
| | - John E Ladbury
- School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Saulius Gražulis
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
2
|
Norval LW, Krämer SD, Gao M, Herz T, Li J, Rath C, Wöhrle J, Günther S, Roth G. KOFFI and Anabel 2.0-a new binding kinetics database and its integration in an open-source binding analysis software. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5585575. [PMID: 31608948 PMCID: PMC6790968 DOI: 10.1093/database/baz101] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 07/12/2019] [Accepted: 07/23/2019] [Indexed: 12/31/2022]
Abstract
The kinetics of featured interactions (KOFFI) database is a novel tool and resource for binding kinetics data from biomolecular interactions. While binding kinetics data are abundant in literature, finding valuable information is a laborious task. We used text extraction methods to store binding rates (association, dissociation) as well as corresponding meta-information (e.g. methods, devices) in a novel database. To date, over 270 articles were manually curated and binding data on over 1705 interactions was collected and stored in the (KOFFI) database. Moreover, the KOFFI database application programming interface was implemented in Anabel (open-source software for the analysis of binding interactions), enabling users to directly compare their own binding data analyses with related experiments described in the database.
Collapse
Affiliation(s)
- Leo William Norval
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,Institute of Pharmaceutical Sciences, Pharmaceutical Bioinformatics, Albert-Ludwigs-University Freiburg, Hermann-Herder-Straße 9, D-79104 Freiburg, Germany
| | - Stefan Daniel Krämer
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,Faculty for Biology, Albert-Ludwigs-University Freiburg, Schaenzlestrasse 1, D-79104 Freiburg, Germany
| | - Mingjie Gao
- Institute of Pharmaceutical Sciences, Pharmaceutical Bioinformatics, Albert-Ludwigs-University Freiburg, Hermann-Herder-Straße 9, D-79104 Freiburg, Germany
| | - Tobias Herz
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,Faculty for Biology, Albert-Ludwigs-University Freiburg, Schaenzlestrasse 1, D-79104 Freiburg, Germany
| | - Jianyu Li
- Institute of Pharmaceutical Sciences, Pharmaceutical Bioinformatics, Albert-Ludwigs-University Freiburg, Hermann-Herder-Straße 9, D-79104 Freiburg, Germany
| | - Christin Rath
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,Faculty for Biology, Albert-Ludwigs-University Freiburg, Schaenzlestrasse 1, D-79104 Freiburg, Germany.,BioCopy GmbH, Spechtweg 25, D-79110 Freiburg, Germany.,BIOSS Center for Biological Signalling Studies, Albert-Ludwigs-University Freiburg, Schänzlestrasse 18, D-79104 Freiburg, Germany
| | - Johannes Wöhrle
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,IMTEK Department of Microsystems Engineering, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 103, D-79110 Freiburg, Germany
| | - Stefan Günther
- Institute of Pharmaceutical Sciences, Pharmaceutical Bioinformatics, Albert-Ludwigs-University Freiburg, Hermann-Herder-Straße 9, D-79104 Freiburg, Germany
| | - Günter Roth
- ZBSA Center for Biological Systems Analysis, Albert-Ludwigs-University Freiburg, Habsburgerstrasse 49, D-79104 Freiburg, Germany.,Faculty for Biology, Albert-Ludwigs-University Freiburg, Schaenzlestrasse 1, D-79104 Freiburg, Germany.,BioCopy GmbH, Spechtweg 25, D-79110 Freiburg, Germany.,BIOSS Center for Biological Signalling Studies, Albert-Ludwigs-University Freiburg, Schänzlestrasse 18, D-79104 Freiburg, Germany
| |
Collapse
|
3
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
4
|
Thafar M, Raies AB, Albaradei S, Essack M, Bajic VB. Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities. Front Chem 2019; 7:782. [PMID: 31824921 PMCID: PMC6879652 DOI: 10.3389/fchem.2019.00782] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 10/30/2019] [Indexed: 12/30/2022] Open
Abstract
The drug development is generally arduous, costly, and success rates are low. Thus, the identification of drug-target interactions (DTIs) has become a crucial step in early stages of drug discovery. Consequently, developing computational approaches capable of identifying potential DTIs with minimum error rate are increasingly being pursued. These computational approaches aim to narrow down the search space for novel DTIs and shed light on drug functioning context. Most methods developed to date use binary classification to predict if the interaction between a drug and its target exists or not. However, it is more informative but also more challenging to predict the strength of the binding between a drug and its target. If that strength is not sufficiently strong, such DTI may not be useful. Therefore, the methods developed to predict drug-target binding affinities (DTBA) are of great value. In this study, we provide a comprehensive overview of the existing methods that predict DTBA. We focus on the methods developed using artificial intelligence (AI), machine learning (ML), and deep learning (DL) approaches, as well as related benchmark datasets and databases. Furthermore, guidance and recommendations are provided that cover the gaps and directions of the upcoming work in this research area. To the best of our knowledge, this is the first comprehensive comparison analysis of tools focused on DTBA with reference to AI/ML/DL.
Collapse
Affiliation(s)
- Maha Thafar
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Arwa Bin Raies
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Vladimir B. Bajic
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
5
|
Torres PHM, Sodero ACR, Jofily P, Silva-Jr FP. Key Topics in Molecular Docking for Drug Design. Int J Mol Sci 2019; 20:E4574. [PMID: 31540192 PMCID: PMC6769580 DOI: 10.3390/ijms20184574] [Citation(s) in RCA: 197] [Impact Index Per Article: 39.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/09/2019] [Accepted: 07/10/2019] [Indexed: 12/18/2022] Open
Abstract
Molecular docking has been widely employed as a fast and inexpensive technique in the past decades, both in academic and industrial settings. Although this discipline has now had enough time to consolidate, many aspects remain challenging and there is still not a straightforward and accurate route to readily pinpoint true ligands among a set of molecules, nor to identify with precision the correct ligand conformation within the binding pocket of a given target molecule. Nevertheless, new approaches continue to be developed and the volume of published works grows at a rapid pace. In this review, we present an overview of the method and attempt to summarise recent developments regarding four main aspects of molecular docking approaches: (i) the available benchmarking sets, highlighting their advantages and caveats, (ii) the advances in consensus methods, (iii) recent algorithms and applications using fragment-based approaches, and (iv) the use of machine learning algorithms in molecular docking. These recent developments incrementally contribute to an increase in accuracy and are expected, given time, and together with advances in computing power and hardware capability, to eventually accomplish the full potential of this area.
Collapse
Affiliation(s)
- Pedro H M Torres
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
| | - Ana C R Sodero
- Department of Drugs and Medicines; School of Pharmacy; Federal University of Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Paula Jofily
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Floriano P Silva-Jr
- Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro 21949-900, RJ, Brazil.
| |
Collapse
|
6
|
Thermodynamic, kinetic, and structural parameterization of human carbonic anhydrase interactions toward enhanced inhibitor design. Q Rev Biophys 2019; 51:e10. [PMID: 30912486 DOI: 10.1017/s0033583518000082] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The aim of rational drug design is to develop small molecules using a quantitative approach to optimize affinity. This should enhance the development of chemical compounds that would specifically, selectively, reversibly, and with high affinity interact with a target protein. It is not yet possible to develop such compounds using computational (i.e., in silico) approach and instead the lead molecules are discovered in high-throughput screening searches of large compound libraries. The main reason why in silico methods are not capable to deliver is our poor understanding of the compound structure-thermodynamics and structure-kinetics correlations. There is a need for databases of intrinsic binding parameters (e.g., the change upon binding in standard Gibbs energy (ΔGint), enthalpy (ΔHint), entropy (ΔSint), volume (ΔVintr), heat capacity (ΔCp,int), association rate (ka,int), and dissociation rate (kd,int)) between a series of closely related proteins and a chemically diverse, but pharmacophoric group-guided library of compounds together with the co-crystal structures that could help explain the structure-energetics correlations and rationally design novel compounds. Assembly of these data will facilitate attempts to provide correlations and train data for modeling of compound binding. Here, we report large datasets of the intrinsic thermodynamic and kinetic data including over 400 primary sulfonamide compound binding to a family of 12 catalytically active human carbonic anhydrases (CA). Thermodynamic parameters have been determined by the fluorescent thermal shift assay, isothermal titration calorimetry, and by the stopped-flow assay of the inhibition of enzymatic activity. Kinetic measurements were performed using surface plasmon resonance. Intrinsic thermodynamic and kinetic parameters of binding were determined by dissecting the binding-linked protonation reactions of the protein and sulfonamide. The compound structure-thermodynamics and kinetics correlations reported here helped to discover compounds that exhibited picomolar affinities, hour-long residence times, and million-fold selectivities over non-target CA isoforms. Drug-lead compounds are suggested for anticancer target CA IX and CA XII, antiglaucoma CA IV, antiobesity CA VA and CA VB, and other isoforms. Together with 85 X-ray crystallographic structures of 60 compounds bound to six CA isoforms, the database should be of help to continue developing the principles of rational target-based drug design.
Collapse
|
7
|
Pawar G, Madden JC, Ebbrell D, Firman JW, Cronin MTD. In Silico Toxicology Data Resources to Support Read-Across and (Q)SAR. Front Pharmacol 2019; 10:561. [PMID: 31244651 PMCID: PMC6580867 DOI: 10.3389/fphar.2019.00561] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 05/03/2019] [Indexed: 12/14/2022] Open
Abstract
A plethora of databases exist online that can assist in in silico chemical or drug safety assessment. However, a systematic review and grouping of databases, based on purpose and information content, consolidated in a single source, has been lacking. To resolve this issue, this review provides a comprehensive listing of the key in silico data resources relevant to: chemical identity and properties, drug action, toxicology (including nano-material toxicity), exposure, omics, pathways, Absorption, Distribution, Metabolism and Elimination (ADME) properties, clinical trials, pharmacovigilance, patents-related databases, biological (genes, enzymes, proteins, other macromolecules etc.) databases, protein-protein interactions (PPIs), environmental exposure related, and finally databases relating to animal alternatives in support of 3Rs policies. More than nine hundred databases were identified and reviewed against criteria relating to accessibility, data coverage, interoperability or application programming interface (API), appropriate identifiers, types of in vitro, in vivo,-clinical or other data recorded and suitability for modelling, read-across, or similarity searching. This review also specifically addresses the need for solutions for mapping and integration of databases into a common platform for better translatability of preclinical data to clinical data.
Collapse
Affiliation(s)
| | | | | | | | - Mark T. D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
8
|
Markosian C, Di Costanzo L, Sekharan M, Shao C, Burley SK, Zardecki C. Analysis of impact metrics for the Protein Data Bank. Sci Data 2018; 5:180212. [PMID: 30325351 PMCID: PMC6190746 DOI: 10.1038/sdata.2018.212] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 08/29/2018] [Indexed: 01/13/2023] Open
Abstract
Since 1971, the Protein Data Bank (PDB) archive has served as the single, global repository for open access to atomic-level data for biological macromolecules. The archive currently holds >140,000 structures (>1 billion atoms). These structures are the molecules of life found in all organisms. Knowing the 3D structure of a biological macromolecule is essential for understanding the molecule's function, providing insights in health and disease, food and energy production, and other topics of concern to prosperity and sustainability. PDB data are freely and publicly available, without restrictions on usage. Through bibliometric and usage studies, we sought to determine the impact of the PDB across disciplines and demographics. Our analysis shows that even though research areas such as molecular biology and biochemistry account for the most usage, other fields are increasingly using PDB resources. PDB usage is seen across 150 disciplines in applied sciences, humanities, and social sciences. Data are also re-used and integrated with >400 resources. Our study identifies trends in PDB usage and documents its utility across research disciplines.
Collapse
Affiliation(s)
- Christopher Markosian
- Department of Molecular Biology and Biochemistry, School of Arts and Sciences, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| | - Luigi Di Costanzo
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| | - Monica Sekharan
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| | - Chenghua Shao
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA.,RCSB Protein Data Bank, Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA USA.,Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ USA
| | - Christine Zardecki
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| |
Collapse
|
9
|
Abstract
Ligandability is a prerequisite for druggability and is a much easier concept to understand, model and predict because it does not depend on the complex pharmacodynamic and pharmacokinetic mechanisms in the human body. In this review, we consider a metric for quantifying ligandability from experimental data. We discuss ligandability in terms of the balance between effort and reward. The metric is evaluated for a standard set of well-studied drug targets - some traditionally considered to be ligandable and some regarded as difficult. We suggest that this metric should be used to systematically improve computational predictions of ligandability, which can then be applied to novel drug targets to predict their tractability.
Collapse
|
10
|
Réau M, Langenfeld F, Zagury JF, Lagarde N, Montes M. Decoys Selection in Benchmarking Datasets: Overview and Perspectives. Front Pharmacol 2018; 9:11. [PMID: 29416509 PMCID: PMC5787549 DOI: 10.3389/fphar.2018.00011] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 01/05/2018] [Indexed: 11/24/2022] Open
Abstract
Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets.
Collapse
Affiliation(s)
- Manon Réau
- Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France
| | - Florent Langenfeld
- Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France
| | - Jean-François Zagury
- Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France
| | - Nathalie Lagarde
- Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France
| | - Matthieu Montes
- Laboratoire GBA, EA4627, Conservatoire National des Arts et Métiers, Paris, France
| |
Collapse
|
11
|
Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R. Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. Acc Chem Res 2017; 50:302-309. [PMID: 28182403 DOI: 10.1021/acs.accounts.6b00491] [Citation(s) in RCA: 207] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
Collapse
Affiliation(s)
- Zhihai Liu
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Minyi Su
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Li Han
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Jie Liu
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Qifan Yang
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Yan Li
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
| | - Renxiao Wang
- State
Key Laboratory of Bioorganic and Natural Products Chemistry, Collaborative
Innovation Center of Chemistry for Life Sciences, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Lingling Road, Shanghai 200032, People’s Republic of China
- State
Key Laboratory of Quality Research in Chinese Medicine, Macau Institute
for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, People’s Republic of China
| |
Collapse
|
12
|
Yan Z, Wang J. Scoring Functions of Protein-Ligand Interactions. Oncology 2017. [DOI: 10.4018/978-1-5225-0549-5.ch036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Scoring function of protein-ligand interactions is used to recognize the “native” binding pose of a ligand on the protein and to predict the binding affinity, so that the active small molecules can be discriminated from the non-active ones. Scoring function is widely used in computationally molecular docking and structure-based drug discovery. The development and improvement of scoring functions have broad implications in pharmaceutical industry and academic research. During the past three decades, much progress have been made in methodology and accuracy for scoring functions, and many successful cases have be witnessed in virtual database screening. In this chapter, the authors introduced the basic types of scoring functions and their derivations, the commonly-used evaluation methods and benchmarks, as well as the underlying challenges and current solutions. Finally, the authors discussed the promising directions to improve and develop scoring functions for future molecular docking-based drug discovery.
Collapse
|
13
|
Abstract
The use of macromolecular structures is widespread for a variety of applications, from teaching protein structure principles all the way to ligand optimization in drug development. Applying data mining techniques on these experimentally determined structures requires a highly uniform, standardized structural data source. The Protein Data Bank (PDB) has evolved over the years toward becoming the standard resource for macromolecular structures. However, the process selecting the data most suitable for specific applications is still very much based on personal preferences and understanding of the experimental techniques used to obtain these models. In this chapter, we will first explain the challenges with data standardization, annotation, and uniformity in the PDB entries determined by X-ray crystallography. We then discuss the specific effect that crystallographic data quality and model optimization methods have on structural models and how validation tools can be used to make informed choices. We also discuss specific advantages of using the PDB_REDO databank as a resource for structural data. Finally, we will provide guidelines on how to select the most suitable protein structure models for detailed analysis and how to select a set of structure models suitable for data mining.
Collapse
Affiliation(s)
- Bart van Beusekom
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Anastassis Perrakis
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Robbie P Joosten
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| |
Collapse
|
14
|
Shin WH, Lee GR, Seok C. Evaluation of GalaxyDock Based on the Community Structure-Activity Resource 2013 and 2014 Benchmark Studies. J Chem Inf Model 2015; 56:988-95. [PMID: 26583962 DOI: 10.1021/acs.jcim.5b00309] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We analyze the results of the GalaxyDock protein-ligand docking program in the two recent experiments of Community Structure-Activity Resource (CSAR), CSAR 2013 and 2014. GalaxyDock performs global optimization of a modified AutoDock3 energy function by employing the conformational space annealing method. The energy function of GalaxyDock is quite sensitive to atomic clashes. Such energy functions can be effective for sampling physically correct conformations but may not be effective for scoring when conformations are not fully optimized. In phase 1 of CSAR 2013, we successfully selected all four true binders of digoxigenin along with three false positives. However, the energy values were rather high due to insufficient optimization of the conformations docked to homology models. A posteriori relaxation of the model complex structures by GalaxyRefine improved the docking energy values and differentiated the true binders from the false positives better. In the scoring test of CSAR 2013 phase 2, we selected the best poses for each of the two targets. The results of CSAR 2013 phase 3 suggested that an improved method for generating initial conformations for GalaxyDock is necessary for targets involving bulky ligands. Finally, combining existing binding information with GalaxyDock energy-based optimization may be needed for more accurate binding affinity prediction.
Collapse
Affiliation(s)
- Woong-Hee Shin
- Department of Chemistry, Seoul National University , Seoul 151-747, Republic of Korea
| | - Gyu Rie Lee
- Department of Chemistry, Seoul National University , Seoul 151-747, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University , Seoul 151-747, Republic of Korea
| |
Collapse
|
15
|
Yan Z, Wang J. Optimizing the affinity and specificity of ligand binding with the inclusion of solvation effect. Proteins 2015; 83:1632-42. [PMID: 26111900 DOI: 10.1002/prot.24848] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/03/2015] [Accepted: 06/21/2015] [Indexed: 01/08/2023]
Abstract
Solvation effect is an important factor for protein-ligand binding in aqueous water. Previous scoring function of protein-ligand interactions rarely incorporates the solvation model into the quantification of protein-ligand interactions, mainly due to the immense computational cost, especially in the structure-based virtual screening, and nontransferable application of independently optimized atomic solvation parameters. In order to overcome these barriers, we effectively combine knowledge-based atom-pair potentials and the atomic solvation energy of charge-independent implicit solvent model in the optimization of binding affinity and specificity. The resulting scoring functions with optimized atomic solvation parameters is named as specificity and affinity with solvation effect (SPA-SE). The performance of SPA-SE is evaluated and compared to 20 other scoring functions, as well as SPA. The comparative results show that SPA-SE outperforms all other scoring functions in binding affinity prediction and "native" pose identification. Our optimization validates that solvation effect is an important regulator to the stability and specificity of protein-ligand binding. The development strategy of SPA-SE sets an example for other scoring function to account for the solvation effect in biomolecular recognitions.
Collapse
Affiliation(s)
- Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun, Jilin, 130022, China
| | - Jin Wang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun, Jilin, 130022, China.,Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York, 11794-3400, USA
| |
Collapse
|
16
|
Bai F, Liao S, Gu J, Jiang H, Wang X, Li H. An Accurate Metalloprotein-Specific Scoring Function and Molecular Docking Program Devised by a Dynamic Sampling and Iteration Optimization Strategy. J Chem Inf Model 2015; 55:833-47. [DOI: 10.1021/ci500647f] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- Fang Bai
- Department
of Engineering Mechanics, State Key Laboratory of Structural Analysis
for Industrial Equipment, Dalian University of Technology, Dalian, Liaoning 116023, China
- Center
for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Sha Liao
- State
Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory
of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Junfeng Gu
- Department
of Engineering Mechanics, State Key Laboratory of Structural Analysis
for Industrial Equipment, Dalian University of Technology, Dalian, Liaoning 116023, China
| | - Hualiang Jiang
- Drug
Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Xicheng Wang
- Department
of Engineering Mechanics, State Key Laboratory of Structural Analysis
for Industrial Equipment, Dalian University of Technology, Dalian, Liaoning 116023, China
| | - Honglin Li
- State
Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory
of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
17
|
Danishuddin M, Khan AU. Structure based virtual screening to discover putative drug candidates: Necessary considerations and successful case studies. Methods 2015; 71:135-45. [DOI: 10.1016/j.ymeth.2014.10.019] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Revised: 09/25/2014] [Accepted: 10/17/2014] [Indexed: 12/19/2022] Open
|
18
|
Pires DEV, Blundell TL, Ascher DB. Platinum: a database of experimentally measured effects of mutations on structurally defined protein-ligand complexes. Nucleic Acids Res 2014; 43:D387-91. [PMID: 25324307 PMCID: PMC4384026 DOI: 10.1093/nar/gku966] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Drug resistance is a major challenge for the treatment of many diseases and a significant concern throughout the drug development process. The ability to understand and predict the effects of mutations on protein–ligand affinities and their roles in the emergence of resistance would significantly aid treatment and drug design strategies. In order to study and understand the impacts of missense mutations on the interaction of ligands with the proteome, we have developed Platinum (http://structure.bioc.cam.ac.uk/platinum). This manually curated, literature-derived database, comprising over 1000 mutations, associates for the first time experimental information on changes in affinity with three-dimensional structures of protein–ligand complexes. To minimize differences arising from experimental techniques and to directly compare binding affinities, Platinum considers only changes measured by the same group and with the same amino-acid sequence used for structure determination, providing a direct link between protein structure, how a ligand binds and how mutations alter the affinity of the ligand of the protein. We believe Platinum will be an invaluable resource for understanding the effects of mutations that give rise to drug resistance, a major problem emerging in pandemics including those caused by the influenza virus, in infectious diseases such as tuberculosis, in cancer and in many other life-threatening illnesses.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
19
|
Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Nie W, Liu Y, Wang R. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2014; 31:405-12. [DOI: 10.1093/bioinformatics/btu626] [Citation(s) in RCA: 264] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
20
|
Westermaier Y, Barril X, Scapozza L. Virtual screening: an in silico tool for interlacing the chemical universe with the proteome. Methods 2014; 71:44-57. [PMID: 25193260 DOI: 10.1016/j.ymeth.2014.08.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2014] [Revised: 07/16/2014] [Accepted: 08/02/2014] [Indexed: 12/28/2022] Open
Abstract
In silico screening both in the forward (traditional virtual screening) and reverse sense (inverse virtual screening (IVS)) are helpful techniques for interlacing the chemical universe of small molecules with the proteome. The former, which is using a protein structure and a large chemical database, is well-known by the scientific community. We have chosen here to provide an overview on the latter, focusing on validation and target prioritization strategies. By comparing it to complementary or alternative wet-lab approaches, we put IVS in the broader context of chemical genomics, target discovery and drug design. By giving examples from the literature and an own example on how to validate the approach, we provide guidance on the issues related to IVS.
Collapse
Affiliation(s)
- Yvonne Westermaier
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland; Computational Biology & Drug Design Group, Departament de Fisicoquímica, Facultat de Farmàcia, Universitat de Barcelona, Barcelona, Spain; Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Spain.
| | - Xavier Barril
- Computational Biology & Drug Design Group, Departament de Fisicoquímica, Facultat de Farmàcia, Universitat de Barcelona, Barcelona, Spain; Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Spain; Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.
| | - Leonardo Scapozza
- School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, 1211 Geneva 4, Switzerland.
| |
Collapse
|
21
|
Inhester T, Rarey M. Protein-ligand interaction databases: advanced tools to mine activity data and interactions on a structural level. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1192] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Therese Inhester
- Center for Bioinformatics; University of Hamburg; Hamburg Germany
| | - Matthias Rarey
- Center for Bioinformatics; University of Hamburg; Hamburg Germany
| |
Collapse
|
22
|
HarmonyDOCK: The Structural Analysis of Poses in Protein-Ligand Docking. J Comput Biol 2014; 21:247-56. [DOI: 10.1089/cmb.2009.0111] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
23
|
de Souza A, Bittker JA, Lahr DL, Brudz S, Chatwin S, Oprea TI, Waller A, Yang JJ, Southall N, Guha R, Schürer SC, Vempati UD, Southern MR, Dawson ES, Clemons PA, Chung TDY. An Overview of the Challenges in Designing, Integrating, and Delivering BARD: A Public Chemical-Biology Resource and Query Portal for Multiple Organizations, Locations, and Disciplines. ACTA ACUST UNITED AC 2014; 19:614-27. [PMID: 24441647 DOI: 10.1177/1087057113517139] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/22/2013] [Indexed: 01/15/2023]
Abstract
Recent industry-academic partnerships involve collaboration among disciplines, locations, and organizations using publicly funded "open-access" and proprietary commercial data sources. These require the effective integration of chemical and biological information from diverse data sources, which presents key informatics, personnel, and organizational challenges. The BioAssay Research Database (BARD) was conceived to address these challenges and serve as a community-wide resource and intuitive web portal for public-sector chemical-biology data. Its initial focus is to enable scientists to more effectively use the National Institutes of Health Roadmap Molecular Libraries Program (MLP) data generated from the 3-year pilot and 6-year production phases of the Molecular Libraries Probe Production Centers Network (MLPCN), which is currently in its final year. BARD evolves the current data standards through structured assay and result annotations that leverage BioAssay Ontology and other industry-standard ontologies, and a core hierarchy of assay definition terms and data standards defined specifically for small-molecule assay data. We initially focused on migrating the highest-value MLP data into BARD and bringing it up to this new standard. We review the technical and organizational challenges overcome by the interdisciplinary BARD team, veterans of public- and private-sector data-integration projects, who are collaborating to describe (functional specifications), design (technical specifications), and implement this next-generation software solution.
Collapse
Affiliation(s)
| | | | - David L Lahr
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Steve Brudz
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Simon Chatwin
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Tudor I Oprea
- University of New Mexico Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Anna Waller
- University of New Mexico Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Jeremy J Yang
- University of New Mexico Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Noel Southall
- NIH Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Rajarshi Guha
- NIH Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Uma D Vempati
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Mark R Southern
- The Translational Research Institute, The Scripps Research Institute, Jupiter, FL, USA
| | - Eric S Dawson
- The Vanderbilt Specialized Chemistry Center for Accelerated Probe Development, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Thomas D Y Chung
- Conrad Prebys Center for Chemical Genomics, Sanford
- Burnham Medical Research Institute, La Jolla, CA, USA
| |
Collapse
|
24
|
Abstract
This article gives an overview of basic computational methods that are commonly used for analyzing small molecule screening data in the chemical genomics field. First, we introduce cheminformatic concepts for analyzing drug-like small molecule structures and their properties. Second, we introduce compound selection approaches for assembling screening libraries using compound property and diversity analyses. Finally, we discuss methods for interpreting screening hits by analyzing compound structures and induced phenotypes using similarity search and clustering approaches. These are critical steps for optimizing screening hits, and relating structure to bioactivity and phenotype.
Collapse
Affiliation(s)
- Tyler W H Backman
- Department of Bioengineering, University of California Riverside, Riverside, CA, USA
| | | |
Collapse
|
25
|
Meyer T, Knapp EW. Database of protein complexes with multivalent binding ability: Bival-bind. Proteins 2013; 82:744-51. [DOI: 10.1002/prot.24453] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Revised: 10/15/2013] [Accepted: 10/21/2013] [Indexed: 01/13/2023]
Affiliation(s)
- Tim Meyer
- Fachbereich Biologie Chemie; Pharmazie/Institute of Chemistry and Biochemistry, Freie Universität Berlin; 14195 Berlin Germany
| | - Ernst-Walter Knapp
- Fachbereich Biologie Chemie; Pharmazie/Institute of Chemistry and Biochemistry, Freie Universität Berlin; 14195 Berlin Germany
| |
Collapse
|
26
|
Shin WH, Kim JK, Kim DS, Seok C. GalaxyDock2: Protein-ligand docking using beta-complex and global optimization. J Comput Chem 2013; 34:2647-56. [DOI: 10.1002/jcc.23438] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Revised: 07/20/2013] [Accepted: 08/18/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Woong-Hee Shin
- Department of Chemistry; Seoul National University; Seoul 151-747 Republic of Korea
| | - Jae-Kwan Kim
- Department of Industrial Engineering; Hanyang University; Seoul 133-791 Republic of Korea
| | - Deok-Soo Kim
- Department of Industrial Engineering; Hanyang University; Seoul 133-791 Republic of Korea
| | - Chaok Seok
- Department of Chemistry; Seoul National University; Seoul 151-747 Republic of Korea
| |
Collapse
|
27
|
Cao DS, Liang YZ, Deng Z, Hu QN, He M, Xu QS, Zhou GH, Zhang LX, Deng ZX, Liu S. Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS One 2013; 8:e57680. [PMID: 23577055 PMCID: PMC3618265 DOI: 10.1371/journal.pone.0057680] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 01/27/2013] [Indexed: 11/18/2022] Open
Abstract
The identification of interactions between drugs and target proteins plays a key role in genomic drug discovery. In the present study, the quantitative binding affinities of drug-target pairs are differentiated as a measurement to define whether a drug interacts with a protein or not, and then a chemogenomics framework using an unbiased set of general integrated features and random forest (RF) is employed to construct a predictive model which can accurately classify drug-target pairs. The predictability of the model is further investigated and validated by several independent validation sets. The built model is used to predict drug-target associations, some of which were confirmed by comparing experimental data from public biological resources. A drug-target interaction network with high confidence drug-target pairs was also reconstructed. This network provides further insight for the action of drugs and targets. Finally, a web-based server called PreDPI-Ki was developed to predict drug-target interactions for drug discovery. In addition to providing a high-confidence list of drug-target associations for subsequent experimental investigation guidance, these results also contribute to the understanding of drug-target interactions. We can also see that quantitative information of drug-target associations could greatly promote the development of more accurate models. The PreDPI-Ki server is freely available via: http://sdd.whu.edu.cn/dpiki.
Collapse
Affiliation(s)
- Dong-Sheng Cao
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
- * E-mail: (YZL); (QNH)
| | - Zhe Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
| | - Qian-Nan Hu
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
- * E-mail: (YZL); (QNH)
| | - Min He
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha, P. R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha, P. R. China
| | - Guang-Hua Zhou
- The 163rd Hospital of The Chinese People's Liberation Army, Changsha, P. R. China
| | - Liu-Xia Zhang
- The 163rd Hospital of The Chinese People's Liberation Army, Changsha, P. R. China
| | - Zi-xin Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan, P. R. China
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, P. R. China
| |
Collapse
|
28
|
Chang DTH, Ke CH, Lin JH, Chiang JH. AutoBind: automatic extraction of protein-ligand-binding affinity data from biological literature. ACTA ACUST UNITED AC 2012; 28:2162-8. [PMID: 22753780 DOI: 10.1093/bioinformatics/bts367] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection. RESULT We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles. AVAILABILITY AutoBind is automatically updated on a monthly basis, and it is freely available at http://autobind.csie.ncku.edu.tw/ and http://autobind.mc.ntu.edu.tw/. All of the deposited binding affinities have been refined and approved manually before being released.
Collapse
Affiliation(s)
- Darby Tien-Hao Chang
- Department of Electrical Engineering, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | | | | | |
Collapse
|
29
|
Flachner B, Lörincz Z, Carotti A, Nicolotti O, Kuchipudi P, Remez N, Sanz F, Tóvári J, Szabó MJ, Bertók B, Cseh S, Mestres J, Dormán G. A chemocentric approach to the identification of cancer targets. PLoS One 2012; 7:e35582. [PMID: 22558171 PMCID: PMC3338416 DOI: 10.1371/journal.pone.0035582] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 03/19/2012] [Indexed: 01/01/2023] Open
Abstract
A novel chemocentric approach to identifying cancer-relevant targets is introduced. Starting with a large chemical collection, the strategy uses the list of small molecule hits arising from a differential cytotoxicity screening on tumor HCT116 and normal MRC-5 cell lines to identify proteins associated with cancer emerging from a differential virtual target profiling of the most selective compounds detected in both cell lines. It is shown that this smart combination of differential in vitro and in silico screenings (DIVISS) is capable of detecting a list of proteins that are already well accepted cancer drug targets, while complementing it with additional proteins that, targeted selectively or in combination with others, could lead to synergistic benefits for cancer therapeutics. The complete list of 115 proteins identified as being hit uniquely by compounds showing selective antiproliferative effects for tumor cell lines is provided.
Collapse
|
30
|
Johnston MA, Farrell D, Nielsen JE. A collaborative environment for developing and validating predictive tools for protein biophysical characteristics. J Comput Aided Mol Des 2012; 26:387-96. [DOI: 10.1007/s10822-012-9564-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 03/18/2012] [Indexed: 11/29/2022]
|
31
|
|
32
|
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 2011; 40:D1100-7. [PMID: 21948594 PMCID: PMC3245175 DOI: 10.1093/nar/gkr777] [Citation(s) in RCA: 2463] [Impact Index Per Article: 189.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
Collapse
Affiliation(s)
- Anna Gaulton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Antony J, Grimme S, Liakos DG, Neese F. Protein-ligand interaction energies with dispersion corrected density functional theory and high-level wave function based methods. J Phys Chem A 2011; 115:11210-20. [PMID: 21842894 DOI: 10.1021/jp203963f] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
With dispersion-corrected density functional theory (DFT-D3) intermolecular interaction energies for a diverse set of noncovalently bound protein-ligand complexes from the Protein Data Bank are calculated. The focus is on major contacts occurring between the drug molecule and the binding site. Generalized gradient approximation (GGA), meta-GGA, and hybrid functionals are used. DFT-D3 interaction energies are benchmarked against the best available wave function based results that are provided by the estimated complete basis set (CBS) limit of the local pair natural orbital coupled-electron pair approximation (LPNO-CEPA/1) and compared to MP2 and semiempirical data. The size of the complexes and their interaction energies (ΔE(PL)) varies between 50 and 300 atoms and from -1 to -65 kcal/mol, respectively. Basis set effects are considered by applying extended sets of triple- to quadruple-ζ quality. Computed total ΔE(PL) values show a good correlation with the dispersion contribution despite the fact that the protein-ligand complexes contain many hydrogen bonds. It is concluded that an adequate, for example, asymptotically correct, treatment of dispersion interactions is necessary for the realistic modeling of protein-ligand binding. Inclusion of the dispersion correction drastically reduces the dependence of the computed interaction energies on the density functional compared to uncorrected DFT results. DFT-D3 methods provide results that are consistent with LPNO-CEPA/1 and MP2, the differences of about 1-2 kcal/mol on average (<5% of ΔE(PL)) being on the order of their accuracy, while dispersion-corrected semiempirical AM1 and PM3 approaches show a deviating behavior. The DFT-D3 results are found to depend insignificantly on the choice of the short-range damping model. We propose to use DFT-D3 as an essential ingredient in a QM/MM approach for advanced virtual screening approaches of protein-ligand interactions to be combined with similarly "first-principle" accounts for the estimation of solvation and entropic effects.
Collapse
Affiliation(s)
- Jens Antony
- Organisch-Chemisches Institut, Universität Münster, Münster, Germany
| | | | | | | |
Collapse
|
34
|
Dunbar JB, Smith RD, Yang CY, Ung PMU, Lexa KW, Khazanov NA, Stuckey JA, Wang S, Carlson HA. CSAR benchmark exercise of 2010: selection of the protein-ligand complexes. J Chem Inf Model 2011; 51:2036-46. [PMID: 21728306 PMCID: PMC3180202 DOI: 10.1021/ci200082t] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
![]()
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) aims to collect available data from industry and academia which may be used for this purpose (www.csardock.org). Also, CSAR is charged with organizing community-wide exercises based on the collected data. The first of these exercises was aimed to gauge the overall state of docking and scoring, using a large and diverse data set of protein–ligand complexes. Participants were asked to calculate the affinity of the complexes as provided and then recalculate with changes which may improve their specific method. This first data set was selected from existing PDB entries which had binding data (Kd or Ki) in Binding MOAD, augmented with entries from PDBbind. The final data set contains 343 diverse protein–ligand complexes and spans 14 pKd. Sixteen proteins have three or more complexes in the data set, from which a user could start an inspection of congeneric series. Inherent experimental error limits the possible correlation between scores and measured affinity; R2 is limited to ∼0.9 when fitting to the data set without over parametrizing. R2 is limited to ∼0.8 when scoring the data set with a method trained on outside data. The details of how the data set was initially selected, and the process by which it matured to better fit the needs of the community are presented. Many groups generously participated in improving the data set, and this underscores the value of a supportive, collaborative effort in moving our field forward.
Collapse
Affiliation(s)
- James B Dunbar
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109-1065, United States.
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Meslamani J, Rognan D. Enhancing the Accuracy of Chemogenomic Models with a Three-Dimensional Binding Site Kernel. J Chem Inf Model 2011; 51:1593-603. [DOI: 10.1021/ci200166t] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jamel Meslamani
- Structural Chemogenomics, Laboratory of Therapeutical Innovation, UMR 7200 CNRS, University of Strasbourg, F-67400 Illkirch, France
| | - Didier Rognan
- Structural Chemogenomics, Laboratory of Therapeutical Innovation, UMR 7200 CNRS, University of Strasbourg, F-67400 Illkirch, France
| |
Collapse
|
36
|
Novikov FN, Zeifman AA, Stroganov OV, Stroylov VS, Kulkov V, Chilov GG. CSAR scoring challenge reveals the need for new concepts in estimating protein-ligand binding affinity. J Chem Inf Model 2011; 51:2090-6. [PMID: 21612285 DOI: 10.1021/ci200034y] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The dG prediction accuracy by the Lead Finder docking software on the CSAR test set was characterized by R(2)=0.62 and rmsd=1.93 kcal/mol, and the method of preparation of the full-atom structures of the test set did not significantly affect the resulting accuracy of predictions. The primary factors determining the correlation between the predicted and experimental values were the van der Waals interactions and solvation effects. Those two factors alone accounted for R(2)=0.50. The other factors that affected the accuracy of predictions, listed in the order of decreasing importance, were the change of ligand's internal energy upon binding with protein, the electrostatic interactions, and the hydrogen bonds. It appears that those latter factors contributed to the independence of the prediction results from the method of full-atom structure preparation. Then, we turned our attention to the other factors that could potentially improve the scoring function in order to raise the accuracy of the dG prediction. It turned out that the ligand-centric factors, including Mw, cLogP, PSA, etc. or protein-centric factors, such as the functional class of protein, did not improve the prediction accuracy. Following that, we explored if the weak molecular interactions such as X-H...Ar, X-H...Hal, CO...Hal, C-H...X, stacking and π-cationic interactions (where X is N or O), that are generally of interest to the medicinal chemists despite their lack of proper molecular mechanical parametrization, could improve dG prediction. Our analysis revealed that out of these new interactions only CO...Hal is statistically significant for dG predictions using Lead FInder scoring function. Accounting for the CO...Hal interaction resulted in the reduction of the rmsd from 2.19 to 0.69 kcal/mol for the corresponding structures. The other weak interaction factors were not statistically significant and therefore irrelevant to the accuracy of dG prediction. On the basis of our findings from our participation in the CSAR scoring challenge we conclude that a significant increase of accuracy predictions necessitates breakthrough scoring approaches. We anticipate that the explicit accounting for water molecules, protein flexibility, and a more thermodynamically accurate method of dG calculation rather than single point energy calculation may lead to such breakthroughs.
Collapse
|
37
|
Morita M, Terada T, Nakamura S, Shimizu K. BUDDY-system: A web site for constructing a dataset of protein pairs between ligand-bound and unbound states. BMC Res Notes 2011; 4:143. [PMID: 21600047 PMCID: PMC3124414 DOI: 10.1186/1756-0500-4-143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 05/22/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Elucidating molecular recognition by proteins, such as in enzyme-substrate and receptor-ligand interactions, is a key to understanding biological phenomena. To delineate these protein interactions, it is important to perform structural bioinformatics studies relevant to molecular recognition. Such studies require a dataset of protein structure pairs between ligand-bound and unbound states. In many studies, the same well-designed and high-quality dataset has been used repeatedly, which has spurred the development of subsequent relevant research. Using previously constructed datasets, researchers are able to fairly compare obtained results with those of other studies; in addition, much effort and time is saved. Therefore, it is important to construct a refined dataset that will appeal to many researchers. However, constructing such datasets is not a trivial task. FINDINGS We have developed the BUDDY-system, a web site designed to support the building of a dataset comprising pairs of protein structures between ligand-bound and unbound states, which are widely used in various areas associated with molecular recognition. In addition to constructing a dataset, the BUDDY-system also allows the user to search for ligand-bound protein structures by its unbound state or by its ligand; and to search for ligands by a particular receptor protein. CONCLUSIONS The BUDDY-system receives input from the user as a single entry or a dataset consisting of a list of ligand-bound state protein structures, unbound state protein structures, or ligands and returns to the user a list of protein structure pairs between the ligand-bound and the corresponding unbound states. This web site is designed for researchers who are involved not only in structural bioinformatics but also in experimental studies. The BUDDY-system is freely available on the web.
Collapse
Affiliation(s)
- Mizuki Morita
- Department of Fundamental Research, National Institute of Biomedical Innovation (NIBIO), 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085, Japan.
| | | | | | | |
Collapse
|
38
|
Backman TWH, Cao Y, Girke T. ChemMine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Res 2011; 39:W486-91. [PMID: 21576229 PMCID: PMC3125754 DOI: 10.1093/nar/gkr320] [Citation(s) in RCA: 320] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
ChemMine Tools is an online service for small molecule data analysis. It provides a web interface to a set of cheminformatics and data mining tools that are useful for various analysis routines performed in chemical genomics and drug discovery. The service also offers programmable access options via the R library ChemmineR. The primary functionalities of ChemMine Tools fall into five major application areas: data visualization, structure comparisons, similarity searching, compound clustering and prediction of chemical properties. First, users can upload compound data sets to the online Compound Workbench. Numerous utilities are provided for compound viewing, structure drawing and format interconversion. Second, pairwise structural similarities among compounds can be quantified. Third, interfaces to ultra-fast structure similarity search algorithms are available to efficiently mine the chemical space in the public domain. These include fingerprint and embedding/indexing algorithms. Fourth, the service includes a Clustering Toolbox that integrates cheminformatic algorithms with data mining utilities to enable systematic structure and activity based analyses of custom compound sets. Fifth, physicochemical property descriptors of custom compound sets can be calculated. These descriptors are important for assessing the bioactivity profile of compounds in silico and quantitative structure—activity relationship (QSAR) analyses. ChemMine Tools is available at: http://chemmine.ucr.edu.
Collapse
Affiliation(s)
- Tyler W H Backman
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | | | | |
Collapse
|
39
|
Zoete V, Cuendet MA, Grosdidier A, Michielin O. SwissParam: A fast force field generation tool for small organic molecules. J Comput Chem 2011; 32:2359-68. [DOI: 10.1002/jcc.21816] [Citation(s) in RCA: 1090] [Impact Index Per Article: 83.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Revised: 02/10/2011] [Accepted: 03/20/2011] [Indexed: 11/08/2022]
|
40
|
Huang SY, Grinter SZ, Zou X. Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. Phys Chem Chem Phys 2010; 12:12899-908. [PMID: 20730182 PMCID: PMC11103779 DOI: 10.1039/c0cp00151a] [Citation(s) in RCA: 294] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The scoring function is one of the most important components in structure-based drug design. Despite considerable success, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. In this perspective, we have reviewed three basic types of scoring functions (force-field, empirical, and knowledge-based) and the consensus scoring technique that are used for protein-ligand docking. The commonly-used assessment criteria and publicly available protein-ligand databases for performance evaluation of the scoring functions have also been presented and discussed. We end with a discussion of the challenges faced by existing scoring functions and possible future directions for developing improved scoring functions.
Collapse
Affiliation(s)
- Sheng-You Huang
- Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute University of Missouri Columbia, MO 65211
| | - Sam Z. Grinter
- Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute University of Missouri Columbia, MO 65211
| | - Xiaoqin Zou
- Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute University of Missouri Columbia, MO 65211
| |
Collapse
|
41
|
Sándor M, Kiss R, Keseru GM. Virtual fragment docking by Glide: a validation study on 190 protein-fragment complexes. J Chem Inf Model 2010; 50:1165-72. [PMID: 20459088 DOI: 10.1021/ci1000407] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The docking accuracy of Glide was evaluated using 16 different docking protocols on 190 protein-fragment complexes representing 78 targets. Standard precision docking (Glide SP) based protocols showed the best performance. The average root-mean-square deviation (rmsd) between the docked and cocrystallized poses achieved by Glide SP with pre- and postprocessing was 1.17 A, and an acceptable binding mode with rmsd < 2 A could be found in 80% of the cases. Comparison of the docking results produced by different protocols suggests that the sampling efficacy of Glide is adequate for fragment docking. The docking accuracy seems to be limited by the performance of scoring schemes, which is supported by the weak correlation between experimental binding affinities and GlideScores. Cross-docking experiments performed on 8 targets represented by 63 complexes revealed that Glide SP gave similar results to that of the computationally more intensive Glide XP. The average rmsd achieved by Glide SP with pre- and postprocessing was 2.06 A, and an acceptable binding mode with rmsd < 2 A could be found in 63% of the cases. These cross-docking results were improved significantly selecting the optimal X-ray structure for each target (average rmsd = 1.3 A, success rate = 77%), indicating the importance of enrichment studies and the use of multiple X-ray structures in virtual fragment screening.
Collapse
Affiliation(s)
- Márk Sándor
- Discovery Chemistry, Gedeon Richter plc., P.O. Box 27, H-1475 Budapest, Hungary
| | | | | |
Collapse
|
42
|
Abstract
In this paper we provide an overview of our current knowledge of the mapping between small molecule ligands and protein domains. We give an overview of the present data resources available on the Web, which provide information about protein-ligand interactions, as well as discussing our own PROCOGNATE database. We present an update of ligand binding in large protein superfamilies and identify those ligands most frequently utilized by nature. Finally we discuss potential uses for this type of data.
Collapse
Affiliation(s)
- Matthew Bashton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | | |
Collapse
|
43
|
Saravanan SE, Karthi R, Sathish K, Kokila K, Sabarinathan R, Sekar K. MLDB: macromolecule ligand database. J Appl Crystallogr 2009. [DOI: 10.1107/s0021889809048626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
MLDB (macromolecule ligand database) is a knowledgebase containing ligands co-crystallized with the three-dimensional structures available in the Protein Data Bank. The proposed knowledgebase serves as an open resource for the analysis and visualization of all ligands and their interactions with macromolecular structures. MLDB can be used to search ligands, and their interactions can be visualized both in text and graphical formats. MLDB will be updated at regular intervals (weekly) with automated Perl scripts. The knowledgebase is intended to serve the scientific community working in the areas of molecular and structural biology. It is available free to users around the clock and can be accessed at http://dicsoft2.physics.iisc.ernet.in/mldb/.
Collapse
|
44
|
Søndergaard CR, Garrett AE, Carstensen T, Pollastri G, Nielsen JE. Structural artifacts in protein-ligand X-ray structures: implications for the development of docking scoring functions. J Med Chem 2009; 52:5673-84. [PMID: 19711919 DOI: 10.1021/jm8016464] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The development of docking scoring functions requires high-resolution 3D structures of protein-ligand complexes for which the binding affinity of the ligand has been measured experimentally. Protein-ligand binding affinities are measured in solution experiments, and high resolution protein-ligand structures can be determined only by X-ray crystallography. Protein-ligand scoring functions must therefore reproduce solution binding energies using analyses of proteins in a crystal environment. We present an analysis of the prevalence of crystal-induced artifacts and water-mediated contacts in protein-ligand complexes and demonstrate the effect that these can have on the performance of protein-ligand scoring functions. We find 36% of ligands in the PDBBind 2007 refined data set to be influenced by crystal contacts and find the performance of a scoring function to be affected by these. A Web server for detecting crystal contacts in protein-ligand complexes is available at http://enzyme.ucd.ie/LIGCRYST .
Collapse
Affiliation(s)
- Chresten R Søndergaard
- School of Biomolecular and Biomedical Science, Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | | | |
Collapse
|
45
|
Molecular docking: theoretical background, practical applications and perspectives. MENDELEEV COMMUNICATIONS 2009. [DOI: 10.1016/j.mencom.2009.09.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
46
|
Abstract
Supramolecular chemistry has expanded dramatically in recent years both in terms of potential applications and in its relevance to analogous biological systems. The formation and function of supramolecular complexes occur through a multiplicity of often difficult to differentiate noncovalent forces. The aim of this Review is to describe the crucial interaction mechanisms in context, and thus classify the entire subject. In most cases, organic host-guest complexes have been selected as examples, but biologically relevant problems are also considered. An understanding and quantification of intermolecular interactions is of importance both for the rational planning of new supramolecular systems, including intelligent materials, as well as for developing new biologically active agents.
Collapse
Affiliation(s)
- Hans-Jörg Schneider
- Organische Chemie, Universität des Saarlandes, 66041 Saarbrücken, Deutschland.
| |
Collapse
|
47
|
Chen IJ, Hubbard RE. Lessons for fragment library design: analysis of output from multiple screening campaigns. J Comput Aided Mol Des 2009; 23:603-20. [PMID: 19495994 DOI: 10.1007/s10822-009-9280-5] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 05/07/2009] [Indexed: 11/26/2022]
Abstract
Over the past 8 years, we have developed, refined and applied a fragment based discovery approach to a range of protein targets. Here we report computational analyses of various aspects of our fragment library and the results obtained for fragment screening. We reinforce the finding of others that the experimentally observed hit rate for screening fragments can be related to a computationally defined druggability index for the target. In general, the physicochemical properties of the fragment hits display the same profile as the library, as is expected for a truly diverse library which probes the relevant chemical space. An analysis of the fragment hits against various protein classes has shown that the physicochemical properties of the fragments are complementary to the properties of the target binding site. The effectiveness of some fragments appears to be achieved by an appropriate mix of pharmacophore features and enhanced aromaticity, with hydrophobic interactions playing an important role. The analysis emphasizes that it is possible to identify small fragments that are specific for different binding sites. To conclude, we discuss how the results could inform further development and improvement of our fragment library.
Collapse
Affiliation(s)
- I-Jen Chen
- Vernalis (R&D) Ltd, Granta Park, Cambridge, CB21 6GB, UK
| | | |
Collapse
|
48
|
|
49
|
Kirchmair J, Markt P, Distinto S, Schuster D, Spitzer GM, Liedl KR, Langer T, Wolber G. The Protein Data Bank (PDB), its related services and software tools as key components for in silico guided drug discovery. J Med Chem 2009; 51:7021-40. [PMID: 18975926 DOI: 10.1021/jm8005977] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Johannes Kirchmair
- Department of Pharmaceutical Chemistry, Faculty of Chemistry and Pharmacy and Center for Molecular Biosciences, University of Innsbruck, Innrain 52, A-6020 Innsbruck, Austria
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Stroganov OV, Novikov FN, Stroylov VS, Kulkov V, Chilov GG. Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening. J Chem Inf Model 2009; 48:2371-85. [PMID: 19007114 DOI: 10.1021/ci800166p] [Citation(s) in RCA: 145] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
An innovative molecular docking algorithm and three specialized high accuracy scoring functions are introduced in the Lead Finder docking software. Lead Finder's algorithm for ligand docking combines the classical genetic algorithm with various local optimization procedures and resourceful exploitation of the knowledge generated during docking process. Lead Finder's scoring functions are based on a molecular mechanics functional which explicitly accounts for different types of energy contributions scaled with empiric coefficients to produce three scoring functions tailored for (a) accurate binding energy predictions; (b) correct energy-ranking of docked ligand poses; and (c) correct rank-ordering of active and inactive compounds in virtual screening experiments. The predicted values of the free energy of protein-ligand binding were benchmarked against a set of experimentally measured binding energies for 330 diverse protein-ligand complexes yielding rmsd of 1.50 kcal/mol. The accuracy of ligand docking was assessed on a set of 407 structures, which included almost all published test sets of the following programs: FlexX, Glide SP, Glide XP, Gold, LigandFit, MolDock, and Surflex. rmsd of 2 A or less was observed for 80-96% of the structures in the test sets (80.0% on the Glide XP and FlexX test sets, 96.0% on the Surflex and MolDock test sets). The ability of Lead Finder to distinguish between active and inactive compounds during virtual screening experiments was benchmarked against 34 therapeutically relevant protein targets. Impressive enrichment factors were obtained for almost all of the targets with the average area under receiver operator curve being equal to 0.92.
Collapse
Affiliation(s)
- Oleg V Stroganov
- MolTech Ltd., Leninskie gory, 1/75A, Moscow 119992, Russian Federation, andBioMolTech Corp., 226 York Mills Road, Toronto, Ontario M2L 1L1, Canada
| | | | | | | | | |
Collapse
|