1
|
Melnikov F, Anger LT, Hasselgren C. Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose-Response Inference on hERG Inhibition Models. Int J Mol Sci 2022; 24:ijms24010635. [PMID: 36614078 PMCID: PMC9820331 DOI: 10.3390/ijms24010635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/23/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation that can impact compound design. Here, we present a consistent data inference approach, exemplified on two data sets of Ether-à-go-go-Related Gene (hERG) K+ inhibition data, for dose-response and screening experiments that are generally applicable for in vitro assays. hERG inhibition has been associated with severe cardiac effects and is one of the more prominent safety targets assessed in drug development, using a wide array of in vitro and in silico screening methods. In this study, the IC50 for hERG inhibition is estimated from diverse historical proprietary data. The IC50 derived from a two-point proprietary screening data set demonstrated high correlation (R = 0.98, MAE = 0.08) with IC50s derived from six-point dose-response curves. Similar IC50 estimation accuracy was obtained on a public thallium flux assay data set (R = 0.90, MAE = 0.2). The IC50 data were used to develop a robust quantitative model. The model's MAE (0.47) and R2 (0.46) were on par with literature statistics and approached assay reproducibility. Using a continuous model has high value for pharmaceutical projects, as it enables rank ordering of compounds and evaluation of compounds against project-specific inhibition thresholds. This data inference approach can be widely applicable to assays with quantitative readouts and has the potential to impact experimental design and improve model performance, interpretation, and acceptance across many standard safety endpoints.
Collapse
|
2
|
Kappler MA, Lowden CT, Culberson J. BioChemUDM: a unified data model for compounds and assays. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2021-1004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
We present a simple, biochemistry data model (BioChemUDM) to represent compounds and assays for the purpose of capturing, reporting, and sharing data, both biological and chemical. We describe an approach to register a compound based solely on a stereo-enhanced sketch, thereby replacing the need for additional user-specified “flags” at the time of compound registration. We describe a convention for string-based labels that enables inter-organizational compound and assay data sharing. By co-adopting the BioChemUDM, we have successfully enabled same-day exchange and utilization of chemical and biological information with various stakeholders.
Collapse
Affiliation(s)
- Michael A. Kappler
- IDEAYA Biosciences Inc , 7000 Shoreline Blvd Ste 350 , South San Francisco , CA 94080 , USA
| | | | - J. Chris Culberson
- Workflow Informatics Corp , 9316 Bramden Ct , Wake Forest , NC 27587 , USA
| |
Collapse
|
3
|
Nayarisseri A, Khandelwal R, Madhavi M, Selvaraj C, Panwar U, Sharma K, Hussain T, Singh SK. Shape-based Machine Learning Models for the Potential Novel COVID-19 Protease Inhibitors Assisted by Molecular Dynamics Simulation. Curr Top Med Chem 2020; 20:2146-2167. [PMID: 32621718 DOI: 10.2174/1568026620666200704135327] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 03/20/2020] [Accepted: 04/25/2020] [Indexed: 12/17/2022]
Abstract
BACKGROUND The vast geographical expansion of novel coronavirus and an increasing number of COVID-19 affected cases have overwhelmed health and public health services. Artificial Intelligence (AI) and Machine Learning (ML) algorithms have extended their major role in tracking disease patterns, and in identifying possible treatments. OBJECTIVE This study aims to identify potential COVID-19 protease inhibitors through shape-based Machine Learning assisted by Molecular Docking and Molecular Dynamics simulations. METHODS 31 Repurposed compounds have been selected targeting the main coronavirus protease (6LU7) and a machine learning approach was employed to generate shape-based molecules starting from the 3D shape to the pharmacophoric features of their seed compound. Ligand-Receptor Docking was performed with Optimized Potential for Liquid Simulations (OPLS) algorithms to identify highaffinity compounds from the list of selected candidates for 6LU7, which were subjected to Molecular Dynamic Simulations followed by ADMET studies and other analyses. RESULTS Shape-based Machine learning reported remdesivir, valrubicin, aprepitant, and fulvestrant as the best therapeutic agents with the highest affinity for the target protein. Among the best shape-based compounds, a novel compound identified was not indexed in any chemical databases (PubChem, Zinc, or ChEMBL). Hence, the novel compound was named 'nCorv-EMBS'. Further, toxicity analysis showed nCorv-EMBS to be suitable for further consideration as the main protease inhibitor in COVID-19. CONCLUSION Effective ACE-II, GAK, AAK1, and protease 3C blockers can serve as a novel therapeutic approach to block the binding and attachment of the main COVID-19 protease (PDB ID: 6LU7) to the host cell and thus inhibit the infection at AT2 receptors in the lung. The novel compound nCorv- EMBS herein proposed stands as a promising inhibitor to be evaluated further for COVID-19 treatment.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore-452010, Madhya Pradesh, India,Bioinformatics Research Laboratory, LeGene Biosciences Pvt Ltd., Mahalakshmi Nagar, Indore-452010, Madhya
Pradesh, India,Research Chair for Biomedical Applications of Nanomaterials, Biochemistry Department, College of Science, King
Saud University, Riyadh, Saudi Arabia,Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630 003, Tamil Nadu, India
| | - Ravina Khandelwal
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore-452010, Madhya Pradesh, India
| | - Maddala Madhavi
- Department of Zoology, Nizam College, Osmania University, Hyderabad-500001, Telangana State, India
| | - Chandrabose Selvaraj
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630 003, Tamil Nadu, India
| | - Umesh Panwar
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630 003, Tamil Nadu, India
| | - Khushboo Sharma
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore-452010, Madhya Pradesh, India
| | - Tajamul Hussain
- Center of Excellence in Biotechnology Research, College of Science, King Saud University, Riyadh, Saudi Arabia,Research Chair for Biomedical Applications of Nanomaterials, Biochemistry Department, College of Science, King
Saud University, Riyadh, Saudi Arabia
| | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630 003, Tamil Nadu, India
| |
Collapse
|
4
|
Baker CM, Kidley NJ, Papachristos K, Hotson M, Carson R, Gravestock D, Pouliot M, Harrison J, Dowling A. Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry. J Chem Inf Model 2020; 60:3781-3791. [PMID: 32644790 DOI: 10.1021/acs.jcim.0c00232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical approach to the design of chemistry business rules has been to focus on consistency of representation, with chemical relevance given secondary consideration. In this work, we overturn that convention. Through the use of quantum chemistry calculations, we define a set of chemistry business rules for tautomer standardization that reproduces gas-phase energetic preferences. We go on to show that, compared to our historic approach, this method yields tautomers that are in better agreement with those observed experimentally in condensed phases and that are better suited for use in predictive models.
Collapse
Affiliation(s)
- Christopher M Baker
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Nathan J Kidley
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | | | - Matthew Hotson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Rob Carson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - David Gravestock
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Martin Pouliot
- Syngenta Crop Protection, Schaffhauserstrasse, Stein CH-4332, Switzerland
| | - Jim Harrison
- Datacraft Technologies, 110 Parkwood Place, Anstead, QLD 4070, Australia
| | - Alan Dowling
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| |
Collapse
|
5
|
Hähnke VD, Kim S, Bolton EE. PubChem chemical structure standardization. J Cheminform 2018; 10:36. [PMID: 30097821 PMCID: PMC6086778 DOI: 10.1186/s13321-018-0293-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure. RESULTS The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form). CONCLUSIONS Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource ( https://pubchem.ncbi.nlm.nih.gov/standardize ), and via programmatic interfaces.
Collapse
Affiliation(s)
- Volker D. Hähnke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
- Present Address: European Patent Office, Patentlaan 2, 2288 EE Rijswijk, The Netherlands
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
| | - Evan E. Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA
| |
Collapse
|
6
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
7
|
Galgonek J, Hurt T, Michlíková V, Onderka P, Schwarz J, Vondrášek J. Advanced SPARQL querying in small molecule databases. J Cheminform 2016; 8:31. [PMID: 27275187 PMCID: PMC4893829 DOI: 10.1186/s13321-016-0144-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 05/25/2016] [Indexed: 11/14/2022] Open
Abstract
Background In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. Results We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Conclusions Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF. Graphical Abstract ![]()
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nám. 2, 166 10 Prague 6, Czech Republic
| | - Tomáš Hurt
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Vendula Michlíková
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Petr Onderka
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Jan Schwarz
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nám. 2, 166 10 Prague 6, Czech Republic
| |
Collapse
|
8
|
Warr WA. Many InChIs and quite some feat. J Comput Aided Mol Des 2015; 29:681-94. [PMID: 26081259 DOI: 10.1007/s10822-015-9854-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 06/10/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, Holmes Chapel, Crewe, Cheshire, CW4 7HZ, UK,
| |
Collapse
|
9
|
Galgonek J, Vondrášek J. On InChI and evaluating the quality of cross-reference links. J Cheminform 2014; 6:15. [PMID: 24742140 PMCID: PMC4005828 DOI: 10.1186/1758-2946-6-15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 03/25/2014] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones. RESULTS We used two different tools to generate InChI identifiers and observed some ambiguities in their outputs. In part, these ambiguities were caused by indistinctness in interpretation of the structural data used. InChI identifiers were used successfully to find duplicate entries in databases. We found that the InChI inconsistencies in the manually curated links are very high (28.85% in the worst case). Even using a weaker definition of consistency, the measured values were very high in general. The completeness of the manually curated links was also very poor (only 93.8% in the best case) compared with that of the automatically generated links. CONCLUSIONS We observed several problems with the InChI tools and the files used as their inputs. There are large gaps in the consistency and completeness of manually curated links if they are measured using InChI identifiers. However, inconsistency can be caused both by errors in manually curated links and the inherent limitations of the InChI method.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nam. 2, 166 10 Prague 6, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nam. 2, 166 10 Prague 6, Czech Republic
| |
Collapse
|
10
|
Martin E, Monge A, Duret JA, Gualandi F, Peitsch MC, Pospisil P. Building an R&D chemical registration system. J Cheminform 2012; 4:11. [PMID: 22650418 PMCID: PMC3430593 DOI: 10.1186/1758-2946-4-11] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 05/11/2012] [Indexed: 11/16/2022] Open
Abstract
Small molecule chemistry is of central importance to a number of R&D companies in diverse areas such as the pharmaceutical, nutraceutical, food flavoring, and cosmeceutical industries. In order to store and manage thousands of chemical compounds in such an environment, we have built a state-of-the-art master chemical database with unique structure identifiers. Here, we present the concept and methodology we used to build the system that we call the Unique Compound Database (UCD). In the UCD, each molecule is registered only once (uniqueness), structures with alternative representations are entered in a uniform way (normalization), and the chemical structure drawings are recognizable to chemists and to a cartridge. In brief, structural molecules are entered as neutral entities which can be associated with a salt. The salts are listed in a dictionary and bound to the molecule with the appropriate stoichiometric coefficient in an entity called “substance”. The substances are associated with batches. Once a molecule is registered, some properties (e.g., ADMET prediction, IUPAC name, chemical properties) are calculated automatically. The UCD has both automated and manual data controls. Moreover, the UCD concept enables the management of user errors in the structure entry by reassigning or archiving the batches. It also allows updating of the records to include newly discovered properties of individual structures. As our research spans a wide variety of scientific fields, the database enables registration of mixtures of compounds, enantiomers, tautomers, and compounds with unknown stereochemistries.
Collapse
Affiliation(s)
- Elyette Martin
- Philip Morris International R&D, Philip Morris Products S,A, Neuchâtel, Switzerland.
| | | | | | | | | | | |
Collapse
|