1
|
Duke R, McCoy R, Risko C, Bursten JRS. Promises and Perils of Big Data: Philosophical Constraints on Chemical Ontologies. J Am Chem Soc 2024; 146:11579-11591. [PMID: 38640489 DOI: 10.1021/jacs.3c11399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Chemistry is experiencing a paradigm shift in the way it interacts with data. So-called "big data" are collected and used at unprecedented scales with the idea that algorithms can be designed to aid in chemical discovery. As data-enabled practices become ever more ubiquitous, chemists must consider the organization and curation of their data, especially as it is presented to both humans and increasingly intelligent algorithms. One of the most promising organizational schemes for big data is a construct termed an ontology. In data science, ontologies are systems that represent relations among objects and properties in a domain of discourse. As chemistry encounters larger and larger data sets, the ontologies that support chemical research will likewise increase in complexity, and the future of chemistry will be shaped by the choices made in developing big data chemical ontologies. How such ontologies will work should therefore be a subject of significant attention in the chemical community. Now is the time for chemists to ask questions about ontology design and use: How should chemical data be organized? What can be reasonably expected from an organizational structure? Is a universal ontology tenable? As some of these questions may be new to chemists, we recommend an interdisciplinary approach that draws on the long history of philosophers of science asking questions about the organization of scientific concepts, constructs, models, and theories. This Perspective presents insights from these long-standing studies and initiates new conversations between chemists and philosophers.
Collapse
Affiliation(s)
- Rebekah Duke
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Ryan McCoy
- Department of Philosophy, University of Kentucky, Lexington, Kentucky 40508, United States
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Julia R S Bursten
- Department of Philosophy, University of Kentucky, Lexington, Kentucky 40508, United States
| |
Collapse
|
2
|
Kondinski A, Bai J, Mosbach S, Akroyd J, Kraft M. Knowledge Engineering in Chemistry: From Expert Systems to Agents of Creation. Acc Chem Res 2022; 56:128-139. [PMID: 36516456 PMCID: PMC9850921 DOI: 10.1021/acs.accounts.2c00617] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Passing knowledge from human to human is a natural process that has continued since the beginning of humankind. Over the past few decades, we have witnessed that knowledge is no longer passed only between humans but also from humans to machines. The latter form of knowledge transfer represents a cornerstone in artificial intelligence (AI) and lays the foundation for knowledge engineering (KE). In order to pass knowledge to machines, humans need to structure, formalize, and make knowledge machine-readable. Subsequently, humans also need to develop software that emulates their decision-making process. In order to engineer chemical knowledge, chemists are often required to challenge their understanding of chemistry and thinking processes, which may help improve the structure of chemical knowledge.Knowledge engineering in chemistry dates from the development of expert systems that emulated the thinking process of analytical and organic chemists. Since then, many different expert systems employing rather limited knowledge bases have been developed, solving problems in retrosynthesis, analytical chemistry, chemical risk assessment, etc. However, toward the end of the 20th century, the AI winters slowed down the development of expert systems for chemistry. At the same time, the increasing complexity of chemical research, alongside the limitations of the available computing tools, made it difficult for many chemistry expert systems to keep pace.In the past two decades, the semantic web, the popularization of object-oriented programming, and the increase in computational power have revitalized knowledge engineering. Knowledge formalization through ontologies has become commonplace, triggering the subsequent development of knowledge graphs and cognitive software agents. These tools enable the possibility of interoperability, enabling the representation of more complex systems, inference capabilities, and the synthesis of new knowledge.This Account introduces the history, the core principles of KE, and its applications within the broad realm of chemical research and engineering. In this regard, we first discuss how chemical knowledge is formalized and how a chemist's cognition can be emulated with the help of reasoning algorithms. Following this, we discuss various applications of knowledge graph and agent technology used to solve problems in chemistry related to molecular engineering, chemical mechanisms, multiscale modeling, automation of calculations and experiments, and chemist-machine interactions. These developments are discussed in the context of a universal and dynamic knowledge ecosystem, referred to as The World Avatar (TWA).
Collapse
Affiliation(s)
- Aleksandar Kondinski
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Jiaru Bai
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Sebastian Mosbach
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CARES,
Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602 Singapore
| | - Jethro Akroyd
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CMCL
Innovations, Sheraton
House, Castle Park, Cambridge CB3 0AX, U.K.
| | - Markus Kraft
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.,CARES,
Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602 Singapore,School
of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459 Singapore,E-mail:
| |
Collapse
|
3
|
Gallmetzer JM, Kröll S, Werner D, Wielend D, Irimia-Vladu M, Portenkirchner E, Sariciftci NS, Hofer TS. Anthraquinone and its derivatives as sustainable materials for electrochemical applications - a joint experimental and theoretical investigation of the redox potential in solution. Phys Chem Chem Phys 2022; 24:16207-16219. [PMID: 35757985 PMCID: PMC9258729 DOI: 10.1039/d2cp01717b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Anthraquinone (AQ) has long been identified as a highly promising lead structure for various applications in organic electronics. Considering the enormous number of possible substitution patterns of the AQ lead structure, with only a minority being commercially available, a systematic experimental screening of the associated electrochemical potentials represents a highly challenging and time consuming task, which can be greatly enhanced via suitable virtual pre-screening techniques. In this work the calculated electrochemical reduction potentials of pristine AQ and 12 hydroxy- or/and amino-substituted AQ derivatives in N,N-dimethylformamide have been correlated against newly measured experimental data. In addition to the calculations performed using density functional theory (DFT), the performance of different semi-empirical density functional tight binding (DFTB) approaches has been critically assessed. It was shown that the SCC DFTB/3ob parametrization in conjunction with the COSMO solvation model provides a highly adequate description of the electrochemical potentials also in the case of the two-fold reduced species. While the quality in the correlation against the experimental data proved to be slightly inferior compared to the employed DFT approach, the highly advantageous cost-accuracy ratio of the SCC DFTB/3ob/COSMO framework has important implications in the formulation of hierarchical screening strategies for materials associated with organic electronics. Based on the observed performance, the low-cost method provides sufficiently accurate results to execute efficient pre-screening protocols, which may then be followed by a DFT-based refinement of the best candidate structures to facilitate a systematic search for new, high-performance organic electronic materials.
Collapse
Affiliation(s)
- Josef M Gallmetzer
- Theoretical Chemistry, Division, Institute of General, Inorganic and Theoretical Chemistry, Center for Chemistry and Biomedicine, University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria.
| | - Stefanie Kröll
- Theoretical Chemistry, Division, Institute of General, Inorganic and Theoretical Chemistry, Center for Chemistry and Biomedicine, University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria.
| | - Daniel Werner
- Institute of Physical Chemistry, Josef-Möller-Haus, University of Innsbruck, Innrain 52c, A-6020, Innsbruck, Austria.
| | - Dominik Wielend
- Linz Institute for Organic Solar Cells (LIOS), Institute of Physical Chemistry, Johannes Kepler University Linz, Altenberger Strasse 69, 4040 Linz, Austria
| | - Mihai Irimia-Vladu
- Linz Institute for Organic Solar Cells (LIOS), Institute of Physical Chemistry, Johannes Kepler University Linz, Altenberger Strasse 69, 4040 Linz, Austria
| | - Engelbert Portenkirchner
- Institute of Physical Chemistry, Josef-Möller-Haus, University of Innsbruck, Innrain 52c, A-6020, Innsbruck, Austria.
| | - Niyazi Serdar Sariciftci
- Linz Institute for Organic Solar Cells (LIOS), Institute of Physical Chemistry, Johannes Kepler University Linz, Altenberger Strasse 69, 4040 Linz, Austria
| | - Thomas S Hofer
- Theoretical Chemistry, Division, Institute of General, Inorganic and Theoretical Chemistry, Center for Chemistry and Biomedicine, University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria.
| |
Collapse
|
4
|
Antinucci G, Dereli B, Vittoria A, Budzelaar PHM, Cipullo R, Goryunov GP, Kulyabin PS, Uborsky DV, Cavallo L, Ehm C, Voskoboynikov AZ, Busico V. Selection of Low-Dimensional 3-D Geometric Descriptors for Accurate Enantioselectivity Prediction. ACS Catal 2022. [DOI: 10.1021/acscatal.2c00976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Giuseppe Antinucci
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Busra Dereli
- Catalysis Research Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Antonio Vittoria
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Peter H. M. Budzelaar
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Roberta Cipullo
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Georgy P. Goryunov
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Pavel S. Kulyabin
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Dmitry V. Uborsky
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Luigi Cavallo
- Catalysis Research Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Christian Ehm
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Alexander Z. Voskoboynikov
- Department of Chemistry, Lomonosov Moscow State University, 1/3 Leninskie Gory, 119991 Moscow, Russia
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| | - Vincenzo Busico
- Dipartimento di Scienze Chimiche, Università di Napoli Federico II, Via Cintia, 80126 Napoli, Italy
- DPI, P.O.
Box 902, 5600 AX Eindhoven, the Netherlands
| |
Collapse
|
5
|
Large-scale comparison between the diffraction-component precision indexes favors Cruickshank’s Rfree function. JOURNAL OF THE SERBIAN CHEMICAL SOCIETY 2022. [DOI: 10.2298/jsc200518076a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
This study aims to provide a first large-scale comparison between the various diffraction-component precision index (DPI) equations, assess the applicability of the parameter, and make recommendations on DPI computation. The DPI estimates the average accuracy of the atomic coordinates obtained by the structural refinement of protein diffraction data, with application in crystallography and cheminformatics. Although, Cruickshank and Blow proposed DPI equations based on R and Rfree in order to calculate DPI values, which remain scarcely employed in the quality assessment of the Protein Data Base (PDB) files, due to the unclear data extraction protocols (to assign variables), the complex equations, the lack of extensive applicability studies and the limited access to automated computations. In order to address these shortcomings, the entire RCSB PDB database was evaluated using Cruickshank?s and Blow?s R and Rfree DPI variations. Computations of 143070 X-ray structures indicate that Rfree-based DPI equations apply to 30 % more protein structures compared to R-based DPI equations, with Cruickshank Rfree-based DPI (CRF) exceeding the number of successful Blow?s Rfree-based DPI (BRF) computations. Although our results indicate that, in general, the resolutions < 2 ? assure consistency among the various DPIs computations (differences <0.05 ?), we recommend the use of CRF DPI because of its wider applicability.
Collapse
|
6
|
Ohms J. Current methodologies for chemical compound searching in patents: A case study. WORLD PATENT INFORMATION 2021. [DOI: 10.1016/j.wpi.2021.102055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
López-López E, Bajorath J, Medina-Franco JL. Informatics for Chemistry, Biology, and Biomedical Sciences. J Chem Inf Model 2020; 61:26-35. [PMID: 33382611 DOI: 10.1021/acs.jcim.0c01301] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Informatics is growing across disciplines, impacting several areas of chemistry, biology, and biomedical sciences. Besides the well-established bioinformatics discipline, other informatics-based interdisciplinary fields have been evolving over time, such as chemoinformatics and biomedical informatics. Other related research areas such as pharmacoinformatics, food informatics, epi-informatics, materials informatics, and neuroinformatics have emerged more recently and continue to develop as independent subdisciplines. The goals and impacts of each of these disciplines have typically been separately reviewed in the literature. Hence, it remains challenging to identify commonalities and key differences. Herein, we discuss in context three major informatics disciplines in the natural and life sciences including bioinformatics, chemoinformatics, and biomedical informatics and briefly comment on related subdisciplines. We focus the discussion on the definitions, historical background, actual impact, main similarities, and differences and evaluate the dissemination and teaching of bioinformatics, chemoinformatics, and biomedical informatics.
Collapse
Affiliation(s)
- Edgar López-López
- Department of Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV), Av Instituto Politécnico Nacional 2508, Mexico City 07360, Mexico
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c, Rheinische Friedrich-Wilhelms-Universität, D-53115 Bonn, Germany
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Av Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
8
|
Abstract
There is significant potential for electronic structure methods to improve the quality of the predictions furnished by the tools of computer-aided drug design, which typically rely on empirically derived functions. In this perspective, we consider some recent examples of how quantum mechanics has been applied in predicting protein-ligand geometries, protein-ligand binding affinities and ligand strain on binding. We then outline several significant developments in quantum mechanics methodology likely to influence these approaches: in particular, we note the advent of more computationally expedient ab initio quantum mechanical methods that can provide chemical accuracy for larger molecular systems than hitherto possible. We highlight the emergence of increasingly accurate semiempirical quantum mechanical methods and the associated role of machine learning and molecular databases in their development. Indeed, the convergence of improved algorithms for solving and analyzing electronic structure, modern machine learning methods, and increasingly comprehensive benchmark data sets of molecular geometries and energies provides a context in which the potential of quantum mechanics will be increasingly realized in driving future developments and applications in structure-based drug discovery.
Collapse
Affiliation(s)
- Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Manchester, UK.
| |
Collapse
|
9
|
Guha R. Implementing cheminformatics. J Cheminform 2019; 11:12. [PMID: 30719588 PMCID: PMC6689878 DOI: 10.1186/s13321-019-0333-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 01/14/2019] [Indexed: 11/18/2022] Open
|
10
|
Himanen L, Geurts A, Foster AS, Rinke P. Data-Driven Materials Science: Status, Challenges, and Perspectives. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2019; 6:1900808. [PMID: 31728276 PMCID: PMC6839624 DOI: 10.1002/advs.201900808] [Citation(s) in RCA: 138] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/20/2019] [Indexed: 05/06/2023]
Abstract
Data-driven science is heralded as a new paradigm in materials science. In this field, data is the new resource, and knowledge is extracted from materials datasets that are too big or complex for traditional human reasoning-typically with the intent to discover new or improved materials or materials phenomena. Multiple factors, including the open science movement, national funding, and progress in information technology, have fueled its development. Such related tools as materials databases, machine learning, and high-throughput methods are now established as parts of the materials research toolset. However, there are a variety of challenges that impede progress in data-driven materials science: data veracity, integration of experimental and computational data, data longevity, standardization, and the gap between industrial interests and academic efforts. In this perspective article, the historical development and current state of data-driven materials science, building from the early evolution of open science to the rapid expansion of materials data infrastructures are discussed. Key successes and challenges so far are also reviewed, providing a perspective on the future development of the field.
Collapse
Affiliation(s)
- Lauri Himanen
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
| | - Amber Geurts
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Department of Management StudiesAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- TNO, Netherlands Organization for Applied Scientific ResearchExpertise Center for Strategy and PolicyAnna van Beurenplein 1DA 2595The HagueNetherlands
| | - Adam Stuart Foster
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Graduate School Materials Science in MainzStaudinger Weg 955128MainzGermany
- WPI Nano Life Science Institute (WPI‐NanoLSI)Kanazawa UniversityKakuma‐machiKanazawa920‐1192Japan
| | - Patrick Rinke
- Department of Applied PhysicsAalto UniversityP.O. Box 1110000076Aalto,EspooFinland
- Theoretical Chemistry and Catalysis Research CentreTechnische Universität MünchenLichtenbergstr. 4D‐85747GarchingGermany
| |
Collapse
|
11
|
Cheminformatics Explorations of Natural Products. PROGRESS IN THE CHEMISTRY OF ORGANIC NATURAL PRODUCTS 2019; 110:1-35. [PMID: 31621009 DOI: 10.1007/978-3-030-14632-0_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The chemistry of natural products is fascinating and has continuously attracted the attention of the scientific community for many reasons including, but not limited to, biosynthesis pathways, chemical diversity, the source of bioactive compounds and their marked impact on drug discovery. There is a broad range of experimental and computational techniques (molecular modeling and cheminformatics) that have evolved over the years and have assisted the investigation of natural products. Herein, we discuss cheminformatics strategies to explore the chemistry and applications of natural products. Since the potential synergisms between cheminformatics and natural products are vast, we will focus on three major aspects: (1) exploration of the chemical space of natural products to identify bioactive compounds, with emphasis on drug discovery; (2) assessment of the toxicity profile of natural products; and (3) diversity analysis of natural product collections and the design of chemical collections inspired by natural sources.
Collapse
|
12
|
Abstract
The Chemical Information Science Gateway (CISG) of F1000Research was originally conceptualized as a forum for high-quality publications in chemical information science (CIS) including chemoinformatics. Adding a publication venue with open access and open peer review to the CIS field was a prime motivation for the introduction of CISG, aiming to support open science in this area. Herein, the CISG concept is revisited and the development of the gateway over the past four years is reviewed. In addition, opportunities are discussed to better position CISG within the publication spectrum of F1000Research and further increase its visibility and attractiveness for scientific contributions.
Collapse
|
13
|
Duarte Y, Márquez-Miranda V, Miossec MJ, González-Nilo F. Integration of target discovery, drug discovery and drug delivery: A review on computational strategies. WILEY INTERDISCIPLINARY REVIEWS-NANOMEDICINE AND NANOBIOTECHNOLOGY 2019; 11:e1554. [PMID: 30932351 DOI: 10.1002/wnan.1554] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 12/14/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
Most of the computational tools involved in drug discovery developed during the 1980s were largely based on computational chemistry, quantitative structure-activity relationship (QSAR) and cheminformatics. Subsequently, the advent of genomics in the 2000s gave rise to a huge number of databases and computational tools developed to analyze large quantities of data, through bioinformatics, to obtain valuable information about the genomic regulation of different organisms. Target identification and validation is a long process during which evidence for and against a target is accumulated in the pursuit of developing new drugs. Finally, the drug delivery system appears as a novel approach to improve drug targeting and releasing into the cells, leading to new opportunities to improve drug efficiency and avoid potential secondary effects. In each area: target discovery, drug discovery and drug delivery, different computational strategies are being developed to accelerate the process of selection and discovery of new tools to be applied to different scientific fields. Research on these three topics is growing rapidly, but still requires a global view of this landscape to detect the most challenging bottleneck and how computational tools could be integrated in each topic. This review describes the current state of the art in computational strategies for target discovery, drug discovery and drug delivery and how these fields could be integrated. Finally, we will discuss about the current needs in these fields and how the continuous development of databases and computational tools will impact on the improvement of those areas. This article is categorized under: Therapeutic Approaches and Drug Discovery > Emerging Technologies Therapeutic Approaches and Drug Discovery > Nanomedicine for Infectious Disease Nanotechnology Approaches to Biology > Nanoscale Systems in Biology.
Collapse
Affiliation(s)
- Yorley Duarte
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | - Valeria Márquez-Miranda
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | - Matthieu J Miossec
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | - Fernando González-Nilo
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile.,Centro Interdisciplinario de Neurociencias de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
| |
Collapse
|
14
|
Zahrt AF, Denmark SE. Evaluating continuous chirality measure as a 3D descriptor in chemoinformatics applied to asymmetric catalysis. Tetrahedron Lett 2019; 75:1841-1851. [PMID: 31983782 PMCID: PMC6980240 DOI: 10.1016/j.tet.2019.02.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Continuous Chirality Measure (CCM) is a computational metric by which to quantify the chirality of a compound. In enantioselective catalysis, prior work has postulated that CCM is correlated to selectivity and can be used to understand which structural features dictate catalyst efficacy. Herein, the investigation of CCM as a metric capable of guiding catalyst optimization is explored. Conformer-dependent CCM is also explored. Finally, CCM is used with Sterimol parameters to significantly improve the performance of Random Forest models.
Collapse
Affiliation(s)
| | - Scott E. Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
15
|
From chemoinformatics to deep learning: an open road to drug discovery. Future Med Chem 2019; 11:371-374. [DOI: 10.4155/fmc-2018-0449] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
16
|
Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019; 363:363/6424/eaau5631. [PMID: 30655414 DOI: 10.1126/science.aau5631] [Citation(s) in RCA: 236] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 12/03/2018] [Indexed: 12/18/2022]
Abstract
Catalyst design in asymmetric reaction development has traditionally been driven by empiricism, wherein experimentalists attempt to qualitatively recognize structural patterns to improve selectivity. Machine learning algorithms and chemoinformatics can potentially accelerate this process by recognizing otherwise inscrutable patterns in large datasets. Herein we report a computationally guided workflow for chiral catalyst selection using chemoinformatics at every stage of development. Robust molecular descriptors that are agnostic to the catalyst scaffold allow for selection of a universal training set on the basis of steric and electronic properties. This set can be used to train machine learning methods to make highly accurate predictive models over a broad range of selectivity space. Using support vector machines and deep feed-forward neural networks, we demonstrate accurate predictive modeling in the chiral phosphoric acid-catalyzed thiol addition to N-acylimines.
Collapse
Affiliation(s)
- Andrew F Zahrt
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Jeremy J Henle
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Brennan T Rose
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Yang Wang
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - William T Darrow
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA
| | - Scott E Denmark
- Roger Adams Laboratory, Department of Chemistry, University of Illinois, Urbana, IL 61801, USA.
| |
Collapse
|
17
|
Grimme S, Schreiner PR. Computerchemie: das Schicksal aktueller Methoden und zukünftige Herausforderungen. Angew Chem Int Ed Engl 2017. [DOI: 10.1002/ange.201709943] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Stefan Grimme
- Mulliken Center for Theoretical Chemistry; Universität Bonn; Beringstraße 4 53115 Bonn Deutschland
| | - Peter R. Schreiner
- Institut für Organische Chemie; Justus-Liebig-Universität; Heinrich-Buff-Ring 17 35392 Gießen Deutschland
| |
Collapse
|
18
|
Grimme S, Schreiner PR. Computational Chemistry: The Fate of Current Methods and Future Challenges. Angew Chem Int Ed Engl 2017; 57:4170-4176. [PMID: 29105929 DOI: 10.1002/anie.201709943] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Indexed: 11/12/2022]
Abstract
"Where do we go from here?" is the underlying question regarding the future (perhaps foreseeable) developments in computational chemistry. Although this young discipline has already permeated practically all of chemistry, it is likely to become even more powerful with the rapid development of computational hard- and software.
Collapse
Affiliation(s)
- Stefan Grimme
- Mulliken Center for Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115, Bonn, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus-Liebig University, Heinrich-Buff-Ring 17, 35392, Giessen, Germany
| |
Collapse
|
19
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
20
|
Jacob PM, Lan T, Goodman JM, Lapkin AA. A possible extension to the RInChI as a means of providing machine readable process data. J Cheminform 2017; 9:23. [PMID: 29086180 PMCID: PMC5388667 DOI: 10.1186/s13321-017-0210-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 04/01/2017] [Indexed: 12/21/2022] Open
Abstract
The algorithmic, large-scale use and analysis of reaction databases such as Reaxys is currently hindered by the absence of widely adopted standards for publishing reaction data in machine readable formats. Crucial data such as yields of all products or stoichiometry are frequently not explicitly stated in the published papers and, hence, not reported in the database entry for those reactions, limiting their usefulness for algorithmic analysis. This paper presents a possible extension to the IUPAC RInChI standard via an auxiliary layer, termed ProcAuxInfo, which is a standardised, extensible form in which to report certain key reaction parameters such as declaration of all products and reactants as well as auxiliaries known in the reaction, reaction stoichiometry, amounts of substances used, conversion, yield and operating conditions. The standard is demonstrated via creation of the RInChI including the ProcAuxInfo layer based on three published reactions and demonstrates accurate data recoverability via reverse translation of the created strings. Implementation of this or another method of reporting process data by the publishing community would ensure that databases, such as Reaxys, would be able to abstract crucial data for big data analysis of their contents.
Collapse
Affiliation(s)
- Philipp-Maximilian Jacob
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS UK
| | - Tian Lan
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS UK
| | | | - Alexei A. Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS UK
| |
Collapse
|
21
|
Mussa HY, Mitchell JBO, Glen RC. A note on utilising binary features as ligand descriptors. J Cheminform 2015; 7:58. [PMID: 26628925 PMCID: PMC4665894 DOI: 10.1186/s13321-015-0105-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 11/11/2015] [Indexed: 11/28/2022] Open
Abstract
It is common in cheminformatics to represent the properties of a ligand as a string of 1’s and 0’s, with the intention of elucidating, inter alia, the relationship between the chemical structure of a ligand and its bioactivity. In this commentary we note that, where relevant but non-redundant features are binary, they inevitably lead to a classifier capable of capturing only a linear relationship between structural features and activity. If, instead, we were to use relevant but non-redundant real-valued features, the resulting predictive model would be capable of describing a non-linear structure-activity relationship. Hence, we suggest that real-valued features, where available, are to be preferred in this scenario.
Collapse
Affiliation(s)
- Hamse Y Mussa
- Centre for Molecular Informatics, Department of Chemistry, Cambridge University, Lensfield Road, Cambridge, CB2 1EW UK ; EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, KY16 9ST St Andrews, Scotland
| | - John B O Mitchell
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, KY16 9ST St Andrews, Scotland
| | - Robert C Glen
- Centre for Molecular Informatics, Department of Chemistry, Cambridge University, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
22
|
Abstract
The
F1000Research publishing platform offers the opportunity to launch themed article collections as a part of its dynamic publication environment. The idea of article collections is further expanded through the generation of publication channels that focus on specific scientific areas or disciplines. This editorial introduces the
Chemical Information Science channel of
F1000Research designed to collate high-quality publications and foster a culture of open peer review. Articles will be selected by guest editor(s) and a group of experts, the channel Editorial Board, and subjected to open peer review.
Collapse
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, Bonn, D-53113, Germany
| |
Collapse
|
23
|
Mussa HY, Marcus D, Mitchell JBO, Glen RC. Verifying the fully "Laplacianised" posterior Naïve Bayesian approach and more. J Cheminform 2015; 7:27. [PMID: 26075027 PMCID: PMC4464057 DOI: 10.1186/s13321-015-0075-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 05/12/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the "Laplacian Corrected Modified Naïve Bayes" (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG's work and introduces a new version of the SNB classifier: "Tapered Naïve Bayes" (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. RESULTS LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the "optimal" number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the "optimal" number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the "optimal" number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. CONCLUSIONS The classification results obtained in this study concur with the mathematical based guidelines given in MMG's paper-that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.
Collapse
Affiliation(s)
- Hamse Y Mussa
- />Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, England CB2 1EW UK
- />EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST UK
| | - David Marcus
- />European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, England CB10 1SD UK
| | - John B O Mitchell
- />EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST UK
| | - Robert C Glen
- />Department of Chemistry, Centre for Molecular Informatics, Lensfield Road, Cambridge, England CB2 1EW UK
| |
Collapse
|
24
|
Technical advances in molecular simulation since the 1980s. Arch Biochem Biophys 2015; 582:3-9. [PMID: 25772387 DOI: 10.1016/j.abb.2015.03.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 03/05/2015] [Accepted: 03/06/2015] [Indexed: 12/14/2022]
Abstract
This review describes how the theory and practice of molecular simulation have evolved since the beginning of the 1980s when the author started his career in this field. The account is of necessity brief and subjective and highlights the changes that the author considers have had significant impact on his research and mode of working.
Collapse
|
25
|
Bird CL, Frey JG. Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences. Chem Soc Rev 2014; 42:6754-76. [PMID: 23686012 DOI: 10.1039/c3cs60050e] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information.
Collapse
Affiliation(s)
- Colin L Bird
- Chemistry, Faculty of Natural and Environmental Sciences, University of Southampton, University Road, Highfield, Southampton SO17 1BJ, UK
| | | |
Collapse
|
26
|
Frey JG, Bird CL. Cheminformatics and the Semantic Web: adding value with linked data and enhanced provenance. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013; 3:465-481. [PMID: 24432050 PMCID: PMC3884755 DOI: 10.1002/wcms.1127] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 01/08/2013] [Indexed: 12/16/2022]
Abstract
Cheminformatics is evolving from being a field of study associated primarily with drug discovery into a discipline that embraces the distribution, management, access, and sharing of chemical data. The relationship with the related subject of bioinformatics is becoming stronger and better defined, owing to the influence of Semantic Web technologies, which enable researchers to integrate heterogeneous sources of chemical, biochemical, biological, and medical information. These developments depend on a range of factors: the principles of chemical identifiers and their role in relationships between chemical and biological entities; the importance of preserving provenance and properly curated metadata; and an understanding of the contribution that the Semantic Web can make at all stages of the research lifecycle. The movements toward open access, open source, and open collaboration all contribute to progress toward the goals of integration.
Collapse
Affiliation(s)
- Jeremy G Frey
- Chemistry, Faculty of Natural Environmental Science, University of Southampton Highfield, Southampton, SO17 1BJ, UK
| | - Colin L Bird
- Chemistry, Faculty of Natural Environmental Science, University of Southampton Highfield, Southampton, SO17 1BJ, UK
| |
Collapse
|
27
|
Medina-Franco JL. Interrogating Novel Areas of Chemical Space for Drug Discovery using Chemoinformatics. Drug Dev Res 2012. [DOI: 10.1002/ddr.21034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
28
|
Edberg A, Soeria-Atmadja D, Bergman Laurila J, Johansson F, Gustafsson MG, Hammerling U. Assessing Relative Bioactivity of Chemical Substances Using Quantitative Molecular Network Topology Analysis. J Chem Inf Model 2012; 52:1238-49. [DOI: 10.1021/ci200429f] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Anna Edberg
- Division of Food
Data, National Food Agency, SE-75126 Uppsala, Sweden
| | - Daniel Soeria-Atmadja
- Division of R&D Information, AstraZeneca Research and Development, SE-15185, Södertälje, Sweden
| | | | - Fredrik Johansson
- Division of Information
Technology,
National Food Agency, SE-75126 Uppsala, Sweden
| | - Mats G. Gustafsson
- Division of Cancer Pharmacology and Computational Medicine, Department of Medical Sciences, Uppsala University and Uppsala Academic Hospital, SE-75185 Uppsala, Sweden
| | - Ulf Hammerling
- Department of Risk Benefit Assessment,
National Food Agency, SE-75126 Uppsala, Sweden
| |
Collapse
|
29
|
|