1
|
Mora JR, Marquez EA, Pérez-Pérez N, Contreras-Torres E, Perez-Castillo Y, Agüero-Chapin G, Martinez-Rios F, Marrero-Ponce Y, Barigye SJ. Rethinking the applicability domain analysis in QSAR models. J Comput Aided Mol Des 2024; 38:9. [PMID: 38351144 DOI: 10.1007/s10822-024-00550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/05/2024] [Indexed: 02/16/2024]
Abstract
Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.
Collapse
Affiliation(s)
- Jose R Mora
- Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, 170901, Ecuador
| | - Edgar A Marquez
- Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, Baja California, México
| | - Noel Pérez-Pérez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito (USFQ), Quito, Ecuador
| | - Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, Porto, 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, Porto, 4169- 007, Portugal
| | - Felix Martinez-Rios
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, 28049, Spain.
| |
Collapse
|
2
|
Abrahamsson D, Siddharth A, Robinson JF, Soshilov A, Elmore S, Cogliano V, Ng C, Khan E, Ashton R, Chiu WA, Fung J, Zeise L, Woodruff TJ. Modeling the transplacental transfer of small molecules using machine learning: a case study on per- and polyfluorinated substances (PFAS). JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2022; 32:808-819. [PMID: 36207486 PMCID: PMC9742309 DOI: 10.1038/s41370-022-00481-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 09/14/2022] [Accepted: 09/15/2022] [Indexed: 05/10/2023]
Abstract
BACKGROUND Despite their large numbers and widespread use, very little is known about the extent to which per- and polyfluoroalkyl substances (PFAS) can cross the placenta and expose the developing fetus. OBJECTIVE The aim of our study is to develop a computational approach that can be used to evaluate the of extend to which small molecules, and in particular PFAS, can cross to cross the placenta and partition to cord blood. METHODS We collected experimental values of the concentration ratio between cord and maternal blood (RCM) for 260 chemical compounds and calculated their physicochemical descriptors using the cheminformatics package Mordred. We used the compiled database to, train and test an artificial neural network (ANN). And then applied the best performing model to predict RCM for a large dataset of PFAS chemicals (n = 7982). We, finally, examined the calculated physicochemical descriptors of the chemicals to identify which properties correlated significantly with RCM. RESULTS We determined that 7855 compounds were within the applicability domain and 127 compounds are outside the applicability domain of our model. Our predictions of RCM for PFAS suggested that 3623 compounds had a log RCM > 0 indicating preferable partitioning to cord blood. Some examples of these compounds were bisphenol AF, 2,2-bis(4-aminophenyl)hexafluoropropane, and nonafluoro-tert-butyl 3-methylbutyrate. SIGNIFICANCE These observations have important public health implications as many PFAS have been shown to interfere with fetal development. In addition, as these compounds are highly persistent and many of them can readily cross the placenta, they are expected to remain in the population for a long time as they are being passed from parent to offspring. IMPACT Understanding the behavior of chemicals in the human body during pregnancy is critical in preventing harmful exposures during critical periods of development. Many chemicals can cross the placenta and expose the fetus, however, the mechanism by which this transport occurs is not well understood. In our study, we developed a machine learning model that describes the transplacental transfer of chemicals as a function of their physicochemical properties. The model was then used to make predictions for a set of 7982 per- and polyfluorinated alkyl substances that are listed on EPA's CompTox Chemicals Dashboard. The model can be applied to make predictions for other chemical categories of interest, such as plasticizers and pesticides. Accurate predictions of RCM can help scientists and regulators to prioritize chemicals that have the potential to cause harm by exposing the fetus.
Collapse
Affiliation(s)
- Dimitri Abrahamsson
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA.
| | - Adi Siddharth
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA
| | - Joshua F Robinson
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA
| | - Anatoly Soshilov
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Sarah Elmore
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Vincent Cogliano
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Carla Ng
- Department of Civil and Environmental Engineering, University of Pittsburgh, 3700 O'Hara St, Pittsburgh, PA, 15261, USA
| | - Elaine Khan
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Randolph Ashton
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, 330 N Orchard St, Madison, WI, 53715, USA
- The Stem Cell and Regenerative Medicine Center, University of Wisconsin, Madison, 1111 Highland Avenue, Madison, WI, 53705, USA
- Department of Biomedical Engineering, University of Wisconsin - Madison, 1550 Engineering Drive, Madison, WI, 53706, USA
| | - Weihsueh A Chiu
- Department of Veterinary Physiology and Pharmacology, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Jennifer Fung
- Department of Obstetrics, Gynecology, and Reproductive Science and the Center of Reproductive Science, University of California, San Francisco, San Francisco, CA, 94143-2240, USA
| | - Lauren Zeise
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Tracey J Woodruff
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA.
| |
Collapse
|
3
|
Rim KT. In silico prediction of toxicity and its applications for chemicals at work. TOXICOLOGY AND ENVIRONMENTAL HEALTH SCIENCES 2020; 12:191-202. [PMID: 32421081 PMCID: PMC7223298 DOI: 10.1007/s13530-020-00056-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/21/2020] [Indexed: 04/14/2023]
Abstract
OBJECTIVE AND METHODS This study reviewed the concept of in silico prediction of chemical toxicity for prevention of occupational cancer and future prospects in workers' health. In this review, a new approach to determine the credibility of in silico predictions with raw data is explored, and the method of determining the confidence level of evaluation based on the credibility of data is discussed. I searched various papers and books related to the in silico prediction of chemical toxicity and carcinogenicity. The intention was to utilize the most recent reports after 2015 regarding in silico prediction. RESULTS AND CONCLUSION The application of in silico methods is increasing with the prediction of toxic risks to human and the environment. The various toxic effects of industrial chemicals have triggered the recognition of the importance of using a combination of in silico models in the risk assessments. In silico occupational exposure models, industrial accidents, and occupational cancers are effectively managed and chemicals evaluated. It is important to identify and manage hazardous substances proactively through the rigorous evaluation of chemicals.
Collapse
Affiliation(s)
- Kyung-Taek Rim
- Chemicals Research Bureau, Occupational Safety and Health Research Institute, Korea Occupational Safety and Health Agency, Daejeon, Korea
| |
Collapse
|
4
|
Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.
Collapse
Affiliation(s)
- Igor I Baskin
- Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russian Federation.
- Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russian Federation.
| |
Collapse
|
5
|
Schleicher J, Conrad T, Gustafsson M, Cedersund G, Guthke R, Linde J. Facing the challenges of multiscale modelling of bacterial and fungal pathogen-host interactions. Brief Funct Genomics 2017; 16:57-69. [PMID: 26857943 PMCID: PMC5439285 DOI: 10.1093/bfgp/elv064] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Recent and rapidly evolving progress on high-throughput measurement techniques and computational performance has led to the emergence of new disciplines, such as systems medicine and translational systems biology. At the core of these disciplines lies the desire to produce multiscale models: mathematical models that integrate multiple scales of biological organization, ranging from molecular, cellular and tissue models to organ, whole-organism and population scale models. Using such models, hypotheses can systematically be tested. In this review, we present state-of-the-art multiscale modelling of bacterial and fungal infections, considering both the pathogen and host as well as their interaction. Multiscale modelling of the interactions of bacteria, especially Mycobacterium tuberculosis, with the human host is quite advanced. In contrast, models for fungal infections are still in their infancy, in particular regarding infections with the most important human pathogenic fungi, Candida albicans and Aspergillus fumigatus. We reflect on the current availability of computational approaches for multiscale modelling of host-pathogen interactions and point out current challenges. Finally, we provide an outlook for future requirements of multiscale modelling.
Collapse
Affiliation(s)
| | | | | | | | | | - Jörg Linde
- Corresponding author: Jörg Linde, Leibniz Institute for Natural Product Research and Infection Biology—Hans Knöll Institute, Jena, Germany. Tel.: +49-3641-532-1290; E-mail:
| |
Collapse
|
6
|
Ford KA. Refinement, Reduction, and Replacement of Animal Toxicity Tests by Computational Methods. ILAR J 2017; 57:226-233. [DOI: 10.1093/ilar/ilw031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 10/12/2016] [Indexed: 12/16/2022] Open
|
7
|
Abstract
It is widely accepted that modern QSAR began in the early 1960s. However, as long ago as 1816 scientists were making predictions about physical and chemical properties. The first investigations into the correlation of biological activities with physicochemical properties such as molecular weight and aqueous solubility began in 1841, almost 60 years before the important work of Overton and Meyer linking aquatic toxicity to lipid-water partitioning. Throughout the 20th century QSAR progressed, though there were many lean years. In 1962 came the seminal work of Corwin Hansch and co-workers, which stimulated a huge interest in the prediction of biological activities. Initially that interest lay largely within medicinal chemistry and drug design, but in the 1970s and 1980s, with increasing ecotoxicological concerns, QSAR modelling of environmental toxicities began to grow, especially once regulatory authorities became involved. Since then QSAR has continued to expand, with over 1400 publications annually from 2011 onwards.
Collapse
|
8
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
9
|
Venkiteshwaran K, Bocher B, Maki J, Zitomer D. Relating Anaerobic Digestion Microbial Community and Process Function. Microbiol Insights 2016; 8:37-44. [PMID: 27127410 PMCID: PMC4841157 DOI: 10.4137/mbi.s33593] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Revised: 01/19/2016] [Accepted: 01/25/2016] [Indexed: 01/01/2023] Open
Abstract
Anaerobic digestion (AD) involves a consortium of microorganisms that convert substrates into biogas containing methane for renewable energy. The technology has suffered from the perception of being periodically unstable due to limited understanding of the relationship between microbial community structure and function. The emphasis of this review is to describe microbial communities in digesters and quantitative and qualitative relationships between community structure and digester function. Progress has been made in the past few decades to identify key microorganisms influencing AD. Yet, more work is required to realize robust, quantitative relationships between microbial community structure and functions such as methane production rate and resilience after perturbations. Other promising areas of research for improved AD may include methods to increase/control (1) hydrolysis rate, (2) direct interspecies electron transfer to methanogens, (3) community structure-function relationships of methanogens, (4) methanogenesis via acetate oxidation, and (5) bioaugmentation to study community-activity relationships or improve engineered bioprocesses.
Collapse
Affiliation(s)
- Kaushik Venkiteshwaran
- Department of Civil, Construction and Environmental Engineering, Marquette University, Milwaukee, WI, USA
| | | | - James Maki
- Department of Biological Sciences, Marquette University, Milwaukee, WI, USA
| | - Daniel Zitomer
- Department of Civil, Construction and Environmental Engineering, Marquette University, Milwaukee, WI, USA
| |
Collapse
|