Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Williams AJ, Ekins S, Tkachenko V. Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today 2012;17:685-701. [PMID: 22426180 DOI: 10.1016/j.drudis.2012.02.013] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Revised: 01/17/2012] [Accepted: 02/28/2012] [Indexed: 01/25/2023]

For:	Williams AJ, Ekins S, Tkachenko V. Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today 2012;17:685-701. [PMID: 22426180 DOI: 10.1016/j.drudis.2012.02.013] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Revised: 01/17/2012] [Accepted: 02/28/2012] [Indexed: 01/25/2023]

Collapse

Number

Cited by Other Article(s)

Noga M, Jurowski K. Toxicity of Bromo-DragonFLY as a New Psychoactive Substance: Application of In Silico Methods for the Prediction of Key Toxicological Parameters Important to Clinical and Forensic Toxicology. Chem Res Toxicol 2024. [PMID: 39119730 DOI: 10.1021/acs.chemrestox.4c00105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]

Kim S, Yu B, Li Q, Bolton EE. PubChem synonym filtering process using crowdsourcing. J Cheminform 2024;16:69. [PMID: 38880887 PMCID: PMC11181558 DOI: 10.1186/s13321-024-00868-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/09/2024] [Indexed: 06/18/2024] Open

Abstract

PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a "chemical synonym"). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors to PubChem. In addition, these synonyms are used for many purposes, including creating links between chemicals and PubMed articles (using Medical Subject Headings (MeSH) terms). However, these depositor-provided name-structure associations are subject to substantial discrepancies within and between depositors, making it difficult to unambiguously map a chemical name to a specific chemical structure. The present paper describes PubChem's crowdsourcing-based synonym filtering strategy, which resolves inter- and intra-depositor discrepancies in synonym-structure associations as well as in the chemical-MeSH associations. The PubChem synonym filtering process was developed based on the analysis of four crowd-voting strategies, which differ in the consistency threshold value employed (60% vs 70%) and how to resolve intra-depositor discrepancies (a single vote vs. multiple votes per depositor) prior to inter-depositor crowd-voting. The agreement of voting was determined at six levels of chemical equivalency, which considers varying isotopic composition, stereochemistry, and connectivity of chemical structures and their primary components. While all four strategies showed comparable results, Strategy I (one vote per depositor with a 60% consistency threshold) resulted in the most synonyms assigned to a single chemical structure as well as the most synonym-structure associations disambiguated at the six chemical equivalency contexts. Based on the results of this study, Strategy I was implemented in PubChem's filtering process that cleans up synonym-structure associations as well as chemical-MeSH associations. This consistency-based filtering process is designed to look for a consensus in name-structure associations but cannot attest to their correctness. As a result, it can fail to recognize correct name-structure associations (or incorrect ones), for example, when a synonym is provided by only one depositor or when many contributors are incorrect. However, this filtering process is an important starting point for quality control in name-structure associations in large chemical databases like PubChem.

Collapse

Eriksen CA, Andersen JL, Fagerberg R, Merkle D. Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases. J Comput Biol 2024;31:498-512. [PMID: 38758924 DOI: 10.1089/cmb.2024.0520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024] Open

Mansouri K, Moreira-Filho JT, Lowe CN, Charest N, Martin T, Tkachenko V, Judson R, Conway M, Kleinstreuer NC, Williams AJ. Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling. J Cheminform 2024;16:19. [PMID: 38378618 PMCID: PMC10880251 DOI: 10.1186/s13321-024-00814-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/10/2024] [Indexed: 02/22/2024] Open

Abstract

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

Collapse

On the ability of machine learning methods to discover novel scaffolds. J Mol Model 2022;29:22. [PMID: 36574054 DOI: 10.1007/s00894-022-05359-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 10/21/2022] [Indexed: 12/28/2022]

Abstract

The recent advances in the application of machine learning to drug discovery have made it a 'hot topic' for research, with hundreds of academic groups and companies integrating machine learning into their drug discovery projects. Nevertheless, there remains great uncertainty regarding the most appropriate ways to evaluate the relative performance of these powerful methods against more traditional cheminformatics approaches, and many pitfalls remain for the unwary. In 2020, researchers at MIT (Stokes et al., Cell 180(4), 688-702, 2020) reported the discovery of a new compound with antibacterial activity, halicin, through the use of a neural network machine learning method. A robust ability to identify new active chemotypes through computational methods would be very useful. In this study, we have used the Stokes et al. dataset to compare the performance of this method to two other approaches, Mapping of Activity Through Dichotomic Scores (MADS) by Todeschini et al. (J Chemom 32(4):e2994, 2018) and Random Matrix Theory (RMT) by Lee et al. (Proc Natl Acad Sci 116(9):3373-3378, 2019). Our results demonstrate that all three methods are capable of predicting halicin as an active antibacterial compound, but that this result is dependent on the dataset composition, pre-processing and the molecular fingerprint used. We have further assessed overall performance as determined by several performance metrics. We also investigated the scaffold hopping potential of the methods by modifying the dataset by removal of the β-lactam and fluoroquinolone chemotypes. MADS and RMT are able to identify actives in the test set that contained these substructures. This ability arises because of high scoring fragments of the withheld chemotypes that are in common with other active antibiotic classes. Interestingly, MADS is relatively better compared to the other two methods based on general predictive performance.

Collapse

Li L, Zhang Z, Men Y, Baskaran S, Sangion A, Wang S, Arnot JA, Wania F. Retrieval, Selection, and Evaluation of Chemical Property Data for Assessments of Chemical Emissions, Fate, Hazard, Exposure, and Risks. ACS ENVIRONMENTAL AU 2022;2:376-395. [PMID: 37101455 PMCID: PMC10125307 DOI: 10.1021/acsenvironau.2c00010] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 04/28/2023]

Baig MH, Ahmad K, Moon JS, Park SY, Ho Lim J, Chun HJ, Qadri AF, Hwang YC, Jan AT, Ahmad SS, Ali S, Shaikh S, Lee EJ, Choi I. Myostatin and its Regulation: A Comprehensive Review of Myostatin Inhibiting Strategies. Front Physiol 2022;13:876078. [PMID: 35812316 PMCID: PMC9259834 DOI: 10.3389/fphys.2022.876078] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 06/06/2022] [Indexed: 12/12/2022] Open

Affiliation(s)

Mohammad Hassan Baig Department of Family Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea
Khurshid Ahmad Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea Research Institute of Cell Culture, Yeungnam University, Gyeongsan, South Korea
Jun Sung Moon Department of Internal Medicine, College of Medicine, Yeungnam University, Daegu, South Korea
So-Young Park Department of Physiology, College of Medicine, Yeungnam University, Daegu, South Korea
Jeong Ho Lim Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea Research Institute of Cell Culture, Yeungnam University, Gyeongsan, South Korea
Hee Jin Chun Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea Research Institute of Cell Culture, Yeungnam University, Gyeongsan, South Korea
Afsha Fatima Qadri Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea
Ye Chan Hwang Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea
Arif Tasleem Jan School of Biosciences and Biotechnology, Baba Ghulam Shah Badshah University, Rajouri, India
Syed Sayeed Ahmad Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea
Shahid Ali Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea
Sibhghatulla Shaikh Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea
Eun Ju Lee Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea Research Institute of Cell Culture, Yeungnam University, Gyeongsan, South Korea *Correspondence: Eun Ju Lee, ; Inho Choi,
Inho Choi Department of Medical Biotechnology, Yeungnam University, Gyeongsan, South Korea Research Institute of Cell Culture, Yeungnam University, Gyeongsan, South Korea *Correspondence: Eun Ju Lee, ; Inho Choi,

Collapse

He K. Pharmacological affinity fingerprints derived from bioactivity data for the identification of designer drugs. J Cheminform 2022;14:35. [PMID: 35672835 PMCID: PMC9171973 DOI: 10.1186/s13321-022-00607-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022] Open

Dolciami D, Villasclaras-Fernandez E, Kannas C, Meniconi M, Al-Lazikani B, Antolin AA. canSAR chemistry registration and standardization pipeline. J Cheminform 2022;14:28. [PMID: 35643512 PMCID: PMC9148294 DOI: 10.1186/s13321-022-00606-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/04/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Background Integration of medicinal chemistry data from numerous public resources is an increasingly important part of academic drug discovery and translational research because it can bring a wealth of important knowledge related to compounds in one place. However, different data sources can report the same or related compounds in various forms (e.g., tautomers, racemates, etc.), thus highlighting the need of organising related compounds in hierarchies that alert the user on important bioactivity data that may be relevant. To generate these compound hierarchies, we have developed and implemented canSARchem, a new compound registration and standardization pipeline as part of the canSAR public knowledgebase. canSARchem builds on previously developed ChEMBL and PubChem pipelines and is developed using KNIME. We describe the pipeline which we make publicly available, and we provide examples on the strengths and limitations of the use of hierarchies for bioactivity data exploration. Finally, we identify canonicalization enrichment in FDA-approved drugs, illustrating the benefits of our approach. Results We created a chemical registration and standardization pipeline in KNIME and made it freely available to the research community. The pipeline consists of five steps to register the compounds and create the compounds’ hierarchy: 1. Structure checker, 2. Standardization, 3. Generation of canonical tautomers and representative structures, 4. Salt strip, and 5. Generation of abstract structure to generate the compound hierarchy. Unlike ChEMBL’s RDKit pipeline, we carry out compound canonicalization ahead of getting the parent structure, similar to PubChem’s OpenEye pipeline. canSARchem has a lower rejection rate compared to both PubChem and ChEMBL. We use our pipeline to assess the impact of grouping the compounds in hierarchies for bioactivity data exploration. We find that FDA-approved drugs show statistically significant sensitivity to canonicalization compared to the majority of bioactive compounds which demonstrates the importance of this step. Conclusions We use canSARchem to standardize all the compounds uploaded in canSAR (> 3 million) enabling efficient data integration and the rapid identification of alternative compound forms with useful bioactivity data. Comparison with PubChem and ChEMBL pipelines evidenced comparable performances in compound standardization, but only PubChem and canSAR canonicalize tautomers and canSAR has a slightly lower rejection rate. Our results highlight the importance of compound hierarchies for bioactivity data exploration. We make canSARchem available under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) at https://gitlab.icr.ac.uk/cansar-public/compound-registration-pipeline. Collapse

Jacobs A, Williams D, Hickey K, Patrick N, Williams AJ, Chalk S, McEwen L, Willighagen E, Walker M, Bolton E, Sinclair G, Sanford A. CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community. J Chem Inf Model 2022;62:2737-2743. [PMID: 35559614 PMCID: PMC9199008 DOI: 10.1021/acs.jcim.2c00268] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Talley KR, White R, Wunder N, Eash M, Schwarting M, Evenson D, Perkins JD, Tumas W, Munch K, Phillips C, Zakutayev A. Research data infrastructure for high-throughput experimental materials science. PATTERNS (NEW YORK, N.Y.) 2021;2:100373. [PMID: 34950901 PMCID: PMC8672147 DOI: 10.1016/j.patter.2021.100373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/13/2021] [Accepted: 09/30/2021] [Indexed: 11/26/2022]

Abstract

The High-Throughput Experimental Materials Database (HTEM-DB, htem.nrel.gov) is a repository of inorganic thin-film materials data collected during combinatorial experiments at the National Renewable Energy Laboratory (NREL). This data asset is enabled by NREL's Research Data Infrastructure (RDI), a set of custom data tools that collect, process, and store experimental data and metadata. Here, we describe the experimental data flow from the RDI to the HTEM-DB to illustrate the strategies and best practices currently used for materials data at NREL. Integration of the data tools with experimental instruments establishes a data communication pipeline between experimental researchers and data scientists. This work motivates the creation of similar workflows at other institutions to aggregate valuable data and increase their usefulness for future machine learning studies. In turn, such data-driven studies can greatly accelerate the pace of discovery and design in the materials science domain.

•

Automated curation of experimental materials data

•

Integration of data tools into the experimental laboratory

•

Simple, effective, and flexible data archival system

•

Collection of metadata for enhanced total data value

For machine learning to make significant contributions to a scientific domain, algorithms must ingest and learn from high-quality, large-volume datasets. The Research Data Infrastructure (RDI) that feeds the High-Throughput Experimental Materials Database (HTEM-DB, htem.nrel.gov) provides such a dataset from existing experimental data streams at the National Renewable Energy Laboratory (NREL). The described methods for curating experimental data can be applied to other materials research laboratory settings, paving the way for increased application of machine learning to materials science. In turn, the resulting new materials and new knowledge will benefit the society by advancing new technologies in energy, fuels, computing, security, and other important areas.

Collapse

Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021;9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open

Bugeac CA, Ancuceanu R, Dinu M. QSAR Models for Active Substances against Pseudomonas aeruginosa Using Disk-Diffusion Test Data. Molecules 2021;26:molecules26061734. [PMID: 33808845 PMCID: PMC8003670 DOI: 10.3390/molecules26061734] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/14/2021] [Accepted: 03/15/2021] [Indexed: 12/02/2022] Open

Hu B, Lin A, Brinson LC. ChemProps: A RESTful API enabled database for composite polymer name standardization. J Cheminform 2021;13:22. [PMID: 33712066 PMCID: PMC7955638 DOI: 10.1186/s13321-021-00502-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/01/2021] [Indexed: 11/24/2022] Open

Costanzi S, Slavick CK, Hutcheson BO, Koblentz GD, Cupitt RT. Lists of Chemical Warfare Agents and Precursors from International Nonproliferation Frameworks: Structural Annotation and Chemical Fingerprint Analysis. J Chem Inf Model 2020;60:4804-4816. [DOI: 10.1021/acs.jcim.0c00896] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Baker CM, Kidley NJ, Papachristos K, Hotson M, Carson R, Gravestock D, Pouliot M, Harrison J, Dowling A. Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry. J Chem Inf Model 2020;60:3781-3791. [PMID: 32644790 DOI: 10.1021/acs.jcim.0c00232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Ambure P, Cordeiro MNDS. Importance of Data Curation in QSAR Studies Especially While Modeling Large-Size Datasets. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Toukach PV, Egorova KS. New Features of Carbohydrate Structure Database Notation (CSDB Linear), As Compared to Other Carbohydrate Notations. J Chem Inf Model 2019;60:1276-1289. [DOI: 10.1021/acs.jcim.9b00744] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Pham N, van Heck RGA, van Dam JCJ, Schaap PJ, Saccenti E, Suarez-Diez M. Consistency, Inconsistency, and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome-Scale Metabolic Modelling. Metabolites 2019;9:E28. [PMID: 30736318 PMCID: PMC6409771 DOI: 10.3390/metabo9020028] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 01/24/2019] [Accepted: 01/31/2019] [Indexed: 12/22/2022] Open

Sobus JR, Wambaugh JF, Isaacs KK, Williams AJ, McEachran AD, Richard AM, Grulke CM, Ulrich EM, Rager JE, Strynar MJ, Newton SR. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2018;28:411-426. [PMID: 29288256 PMCID: PMC6661898 DOI: 10.1038/s41370-017-0012-y] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 08/04/2017] [Accepted: 08/25/2017] [Indexed: 05/18/2023]

Affiliation(s)

Jon R Sobus U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA.
John F Wambaugh U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Kristin K Isaacs U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Antony J Williams U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Andrew D McEachran Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Ann M Richard U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Christopher M Grulke U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Elin M Ulrich U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Julia E Rager Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA ToxStrategies, Inc., 9390 Research Blvd., Suite 100, Austin, TX, 78759, USA
Mark J Strynar U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Seth R Newton U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA

Collapse

McEachran AD, Mansouri K, Grulke C, Schymanski EL, Ruttkies C, Williams AJ. "MS-Ready" structures for non-targeted high-resolution mass spectrometry screening studies. J Cheminform 2018;10:45. [PMID: 30167882 PMCID: PMC6117229 DOI: 10.1186/s13321-018-0299-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 08/21/2018] [Indexed: 02/05/2023] Open

Basith S, Cui M, Macalino SJY, Park J, Clavio NAB, Kang S, Choi S. Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design. Front Pharmacol 2018;9:128. [PMID: 29593527 PMCID: PMC5854945 DOI: 10.3389/fphar.2018.00128] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/06/2018] [Indexed: 01/14/2023] Open

Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 2018. [PMID: 29520515 PMCID: PMC5843579 DOI: 10.1186/s13321-018-0263-1] [Citation(s) in RCA: 271] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q² of the models varied from 0.72 to 0.95, with an average of 0.86 and an R² test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard.

Collapse

Patel M, Chilton ML, Sartini A, Gibson L, Barber C, Covey-Crump L, Przybylak KR, Cronin MTD, Madden JC. Assessment and Reproducibility of Quantitative Structure–Activity Relationship Models by the Nonexpert. J Chem Inf Model 2018;58:673-682. [DOI: 10.1021/acs.jcim.7b00523] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Filimonov D, Druzhilovskiy D, Lagunin A, Gloriozova T, Rudik A, Dmitriev A, Pogodin P, Poroikov V. Computer-aided prediction of biological activity spectra for chemical compounds: opportunities and limitation. ACTA ACUST UNITED AC 2018. [DOI: 10.18097/bmcrm00004] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Abstract An essential characteristic of chemical compounds is their biological activity since its presence can become the basis for the use of the substance for therapeutic purposes, or, on the contrary, limit the possibilities of its practical application due to the manifestation of side action and toxic effects. Computer assessment of the biological activity spectra makes it possible to determine the most promising directions for the study of the pharmacological action of particular substances, and to filter out potentially dangerous molecules at the early stages of research. For more than 25 years, we have been developing and improving the computer program PASS (Prediction of Activity Spectra for Substances), designed to predict the biological activity spectrum of substance based on the structural formula of its molecules. The prediction is carried out by the analysis of structure-activity relationships for the training set, which currently contains information on structures and known biological activities for more than one million molecules. The structure of the organic compound is represented in PASS using Multilevel Neighborhoods of Atoms descriptors; the activity prediction for new compounds is performed by the naive Bayes classifier and the structure-activity relationships determined by the analysis of the training set. We have created and improved both local versions of the PASS program and freely available web resources based on PASS (http://www.way2drug.com). They predict several thousand biological activities (pharmacological effects, molecular mechanisms of action, specific toxicity and adverse effects, interaction with the unwanted targets, metabolism and action on molecular transport), cytotoxicity for tumor and non-tumor cell lines, carcinogenicity, induced changes of gene expression profiles, metabolic sites of the major enzymes of the first and second phases of xenobiotics biotransformation, and belonging to substrates and/or metabolites of metabolic enzymes. The web resource Way2Drug is used by over 18,000 researchers from more than 90 countries around the world, which allowed them to obtain over 600,000 predictions and publish about 500 papers describing the obtained results. The analysis of the published works shows that in some cases the interpretation of the prediction results presented by the authors of these publications requires an adjustment. In this work, we provide the theoretical basis and consider, on particular examples, the opportunities and limitations of computer-aided prediction of biological activity spectra. Collapse

Olier I, Sadawi N, Bickerton GR, Vanschoren J, Grosan C, Soldatova L, King RD. Meta-QSAR: a large-scale application of meta-learning to drug design and discovery. Mach Learn 2017;107:285-311. [PMID: 31997851 PMCID: PMC6956898 DOI: 10.1007/s10994-017-5685-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 10/04/2017] [Indexed: 11/03/2022]

Korotcov A, Tkachenko V, Russo DP, Ekins S. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Mol Pharm 2017;14:4462-4475. [PMID: 29096442 PMCID: PMC5741413 DOI: 10.1021/acs.molpharmaceut.7b00578] [Citation(s) in RCA: 184] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Abstract

Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.

Collapse

Low YS, Daugherty AC, Schroeder EA, Chen W, Seto T, Weber S, Lim M, Hastie T, Mathur M, Desai M, Farrington C, Radin AA, Sirota M, Kenkare P, Thompson CA, Yu PP, Gomez SL, Sledge GW, Kurian AW, Shah NH. Synergistic drug combinations from electronic health records and gene expression. J Am Med Inform Assoc 2017;24:565-576. [PMID: 27940607 PMCID: PMC6080645 DOI: 10.1093/jamia/ocw161] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017;9:61. [PMID: 29185060 PMCID: PMC5705535 DOI: 10.1186/s13321-017-0247-6] [Citation(s) in RCA: 568] [Impact Index Per Article: 81.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Accepted: 11/18/2017] [Indexed: 11/10/2022] Open

Abstract

Despite an abundance of online databases providing access to chemical data, there is increasing demand for high-quality, structure-curated, open data to meet the various needs of the environmental sciences and computational toxicology communities. The U.S. Environmental Protection Agency's (EPA) web-based CompTox Chemistry Dashboard is addressing these needs by integrating diverse types of relevant domain data through a cheminformatics layer, built upon a database of curated substances linked to chemical structures. These data include physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data, surfaced through an integration hub with link-outs to additional EPA data and public domain online resources. Batch searching allows for direct chemical identifier (ID) mapping and downloading of multiple data streams in several different formats. This facilitates fast access to available structure, property, toxicity, and bioassay data for collections of chemicals (hundreds to thousands at a time). Advanced search capabilities are available to support, for example, non-targeted analysis and identification of chemicals using mass spectrometry. The contents of the chemistry database, presently containing ~ 760,000 substances, are available as public domain data for download. The chemistry content underpinning the Dashboard has been aggregated over the past 15 years by both manual and auto-curation techniques within EPA's DSSTox project. DSSTox chemical content is subject to strict quality controls to enforce consistency among chemical substance-structure identifiers, as well as list curation review to ensure accurate linkages of DSSTox substances to chemical lists and associated data. The Dashboard, publicly launched in April 2016, has expanded considerably in content and user traffic over the past year. It is continuously evolving with the growth of DSSTox into high-interest or data-rich domains of interest to EPA, such as chemicals on the Toxic Substances Control Act listing, while providing the user community with a flexible and dynamic web-based platform for integration, processing, visualization and delivery of data and resources. The Dashboard provides support for a broad array of research and regulatory programs across the worldwide community of toxicologists and environmental scientists.

Collapse

Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I. Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure. Chem Res Toxicol 2017;30:2046-2059. [PMID: 28768096 DOI: 10.1021/acs.chemrestox.7b00084] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using 5-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%), and these gains were correlated (ρ = 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.

Collapse

Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017;18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance.

METHOD

We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions.

RESULTS

VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time.

CONCLUSION

In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.

Collapse

Koutsoukas A, Monaghan KJ, Li X, Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017;9:42. [PMID: 29086090 PMCID: PMC5489441 DOI: 10.1186/s13321-017-0226-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/27/2017] [Indexed: 01/03/2023] Open

Abstract

Background

In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.

Results

We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.

Collapse

Gally JM, Bourg S, Do QT, Aci-Sèche S, Bonnet P. VSPrep: A General KNIME Workflow for the Preparation of Molecules for Virtual Screening. Mol Inform 2017;36. [PMID: 28586180 DOI: 10.1002/minf.201700023] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 05/05/2017] [Indexed: 12/27/2022]

Web Resources for Discovery and Development of New Medicines. Pharm Chem J 2017. [DOI: 10.1007/s11094-017-1563-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Card ML, Gomez-Alvarez V, Lee WH, Lynch DG, Orentas NS, Lee MT, Wong EM, Boethling RS. History of EPI Suite™ and future perspectives on chemical property estimation in US Toxic Substances Control Act new chemical risk assessments. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2017;19:203-212. [PMID: 28275775 DOI: 10.1039/c7em00064b] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Mansouri K, Grulke CM, Richard AM, Judson RS, Williams AJ. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016;27:939-965. [PMID: 27885862 DOI: 10.1080/1062936x.2016.1253611] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Accepted: 10/24/2016] [Indexed: 05/18/2023]

Ekins S. The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 2016;33:2594-603. [PMID: 27599991 DOI: 10.1007/s11095-016-2029-7] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Accepted: 08/23/2016] [Indexed: 01/22/2023]

Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS. ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol 2016;29:1225-51. [PMID: 27367298 DOI: 10.1021/acs.chemrestox.6b00135] [Citation(s) in RCA: 386] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Abstract

The U.S. Environmental Protection Agency's (EPA) ToxCast program is testing a large library of Agency-relevant chemicals using in vitro high-throughput screening (HTS) approaches to support the development of improved toxicity prediction models. Launched in 2007, Phase I of the program screened 310 chemicals, mostly pesticides, across hundreds of ToxCast assay end points. In Phase II, the ToxCast library was expanded to 1878 chemicals, culminating in the public release of screening data at the end of 2013. Subsequent expansion in Phase III has resulted in more than 3800 chemicals actively undergoing ToxCast screening, 96% of which are also being screened in the multi-Agency Tox21 project. The chemical library unpinning these efforts plays a central role in defining the scope and potential application of ToxCast HTS results. The history of the phased construction of EPA's ToxCast library is reviewed, followed by a survey of the library contents from several different vantage points. CAS Registry Numbers are used to assess ToxCast library coverage of important toxicity, regulatory, and exposure inventories. Structure-based representations of ToxCast chemicals are then used to compute physicochemical properties, substructural features, and structural alerts for toxicity and biotransformation. Cheminformatics approaches using these varied representations are applied to defining the boundaries of HTS testability, evaluating chemical diversity, and comparing the ToxCast library to potential target application inventories, such as used in EPA's Endocrine Disruption Screening Program (EDSP). Through several examples, the ToxCast chemical library is demonstrated to provide comprehensive coverage of the knowledge domains and target inventories of potential interest to EPA. Furthermore, the varied representations and approaches presented here define local chemistry domains potentially worthy of further investigation (e.g., not currently covered in the testing library or defined by toxicity "alerts") to strategically support data mining and predictive toxicology modeling moving forward.

Collapse

Affiliation(s)

Ann M Richard National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Richard S Judson National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Keith A Houck National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Christopher M Grulke National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Patra Volarath Center for Food Safety and Nutrition, U.S. Food and Drug Administration , 5100 Paint Branch Parkway, College Park, Maryland 20740, United States
Inthirany Thillainadarajah Senior Environmental Employment Program, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States
Chihae Yang Molecular Networks GmbH , Henkestraße 91, 91052 Erlangen, Germany.,Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States
James Rathman Altamira, LLC , 1455 Candlewood Drive, Columbus, Ohio 43235, United States.,Department of Chemical and Biomolecular Engineering, The Ohio State University , 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States
Matthew T Martin National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
John F Wambaugh National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Thomas B Knudsen National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Jayaram Kancherla ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
Kamel Mansouri ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States
Grace Patlewicz National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Antony J Williams National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Stephen B Little National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Kevin M Crofton National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States
Russell S Thomas National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency , Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States

Collapse

Croset S, Rupp J, Romacker M. Flexible data integration and curation using a graph-based approach. Bioinformatics 2016;32:918-25. [PMID: 26556384 DOI: 10.1093/bioinformatics/btv644] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/21/2015] [Indexed: 11/14/2022] Open

Ball N, Cronin MTD, Shen J, Blackburn K, Booth ED, Bouhifd M, Donley E, Egnash L, Hastings C, Juberg DR, Kleensang A, Kleinstreuer N, Kroese ED, Lee AC, Luechtefeld T, Maertens A, Marty S, Naciff JM, Palmer J, Pamies D, Penman M, Richarz AN, Russo DP, Stuard SB, Patlewicz G, van Ravenzwaay B, Wu S, Zhu H, Hartung T. Toward Good Read-Across Practice (GRAP) guidance. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2016;33:149-66. [PMID: 26863606 PMCID: PMC5581000 DOI: 10.14573/altex.1601251] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 02/11/2016] [Indexed: 12/04/2022]

Affiliation(s)

Nicholas Ball The Dow Chemical Company, Midland, MI, USA
Mark T D Cronin School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
Jie Shen Research Institute for Fragrance Materials, Inc. Woodcliff Lake, NJ, USA
Karen Blackburn The Procter and Gamble Co., Cincinatti, OH, USA
Ewan D Booth Syngenta Ltd, Jealott's Hill International Research Centre, Bracknell, Berkshire, UK
Mounir Bouhifd Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Elizabeth Donley Stemina Biomarker Discovery Inc., Madison, WI, USA
Laura Egnash Stemina Biomarker Discovery Inc., Madison, WI, USA
Charles Hastings BASF SE, Ludwigshafen am Rhein, Germany, and Research Triangle Park, NC, USA
Daland R Juberg The Dow Chemical Company, Midland, MI, USA
Andre Kleensang Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Nicole Kleinstreuer National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
E Dinant Kroese Risk Analysis for Products in Development, TNO Zeist, The Netherlands
Adam C Lee DuPont Haskell Global Centers for Health and Environmental Sciences, Newark, DE, USA
Thomas Luechtefeld Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Alexandra Maertens Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Sue Marty The Dow Chemical Company, Midland, MI, USA
Jorge M Naciff The Procter and Gamble Co., Cincinatti, OH, USA
Jessica Palmer Stemina Biomarker Discovery Inc., Madison, WI, USA
David Pamies Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
Mike Penman Penman Consulting, Brussels, Belgium
Andrea-Nicole Richarz School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
Daniel P Russo Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Sharon B Stuard The Procter and Gamble Co., Cincinatti, OH, USA
Grace Patlewicz US EPA/ORD, National Center for Computational Toxicology, Research Triangle Park, NC, USA
Bennard van Ravenzwaay Risk Analysis for Products in Development, TNO Zeist, The Netherlands
Shengde Wu The Procter and Gamble Co., Cincinatti, OH, USA
Hao Zhu Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Thomas Hartung Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA.,University of Konstanz, CAAT-Europe, Konstanz, Germany

Collapse

Akhondi SA, Muresan S, Williams AJ, Kors JA. Ambiguity of non-systematic chemical identifiers within and between small-molecule databases. J Cheminform 2015;7:54. [PMID: 26579214 PMCID: PMC4646925 DOI: 10.1186/s13321-015-0102-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/30/2015] [Indexed: 11/18/2022] Open

Abstract

Background

A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers.

Results

The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7–60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points).

Conclusions

Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-015-0102-6) contains supplementary material, which is available to authorized users.

Collapse

Hersey A, Chambers J, Bellis L, Patrícia Bento A, Gaulton A, Overington JP. Chemical databases: curation or integration by user-defined equivalence? DRUG DISCOVERY TODAY. TECHNOLOGIES 2015;14:17-24. [PMID: 26194583 PMCID: PMC6294287 DOI: 10.1016/j.ddtec.2015.01.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 01/15/2015] [Accepted: 01/16/2015] [Indexed: 11/30/2022]

Tarasova OA, Urusova AF, Filimonov DA, Nicklaus MC, Zakharov AV, Poroikov VV. QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors. J Chem Inf Model 2015;55:1388-99. [PMID: 26046311 DOI: 10.1021/acs.jcim.5b00019] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Ai N, Fan X, Ekins S. In silico methods for predicting drug-drug interactions with cytochrome P-450s, transporters and beyond. Adv Drug Deliv Rev 2015;86:46-60. [PMID: 25796619 DOI: 10.1016/j.addr.2015.03.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 01/05/2015] [Accepted: 03/11/2015] [Indexed: 12/13/2022]

Karapetyan K, Batchelor C, Sharpe D, Tkachenko V, Williams AJ. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J Cheminform 2015;7:30. [PMID: 26155308 PMCID: PMC4494041 DOI: 10.1186/s13321-015-0072-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 04/28/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets.

RESULTS

The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/.

CONCLUSION

A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.

Collapse

Warr WA. Many InChIs and quite some feat. J Comput Aided Mol Des 2015;29:681-94. [PMID: 26081259 DOI: 10.1007/s10822-015-9854-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 06/10/2015] [Indexed: 12/14/2022]

Brito-Sánchez Y, Marrero-Ponce Y, Barigye SJ, Yaber-Goenaga I, Morell Pérez C, Le-Thi-Thu H, Cherkasov A. Towards Better BBB Passage Prediction Using an Extensive and Curated Data Set. Mol Inform 2015;34:308-30. [PMID: 27490276 DOI: 10.1002/minf.201400118] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 01/20/2015] [Indexed: 12/25/2022]

Affiliation(s)

Yoan Brito-Sánchez Vancouver Prostate Centre, University of British Columbia, Vancouver, British Columbia, V6H 3Z6, Canada.,Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research, International Network (CAMD-BIR International Network), Los Laureles L76MD, Nuevo Bosque, 130015, Cartagena de Indias, Bolivar, Colombia. Homepage: http://www.uv.es/yoma/ Homepage: http://sites.google.com/site/ymponce/home
Yovani Marrero-Ponce Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research, International Network (CAMD-BIR International Network), Los Laureles L76MD, Nuevo Bosque, 130015, Cartagena de Indias, Bolivar, Colombia. Homepage: http://www.uv.es/yoma/ Homepage: http://sites.google.com/site/ymponce/home. .,Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad Tecnológica de Bolívar, Parque Industrial y Tecnológico Carlos Vélez Pombo Km 1 vía Turbaco, 130010, Cartagena de Indias, Bolívar, Colombia. .,Facultad de Química Farmacéutica, Universidad de Cartagena, Cartagena de Indias, Bolívar, Colombia.
Stephen J Barigye Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research, International Network (CAMD-BIR International Network), Los Laureles L76MD, Nuevo Bosque, 130015, Cartagena de Indias, Bolivar, Colombia. Homepage: http://www.uv.es/yoma/ Homepage: http://sites.google.com/site/ymponce/home.,Department of Chemistry, Federal University of Lavras, P.O. Box 3037, 37200-000, Lavras, MG, Brazil
Iván Yaber-Goenaga Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad Tecnológica de Bolívar, Parque Industrial y Tecnológico Carlos Vélez Pombo Km 1 vía Turbaco, 130010, Cartagena de Indias, Bolívar, Colombia
Carlos Morell Pérez Center of Studies on Informatics, Universidad "Marta Abreu" de Las Villas, Santa Clara, 54830, Villa Clara, Cuba
Huong Le-Thi-Thu School of Medicine and Pharmacy, Vietnam National University, Hanoi (VNU) 144 Xuan Thuy, CauGiay, Hanoi, Vietnam
Artem Cherkasov Vancouver Prostate Centre, University of British Columbia, Vancouver, British Columbia, V6H 3Z6, Canada

Collapse

Clark AM, Williams AJ, Ekins S. Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data. J Cheminform 2015;7:9. [PMID: 25798198 PMCID: PMC4369291 DOI: 10.1186/s13321-015-0057-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 02/23/2015] [Indexed: 11/12/2022] Open

Lab notebook entries must target both visualisation by scientists and use by machine learning algorithms

Alex M Clark Molecular Materials Informatics, 1900 St. Jacques #302, Montreal, H3J 2S1, QC Canada
Antony J Williams Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587 USA
Sean Ekins Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526 USA ; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010 USA

Collapse

Abstract

This brief review of current research progress on Charcot-Marie-Tooth (CMT) disease is a summary of discussions initiated at the Hereditary Neuropathy Foundation (HNF) scientific advisory board meeting on November 7, 2014. It covers recent published and unpublished in vitro and in vivo research. We discuss recent promising preclinical work for CMT1A, the development of new biomarkers, the characterization of different animal models, and the analysis of the frequency of gene mutations in patients with CMT. We also describe how progress in related fields may benefit CMT therapeutic development, including the potential of gene therapy and stem cell research. We also discuss the potential to assess and improve the quality of life of CMT patients. This summary of CMT research identifies some of the gaps which may have an impact on upcoming clinical trials. We provide some priorities for CMT research and areas which HNF can support. The goal of this review is to inform the scientific community about ongoing research and to avoid unnecessary overlap, while also highlighting areas ripe for further investigation. The general collaborative approach we have taken may be useful for other rare neurological diseases.

Collapse

Affiliation(s)

Sean Ekins Hereditary Neuropathy Foundation, New York, NY, 10016, USA ; Collaborations in Chemistry, Fuquay Varina, NC, 27526, USA ; Collaborative Drug Discovery, Burlingame, CA, 94010, USA
Nadia K Litterman Collaborative Drug Discovery, Burlingame, CA, 94010, USA
Renée J G Arnold Arnold Consultancy & Technology LLC, New York, NY, 10023, USA ; Master of Public Health Program, Mount Sinai School of Medicine, New York, NY, 10029, USA ; Quorum Consulting, Inc, San Francisco, CA, 94104, USA
Robert W Burgess The Jackson Laboratory in Bar Harbor, Bar Harbour, ME, 04609, USA
Joel S Freundlich Department of Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University - New Jersey Medical School, Newark, NJ, 07103, USA
Steven J Gray Gene Therapy Center and Dept. of Ophthalmology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7352, USA
Joseph J Higgins Quest Diagnostics, Athena Brand, Marlborough, MA, 01572, USA
Brett Langley Burke-Cornell Medical Research Institute, White Plains, NY, 10605, USA ; Department of Neurology and Neuroscience, Weill Medical College of Cornell University, New York, NY, 10065, USA
Dianna E Willis Burke-Cornell Medical Research Institute, White Plains, NY, 10605, USA
Lucia Notterpek Department of Neuroscience, College of Medicine, McKnight Brain Institute, University of Florida, Gainesville, FL, 32611, USA
David Pleasure Institute for Pediatric Regenerative Medicine, University of California Davis, School of Medicine, Sacramento, CA, 95817, USA ; Department of Neurology, University of California, Davis, School of Medicine, c/o Shriners Hospital, Sacramento, CA, 95817, USA
Michael W Sereda Department of Neurogenetics, Max Planck Institute (MPI) of Experimental Medicine, Göttingen, 37075, Germany ; Department of Clinical Neurophysiology, University Medical Center (UMG), Göttingen, D-37075, Germany
Allison Moore Hereditary Neuropathy Foundation, New York, NY, 10016, USA

Collapse