Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Glavatskikh M, Leguy J, Hunault G, Cauchy T, Da Mota B. Dataset's chemical diversity limits the generalizability of machine learning predictions. J Cheminform 2019;11:69. [PMID: 33430991 PMCID: PMC6852905 DOI: 10.1186/s13321-019-0391-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 01/18/2023] Open

For:	Glavatskikh M, Leguy J, Hunault G, Cauchy T, Da Mota B. Dataset's chemical diversity limits the generalizability of machine learning predictions. J Cheminform 2019;11:69. [PMID: 33430991 PMCID: PMC6852905 DOI: 10.1186/s13321-019-0391-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 01/18/2023] Open

Number

Cited by Other Article(s)

Zhu Y, Li M, Xu C, Lan Z. Quantum Chemistry Dataset with Ground- and Excited-state Properties of 450 Kilo Molecules. Sci Data 2024;11:948. [PMID: 39209851 PMCID: PMC11362161 DOI: 10.1038/s41597-024-03788-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open

Sarangi R, Maity S, Acharya A. Machine Learning Approach to Vertical Energy Gap in Redox Processes. J Chem Theory Comput 2024;20:6747-6755. [PMID: 39044422 PMCID: PMC11325558 DOI: 10.1021/acs.jctc.4c00715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024]

Terrones GG, Huang SP, Rivera MP, Yue S, Hernandez A, Kulik HJ. Metal-Organic Framework Stability in Water and Harsh Environments from Data-Driven Models Trained on the Diverse WS24 Data Set. J Am Chem Soc 2024;146:20333-20348. [PMID: 38984798 DOI: 10.1021/jacs.4c05879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]

Tempke R, Musho T. Autonomous generation of single photon emitting materials. NANOSCALE 2024;16:10239-10249. [PMID: 38726673 DOI: 10.1039/d3nr04944b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]

Raush E, Abagyan R, Totrov M. Efficient Generation of Conformer Ensembles Using Internal Coordinates and a Generative Directional Graph Convolution Neural Network. J Chem Theory Comput 2024;20:4054-4063. [PMID: 38669307 DOI: 10.1021/acs.jctc.4c00280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]

Lee AS, Elliott S, Harb H, Ward L, Foster I, Curtiss L, Assary RS. E_min: A First-Principles Thermochemical Descriptor for Predicting Molecular Synthesizability. J Chem Inf Model 2024;64:1277-1289. [PMID: 38359461 DOI: 10.1021/acs.jcim.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]

Viswanathan K, Goel M, Laghuvarapu S, Varma G, Priyakumar UD. Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines. Sci Rep 2023;13:21069. [PMID: 38030689 PMCID: PMC10686981 DOI: 10.1038/s41598-023-42952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 09/16/2023] [Indexed: 12/01/2023] Open

Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023;19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]

Li CH, Tabor DP. Generative organic electronic molecular design informed by quantum chemistry. Chem Sci 2023;14:11045-11055. [PMID: 37860647 PMCID: PMC10583709 DOI: 10.1039/d3sc03781a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 09/11/2023] [Indexed: 10/21/2023] Open

Nakata M, Maeda T. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations. J Chem Inf Model 2023;63:5734-5754. [PMID: 37677147 DOI: 10.1021/acs.jcim.3c00899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]

Abstract

The presented "PubChemQC B3LYP/6-31G*//PM6" data set is composed of the electronic properties of 85,938,443 molecules, encompassing a broad spectrum of molecules from essential compounds to biomolecules with a molecular weight up to 1000. These molecules account for 94.0% of the original PubChem Compound catalog as of August 29, 2016. The electronic properties, including orbitals, orbital energies, total energies, dipole moments, and other pertinent properties, were computed by using the B3LYP/6-31G* and PM6 methods. The data set, available in three formats, namely, GAMESS quantum chemistry program files, selected JSON output files, and a PostgreSQL database, provides researchers with the ability to query molecular properties. It is further subdivided into five subdata sets for more specific data. The first two subsets encompass molecules with carbon, hydrogen, oxygen, and nitrogen with molecular weights under 300 and 500, respectively. The third and fourth subsets incorporate molecules with carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, fluorine, and chlorine, with molecular weights under 300 and 500, respectively. The fifth subset comprises molecules with carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, fluorine, chlorine, sodium, potassium, magnesium, and calcium, with a molecular weight of under 500. The coefficients of determination for the highest occupied molecular orbital-lowest unoccupied molecular orbital energy gap range from 0.892 (for CHON500) to 0.803 (for the whole data set). These comprehensive results pave the way for applications in drug discovery and materials science, among others. The data sets can be accessed under the Creative Commons Attribution 4.0 International license at the following web address: https://nakatamaho.riken.jp/pubchemqc.riken.jp/b3lyp_pm6_datasets.html.

Collapse

Morgan JP, Paiement A, Klinke C. Domain-informed graph neural networks: A quantum chemistry case study. Neural Netw 2023;165:938-952. [PMID: 37453397 DOI: 10.1016/j.neunet.2023.06.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 05/05/2023] [Accepted: 06/24/2023] [Indexed: 07/18/2023]

Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Zhao Q, Vaddadi SM, Woulfe M, Ogunfowora LA, Garimella SS, Isayev O, Savoie BM. Comprehensive exploration of graphically defined reaction spaces. Sci Data 2023;10:145. [PMID: 36935430 PMCID: PMC10025260 DOI: 10.1038/s41597-023-02043-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/27/2023] [Indexed: 03/21/2023] Open

Belenahalli Shekarappa S, Kandagalla S, Lee J. Development of machine learning models based on molecular fingerprints for selection of small molecule inhibitors against JAK2 protein. J Comput Chem 2023;44:1493-1504. [PMID: 36929511 DOI: 10.1002/jcc.27103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/18/2023] [Accepted: 02/24/2023] [Indexed: 03/18/2023]

Guo J, Sun M, Zhao X, Shi C, Su H, Guo Y, Pu X. General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation. J Chem Inf Model 2023;63:1143-1156. [PMID: 36734616 DOI: 10.1021/acs.jcim.2c01538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Abstract

Cocrystal engineering as an effective way to modify solid-state properties has inspired great interest from diverse material fields while cocrystal density is an important property closely correlated with the material function. In order to accurately predict the cocrystal density, we develop a graph neural network (GNN)-based deep learning framework by considering three key factors of machine learning (data quality, feature presentation, and model architecture). The result shows that different stoichiometric ratios of molecules in cocrystals can significantly influence the prediction performances, highlighting the importance of data quality. In addition, the feature complementary is not suitable for augmenting the molecular graph representation in the cocrystal density prediction, suggesting that the complementary strategy needs to consider whether extra features can sufficiently supplement the lacked information in the original representation. Based on these results, 4144 cocrystals with 1:1 stoichiometry ratio are selected as the dataset, supplemented by the data augmentation of exchanging a pair of coformers. The molecular graph is determined to learn feature representation to train the GNN-based model. Global attention is introduced to further optimize the feature space and identify important atoms to realize the interpretability of the model. Benefited from the advantages, our model significantly outperforms three competitive models and exhibits high prediction accuracy for unseen cocrystals, showcasing its robustness and generality. Overall, our work not only provides a general cocrystal density prediction tool for experimental investigations but also provides useful guidelines for the machine learning application. All source codes are freely available at https://github.com/Xiao-Gua00/CCPGraph.

Collapse

Kondratyev V, Dryzhakov M, Gimadiev T, Slutskiy D. Generative model based on junction tree variational autoencoder for HOMO value prediction and molecular optimization. J Cheminform 2023;15:11. [PMID: 36732800 PMCID: PMC9893566 DOI: 10.1186/s13321-023-00681-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/06/2023] [Indexed: 02/04/2023] Open

Kříž K, Schmidt L, Andersson AT, Walz MM, van der Spoel D. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. J Chem Inf Model 2023;63:412-431. [PMID: 36630710 PMCID: PMC9875315 DOI: 10.1021/acs.jcim.2c01127] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Indexed: 01/12/2023]

Xia S, Zhang D, Zhang Y. Multitask Deep Ensemble Prediction of Molecular Energetics in Solution: From Quantum Mechanics to Experimental Properties. J Chem Theory Comput 2023;19:10.1021/acs.jctc.2c01024. [PMID: 36607141 PMCID: PMC10323048 DOI: 10.1021/acs.jctc.2c01024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Alygizakis N, Giannakopoulos T, Τhomaidis NS, Slobodnik J. Detecting the sources of chemicals in the Black Sea using non-target screening and deep learning convolutional neural networks. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022;847:157554. [PMID: 35878861 DOI: 10.1016/j.scitotenv.2022.157554] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 07/17/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]

Abstract

The Black Sea is an important ecosystem, which is affected by various anthropogenic pressures, such as shipping activities and wastewater inputs from large coastal cities. Significant loads of chemical pollutants are being continuously brought in by major European rivers. This study investigated the spatial distribution of chemicals in the Ukrainian shelf (the northwestern part of the Black Sea) and their main sources. Chemical occurrence data used in the study was generated within the Joint Black Sea Surveys (JBSS), which took place in 2016 and 2017 as a part of the EU/UNDP EMBLAS II project (www.emblasproject.org). During the JBSS, seawater samples were analyzed by a non-target screening workflow using liquid chromatography high-resolution mass spectrometry (LC-HRMS). Open-source algorithms were applied to generate a combined dataset of 30,489 detected chemical signals and their intensities. Out of these, 35 compounds were tentatively identified by the application of a non-target screening identification workflow based on automated matching of their mass spectra against those in available mass spectral libraries. The dataset was used to generate images, representing spatial distribution of each of the signals. These images were then used as an input to a deep learning convolutional neural network classification model. The study resulted in the development of an open-source end-to-end workflow for the estimation of the pollution load by chemicals contributed by the two major inflowing rivers (Danube and Dnieper) and other, so far unidentified, sources. A dedicated dashboard was built to facilitate data visualization per detected signal/compound. The presented model proved to be especially useful at the prioritization of signals of unknown compounds, which is of key importance for the follow up structure elucidation efforts of bulky non-target screening data. The deep learning approach for peak prioritization of unknown chemicals in the environment has been used for the first time.

Collapse

Rahman ASMZ, Liu C, Sturm H, Hogan AM, Davis R, Hu P, Cardona ST. A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery. PLoS Comput Biol 2022;18:e1010613. [PMID: 36228001 PMCID: PMC9624395 DOI: 10.1371/journal.pcbi.1010613] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 11/01/2022] [Accepted: 09/26/2022] [Indexed: 01/24/2023] Open

Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022;20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open

Singh K, Münchmeyer J, Weber L, Leser U, Bande A. Graph Neural Networks for Learning Molecular Excitation Spectra. J Chem Theory Comput 2022;18:4408-4417. [PMID: 35671364 DOI: 10.1021/acs.jctc.2c00255] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Isert C, Atz K, Jiménez-Luna J, Schneider G. QMugs, quantum mechanical properties of drug-like molecules. Sci Data 2022;9:273. [PMID: 35672335 PMCID: PMC9174255 DOI: 10.1038/s41597-022-01390-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 05/17/2022] [Indexed: 12/16/2022] Open

Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022;7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]

Autonomous design of new chemical reactions using a variational autoencoder. Commun Chem 2022;5:40. [PMID: 36697652 PMCID: PMC9814385 DOI: 10.1038/s42004-022-00647-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/16/2022] [Indexed: 01/28/2023] Open

Jacobson LD, Stevenson JM, Ramezanghorbani F, Ghoreishi D, Leswing K, Harder ED, Abel R. Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions. J Chem Theory Comput 2022;18:2354-2366. [PMID: 35290063 DOI: 10.1021/acs.jctc.1c00821] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Gebauer NWA, Gastegger M, Hessmann SSP, Müller KR, Schütt KT. Inverse design of 3d molecular structures with conditional generative neural networks. Nat Commun 2022;13:973. [PMID: 35190542 PMCID: PMC8861047 DOI: 10.1038/s41467-022-28526-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open

Shi X, Lin X, Luo R, Wu S, Li L, Zhao ZJ, Gong J. Dynamics of Heterogeneous Catalytic Processes at Operando Conditions. JACS AU 2021;1:2100-2120. [PMID: 34977883 PMCID: PMC8715484 DOI: 10.1021/jacsau.1c00355] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Indexed: 05/02/2023]

Affiliation(s)

Xiangcheng Shi Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, China
Xiaoyun Lin Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
Ran Luo Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
Shican Wu Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
Lulu Li Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
Zhi-Jian Zhao Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
Jinlong Gong Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China Joint School of National University of Singapore and Tianjin University, International Campus of Tianjin University, Fuzhou 350207, China

Collapse

Busk J, Bjørn Jørgensen P, Bhowmik A, Schmidt MN, Winther O, Vegge T. Calibrated uncertainty for molecular property prediction using ensembles of message passing neural networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/ac3eb3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Meng F, Xi Y, Huang J, Ayers PW. A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors. Sci Data 2021;8:289. [PMID: 34716354 PMCID: PMC8556334 DOI: 10.1038/s41597-021-01069-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/22/2021] [Indexed: 01/31/2023] Open

Leguy J, Glavatskikh M, Cauchy T, Da Mota B. Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization. J Cheminform 2021;13:76. [PMID: 34600576 PMCID: PMC8487551 DOI: 10.1186/s13321-021-00554-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/15/2021] [Indexed: 01/21/2023] Open

Sattari K, Xie Y, Lin J. Data-driven algorithms for inverse design of polymers. SOFT MATTER 2021;17:7607-7622. [PMID: 34397078 DOI: 10.1039/d1sm00725d] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021;121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 171] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]

Huang B, von Lilienfeld OA. Ab Initio Machine Learning in Chemical Compound Space. Chem Rev 2021;121:10001-10036. [PMID: 34387476 PMCID: PMC8391942 DOI: 10.1021/acs.chemrev.0c01303] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Indexed: 12/11/2022]

Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021. [PMID: 33211478 DOI: 10.1021/acs.chemrev.1020c00749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Kerner J, Dogan A, von Recum H. Machine learning and big data provide crucial insight for future biomaterials discovery and research. Acta Biomater 2021;130:54-65. [PMID: 34087445 DOI: 10.1016/j.actbio.2021.05.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 02/06/2023]

Abstract

Machine learning have been widely adopted in a variety of fields including engineering, science, and medicine revolutionizing how data is collected, used, and stored. Their implementation has led to a drastic increase in the number of computational models for the prediction of various numerical, categorical, or association events given input variables. We aim to examine recent advances in the use of machine learning when applied to the biomaterial field. Specifically, quantitative structure properties relationships offer the unique ability to correlate microscale molecular descriptors to larger macroscale material properties. These new models can be broken down further into four categories: regression, classification, association, and clustering. We examine recent approaches and new uses of machine learning in the three major categories of biomaterials: metals, polymers, and ceramics for rapid property prediction and trend identification. While current research is promising, limitations in the form of lack of standardized reporting and available databases complicates the implementation of described models. Herein, we hope to provide a snapshot of the current state of the field and a beginner's guide to navigating the intersection of biomaterials research and machine learning. STATEMENT OF SIGNIFICANCE: Machine learning and its methods have found a variety of uses beyond the field of computer science but have largely been neglected by those in realm of biomaterials. Through the use of more computational methods, biomaterials development can be expediated while reducing the need for standard trial and error methods. Within, we introduce four basic models that readers can potentially apply to their current research as well as current applications within the field. Furthermore, we hope that this article may act as a "call to action" for readers to realize and address the current lack of implementation within the biomaterials field.

Collapse

Gawriljuk VO, Foil DH, Puhl AC, Zorn KM, Lane TR, Riabova O, Makarov V, Godoy AS, Oliva G, Ekins S. Development of Machine Learning Models and the Discovery of a New Antiviral Compound against Yellow Fever Virus. J Chem Inf Model 2021;61:3804-3813. [PMID: 34286575 DOI: 10.1021/acs.jcim.1c00460] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Vazquez-Salazar LI, Boittier ED, Unke OT, Meuwly M. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. J Chem Theory Comput 2021;17:4769-4785. [PMID: 34288675 DOI: 10.1021/acs.jctc.1c00363] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Abstract

An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules in Tautobase. For this, characteristics such as the number of heavy atoms in a molecule, number of atoms of a given element, bond composition, or initial geometry on the quality of the predictions are considered. The results indicate that training on a chemically diverse database is crucial for obtaining good results and also that conformational sampling can partly compensate for limited coverage of chemical diversity. The overall best-performing reference database (ANI-1x) performs on average by 1 kcal/mol better than PC9, which, however, contains about 2 orders of magnitude fewer reference structures. On the other hand, PC9 is chemically more diverse by a factor of ∼5 as quantified by the number of atom-in-molecule-based fragments (amons) it contains compared with the ANI family of databases. A quantitative measure for deficiencies is the Kullback-Leibler divergence between reference and target distributions. It is explicitly demonstrated that when certain types of bonds need to be covered in the target database (Tautobase) but are undersampled in the reference databases, the resulting predictions are poor. Examples of this include the poor performance of all databases analyzed to predict C(sp²)-C(sp²) double bonds close to heteroatoms and azoles containing N-N and N-O bonds. Analysis of the results with a Tree MAP algorithm provides deeper understanding of specific deficiencies in predicting tautomerization energies by the reference datasets due to inadequate coverage of chemical space. Capitalizing on this information can be used to either improve existing databases or generate new databases of sufficient diversity for a range of machine learning (ML) applications in chemistry.

Collapse

Häse F, Aldeghi M, Hickman RJ, Roch LM, Christensen M, Liles E, Hein JE, Aspuru-Guzik A. Olympus: a benchmarking framework for noisy optimization and experiment planning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abedc8] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Stuke A, Rinke P, Todorović M. Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abee59] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Lu J, Xia S, Lu J, Zhang Y. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. J Chem Inf Model 2021;61:1095-1104. [PMID: 33683885 DOI: 10.1021/acs.jcim.1c00007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Shen WX, Zeng X, Zhu F, Wang YL, Qin C, Tan Y, Jiang YY, Chen YZ. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00301-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Koge D, Ono N, Huang M, Altaf‐Ul‐Amin M, Kanaya S. Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning. Mol Inform 2021;40:e2000203. [PMID: 33164295 PMCID: PMC7900996 DOI: 10.1002/minf.202000203] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/29/2020] [Indexed: 11/06/2022]

Leguy J, Cauchy T, Glavatskikh M, Duval B, Da Mota B. EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation. J Cheminform 2020;12:55. [PMID: 33431049 PMCID: PMC7494000 DOI: 10.1186/s13321-020-00458-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open

Smith DGA, Altarawy D, Burns LA, Welborn M, Naden LN, Ward L, Ellis S, Pritchard BP, Crawford TD. The MolSSI QCA rchive project: An open‐source platform to compute, organize, and share quantum chemistry data. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1491] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Mancuso JL, Mroz AM, Le KN, Hendon CH. Electronic Structure Modeling of Metal-Organic Frameworks. Chem Rev 2020;120:8641-8715. [PMID: 32672939 DOI: 10.1021/acs.chemrev.0c00148] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Abstract

Owing to their molecular building blocks, yet highly crystalline nature, metal-organic frameworks (MOFs) sit at the interface between molecule and material. Their diverse structures and compositions enable them to be useful materials as catalysts in heterogeneous reactions, electrical conductors in energy storage and transfer applications, chromophores in photoenabled chemical transformations, and beyond. In all cases, density functional theory (DFT) and higher-level methods for electronic structure determination provide valuable quantitative information about the electronic properties that underpin the functions of these frameworks. However, there are only two general modeling approaches in conventional electronic structure software packages: those that treat materials as extended, periodic solids, and those that treat materials as discrete molecules. Each approach has features and benefits; both have been widely employed to understand the emergent chemistry that arises from the formation of the metal-organic interface. This Review canvases these approaches to date, with emphasis placed on the application of electronic structure theory to explore reactivity and electron transfer using periodic, molecular, and embedded models. This includes (i) computational chemistry considerations such as how functional, k-grid, and other model variables are selected to enable insights into MOF properties, (ii) extended solid models that treat MOFs as materials rather than molecules, (iii) the mechanics of cluster extraction and subsequent chemistry enabled by these molecular models, (iv) catalytic studies using both solids and clusters thereof, and (v) embedded, mixed-method approaches, which simulate a fraction of the material using one level of theory and the remainder of the material using another dissimilar theoretical implementation.

Collapse

Rauer C, Bereau T. Hydration free energies from kernel-based machine learning: Compound-database bias. J Chem Phys 2020;153:014101. [DOI: 10.1063/5.0012230] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Dral PO. Quantum Chemistry in the Age of Machine Learning. J Phys Chem Lett 2020;11:2336-2347. [PMID: 32125858 DOI: 10.1021/acs.jpclett.9b03664] [Citation(s) in RCA: 191] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Schindl A, Hawker RR, Schaffarczyk McHale KS, Liu KTC, Morris DC, Hsieh AY, Gilbert A, Prescott SW, Haines RS, Croft AK, Harper JB, Jäger CM. Controlling the outcome of S_N2 reactions in ionic liquids: from rational data set design to predictive linear regression models. Phys Chem Chem Phys 2020;22:23009-23018. [PMID: 33043942 DOI: 10.1039/d0cp04224b] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]