1
|
Fu L, Shi S, Yi J, Wang N, He Y, Wu Z, Peng J, Deng Y, Wang W, Wu C, Lyu A, Zeng X, Zhao W, Hou T, Cao D. ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res 2024; 52:W422-W431. [PMID: 38572755 PMCID: PMC11223840 DOI: 10.1093/nar/gkae236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/10/2024] [Accepted: 03/21/2024] [Indexed: 04/05/2024] Open
Abstract
ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.
Collapse
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Shaohua Shi
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ningning Wang
- Xiangya Hospital of Central South University, Changsha, Hunan 410008, P.R. China
| | - Yuanhang He
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Jinfu Peng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
2
|
Stienstra CMK, Hebert L, Thomas P, Haack A, Guo J, Hopkins WS. Graphormer-IR: Graph Transformers Predict Experimental IR Spectra Using Highly Specialized Attention. J Chem Inf Model 2024; 64:4613-4629. [PMID: 38845400 DOI: 10.1021/acs.jcim.4c00378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Infrared (IR) spectroscopy is an important analytical tool in various chemical and forensic domains and a great deal of effort has gone into developing in silico methods for predicting experimental spectra. A key challenge in this regard is generating highly accurate spectra quickly to enable real-time feedback between computation and experiment. Here, we employ Graphormer, a graph neural network (GNN) transformer, to predict IR spectra using only simplified molecular-input line-entry system (SMILES) strings. Our data set includes 53,528 high-quality spectra, measured in five different experimental media (i.e., phases), for molecules containing the elements H, C, N, O, F, Si, S, P, Cl, Br, and I. When using only atomic numbers for node encodings, Graphormer-IR achieved a mean test spectral information similarity (SISμ) value of 0.8449 ± 0.0012 (n = 5), which surpasses that the current state-of-the-art model Chemprop-IR (SISμ = 0.8409 ± 0.0014, n = 5) with only 36% of the encoded information. Augmenting node embeddings with additional node-level descriptors in learned embeddings generated through a multilayer perceptron improves scores to SISμ = 0.8523 ± 0.0006, a total improvement of 19.7σ (t = 19). These improved scores show how Graphormer-IR excels in capturing long-range interactions like hydrogen bonding, anharmonic peak positions in experimental spectra, and stretching frequencies of uncommon functional groups. Scaling our architecture to 210 attention heads demonstrates specialist-like behavior for distinct IR frequencies that improves model performance. Our model utilizes novel architectures, including a global node for phase encoding, learned node feature embeddings, and a one-dimensional (1D) smoothing convolutional neural network (CNN). Graphormer-IR's innovations underscore its value over traditional message-passing neural networks (MPNNs) due to its expressive embeddings and ability to capture long-range intramolecular relationships.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Liam Hebert
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Patrick Thomas
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Jason Guo
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
3
|
Ma H, Pan SQ, Wang WL, Yue X, Xi XH, Yan S, Wu DY, Wang X, Liu G, Ren B. Surface-Enhanced Raman Spectroscopy: Current Understanding, Challenges, and Opportunities. ACS NANO 2024; 18:14000-14019. [PMID: 38764194 DOI: 10.1021/acsnano.4c02670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2024]
Abstract
While surface-enhanced Raman spectroscopy (SERS) has experienced substantial advancements since its discovery in the 1970s, it is an opportunity to celebrate achievements, consider ongoing endeavors, and anticipate the future trajectory of SERS. In this perspective, we encapsulate the latest breakthroughs in comprehending the electromagnetic enhancement mechanisms of SERS, and revisit CT mechanisms of semiconductors. We then summarize the strategies to improve sensitivity, selectivity, and reliability. After addressing experimental advancements, we comprehensively survey the progress on spectrum-structure correlation of SERS showcasing their important role in promoting SERS development. Finally, we anticipate forthcoming directions and opportunities, especially in deepening our insights into chemical or biological processes and establishing a clear spectrum-structure correlation.
Collapse
Affiliation(s)
- Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Si-Qi Pan
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, Xiamen University, Xiamen 361102, China
| | - Wei-Li Wang
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, Xiamen University, Xiamen 361102, China
| | - Xiaxia Yue
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Xiao-Han Xi
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Sen Yan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - De-Yin Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Xiang Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Guokun Liu
- State Key Laboratory of Marine Environmental Science, College of the Environment and Ecology, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, Xiamen University, Xiamen 361102, China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (i-ChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
4
|
Chen J, Schwaller P. Molecular hypergraph neural networks. J Chem Phys 2024; 160:144307. [PMID: 38597317 DOI: 10.1063/5.0193557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/14/2024] [Indexed: 04/11/2024] Open
Abstract
Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks. However, conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher order connections, such as multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNNs) to predict the optoelectronic properties of organic semiconductors, where hyperedges represent conjugated structures. A general algorithm is designed for irregular high-order connections, which can efficiently operate on molecular hypergraphs with hyperedges of various orders. The results show that MHNN outperforms all baseline models on most tasks of organic photovoltaic, OCELOT chromophore v1, and PCQM4Mv2 datasets. Notably, MHNN achieves this without any 3D geometric information, surpassing the baseline model that utilizes atom positions. Moreover, MHNN achieves better performance than pretrained GNNs under limited training data, underscoring its excellent data efficiency. This work provides a new strategy for more general molecular representations and property prediction tasks related to high-order connections.
Collapse
Affiliation(s)
- Junwu Chen
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
5
|
An H, Liu X, Cai W, Shao X. Explainable Graph Neural Networks with Data Augmentation for Predicting p Ka of C-H Acids. J Chem Inf Model 2024; 64:2383-2392. [PMID: 37706462 DOI: 10.1021/acs.jcim.3c00958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
The pKa of C-H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of pKa is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the pKa values of C-H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict pKa by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of pKa values when a specific atom was masked. This explainability was used to identify the key substituents for pKa. The model was evaluated on two data sets from the iBonD database. Dataset1 includes the experimental pKa values of C-H acids measured in DMSO, while dataset2 comprises the pKa values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
6
|
King-Smith E. Transfer learning for a foundational chemistry model. Chem Sci 2024; 15:5143-5151. [PMID: 38577363 PMCID: PMC10988575 DOI: 10.1039/d3sc04928k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 11/15/2023] [Indexed: 04/06/2024] Open
Abstract
Data-driven chemistry has garnered much interest concurrent with improvements in hardware and the development of new machine learning models. However, obtaining sufficiently large, accurate datasets of a desired chemical outcome for data-driven chemistry remains a challenge. The community has made significant efforts to democratize and curate available information for more facile machine learning applications, but the limiting factor is usually the laborious nature of generating large-scale data. Transfer learning has been noted in certain applications to alleviate some of the data burden, but this protocol is typically carried out on a case-by-case basis, with the transfer learning task expertly chosen to fit the finetuning. Herein, I develop a machine learning framework capable of accurate chemistry-relevant prediction amid general sources of low data. First, a chemical "foundational model" is trained using a dataset of ∼1 million experimental organic crystal structures. A task specific module is then stacked atop this foundational model and subjected to finetuning. This approach achieves state-of-the-art performance on a diverse set of tasks: toxicity prediction, yield prediction, and odor prediction.
Collapse
|
7
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
8
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
9
|
Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model 2024; 64:9-17. [PMID: 38147829 PMCID: PMC10777403 DOI: 10.1021/acs.jcim.3c01250] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/28/2023]
Abstract
Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Kevin P. Greenman
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Shih-Cheng Li
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, National Taiwan
University, Taipei 10617, Taiwan
| | - David E. Graff
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Chemical Biology, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - Haoyang Wu
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| |
Collapse
|
10
|
Tetsassi Feugmo CG. Accurately predicting molecular spectra with deep learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:918-919. [PMID: 38177595 DOI: 10.1038/s43588-023-00553-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
11
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
12
|
Cheng F, Yang C, Zhu H, Li Y, Lan L, Wang K. Semi-Supervised Deep Learning-Based Multi-component Spectral Calibration Modeling for UV-vis and Near-Infrared Spectroscopy without Information Loss. Anal Chem 2023; 95:13446-13455. [PMID: 37638661 DOI: 10.1021/acs.analchem.3c01132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
Spectral analysis is an important method for characterizing and identifying chemical species. However, quantitative spectral analysis of multiple chemical properties in the real world has always been a challenging problem due to the strong correlation, massive noise, and serious information overlapping of the spectral features. Here, we present a new semi-supervised spectral calibration method based on information lossless decoupling of spectral features named NICEM. To realize the separation and extraction of key latent features, the method uses the flow-based model non-linear independent component estimation (NICE) to learn the sample distribution. The spectral data information is transformed into independent latent variables obeying Gaussian distribution by the reversible structure of deep network without information loss, so as to find the essential properties and realize the feature nonlinear decomposition. Moreover, the association between the input latent feature variables and attributes is evaluated by the maximum mutual information coefficient to eliminate the adverse effects of irrelevant information in the latent variable space and mine key information. Since the latent variables are independent in each dimension, the NICEM method is easier to establish an accurate semi-supervised multi-component calibration model even for high overlapping and complex spectral data. The applicability of the proposed spectral modeling method is demonstrated by using three ultraviolet-visible and near-infrared spectral data sets with 15 physical and chemical properties including diesel fuels, corn, and multi-metal ions solution. Results show that the proposed NICEM method has the highest determination coefficient (R2) and significantly improves extrapolation compared with the seven state-of-the-art methods. The proposed method is intuitive because it obviates complex feature engineering and prior knowledge and is a promising spectral calibration tool for quantitative analysis in other spectroscopy applications.
Collapse
Affiliation(s)
- Fei Cheng
- School of Automation, Central South University, Changsha 410083, China
| | - Chunhua Yang
- School of Automation, Central South University, Changsha 410083, China
| | - Hongqiu Zhu
- School of Automation, Central South University, Changsha 410083, China
| | - Yonggang Li
- School of Automation, Central South University, Changsha 410083, China
| | - Lijuan Lan
- School of Automation, Central South University, Changsha 410083, China
| | - Kai Wang
- School of Automation, Central South University, Changsha 410083, China
| |
Collapse
|
13
|
Biswas S, Chung Y, Ramirez J, Wu H, Green WH. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. J Chem Inf Model 2023; 63:4574-4588. [PMID: 37487557 DOI: 10.1021/acs.jcim.3c00546] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.
Collapse
Affiliation(s)
- Sayandeep Biswas
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Josephine Ramirez
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
14
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
15
|
McNaughton AD, Joshi RP, Knutson CR, Fnu A, Luebke KJ, Malerich JP, Madrid PB, Kumar N. Machine Learning Models for Predicting Molecular UV-Vis Spectra with Quantum Mechanical Properties. J Chem Inf Model 2023; 63:1462-1471. [PMID: 36847578 DOI: 10.1021/acs.jcim.2c01662] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
Accurate understanding of ultraviolet-visible (UV-vis) spectra is critical for the high-throughput synthesis of compounds for drug discovery. Experimentally determining UV-vis spectra can become expensive when dealing with a large quantity of novel compounds. This provides us an opportunity to drive computational advances in molecular property predictions using quantum mechanics and machine learning methods. In this work, we use both quantum mechanically (QM) predicted and experimentally measured UV-vis spectra as input to devise four different machine learning architectures, UVvis-SchNet, UVvis-DTNN, UVvis-Transformer, and UVvis-MPNN, and assess the performance of each method. We find that the UVvis-MPNN model outperforms the other models when using optimized 3D coordinates and QM predicted spectra as input features. This model has the highest performance for predicting UV-vis spectra with a training RMSE of 0.06 and validation RMSE of 0.08. Most importantly, our model can be used for the challenging task of predicting differences in the UV-vis spectral signatures of regioisomers.
Collapse
Affiliation(s)
- Andrew D McNaughton
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Rajendra P Joshi
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Carter R Knutson
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Anubhav Fnu
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Kevin J Luebke
- SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025, United States
| | - Jeremiah P Malerich
- SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025, United States
| | - Peter B Madrid
- SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025, United States
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
16
|
Bougueroua S, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic graph theory for post-processing molecular dynamics trajectories. Mol Phys 2023. [DOI: 10.1080/00268976.2022.2162456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| | - Ylène Aboulfath
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| |
Collapse
|
17
|
Yang L, Chen P, He K, Wang R, Chen G, Shan G, Zhu L. Predicting bioconcentration factor and estrogen receptor bioactivity of bisphenol a and its analogues in adult zebrafish by directed message passing neural networks. ENVIRONMENT INTERNATIONAL 2022; 169:107536. [PMID: 36152365 DOI: 10.1016/j.envint.2022.107536] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
The bioconcentration factor (BCF) is a key parameter for bioavailability assessment of environmental pollutants in regulatory frameworks. The comparative toxicology and mechanism of action of congeners are also of concern. However, there are limitations to acquire them by conducting field and laboratory experiments while machinelearning is emerging as a promising predictive tool to fill the gap. In this study, the Direct Message Passing Neural Network (DMPNN) was applied to predict logBCFs of bisphenol A (BPA) and its four analogues (bisphenol AF (BPAF), bisphenol B (BPB), bisphenol F (BPF) and bisphenol S (BPS)). For the test set, the Pearson correlation coefficient (PCC) and mean square error (MSE) were 0.85 and 0.52 respectively, suggesting a good predictive performance. The predicted logBCFs values by the DMPNN ranging from 0.35 (BPS) to 2.14 (BPAF) coincided well with those by the classical EPI Suite (BCFBAF model). Besides, estrogen receptor α (ERα) bioactivity of these bisphenols was also predicted well by the DMPNN, with a probability of 97.0 % (BPB) to 99.7 % (BPAF), which was validated by the extent of vitellogenin (VTG) induction in male zebrafish as a biomarker except BPS. Thus, with little need for expert knowledge, DMPNN is confirmed to be a useful tool to accurately predict logBCF and screen for estrogenic activity from molecular structures. Moreover, a gender difference was noted in the changes of three endpoints (logBCF, ER binding affinity and VTG levels), the rank order of which was BPAF > BPB > BPA > BPF > BPS consistently, and abnormal amino acid metabolism is featured as an omics signature of abnormal hormone protein expression.
Collapse
Affiliation(s)
- Liping Yang
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Pengyu Chen
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China; College of Oceanography, Hohai University, Nanjing 210098, China
| | - Keyan He
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Ruihan Wang
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Guoqiang Shan
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China.
| | - Lingyan Zhu
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| |
Collapse
|
18
|
Johnson MS, Dong X, Grinberg Dana A, Chung Y, Farina D, Gillis RJ, Liu M, Yee NW, Blondal K, Mazeau E, Grambow CA, Payne AM, Spiekermann KA, Pang HW, Goldsmith CF, West RH, Green WH. RMG Database for Chemical Property Prediction. J Chem Inf Model 2022; 62:4906-4915. [PMID: 36222558 DOI: 10.1021/acs.jcim.2c00965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21 000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.
Collapse
Affiliation(s)
- Matthew S Johnson
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Alon Grinberg Dana
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States.,The Wolfson Department of Chemical Engineering, Grand Technion Energy Program (GTEP), Technion─Israel Institute of Technology, Haifa3200003, Israel
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - David Farina
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - Ryan J Gillis
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Mengjie Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Nathan W Yee
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Katrin Blondal
- School of Engineering, Brown University, Providence, Rhode Island02912, United States
| | - Emily Mazeau
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - Colin A Grambow
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - A Mark Payne
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Hao-Wei Pang
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - C Franklin Goldsmith
- School of Engineering, Brown University, Providence, Rhode Island02912, United States
| | - Richard H West
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| |
Collapse
|
19
|
Retention Time Prediction with Message-Passing Neural Networks. SEPARATIONS 2022. [DOI: 10.3390/separations9100291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
Collapse
|
20
|
Spiekermann K, Pattanaik L, Green WH. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci Data 2022; 9:417. [PMID: 35851390 PMCID: PMC9293986 DOI: 10.1038/s41597-022-01529-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/30/2022] [Indexed: 12/13/2022] Open
Abstract
Quantitative chemical reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are especially rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calculations for nearly 22,000 unique stable species and transition states. We report the results from these quantum chemistry calculations and extract the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol−1) relative to those calculated at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coefficients \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${k}_{\infty }(T)$$\end{document}k∞(T) between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quantitative reaction prediction. Measurement(s) | Barrier Heights • Enthalpies • Rate Coefficients | Technology Type(s) | ab initio quantum chemistry computational method |
Collapse
Affiliation(s)
- Kevin Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
| |
Collapse
|
21
|
Chen P, Wang R, Chen G, An B, Liu M, Wang Q, Tao Y. Thyroid endocrine disruption and hepatotoxicity induced by bisphenol AF: Integrated zebrafish embryotoxicity test and deep learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 822:153639. [PMID: 35131240 DOI: 10.1016/j.scitotenv.2022.153639] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 01/28/2022] [Accepted: 01/29/2022] [Indexed: 06/14/2023]
Abstract
Bisphenol AF (BPAF) is an emerging contaminant prevalent in the environment as one of main substitutes of bisphenol A (BPA). It was found that BPAF exhibited estrogenic effects in zebrafish larvae in our previous study, while little is known about its effects on the thyroid and liver. A 7 d zebrafish embryotoxicity test was conducted to study the potential thyroid disruption and hepatotoxicity of BPAF. BPAF decreased levels of thyroid hormones and deiodinases but increased expressions of transthyretin at 12.5 and 125 μg/L after 7 d exposure, indicating that both the metabolism and transport of thyroid hormones were perturbed. The thyroid hormone receptor (TR) levels decreased significantly upon exposure to ≥12.5 μg/L BPAF, implying that BPAF acts as a TR antagonist, which coincided well with the prediction from the Direct Message Passing Neural Network. The liver impairment (mainly cell necrosis of hepatocytes) and apoptosis were triggered by 125 μg/L and ≥12.5 μg/L BPAF respectively, accompanied by the increased activities of caspase 3 and caspase 9. Thus BPAF might not be a safe alternative to BPA given the thyroid and liver toxicity. DMPNN appears useful to screen for thyroid disrupting activity from molecular structures.
Collapse
Affiliation(s)
- Pengyu Chen
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Ruihan Wang
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Baihui An
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Ming Liu
- College of Oceanography, Hohai University, Nanjing 210024, China
| | - Qiang Wang
- Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
| | - Yuqiang Tao
- College of Oceanography, Hohai University, Nanjing 210024, China.
| |
Collapse
|
22
|
Shao J, Liu Y, Yan J, Yan ZY, Wu Y, Ru Z, Liao JY, Miao X, Qian L. Prediction of Maximum Absorption Wavelength Using Deep Neural Networks. J Chem Inf Model 2022; 62:1368-1375. [PMID: 35290042 DOI: 10.1021/acs.jcim.1c01449] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Fluorescent molecules are important tools in biological detection, and numerous efforts have been made to develop compounds to meet the desired photophysical properties. For example, tuning the wavelength allows an appropriate penetration depth with minimal interference from the autofluorescence/scattering for a better signal-to-noise contrast. However, there are limited guidelines to rationally design or computationally predict the optical properties from first principles, and factors like the solvent effects will make it more complicated. Herein, we established a database (SMFluo1) of 1181 solvated small-molecule fluorophores covering the ultraviolet-visible-near-infrared absorption window and developed new machine learning models based on deep neural networks for accurately predicting photophysical parameters. The optimal system was applied to 120 out-of-sample compounds, and it exhibited remarkable accuracy with a mean relative error of 1.52%. In this new paradigm, a deep learning algorithm is promising to complement conventional theoretical and experimental studies of fluorophores and to greatly accelerate the discovery of new dyes. Due to its simplicity and efficiency, data from newly developed fluorophores can be easily supplemented to this system to further improve the accuracy across various dye families.
Collapse
Affiliation(s)
- Jinning Shao
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| | - Yue Liu
- Center for Data Science, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Jiaqi Yan
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| | - Ze-Yi Yan
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Yangyang Wu
- Center for Data Science, Zhejiang University, Hangzhou, China 310058
| | - Zhongying Ru
- Center for Data Science, Zhejiang University, Hangzhou, China 310058.,Polytechnic Institute, Zhejiang University, Hangzhou, China 310058
| | - Jia-Yu Liao
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou, China 310018
| | - Xiaoye Miao
- Center for Data Science, Zhejiang University, Hangzhou, China 310058
| | - Linghui Qian
- Institute of Drug Metabolism and Pharmaceutical Analysis, Zhejiang Province Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Cancer Center, & Hangzhou Institute of Innovative Medicine, Zhejiang University, Hangzhou, China 310058
| |
Collapse
|