1051
|
Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery. REMOTE SENSING 2020. [DOI: 10.3390/rs12081289] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area. We examined, in particular, if the distorted geometric information, in addition to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this regard, we trained a fully convolutional neural network that uses generalized sparse convolution one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching), and twice on 3D geometric as well as color information. In the first experiment, we did not use class weights, whereas in the second we did. We compared the results with a fully convolutional neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color features. The decision tree using hand-crafted features has been successfully applied to aerial laser scanning data in the literature. Hence, we compared our main interest of study, a representation learning technique, with another representation learning technique, and a non-representation learning technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our study area, we reported that geometric and color information only improves the performance of the Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a higher overall performance in our case. We also found that training the network with median class weighting partially reverts the effects of adding color. The network also started to learn the classes with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto generally outperforms the other two with a kappa score of over 90% and an average per class accuracy of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads.
Collapse
|
1052
|
Jin H, Zhang H, Li J, Wang T, Wan L, Guo H, Wei Y. Discovery of Novel Two-Dimensional Photovoltaic Materials Accelerated by Machine Learning. J Phys Chem Lett 2020; 11:3075-3081. [PMID: 32239944 DOI: 10.1021/acs.jpclett.0c00721] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Searching for novel, high-performance, two-dimensional photovoltaic (2DPV) materials is an important pursuit for solar cell applications. In this work, an efficient method based on the machine learning algorithm combined with high-throughput screening is developed. Twenty-six 2DPV candidates are successfully ruled out from 187093 experimentally identified inorganic crystal structures, whose conversion efficiencies are predicted by density functional theory calculations. Our results indicate that Sb2Se2Te, Sb2Te3, and Bi2Se3 exhibit conversion efficiencies that are much higher than those of others, which make them promising 2DPV candidates for further applications. The superior photovoltaic performance is then analyzed, and the hidden structure-related relationships with photovoltaic properties are established, thus providing important information for the further examination of 2DPV materials. Given the rapid development of the database of materials, this approach not only provides an efficient way of searching for novel 2DPV materials but also can be applied to exploration of a broad range of functional materials.
Collapse
Affiliation(s)
- Hao Jin
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| | - Huijun Zhang
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| | - Jianwei Li
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| | - Tao Wang
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| | - Langhui Wan
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| | - Hong Guo
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
- Centre for the Physics of Materials and Department of Physics, McGill University, H3A 2T8 Montréal, Canada
| | - Yadong Wei
- Shenzhen Key Laboratory of Advanced Thin Films and Applications, College of Physics and Optoelectronic Engineering, Shenzhen University, 518060 Shenzhen, P. R. China
| |
Collapse
|
1053
|
Wang X, Ye S, Hu W, Sharman E, Liu R, Liu Y, Luo Y, Jiang J. Electric Dipole Descriptor for Machine Learning Prediction of Catalyst Surface-Molecular Adsorbate Interactions. J Am Chem Soc 2020; 142:7737-7743. [PMID: 32297511 DOI: 10.1021/jacs.0c01825] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The challenge of evaluating catalyst surface-molecular adsorbate interactions holds the key for rational design of catalysts. Finding an experimentally measurable and theoretically computable descriptor for evaluating surface-adsorbate interactions is a significant step toward achieving this goal. Here we show that the electric dipole moment can serve as a convenient yet accurate descriptor for establishing structure-property relationships for molecular adsorbates on metal catalyst surfaces. By training a machine learning neural network with a large data set of first-principles calculations, we achieve quick and accurate predictions of molecular adsorption energy and transferred charge. The training model using NO/CO@Au(111) can be extended to study additional substrates such as Au(001) or Ag(111), thus exhibiting extraordinary transferability. These findings validate the effectiveness of the electric dipole descriptor, providing an efficient modality for future catalyst design.
Collapse
Affiliation(s)
- Xijun Wang
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China.,Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27606, United States
| | - Sheng Ye
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Wei Hu
- Shandong Provincial Key Laboratory of Molecular Engineering, School of Chemistry and Pharmaceutical Engineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong 250353, People's Republic of China
| | - Edward Sharman
- Department of Neurology, University of California, Irvine, California 92697, United States
| | - Ran Liu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Yan Liu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Yi Luo
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, Anhui 230026, People's Republic of China
| |
Collapse
|
1054
|
Machine learning as a tool to design glasses with controlled dissolution for healthcare applications. Acta Biomater 2020; 107:286-298. [PMID: 32114183 DOI: 10.1016/j.actbio.2020.02.037] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/04/2020] [Accepted: 02/24/2020] [Indexed: 01/30/2023]
Abstract
The advancement of glass science has played a pivotal role in enhancing the quality and length of human life. However, with an ever-increasing demand for glasses in a variety of healthcare applications - especially with controlled degradation rates - it is becoming difficult to design new glass compositions using conventional approaches. For example, it is difficult, if not impossible, to design new gene-activation bioactive glasses, with controlled release of functional ions tailored for specific patient states, using trial-and-error based approaches. Notwithstanding, it is possible to design new glasses with controlled release of functional ions by using artificial intelligence-based methods, for example, supervised machine learning (ML). In this paper, we present an ensemble ML model for reliable prediction of time- and composition-dependent dissolution behavior of a wide variety of oxide glasses relevant for various biomedical applications. A comprehensive database, comprising of over 1300 data-records consolidated from original glass dissolution experiments, has been used for training and subsequent testing of prediction performance of the ML model. Results demonstrate that the ensemble ML model can predict chemical degradation behavior of glasses in aqueous solutions over a wide range of pH relevant for their usage in a human body where the environment can be highly acidic (for example, pH = 3), for example, due to secretion of citric acid by osteoclasts, or highly alkaline (pH ≈10) due to the release of alkali cations from bioactive glasses. Outcomes of this study can be leveraged to design glasses with controlled dissolution behavior in various biological environments. STATEMENT OF SIGNIFICANCE: In this paper, we present an ensemble machine learning (ML) model for prediction of dissolution behavior of a wide variety of oxide glasses relevant for various biomedical applications. The results demonstrate that the ML model can predict the chemical degradation behavior of glasses in aqueous solutions over a wide range of pH relevant for their usage in a human body where the environment can be highly acidic (for example, pH = 3), for example, due to secretion of citric acid by osteoclasts, or highly alkaline (pH ≈10) due to the release of alkali cations from bioactive glasses. Outcomes of this study can be leveraged to design new biomedical glasses with controlled (desired) dissolution behavior in various biological environments.
Collapse
|
1055
|
Zhai Y, Caruso A, Gao S, Paesani F. Active learning of many-body configuration space: Application to the Cs+–water MB-nrg potential energy function as a case study. J Chem Phys 2020; 152:144103. [DOI: 10.1063/5.0002162] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Yaoguang Zhai
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Alessandro Caruso
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
| | - Sicun Gao
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
- Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
1056
|
Eng J, Penfold TJ. Understanding and Designing Thermally Activated Delayed Fluorescence Emitters: Beyond the Energy Gap Approximation. CHEM REC 2020; 20:831-856. [PMID: 32267093 DOI: 10.1002/tcr.202000013] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/13/2020] [Indexed: 11/08/2022]
Abstract
In this article recent progress in the development of molecules exhibiting Thermally Activated Delayed Fluorescence (TADF) is discussed with a particular focus upon their application as emitters in highly efficient organic light emitting diodes (OLEDs). The key aspects controlling the desirable functional properties, e. g. fast intersystem crossing, high radiative rate and unity quantum yield, are introduced with a particular focus upon the competition between the key requirements needed to achieve high performance OLEDs. The design rules required for organic and metal organic materials are discussed, and the correlation between them outlined. Recent progress towards understanding the influence of the interaction between a molecule and its environment are explained as is the role of the mechanism for excited state formation in OLEDs. Finally, all of these aspects are combined to discuss the ability to implement high level design rules for achieving higher quality materials for commercial applications. This article highlights the significant progress that has been made in recent years, but also outlines the significant challenges which persist to achieve a full understanding of the TADF mechanism and improve the stability and performance of these materials.
Collapse
Affiliation(s)
- Julien Eng
- Chemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Thomas J Penfold
- Chemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| |
Collapse
|
1057
|
Wang H, Xie Y, Li D, Deng H, Zhao Y, Xin M, Lin J. Rapid Identification of X-ray Diffraction Patterns Based on Very Limited Data by Interpretable Convolutional Neural Networks. J Chem Inf Model 2020; 60:2004-2011. [PMID: 32208721 DOI: 10.1021/acs.jcim.0c00020] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Large volumes of data from material characterizations call for rapid and automatic data analysis to accelerate materials discovery. Herein, we report a convolutional neural network (CNN) that was trained based on theoretical data and very limited experimental data for fast identification of experimental X-ray diffraction (XRD) patterns of metal-organic frameworks (MOFs). To augment the data for training the model, noise was extracted from experimental data and shuffled; then it was merged with the main peaks that were extracted from theoretical spectra to synthesize new spectra. For the first time, one-to-one material identification was achieved. Theoretical MOFs patterns (1012) were augmented to a whole data set of 72 864 samples. It was then randomly shuffled and split into training (58 292 samples) and validation (14 572 samples) data sets at a ratio of 4:1. For the task of discriminating, the optimized model showed the highest identification accuracy of 96.7% for the top 5 ranking on a test data set of 30 hold-out samples. Neighborhood component analysis (NCA) on the experimental XRD samples shows that the samples from the same material are clustered in groups in the NCA map. Analysis on the class activation maps of the last CNN layer further discloses the mechanism by which the CNN model successfully identifies individual MOFs from the XRD patterns. This CNN model trained by the data augmentation technique would not only open numerous potential applications for identifying XRD patterns for different materials, but also pave avenues to autonomously analyze data by other characterization tools such as FTIR, Raman, and NMR spectroscopies.
Collapse
|
1058
|
Wang Z, Han Y, Li J, He X. Combining the Fragmentation Approach and Neural Network Potential Energy Surfaces of Fragments for Accurate Calculation of Protein Energy. J Phys Chem B 2020; 124:3027-3035. [PMID: 32208716 DOI: 10.1021/acs.jpcb.0c01370] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Accurate and efficient all-atom quantum mechanical (QM) calculations for biomolecules still present a challenge to computational physicists and chemists. In this study, an extensible generalized molecular fractionation with a conjugate caps method combined with neural networks (NN-GMFCC) is developed for efficient QM calculation of protein energy. In the NN-GMFCC scheme, the total energy of a given protein is calculated by taking a proper combination of the high-precision neural network potential energies of all capped residues and overlapping conjugate caps. In addition, the two-body interaction energies of residue pairs are calculated by molecular mechanics (MM). With reference to the GMFCC/MM calculation at the ωB97XD/6-31G* level, the overall mean unsigned errors of the energy deviations and atomic force root-mean-squared errors calculated by NN-GMFCC are only 2.01 kcal/mol and 0.68 kcal/mol/Å, respectively, for 14 proteins (containing up to 13,728 atoms). Meanwhile, the NN-GMFCC approach is about 4 orders of magnitude faster than the GMFCC/MM method. The NN-GMFCC method could be systematically improved by inclusion of two-body QM interaction and multibody electronic polarization effect. Moreover, the NN-GMFCC approach can also be applied to other macromolecular systems such as DNA/RNA, and it is capable of providing a powerful and efficient approach for exploration of structures and functions of proteins with QM accuracy.
Collapse
Affiliation(s)
- Zhilong Wang
- Key Laboratory of Thin Film and Micro Fabrication, Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanqiang Han
- Key Laboratory of Thin Film and Micro Fabrication, Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jinjin Li
- Key Laboratory of Thin Film and Micro Fabrication, Ministry of Education, Department of Micro/Nano-electronics, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
1059
|
Langner S, Häse F, Perea JD, Stubhan T, Hauch J, Roch LM, Heumueller T, Aspuru-Guzik A, Brabec CJ. Beyond Ternary OPV: High-Throughput Experimentation and Self-Driving Laboratories Optimize Multicomponent Systems. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e1907801. [PMID: 32049386 DOI: 10.1002/adma.201907801] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 01/09/2020] [Indexed: 05/07/2023]
Abstract
Fundamental advances to increase the efficiency as well as stability of organic photovoltaics (OPVs) are achieved by designing ternary blends, which represents a clear trend toward multicomponent active layer blends. The development of high-throughput and autonomous experimentation methods is reported for the effective optimization of multicomponent polymer blends for OPVs. A method for automated film formation enabling the fabrication of up to 6048 films per day is introduced. Equipping this automated experimentation platform with a Bayesian optimization, a self-driving laboratory is constructed that autonomously evaluates measurements to design and execute the next experiments. To demonstrate the potential of these methods, a 4D parameter space of quaternary OPV blends is mapped and optimized for photostability. While with conventional approaches, roughly 100 mg of material would be necessary, the robot-based platform can screen 2000 combinations with less than 10 mg, and machine-learning-enabled autonomous experimentation identifies stable compositions with less than 1 mg.
Collapse
Affiliation(s)
- Stefan Langner
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander University Erlangen-Nürnberg, Martensstrasse 7, Erlangen, 91058, Germany
| | - Florian Häse
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, 02138, USA
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
| | - José Darío Perea
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander University Erlangen-Nürnberg, Martensstrasse 7, Erlangen, 91058, Germany
| | - Tobias Stubhan
- Forschungszentrum Jülich GmbH, Helmholtz-Institut Erlangen-Nürnberg for Renewable Energy (IEK-11), Immerwahrstraße 2, Erlangen, 91058, Germany
| | - Jens Hauch
- Forschungszentrum Jülich GmbH, Helmholtz-Institut Erlangen-Nürnberg for Renewable Energy (IEK-11), Immerwahrstraße 2, Erlangen, 91058, Germany
| | - Loïc M Roch
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, 02138, USA
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
| | - Thomas Heumueller
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander University Erlangen-Nürnberg, Martensstrasse 7, Erlangen, 91058, Germany
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, M5S 1M1, Canada
| | - Christoph J Brabec
- Institute of Materials for Electronics and Energy Technology (i-MEET), Department of Materials Science and Engineering, Friedrich-Alexander University Erlangen-Nürnberg, Martensstrasse 7, Erlangen, 91058, Germany
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
1060
|
Zhu X, Liu P, Ge Y, Wu R, Xue T, Sheng Y, Ai S, Tang K, Wen Y. MoS2/MWCNTs porous nanohybrid network with oxidase-like characteristic as electrochemical nanozyme sensor coupled with machine learning for intelligent analysis of carbendazim. J Electroanal Chem (Lausanne) 2020. [DOI: 10.1016/j.jelechem.2020.113940] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
1061
|
Konstantopoulos G, Koumoulos EP, Charitidis CA. Testing Novel Portland Cement Formulations with Carbon Nanotubes and Intrinsic Properties Revelation: Nanoindentation Analysis with Machine Learning on Microstructure Identification. NANOMATERIALS 2020; 10:nano10040645. [PMID: 32235614 PMCID: PMC7221838 DOI: 10.3390/nano10040645] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 03/27/2020] [Accepted: 03/27/2020] [Indexed: 02/04/2023]
Abstract
Nanoindentation was utilized as a non-destructive technique to identify Portland Cement hydration phases. Artificial Intelligence (AI) and semi-supervised Machine Learning (ML) were used for knowledge gain on the effect of carbon nanotubes to nanomechanics in novel cement formulations. Data labelling is performed with unsupervised ML with k-means clustering. Supervised ML classification is used in order to predict the hydration products composition and 97.6% accuracy was achieved. Analysis included multiple nanoindentation raw data variables, and required less time to execute than conventional single component probability density analysis (PDA). Also, PDA was less informative than ML regarding information exchange and re-usability of input in design predictions. In principle, ML is the appropriate science for predictive modeling, such as cement phase identification and facilitates the acquisition of precise results. This study introduces unbiased structure-property relations with ML to monitor cement durability based on cement phases nanomechanics compared to PDA, which offers a solution based on local optima of a multidimensional space solution. Evaluation of nanomaterials inclusion in composite reinforcement using semi-supervised ML was proved feasible. This methodology is expected to contribute to design informatics due to the high prediction metrics, which holds promise for the transfer learning potential of these models for studying other novel cement formulations.
Collapse
Affiliation(s)
- Georgios Konstantopoulos
- RNANO Lab—Research Unit of Advanced, Composite, Nano Materials & Nanotechnology, School of Chemical Engineering, National Technical University of Athens, GR-15773 Zographos Athens, Greece; (G.K.); (C.A.C.)
| | - Elias P. Koumoulos
- RNANO Lab—Research Unit of Advanced, Composite, Nano Materials & Nanotechnology, School of Chemical Engineering, National Technical University of Athens, GR-15773 Zographos Athens, Greece; (G.K.); (C.A.C.)
- Innovation in Research & Engineering Solutions (IRES), Boulevard Edmond Machtens 79/22, 1080 Brussels, Belgium
- Correspondence: or
| | - Costas A. Charitidis
- RNANO Lab—Research Unit of Advanced, Composite, Nano Materials & Nanotechnology, School of Chemical Engineering, National Technical University of Athens, GR-15773 Zographos Athens, Greece; (G.K.); (C.A.C.)
| |
Collapse
|
1062
|
He Y, Ye Z, Liu X, Wei Z, Qiu F, Li HF, Zheng Y, Ouyang D. Can machine learning predict drug nanocrystals? J Control Release 2020; 322:274-285. [PMID: 32234511 DOI: 10.1016/j.jconrel.2020.03.043] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 03/26/2020] [Accepted: 03/28/2020] [Indexed: 12/20/2022]
Abstract
Nanocrystals have exhibited great advantage for enhancing the dissolution rate of water insoluble drugs due to the reduced size to nanoscale. However, current pharmaceutical approaches for nanocrystals formulation development highly depend on the expert experience and trial-and-error attempts which remain time and resource consuming. In this research, we utilized machine learning techniques to predict the particle size and polydispersity index (PDI) of nanocrystals. Firstly, 910 nanocrystal size data and 341 PDI data by three preparation methods (ball wet milling (BWM) method, high-pressure homogenization (HPH) method and antisolvent precipitation (ASP) method) were collected for the construction of the prediction models. The results demonstrated that light gradient boosting machine (LightGBM) exhibited well performance for the nanocrystals size and PDI prediction with BWM and HPH methods, but relatively poor predictions for ASP method. The possible reasons for the poor prediction refer to low quality of data because of the poor reproducibility and instability of nanocrystals by ASP method, which also confirm that current commercialized products were mainly manufactured by BWM and HPH approaches. Notably, the contribution of the influence factors was ranked by the LightGBM, which demonstrated that milling time, cycle index and concentration of stabilizer are crucial factors for nanocrystals prepared by BWM, HPH and ASP, respectively. Furthermore, the model generalizations and prediction accuracies of LightGBM were confirmed experimentally by the newly prepared nanocrystals. In conclusion, the machine learning techniques can be successfully utilized for the predictions of nanocrystals prepared by BWM and HPH methods. Our research also reveals a new way for nanotechnology manufacture.
Collapse
Affiliation(s)
- Yuan He
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Xinyang Liu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Zhengjie Wei
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Fen Qiu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China
| | - Hai-Feng Li
- Institute of Applied Physics and Materials Engineering, University of Macau, Macau, China
| | - Ying Zheng
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China.
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences (ICMS), University of Macau, Macau, China.
| |
Collapse
|
1063
|
Dreßler C, Kabbe G, Brehm M, Sebastiani D. Dynamical matrix propagator scheme for large-scale proton dynamics simulations. J Chem Phys 2020; 152:114114. [PMID: 32199428 DOI: 10.1063/1.5140635] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
We derive a matrix formalism for the simulation of long range proton dynamics for extended systems and timescales. On the basis of an ab initio molecular dynamics simulation, we construct a Markov chain, which allows us to store the entire proton dynamics in an M × M transition matrix (where M is the number of oxygen atoms). In this article, we start from common topology features of the hydrogen bond network of good proton conductors and utilize them as constituent constraints of our dynamic model. We present a thorough mathematical derivation of our approach and verify its uniqueness and correct asymptotic behavior. We propagate the proton distribution by means of transition matrices, which contain kinetic data from both ultra-short (sub-ps) and intermediate (ps) timescales. This concept allows us to keep the most relevant features from the microscopic level while effectively reaching larger time and length scales. We demonstrate the applicability of the transition matrices for the description of proton conduction trends in proton exchange membrane materials.
Collapse
Affiliation(s)
- Christian Dreßler
- Institute of Chemistry, Martin Luther University Halle-Wittenberg, Von-Danckelmann-Platz 4, 06120 Halle (Saale), Germany
| | - Gabriel Kabbe
- Institute of Chemistry, Martin Luther University Halle-Wittenberg, Von-Danckelmann-Platz 4, 06120 Halle (Saale), Germany
| | - Martin Brehm
- Institute of Chemistry, Martin Luther University Halle-Wittenberg, Von-Danckelmann-Platz 4, 06120 Halle (Saale), Germany
| | - Daniel Sebastiani
- Institute of Chemistry, Martin Luther University Halle-Wittenberg, Von-Danckelmann-Platz 4, 06120 Halle (Saale), Germany
| |
Collapse
|
1064
|
George J, Waroquiers D, Di Stefano D, Petretto G, Rignanese G, Hautier G. The Limited Predictive Power of the Pauling Rules. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.202000829] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Janine George
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| | - David Waroquiers
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| | - Davide Di Stefano
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| | - Guido Petretto
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| | - Gian‐Marco Rignanese
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| | - Geoffroy Hautier
- Institute of Condensed Matter and NanosciencesUniversité catholique de Louvain Chemin des étoiles 8 1348 Louvain-la-Neuve Belgium
| |
Collapse
|
1065
|
Abstract
As the quantum chemistry (QC) community embraces machine learning (ML), the number of new methods and applications based on the combination of QC and ML is surging. In this Perspective, a view of the current state of affairs in this new and exciting research field is offered, challenges of using machine learning in quantum chemistry applications are described, and potential future developments are outlined. Specifically, examples of how machine learning is used to improve the accuracy and accelerate quantum chemical research are shown. Generalization and classification of existing techniques are provided to ease the navigation in the sea of literature and to guide researchers entering the field. The emphasis of this Perspective is on supervised machine learning.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
1066
|
Visual Analysis of Odor Interaction Based on Support Vector Regression Method. SENSORS 2020; 20:s20061707. [PMID: 32204317 PMCID: PMC7146738 DOI: 10.3390/s20061707] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/08/2020] [Accepted: 03/16/2020] [Indexed: 12/02/2022]
Abstract
The complex odor interaction between odorants makes it difficult to predict the odor intensity of their mixtures. The analysis method is currently one of the factors limiting our understanding of the odor interaction laws. We used a support vector regression algorithm to establish odor intensity prediction models for binary esters, aldehydes, and aromatic hydrocarbon mixtures, respectively. The prediction accuracy to both training samples and test samples demonstrated the high prediction capacity of the support vector regression model. Then the optimized model was used to generate extra odor data by predicting the odor intensity of more simulated samples with various mixing ratios and concentration levels. Based on these olfactory measured and model predicted data, the odor interaction was analyzed in the form of contour maps. This intuitive method showed more details about the odor interaction pattern in the binary mixture. We found that that the antagonism effect was commonly observed in these binary mixtures and the interaction degree was more intense when the components’ mixing ratio was close. Meanwhile, the odor intensity level of the odor mixture barely influenced the interaction degree. The machine learning algorithms were considered promising tools in odor researches.
Collapse
|
1067
|
Abstract
The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times.This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application.This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.
Collapse
Affiliation(s)
- Jacqueline M. Cole
- Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford OX2 6GG, U.K
| |
Collapse
|
1068
|
Lin R, Zhai Y, Xiong C, Li X. Inverse design of plasmonic metasurfaces by convolutional neural network. OPTICS LETTERS 2020; 45:1362-1365. [PMID: 32163966 DOI: 10.1364/ol.387404] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 02/03/2020] [Indexed: 06/10/2023]
Abstract
Artificial neural networks have shown effectiveness in the inverse design of nanophotonic structures; however, the numerical accuracy and algorithm efficiency are not analyzed adequately in previous reports. In this Letter, we demonstrate the convolutional neural network as an inverse design tool to achieve high numerical accuracy in plasmonic metasurfaces. A comparison of the convolutional neural networks and the fully connected neural networks show that convolutional neural networks have higher generalization capabilities. We share practical guidelines for optimizing the neural network and analyzed the hierarchy of accuracy in the multi-parameter inverse design of plasmonic metasurfaces. A high inverse design accuracy of $\pm 8\;{\rm nm}$±8nm for the critical geometrical parameters is demonstrated.
Collapse
|
1069
|
He J, Chen Y, Wu J, Stow DA, Christakos G. Space-time chlorophyll-a retrieval in optically complex waters that accounts for remote sensing and modeling uncertainties and improves remote estimation accuracy. WATER RESEARCH 2020; 171:115403. [PMID: 31901508 DOI: 10.1016/j.watres.2019.115403] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/22/2019] [Accepted: 12/15/2019] [Indexed: 06/10/2023]
Abstract
Remote sensing reflectance (Rrs) values measured by satellite sensors involve large amounts of uncertainty leading to non-negligible noise in remote Chlorophyll-a (Chl-a) concentration estimation. This work distinguished between two main stages in the case of estimating distributions of Chl-a within the Gulf of St. Lawrence (Canada). At the model building stage, the retrieval algorithm used both in-situ Chl-a measurements and the corresponding Moderate Resolution Imaging Spectroradiometer (MODIS) L2-level data estimated Rrs at 412, 443, 469, 488, 531, 547, 555, 645, 667, 678 nm at a 1 km spatial resolution during 2004-2013. Through the training and validation of various models and Rrs combinations of the considered eight techniques (including support vector regression, artificial neural networks, gradient boosting machine, random forests, standard CI-OC3M, multiple linear regression, generalized addictive regression, principal component regression), the support vector regression (SVR) technique was shown to have the best performance in Chl-a concentration estimation using Rrs at 412, 443, 488, 531 and 678 nm. The accuracy indicators for both the training (850) and the validation (213) datasets were found to be very good to excellent (e.g., the R2 value varied between 0.7058 and 0.9068). At the space-time estimation stage, this work took a step forward by using the Bayesian maximum entropy (BME) theory to further process the SVR estimated Chl-a concentrations by incorporating the inherent spatiotemporal dependency of physical Chl-a distribution. A 56% improvement was achieved in the reduction of the mean uncertainty of the validation data decreased considerably (from 1.2222 to 0.5322 mg/m3). Then, this novel BME/SVR framework was employed to estimate the daily Chl-a concentrations in the Gulf of St. Lawrence during Jan 1-Dec 31 of 2017 (1 km spatial resolution). The results showed that the daily mean Chl-a concentration varied from 1.6630 to 3.3431 mg/m3, and that the daily mean Chl-a uncertainty reduction of the composite BME/SVR vs. the SVR estimation had a maximum reduction value of 1.0082 and an average reduction value of 0.6173 mg/m3. The monthly spatial Chl-a distribution covariances showed that the highest Chl-a concentration variability occurred during November and that the spatiotemporal Chl-a concentration pattern changed a lot during the period August to November. In conclusion, the proposed BME/SVR was shown to be a promising remote Chl-a retrieval approach that exhibited a significant ability in reducing the non-negligible uncertainty and improving the accuracy of remote sensing Chl-a concentration estimates.
Collapse
Affiliation(s)
- Junyu He
- Ocean College, Zhejiang University, Zhoushan, China
| | - Yijun Chen
- School of Earth Sciences, Zhejiang University, Hangzhou, China
| | - Jiaping Wu
- Ocean College, Zhejiang University, Zhoushan, China
| | - Douglas A Stow
- Department of Geography, San Diego State University, San Diego, USA
| | - George Christakos
- Ocean College, Zhejiang University, Zhoushan, China; Department of Geography, San Diego State University, San Diego, USA.
| |
Collapse
|
1070
|
Sidky H, Chen W, Ferguson AL. Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation. Mol Phys 2020. [DOI: 10.1080/00268976.2020.1737742] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Hythem Sidky
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| | - Wei Chen
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| |
Collapse
|
1071
|
Hey T, Butler K, Jackson S, Thiyagalingam J. Machine learning and big scientific data. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2020; 378:20190054. [PMID: 31955675 PMCID: PMC7015290 DOI: 10.1098/rsta.2019.0054] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/06/2019] [Indexed: 05/21/2023]
Abstract
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such 'Big Scientific Data' comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our 'scientific machine learning' benchmark suite and of the research challenges these benchmarks will enable. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
Collapse
Affiliation(s)
- Tony Hey
- Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot OX11 0QX, UK
| | | | | | | |
Collapse
|
1072
|
Single-particle spectroscopy for functional nanomaterials. Nature 2020; 579:41-50. [PMID: 32132689 DOI: 10.1038/s41586-020-2048-8] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 01/07/2020] [Indexed: 11/08/2022]
Abstract
Tremendous progress in nanotechnology has enabled advances in the use of luminescent nanomaterials in imaging, sensing and photonic devices. This translational process relies on controlling the photophysical properties of the building block, that is, single luminescent nanoparticles. In this Review, we highlight the importance of single-particle spectroscopy in revealing the diverse optical properties and functionalities of nanomaterials, and compare it with ensemble fluorescence spectroscopy. The information provided by this technique has guided materials science in tailoring the synthesis of nanomaterials to achieve optical uniformity and to develop novel applications. We discuss the opportunities and challenges that arise from pushing the resolution limit, integrating measurement and manipulation modalities, and establishing the relationship between the structure and functionality of single nanoparticles.
Collapse
|
1073
|
Chen CT, Gu GX. Generative Deep Neural Networks for Inverse Materials Design Using Backpropagation and Active Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2020; 7:1902607. [PMID: 32154072 PMCID: PMC7055566 DOI: 10.1002/advs.201902607] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 11/11/2019] [Indexed: 05/19/2023]
Abstract
In recent years, machine learning (ML) techniques are seen to be promising tools to discover and design novel materials. However, the lack of robust inverse design approaches to identify promising candidate materials without exploring the entire design space causes a fundamental bottleneck. A general-purpose inverse design approach is presented using generative inverse design networks. This ML-based inverse design approach uses backpropagation to calculate the analytical gradients of an objective function with respect to design variables. This inverse design approach is capable of overcoming local minima traps by using backpropagation to provide rapid calculations of gradient information and running millions of optimizations with different initial values. Furthermore, an active learning strategy is adopted in the inverse design approach to improve the performance of candidate materials and reduce the amount of training data needed to do so. Compared to passive learning, the active learning strategy is capable of generating better designs and reducing the amount of training data by at least an order-of-magnitude in the case study on composite materials. The inverse design approach is compared with conventional gradient-based topology optimization and gradient-free genetic algorithms and the pros and cons of each method are discussed when applied to materials discovery and design problems.
Collapse
Affiliation(s)
- Chun-Teh Chen
- Department of Materials Science and Engineering University of California Berkeley CA 94720 USA
| | - Grace X Gu
- Department of Mechanical Engineering University of California Berkeley CA 94720 USA
| |
Collapse
|
1074
|
Zhang L, Mao H, Liu Q, Gani R. Chemical product design – recent advances and perspectives. Curr Opin Chem Eng 2020. [DOI: 10.1016/j.coche.2019.10.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
1075
|
Abstract
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
Collapse
Affiliation(s)
- Duc Duy Nguyen
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Zixuan Cang
- Department of Mathematics, Michigan State University, MI 48824, USA.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA. and Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA and Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
1076
|
Abstract
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.
Collapse
Affiliation(s)
- Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany.,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA;
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, 1511 Luxembourg, Luxembourg;
| | - Klaus-Robert Müller
- Department of Computer Science, Technical University Berlin, 10587 Berlin, Germany; .,Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany.,Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| | - Cecilia Clementi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany; .,Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA; .,Department of Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
1077
|
Aguirre NF, Morgenstern A, Cawkwell MJ, Batista ER, Yang P. Development of Density Functional Tight-Binding Parameters Using Relative Energy Fitting and Particle Swarm Optimization. J Chem Theory Comput 2020; 16:1469-1481. [DOI: 10.1021/acs.jctc.9b00880] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Néstor F. Aguirre
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Amanda Morgenstern
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - M. J. Cawkwell
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Enrique R. Batista
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ping Yang
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
1078
|
Hu Q, Weng M, Chen X, Li S, Pan F, Wang LW. Neural Network Force Fields for Metal Growth Based on Energy Decompositions. J Phys Chem Lett 2020; 11:1364-1369. [PMID: 32000486 DOI: 10.1021/acs.jpclett.9b03780] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A method using machine learning (ML) is proposed to describe metal growth for simulations, which retains the accuracy of ab initio density functional theory (DFT) and results in a thousands-fold reduction in the computational time. This method is based on atomic energy decomposition from DFT calculations. Compared with other ML methods, our energy decomposition approach can yield much more information with the same DFT calculations. This approach is employed for the amorphous sodium system, where only 1000 DFT molecular dynamics images are enough for training an accurate model. The DFT and neural network potential (NNP) are compared for the dynamics to show that similar structural properties are generated. Finally, metal growth experiments from liquid to solid in a small and larger system are carried out to demonstrate the ability of using NNP to simulate the real growth process.
Collapse
Affiliation(s)
- Qin Hu
- School of Advanced Materials , Peking University Shenzhen Graduate School , Shenzhen 518055 , China
| | - Mouyi Weng
- School of Advanced Materials , Peking University Shenzhen Graduate School , Shenzhen 518055 , China
| | - Xin Chen
- School of Advanced Materials , Peking University Shenzhen Graduate School , Shenzhen 518055 , China
| | - Shucheng Li
- School of Advanced Materials , Peking University Shenzhen Graduate School , Shenzhen 518055 , China
| | - Feng Pan
- School of Advanced Materials , Peking University Shenzhen Graduate School , Shenzhen 518055 , China
| | - Lin-Wang Wang
- Materials Sciences Division , Lawrence Berkeley National Laboratory , Berkeley , California 94720 , United States
| |
Collapse
|
1079
|
Attia PM, Grover A, Jin N, Severson KA, Markov TM, Liao YH, Chen MH, Cheong B, Perkins N, Yang Z, Herring PK, Aykol M, Harris SJ, Braatz RD, Ermon S, Chueh WC. Closed-loop optimization of fast-charging protocols for batteries with machine learning. Nature 2020; 578:397-402. [DOI: 10.1038/s41586-020-1994-5] [Citation(s) in RCA: 236] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/19/2019] [Indexed: 12/24/2022]
|
1080
|
Archibald RK, Doucet M, Johnston T, Young SR, Yang E, Heller WT. Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques. J Appl Crystallogr 2020. [DOI: 10.1107/s1600576720000552] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
A consistent challenge for both new and expert practitioners of small-angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post-processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS.
Collapse
|
1081
|
Abstract
Fast-scan cyclic voltammetry (FSCV) at carbon-fiber microelectrodes (CFMEs) is a versatile electrochemical technique to probe neurochemical dynamics in vivo. Progress in FSCV methodology continues to address analytical challenges arising from biological needs to measure low concentrations of neurotransmitters at specific sites. This review summarizes recent advances in FSCV method development in three areas: (1) waveform optimization, (2) electrode development, and (3) data analysis. First, FSCV waveform parameters such as holding potential, switching potential, and scan rate have been optimized to monitor new neurochemicals. The new waveform shapes introduce better selectivity toward specific molecules such as serotonin, histamine, hydrogen peroxide, octopamine, adenosine, guanosine, and neuropeptides. Second, CFMEs have been modified with nanomaterials such as carbon nanotubes or replaced with conducting polymers to enhance sensitivity, selectivity, and antifouling properties. Different geometries can be obtained by 3D-printing, manufacturing arrays, or fabricating carbon nanopipettes. Third, data analysis is important to sort through the thousands of CVs obtained. Recent developments in data analysis include preprocessing by digital filtering, principal components analysis for distinguishing analytes, and developing automated algorithms to detect peaks. Future challenges include multisite measurements, machine learning, and integration with other techniques. Advances in FSCV will accelerate research in neurochemistry to answer new biological questions about dynamics of signaling in the brain.
Collapse
Affiliation(s)
- Pumidech Puthongkham
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA.
| | | |
Collapse
|
1082
|
Zhang M, Li J, Kang L, Zhang N, Huang C, He Y, Hu M, Zhou X, Zhang J. Machine learning-guided design and development of multifunctional flexible Ag/poly (amic acid) composites using the differential evolution algorithm. NANOSCALE 2020; 12:3988-3996. [PMID: 32016252 DOI: 10.1039/c9nr09146g] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The development of flexible composites is of great significance in the flexible electronic field. In combination with machine learning technology, the introduction of artificial intelligence to flexible materials design, synthesis, characterization and application research will greatly promote the flexible materials research efficiency. In this study, the back propagation (BP) neural network based on the differential evolution (DE) algorithm was applied to determine the electrical properties of the flexible Ag/poly (amic acid) (PAA) composite structure and to develop flexible materials for its different applications. In the machine learning model, the concentration of PAA, the ion exchange time of AgNO3, and the concentration and reduction time of NaBH4 are set as input parameters, and the product of the sheet resistance of the Ag/PAA film and the processing time are set as output information. To overcome the situation whereby the BP neural network solution process could fall into the local optimum, the initial threshold and the weight of the BP neural network and the data import model are optimized by the DE algorithm. Utilizing 1077 learning samples and 49 predictive samples, a machine learning model with very high accuracy was established and relative errors of predictions less than 1.96% were achieved. In terms of this model, the optimized fabrication conditions of the Ag/PAA composites, which are suitable for strain sensors and electrodes, were predicted. To identify the availability and applicability of the proposed algorithm, a strain gauge sensor, a triboelectric nanogenerator (TENG) and a capacitive pressure sensor array were fabricated successfully using the optimized process parameters. This work shows that machine learning can be used to quickly optimize the process and provide guidance for material and process design, which is of significance for the development of flexible materials and devices.
Collapse
Affiliation(s)
- Mengyao Zhang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Jia Li
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Ling Kang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Nan Zhang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Chun Huang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Yaqin He
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Menghan Hu
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Xiaofeng Zhou
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China.
| | - Jian Zhang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, 200241, Shanghai, China. and Shanghai Institute of Intelligent Electronics & Systems, Fudan University, Shanghai 200433, China
| |
Collapse
|
1083
|
Jirasek F, Alves RAS, Damay J, Vandermeulen RA, Bamler R, Bortz M, Mandt S, Kloft M, Hasse H. Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion. J Phys Chem Lett 2020; 11:981-985. [PMID: 31964142 DOI: 10.1021/acs.jpclett.9b03657] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Activity coefficients, which are a measure of the nonideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate the activity coefficients in many relevant mixtures that have not been explored to date. In this report, we propose a probabilistic matrix factorization model for predicting the activity coefficients in arbitrary binary mixtures. Although no physical descriptors for the considered components were used, our method outperforms the state-of-the-art method that has been refined over three decades while requiring much less training effort. This opens perspectives to novel methods for predicting physicochemical properties of binary mixtures with the potential to revolutionize modeling and simulation in chemical engineering.
Collapse
Affiliation(s)
- Fabian Jirasek
- Department of Computer Science , University of California , Irvine , California 92697 , United States
- Laboratory of Engineering Thermodynamics (LTD) , TU Kaiserslautern , 67663 Kaiserslautern , Germany
| | - Rodrigo A S Alves
- Machine Learning Group, Department of Computer Science , TU Kaiserslautern , 67663 Kaiserslautern , Germany
| | - Julie Damay
- Fraunhofer Institute for Industrial Mathematics ITWM , 67663 Kaiserslautern , Germany
| | - Robert A Vandermeulen
- Machine Learning Group, Department of Computer Science , TU Kaiserslautern , 67663 Kaiserslautern , Germany
| | - Robert Bamler
- Department of Computer Science , University of California , Irvine , California 92697 , United States
| | - Michael Bortz
- Fraunhofer Institute for Industrial Mathematics ITWM , 67663 Kaiserslautern , Germany
| | - Stephan Mandt
- Department of Computer Science , University of California , Irvine , California 92697 , United States
| | - Marius Kloft
- Machine Learning Group, Department of Computer Science , TU Kaiserslautern , 67663 Kaiserslautern , Germany
| | - Hans Hasse
- Laboratory of Engineering Thermodynamics (LTD) , TU Kaiserslautern , 67663 Kaiserslautern , Germany
| |
Collapse
|
1084
|
Monteiro R, Miyazato I, Takahashi K. Rising Sun Envelope Method: An Automatic and Accurate Peak Location Technique for XANES Measurements. J Phys Chem A 2020; 124:1754-1762. [DOI: 10.1021/acs.jpca.9b11712] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Rafael Monteiro
- MathAM-OIL, AIST, c/o Advanced Institute for Materials Research, Tohoku University, Sendai 980-8577, Japan
| | - Itsuki Miyazato
- Department of Chemistry, Hokkaido University, N-10 W-8, Sapporo 060-0810, Japan
- Center for Materials Research By Information Integration (CMI2), National Institute for Materials Science (NIMS), 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Keisuke Takahashi
- Department of Chemistry, Hokkaido University, N-10 W-8, Sapporo 060-0810, Japan
- Center for Materials Research By Information Integration (CMI2), National Institute for Materials Science (NIMS), 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| |
Collapse
|
1085
|
Winkler M, Sonner H, Gleiss M, Nirschl H. Fractionation of ultrafine particles: Evaluation of separation efficiency by UV–vis spectroscopy. Chem Eng Sci 2020. [DOI: 10.1016/j.ces.2019.115374] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
1086
|
Zhang X, Zhang K, Lin D, Zhu Y, Chen C, He L, Guo X, Chen K, Wang R, Liu Z, Wu X, Long E, Huang K, He Z, Liu X, Lin H. Artificial intelligence deciphers codes for color and odor perceptions based on large-scale chemoinformatic data. Gigascience 2020; 9:giaa011. [PMID: 32101298 PMCID: PMC7043059 DOI: 10.1093/gigascience/giaa011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 10/19/2019] [Accepted: 01/30/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Color vision is the ability to detect, distinguish, and analyze the wavelength distributions of light independent of the total intensity. It mediates the interaction between an organism and its environment from multiple important aspects. However, the physicochemical basis of color coding has not been explored completely, and how color perception is integrated with other sensory input, typically odor, is unclear. RESULTS Here, we developed an artificial intelligence platform to train algorithms for distinguishing color and odor based on the large-scale physicochemical features of 1,267 and 598 structurally diverse molecules, respectively. The predictive accuracies achieved using the random forest and deep belief network for the prediction of color were 100% and 95.23% ± 0.40% (mean ± SD), respectively. The predictive accuracies achieved using the random forest and deep belief network for the prediction of odor were 93.40% ± 0.31% and 94.75% ± 0.44% (mean ± SD), respectively. Twenty-four physicochemical features were sufficient for the accurate prediction of color, while 39 physicochemical features were sufficient for the accurate prediction of odor. A positive correlation between the color-coding and odor-coding properties of the molecules was predicted. A group of descriptors was found to interlink prominently in color and odor perceptions. CONCLUSIONS Our random forest model and deep belief network accurately predicted the colors and odors of structurally diverse molecules. These findings extend our understanding of the molecular and structural basis of color vision and reveal the interrelationship between color and odor perceptions in nature.
Collapse
Affiliation(s)
- Xiayin Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Kai Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
- School of Computer Science and Technology, Xidian University, Tai Bai South Road 2#, Xi'an 710000, China
| | - Duoru Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Yi Zhu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
- Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, 1120 NW 14th Street, Miami, FL 33136, USA
| | - Chuan Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, 1120 NW 14th Street, Miami, FL 33136, USA
| | - Lin He
- School of Computer Science and Technology, Xidian University, Tai Bai South Road 2#, Xi'an 710000, China
| | - Xusen Guo
- Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education School of Data and Computer Science, Sun Yat-Sen University, Wai Huan East Road 132#, Guangzhou 510000, China
| | - Kexin Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Ruixin Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Zhenzhen Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Xiaohang Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Erping Long
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
| | - Kai Huang
- Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education School of Data and Computer Science, Sun Yat-Sen University, Wai Huan East Road 132#, Guangzhou 510000, China
| | - Zhiqiang He
- Key Laboratory of Universal Wireless Communications, Beijing University of Posts and Telecommunications, West Tu Cheng Road 10#, Beijing 100876, China
| | - Xiyang Liu
- School of Computer Science and Technology, Xidian University, Tai Bai South Road 2#, Xi'an 710000, China
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Xian Lie South Road 54#, Guangzhou 510060, China
- Center of Precision Medicine, Sun Yat-sen University, Xin Guang West Road 135#, Guangzhou 510080, China
| |
Collapse
|
1087
|
Haghighatlari M, Vishwakarma G, Altarawy D, Subramanian R, Kota BU, Sonpal A, Setlur S, Hachmann J. ChemML
: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1458] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Mojtaba Haghighatlari
- Department of Chemical and Biological Engineering University at Buffalo, The State University of New York Buffalo New York
| | - Gaurav Vishwakarma
- Department of Chemical and Biological Engineering University at Buffalo, The State University of New York Buffalo New York
| | - Doaa Altarawy
- The Molecular Sciences Software Institute, Virginia Tech Blacksburg Virginia
- Computer and Systems Engineering Department Alexandria University Alexandria Egypt
| | - Ramachandran Subramanian
- Department of Computer Science and Engineering University at Buffalo, The State University of New York Buffalo New York
- Center for Unified Biometrics and Sensors University at Buffalo, The State University of New York Buffalo New York
| | - Bhargava U. Kota
- Department of Computer Science and Engineering University at Buffalo, The State University of New York Buffalo New York
- Center for Unified Biometrics and Sensors University at Buffalo, The State University of New York Buffalo New York
| | - Aditya Sonpal
- Department of Chemical and Biological Engineering University at Buffalo, The State University of New York Buffalo New York
| | - Srirangaraj Setlur
- Department of Computer Science and Engineering University at Buffalo, The State University of New York Buffalo New York
- Center for Unified Biometrics and Sensors University at Buffalo, The State University of New York Buffalo New York
- Center of Excellence for Document Analysis and Recognition, University at Buffalo The State University of New York Buffalo New York
| | - Johannes Hachmann
- Department of Chemical and Biological Engineering University at Buffalo, The State University of New York Buffalo New York
- Computational and Data‐Enabled Science and Engineering Graduate Program University at Buffalo, The State University of New York Buffalo New York
- New York State Center of Excellence in Materials Informatics Buffalo New York
| |
Collapse
|
1088
|
Rohr B, Stein HS, Guevarra D, Wang Y, Haber JA, Aykol M, Suram SK, Gregoire JM. Benchmarking the acceleration of materials discovery by sequential learning. Chem Sci 2020; 11:2696-2706. [PMID: 34084328 PMCID: PMC8157525 DOI: 10.1039/c9sc05999g] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 01/27/2020] [Indexed: 12/23/2022] Open
Abstract
Sequential learning (SL) strategies, i.e. iteratively updating a machine learning model to guide experiments, have been proposed to significantly accelerate materials discovery and research. Applications on computational datasets and a handful of optimization experiments have demonstrated the promise of SL, motivating a quantitative evaluation of its ability to accelerate materials discovery, specifically in the case of physical experiments. The benchmarking effort in the present work quantifies the performance of SL algorithms with respect to a breadth of research goals: discovery of any "good" material, discovery of all "good" materials, and discovery of a model that accurately predicts the performance of new materials. To benchmark the effectiveness of different machine learning models against these goals, we use datasets in which the performance of all materials in the search space is known from high-throughput synthesis and electrochemistry experiments. Each dataset contains all pseudo-quaternary metal oxide combinations from a set of six elements (chemical space), the performance metric chosen is the electrocatalytic activity (overpotential) for the oxygen evolution reaction (OER). A diverse set of SL schemes is tested on four chemical spaces, each containing 2121 catalysts. The presented work suggests that research can be accelerated by up to a factor of 20 compared to random acquisition in specific scenarios. The results also show that certain choices of SL models are ill-suited for a given research goal resulting in substantial deceleration compared to random acquisition methods. The results provide quantitative guidance on how to tune an SL strategy for a given research goal and demonstrate the need for a new generation of materials-aware SL algorithms to further accelerate materials discovery.
Collapse
Affiliation(s)
- Brian Rohr
- Accelerated Materials Design and Discovery, Toyota Research Institute Los Altos CA USA
| | - Helge S Stein
- Joint Center for Artificial Photosynthesis, California Institute of Technology Pasadena CA USA
| | - Dan Guevarra
- Joint Center for Artificial Photosynthesis, California Institute of Technology Pasadena CA USA
| | - Yu Wang
- Joint Center for Artificial Photosynthesis, California Institute of Technology Pasadena CA USA
| | - Joel A Haber
- Joint Center for Artificial Photosynthesis, California Institute of Technology Pasadena CA USA
| | - Muratahan Aykol
- Accelerated Materials Design and Discovery, Toyota Research Institute Los Altos CA USA
| | - Santosh K Suram
- Accelerated Materials Design and Discovery, Toyota Research Institute Los Altos CA USA
| | - John M Gregoire
- Joint Center for Artificial Photosynthesis, California Institute of Technology Pasadena CA USA
- Division of Engineering and Applied Science, California Institute of Technology Pasadena CA USA
| |
Collapse
|
1089
|
Sun J, Tárnok A, Su X. Deep Learning-Based Single-Cell Optical Image Studies. Cytometry A 2020; 97:226-240. [PMID: 31981309 DOI: 10.1002/cyto.a.23973] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 01/03/2020] [Accepted: 01/10/2020] [Indexed: 12/17/2022]
Abstract
Optical imaging technology that has the advantages of high sensitivity and cost-effectiveness greatly promotes the progress of nondestructive single-cell studies. Complex cellular image analysis tasks such as three-dimensional reconstruction call for machine-learning technology in cell optical image research. With the rapid developments of high-throughput imaging flow cytometry, big data cell optical images are always obtained that may require machine learning for data analysis. In recent years, deep learning has been prevalent in the field of machine learning for large-scale image processing and analysis, which brings a new dawn for single-cell optical image studies with an explosive growth of data availability. Popular deep learning techniques offer new ideas for multimodal and multitask single-cell optical image research. This article provides an overview of the basic knowledge of deep learning and its applications in single-cell optical image studies. We explore the feasibility of applying deep learning techniques to single-cell optical image analysis, where popular techniques such as transfer learning, multimodal learning, multitask learning, and end-to-end learning have been reviewed. Image preprocessing and deep learning model training methods are then summarized. Applications based on deep learning techniques in the field of single-cell optical image studies are reviewed, which include image segmentation, super-resolution image reconstruction, cell tracking, cell counting, cross-modal image reconstruction, and design and control of cell imaging systems. In addition, deep learning in popular single-cell optical imaging techniques such as label-free cell optical imaging, high-content screening, and high-throughput optical imaging cytometry are also mentioned. Finally, the perspectives of deep learning technology for single-cell optical image analysis are discussed. © 2020 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Jing Sun
- Institute of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Attila Tárnok
- Department of Therapy Validation, Fraunhofer Institute for Cell Therapy and Immunology (IZI), Leipzig, Germany.,Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Leipzig, Germany
| | - Xuantao Su
- Institute of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| |
Collapse
|
1090
|
Joseph MB. Neural hierarchical models of ecological populations. Ecol Lett 2020; 23:734-747. [PMID: 31970895 DOI: 10.1111/ele.13462] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 10/17/2019] [Accepted: 12/23/2019] [Indexed: 01/20/2023]
Abstract
Neural networks are increasingly being used in science to infer hidden dynamics of natural systems from noisy observations, a task typically handled by hierarchical models in ecology. This article describes a class of hierarchical models parameterised by neural networks - neural hierarchical models. The derivation of such models analogises the relationship between regression and neural networks. A case study is developed for a neural dynamic occupancy model of North American bird populations, trained on millions of detection/non-detection time series for hundreds of species, providing insights into colonisation and extinction at a continental scale. Flexible models are increasingly needed that scale to large data and represent ecological processes. Neural hierarchical models satisfy this need, providing a bridge between deep learning and ecological modelling that combines the function representation power of neural networks with the inferential capacity of hierarchical models.
Collapse
Affiliation(s)
- Maxwell B Joseph
- Earth Lab, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO, 80303, USA
| |
Collapse
|
1091
|
Singh S, Pareek M, Changotra A, Banerjee S, Bhaskararao B, Balamurugan P, Sunoj RB. A unified machine-learning protocol for asymmetric catalysis as a proof of concept demonstration using asymmetric hydrogenation. Proc Natl Acad Sci U S A 2020; 117:1339-1345. [PMID: 31915295 PMCID: PMC6983389 DOI: 10.1073/pnas.1916392117] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Design of asymmetric catalysts generally involves time- and resource-intensive heuristic endeavors. In view of the steady increase in interest toward efficient catalytic asymmetric reactions and the rapid growth in the field of machine learning (ML) in recent years, we envisaged dovetailing these two important domains. We selected a set of quantum chemically derived molecular descriptors from five different asymmetric binaphthyl-derived catalyst families with the propensity to impact the enantioselectivity of asymmetric hydrogenation of alkenes and imines. The predictive power of the random forest (RF) built using the molecular parameters of a set of 368 substrate-catalyst combinations is found to be impressive, with a root-mean-square error (rmse) in the predicted enantiomeric excess (%ee) of about 8.4 ± 1.8 compared to the experimentally known values. The accuracy of RF is found to be superior to other ML methods such as convolutional neural network, decision tree, and eXtreme gradient boosting as well as stepwise linear regression. The proposed method is expected to provide a leap forward in the design of catalysts for asymmetric transformations.
Collapse
Affiliation(s)
- Sukriti Singh
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - Monika Pareek
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - Avtar Changotra
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - Sayan Banerjee
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - Bangaru Bhaskararao
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - P Balamurugan
- Industrial Engineering and Operations Research, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, 400076 Mumbai, India;
| |
Collapse
|
1092
|
Hatakeyama-Sato K, Tezuka T, Umeki M, Oyaizu K. AI-Assisted Exploration of Superionic Glass-Type Li+ Conductors with Aromatic Structures. J Am Chem Soc 2020; 142:3301-3305. [DOI: 10.1021/jacs.9b11442] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
| | - Toshiki Tezuka
- Department of Applied Chemistry, Waseda University, Tokyo 169-8555, Japan
| | - Momoka Umeki
- Department of Applied Chemistry, Waseda University, Tokyo 169-8555, Japan
| | - Kenichi Oyaizu
- Department of Applied Chemistry, Waseda University, Tokyo 169-8555, Japan
| |
Collapse
|
1093
|
Shao Y, Hellström M, Mitev PD, Knijff L, Zhang C. PiNN: A Python Library for Building Atomic Neural Networks of Molecules and Materials. J Chem Inf Model 2020; 60:1184-1193. [DOI: 10.1021/acs.jcim.9b00994] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Yunqi Shao
- Department of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Matti Hellström
- Software for Chemistry and Materials B.V., De Boelelaan 1083, 1081HV Amsterdam, The Netherlands
| | - Pavlin D. Mitev
- Department of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Lisanne Knijff
- Department of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Chao Zhang
- Department of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| |
Collapse
|
1094
|
Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat Protoc 2020; 15:479-512. [PMID: 31932775 DOI: 10.1038/s41596-019-0251-6] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 10/04/2019] [Indexed: 01/01/2023]
Abstract
DNA methylation data-based precision cancer diagnostics is emerging as the state of the art for molecular tumor classification. Standards for choosing statistical methods with regard to well-calibrated probability estimates for these typically highly multiclass classification tasks are still lacking. To support this choice, we evaluated well-established machine learning (ML) classifiers including random forests (RFs), elastic net (ELNET), support vector machines (SVMs) and boosted trees in combination with post-processing algorithms and developed ML workflows that allow for unbiased class probability (CP) estimation. Calibrators included ridge-penalized multinomial logistic regression (MR) and Platt scaling by fitting logistic regression (LR) and Firth's penalized LR. We compared these workflows on a recently published brain tumor 450k DNA methylation cohort of 2,801 samples with 91 diagnostic categories using a 5 × 5-fold nested cross-validation scheme and demonstrated their generalizability on external data from The Cancer Genome Atlas. ELNET was the top stand-alone classifier with the best calibration profiles. The best overall two-stage workflow was MR-calibrated SVM with linear kernels closely followed by ridge-calibrated tuned RF. For calibration, MR was the most effective regardless of the primary classifier. The protocols developed as a result of these comparisons provide valuable guidance on choosing ML workflows and their tuning to generate well-calibrated CP estimates for precision diagnostics using DNA methylation data. Computation times vary depending on the ML algorithm from <15 min to 5 d using multi-core desktop PCs. Detailed scripts in the open-source R language are freely available on GitHub, targeting users with intermediate experience in bioinformatics and statistics and using R with Bioconductor extensions.
Collapse
|
1095
|
Brown KA, Brittman S, Maccaferri N, Jariwala D, Celano U. Machine Learning in Nanoscience: Big Data at Small Scales. NANO LETTERS 2020; 20:2-10. [PMID: 31804080 DOI: 10.1021/acs.nanolett.9b04090] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent advances in machine learning (ML) offer new tools to extract new insights from large data sets and to acquire small data sets more effectively. Researchers in nanoscience are experimenting with these tools to tackle challenges in many fields. In addition to ML's advancement of nanoscience, nanoscience provides the foundation for neuromorphic computing hardware to expand the implementation of ML algorithms. In this Mini Review, we highlight some recent efforts to connect the ML and nanoscience communities by focusing on three types of interaction: (1) using ML to analyze and extract new insights from large nanoscience data sets, (2) applying ML to accelerate material discovery, including the use of active learning to guide experimental design, and (3) the nanoscience of memristive devices to realize hardware tailored for ML. We conclude with a discussion of challenges and opportunities for future interactions between nanoscience and ML researchers.
Collapse
Affiliation(s)
- Keith A Brown
- Department of Mechanical Engineering, Physics Department, and Division of Materials Science and Engineering , Boston University , Boston , Massachusetts 02215 , United States
| | - Sarah Brittman
- U.S. Naval Research Laboratory , Washington , DC 20375 , United States
| | - Nicolò Maccaferri
- Department of Physics and Materials Science , University of Luxembourg , 162a avenue de la Faïencerie , L-1511 Luxembourg , Luxembourg
| | - Deep Jariwala
- Department of Electrical and Systems Engineering , University of Pennsylvania , Philadelphia , Pennsylvania 19104 , United States
| | - Umberto Celano
- imec , Kapeldreef 75 , B-3001 Heverlee ( Leuven ), Belgium
| |
Collapse
|
1096
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
1097
|
Wang H, Liang X, Wang J, Jiao S, Xue D. Multifunctional inorganic nanomaterials for energy applications. NANOSCALE 2020; 12:14-42. [PMID: 31808494 DOI: 10.1039/c9nr07008g] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Our society has been facing more and more serious challenges towards achieving highly efficient utilization of energy. In the field of energy applications, multifunctional nanomaterials have been attracting increasing attention. Various energy applications, such as energy generation, conversion, storage, saving and transmission, are strongly dependent upon the electrical, thermal, mechanical, optical and catalytic functions of materials. In the nanoscale range, thermoelectric, piezoelectric, triboelectric, photovoltaic, catalytic and electrochromic materials have made major contributions to various energy applications. Inorganic nanomaterials' unique properties, such as excellent electrical and thermal conductivity, large surface area and chemical stability, make them highly competitive in energy applications. In this review, the latest research and development of multifunctional inorganic nanomaterials in energy applications were summarized from the perspective of different energy applications. Furthermore, we also illustrated the unique functions of inorganic nanomaterials to improve their performances and the combination of the functions of nanomaterials into a device. However, challenges may be traced back to the limitations set by scaling the relations between multifunctional inorganic nanomaterials and energy devices.
Collapse
Affiliation(s)
- Huilin Wang
- State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China. and University of Science and Technology of China, Hefei 230026, China
| | - Xitong Liang
- State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China. and University of Science and Technology of China, Hefei 230026, China
| | - Jiutian Wang
- State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China. and University of Science and Technology of China, Hefei 230026, China
| | - Shengjian Jiao
- State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China. and University of Science and Technology of China, Hefei 230026, China
| | - Dongfeng Xue
- State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, China. and University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
1098
|
Méndez-Lucio O, Baillif B, Clevert DA, Rouquié D, Wichard J. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat Commun 2020; 11:10. [PMID: 31900408 PMCID: PMC6941972 DOI: 10.1038/s41467-019-13807-w] [Citation(s) in RCA: 188] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 11/27/2019] [Indexed: 01/20/2023] Open
Abstract
Finding new molecules with a desired biological activity is an extremely difficult task. In this context, artificial intelligence and generative models have been used for molecular de novo design and compound optimization. Herein, we report a generative model that bridges systems biology and molecular design, conditioning a generative adversarial network with transcriptomic data. By doing so, we can automatically design molecules that have a high probability to induce a desired transcriptomic profile. As long as the gene expression signature of the desired state is provided, this model is able to design active-like molecules for desired targets without any previous target annotation of the training compounds. Molecules designed by this model are more similar to active compounds than the ones identified by similarity of gene expression signatures. Overall, this method represents an alternative approach to bridge chemistry and biology in the long and difficult road of drug discovery.
Collapse
Affiliation(s)
- Oscar Méndez-Lucio
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France.
- Bloomoon, 13 Avenue Albert Einstein, 69100, Villeurbanne, France.
| | - Benoit Baillif
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France
| | - Djork-Arné Clevert
- Department of Machine Learning Research, Bayer AG, 13353, Berlin, Germany
| | - David Rouquié
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France.
| | - Joerg Wichard
- Department of Genetic Toxicology, Bayer AG, 13353, Berlin, Germany.
| |
Collapse
|
1099
|
Witman M, Ling S, Grant DM, Walker GS, Agarwal S, Stavila V, Allendorf MD. Extracting an Empirical Intermetallic Hydride Design Principle from Limited Data via Interpretable Machine Learning. J Phys Chem Lett 2020; 11:40-47. [PMID: 31814416 DOI: 10.1021/acs.jpclett.9b02971] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
An open question in the metal hydride community is whether there are simple, physics-based design rules that dictate the thermodynamic properties of these materials across the variety of structures and chemistry they can exhibit. While black box machine learning-based algorithms can predict these properties with some success, they do not directly provide the basis on which these predictions are made, therefore complicating the a priori design of novel materials exhibiting a desired property value. In this work we demonstrate how feature importance, as identified by a gradient boosting tree regressor, uncovers the strong dependence of the metal hydride equilibrium H2 pressure on a volume-based descriptor that can be computed from just the elemental composition of the intermetallic alloy. Elucidation of this simple structure-property relationship is valid across a range of compositions, metal substitutions, and structural classes exhibited by intermetallic hydrides. This permits rational targeting of novel intermetallics for high-pressure hydrogen storage (low-stability hydrides) by their descriptor values, and we predict a known intermetallic to form a low-stability hydride (as confirmed by density functional theory calculations) that has not yet been experimentally investigated.
Collapse
Affiliation(s)
- Matthew Witman
- Sandia National Laboratories , Livermore , California 94551 , United States
| | - Sanliang Ling
- Advanced Materials Research Group, Faculty of Engineering , University of Nottingham , University Park , Nottingham NG7 2RD , U.K
| | - David M Grant
- Advanced Materials Research Group, Faculty of Engineering , University of Nottingham , University Park , Nottingham NG7 2RD , U.K
| | - Gavin S Walker
- Advanced Materials Research Group, Faculty of Engineering , University of Nottingham , University Park , Nottingham NG7 2RD , U.K
| | - Sapan Agarwal
- Sandia National Laboratories , Livermore , California 94551 , United States
| | - Vitalie Stavila
- Sandia National Laboratories , Livermore , California 94551 , United States
| | - Mark D Allendorf
- Sandia National Laboratories , Livermore , California 94551 , United States
| |
Collapse
|
1100
|
Duan Q, Lee J, Zheng S, Chen J, Luo R, Feng Y, Xu Z. A color-spectral machine learning path for analysis of five mixed amino acids. Chem Commun (Camb) 2020; 56:1058-1061. [DOI: 10.1039/c9cc07186e] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A data path between mixed amino acid analysis and machine learning.
Collapse
Affiliation(s)
- Qiannan Duan
- State Key Laboratory of Pollution Control and Resource Reuse
- Jiangsu Key Laboratory of Vehicle Emissions Control
- School of the Environment
- Nanjing University
- Nanjing 210023
| | - Jianchao Lee
- Department of Environmental Science
- Shaanxi Normal University
- Xi’an 710062
- China
| | - Shourong Zheng
- State Key Laboratory of Pollution Control and Resource Reuse
- Jiangsu Key Laboratory of Vehicle Emissions Control
- School of the Environment
- Nanjing University
- Nanjing 210023
| | - Jiayuan Chen
- Department of Environmental Science
- Shaanxi Normal University
- Xi’an 710062
- China
| | - Run Luo
- Department of Environmental Science
- Shaanxi Normal University
- Xi’an 710062
- China
| | - Yunjin Feng
- Department of Environmental Science
- Shaanxi Normal University
- Xi’an 710062
- China
| | - Zhaoyi Xu
- State Key Laboratory of Pollution Control and Resource Reuse
- Jiangsu Key Laboratory of Vehicle Emissions Control
- School of the Environment
- Nanjing University
- Nanjing 210023
| |
Collapse
|