1
|
Zhuang Z, Barnard AS. Classification of battery compounds using structure-free Mendeleev encodings. J Cheminform 2024; 16:47. [PMID: 38671512 PMCID: PMC11055346 DOI: 10.1186/s13321-024-00836-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Machine learning is a valuable tool that can accelerate the discovery and design of materials occupying combinatorial chemical spaces. However, the prerequisite need for vast amounts of training data can be prohibitive when significant resources are needed to characterize or simulate candidate structures. Recent results have shown that structure-free encoding of complex materials, based entirely on chemical compositions, can overcome this impediment and perform well in unsupervised learning tasks. In this study, we extend this exploration to supervised classification, and show how structure-free encoding can accurately predict classes of material compounds for battery applications without time consuming measurement of bonding networks, lattices or densities. SCIENTIFIC CONTRIBUTION: The comprehensive evaluation of structure-free encodings of complex materials in classification tasks, including binary and multi-class separation, inclusive of three classifiers based on different logic function, is measured four metrics and learning curves. The encoding is applied to two data sets from computational and experimental sources, and the outcomes visualised using 5 approaches to confirms the suitability and superiority of Mendeleev encoding. These methods are general and accessible using source software, to provide simple, intuitive and interpretable materials informatics outcomes to accelerate materials design.
Collapse
Affiliation(s)
- Zixin Zhuang
- School of Computing, Australian National University, 145 Science Road, Acton, 2601, ACT, Australia
| | - Amanda S Barnard
- School of Computing, Australian National University, 145 Science Road, Acton, 2601, ACT, Australia.
| |
Collapse
|
2
|
Kirschbaum T, von Seggern B, Dzubiella J, Bande A, Noé F. Machine Learning Frontier Orbital Energies of Nanodiamonds. J Chem Theory Comput 2023; 19:4461-4473. [PMID: 37053438 DOI: 10.1021/acs.jctc.2c01275] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
Nanodiamonds have a wide range of applications including catalysis, sensing, tribology, and biomedicine. To leverage nanodiamond design via machine learning, we introduce the new data set ND5k, consisting of 5089 diamondoid and nanodiamond structures and their frontier orbital energies. ND5k structures are optimized via tight-binding density functional theory (DFTB) and their frontier orbital energies are computed using density functional theory (DFT) with the PBE0 hybrid functional. From this data set we derive a qualitative design suggestion for nanodiamonds in photocatalysis. We also compare recent machine learning models for predicting frontier orbital energies for similar structures as they have been trained on (interpolation on ND5k), and we test their abilities to extrapolate predictions to larger structures. For both the interpolation and extrapolation task, we find the best performance using the equivariant message passing neural network PaiNN. The second best results are achieved with a message passing neural network using a tailored set of atomic descriptors proposed here.
Collapse
Affiliation(s)
- Thorren Kirschbaum
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Börries von Seggern
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
- Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Arnimallee 22, 14195 Berlin, Germany
| | - Joachim Dzubiella
- Institute of Physics, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 3, 79104 Freiburg im Breisgau, Germany
| | - Annika Bande
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
3
|
Bhat N, Barnard AS, Birbilis N. Unsupervised machine learning discovers classes in aluminium alloys. ROYAL SOCIETY OPEN SCIENCE 2023; 10:220360. [PMID: 36756073 PMCID: PMC9890099 DOI: 10.1098/rsos.220360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/26/2022] [Indexed: 06/18/2023]
Abstract
Aluminium (Al) alloys are critical to many applications. Although Al alloys have been commercially widespread for over a century, their development has predominantly taken a trial-and-error approach. Furthermore, many discrete studies regarding Al alloys, often application specific, have precluded a broader consolidation of Al alloy classification. Iterative label spreading (ILS), an unsupervised machine learning approach, was used to identify the different classes of Al alloys, drawing from a specifically curated dataset of 1154 Al alloys (including alloy composition and processing conditions). Using ILS, eight classes of Al alloys were identified based on a comprehensive feature set under two descriptors. Further, a decision tree classifier was used to validate the separation of classes.
Collapse
Affiliation(s)
- Ninad Bhat
- College of Engineering and Computer Science, The Australian National University, Acton, ACT 2601, Australia
| | - Amanda S. Barnard
- College of Engineering and Computer Science, The Australian National University, Acton, ACT 2601, Australia
| | - Nick Birbilis
- College of Engineering and Computer Science, The Australian National University, Acton, ACT 2601, Australia
| |
Collapse
|
4
|
Roncaglia C, Ferrando R. Machine Learning Assisted Clustering of Nanoparticle Structures. J Chem Inf Model 2023; 63:459-473. [PMID: 36597194 PMCID: PMC9875306 DOI: 10.1021/acs.jcim.2c01203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
We propose a scheme for the automatic separation (i.e., clustering) of data sets composed of several nanoparticle (NP) structures by means of Machine Learning techniques. These data sets originate from atomistic simulations, such as global optimizations searches and molecular dynamics simulations, which can produce large outputs that are often difficult to inspect by hand. By combining a description of NPs based on their local atomic environment with unsupervised learning algorithms, such as K-Means and Gaussian mixture model, we are able to distinguish between different structural motifs (e.g., icosahedra, decahedra, polyicosahedra, fcc fragments, twins, and so on). We show that this method is able to improve over the results obtained previously thanks to the successful implementation of a more detailed description of NPs, especially for systems showing a large variety of structures, including disordered ones.
Collapse
Affiliation(s)
- Cesare Roncaglia
- Physics
Department, University of Genoa, Via Dodecaneso 33, 16146Genoa, Italy
| | - Riccardo Ferrando
- Physics
Department, University of Genoa and CNR-IMEM, Via Dodecaneso 33, 16146Genoa, Italy,E-mail:
| |
Collapse
|
5
|
Chen X, Lv H. Intelligent control of nanoparticle synthesis on microfluidic chips with machine learning. NPG ASIA MATERIALS 2022; 14:69. [DOI: 10.1038/s41427-022-00416-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 01/12/2025]
Abstract
AbstractNanoparticles play irreplaceable roles in optoelectronic sensing, medical therapy, material science, and chemistry due to their unique properties. There are many synthetic pathways used for the preparation of nanoparticles, and different synthetic pathways can produce nanoparticles with different properties. Therefore, it is crucial to control the properties of nanoparticles precisely to impart the desired functions. In general, the properties of nanoparticles are influenced by their sizes and morphologies. Current technology for the preparation of nanoparticles on microfluidic chips requires repeated experimental debugging and significant resources to synthesize nanoparticles with precisely the desired properties. Machine learning-assisted synthesis of nanoparticles is a sensible choice for addressing this challenge. In this paper, we review many recent studies on syntheses of nanoparticles assisted by machine learning. Moreover, we describe the working steps of machine learning, the main algorithms, and the main ways to obtain datasets. Finally, we discuss the current problems of this research and provide an outlook.
Collapse
|
6
|
Lv H, Chen X. Intelligent control of nanoparticle synthesis through machine learning. NANOSCALE 2022; 14:6688-6708. [PMID: 35450983 DOI: 10.1039/d2nr00124a] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The synthesis of nanoparticles is affected by many reaction conditions, and their properties are usually determined by factors such as their size, shape and surface chemistry. In order for the synthesized nanoparticles to have functions suitable for different fields (for example, optics, electronics, sensor applications and so on), precise control of their properties is essential. However, with the current technology of preparing nanoparticles on a microreactor, it is time-consuming and laborious to achieve precise synthesis. In order to improve the efficiency of synthesizing nanoparticles with the expected functionality, the application of machine learning-assisted synthesis is an intelligent choice. In this article, we mainly introduce the typical methods of preparing nanoparticles on microreactors, and explain the principles and procedures of machine learning, as well as the main ways of obtaining data sets. We have studied three types of representative nanoparticle preparation methods assisted by machine learning. Finally, the current problems in machine learning-assisted nanoparticle synthesis and future development prospects are discussed.
Collapse
Affiliation(s)
- Honglin Lv
- College of Transportation, Ludong University, Yantai, Shandong 264025, China.
| | - Xueye Chen
- College of Transportation, Ludong University, Yantai, Shandong 264025, China.
| |
Collapse
|
7
|
Li S, Barnard AS. Inverse Design of Nanoparticles Using Multi‐Target Machine Learning. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202100414] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Sichao Li
- School of Computing Australian National University Acton Australian Capital Territory 2601 Australia
| | - Amanda S. Barnard
- School of Computing Australian National University Acton Australian Capital Territory 2601 Australia
| |
Collapse
|
8
|
Zhang H, Barnard AS. Impact of atomistic or crystallographic descriptors for classification of gold nanoparticles. NANOSCALE 2021; 13:11887-11898. [PMID: 34190263 DOI: 10.1039/d1nr02258j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning models are known to be sensitive to the features used to train them, but there is currently no way to predict the impact of using different features prior to feature extraction. This is particularly important to fields such as nanotechnology that are highly multi-disciplinary, and samples can be characterised many different ways depending on the preferences of individual researchers. Does it matter if nanomaterials are described using the interatomic coordinations or more complex order parameters? In this study we compare results of supervised and unsupervised learning on a single set of gold nanoparticles that has been characterised by two different descriptors, each with a unique feature space. We find that there are some consistencies, and model selection is descriptor-agnostic, but the level of detail and the type of information that can be extracted from the results is sensitive to the way the particles are described. Unsupervised clustering revealed that an atomistic descriptor provides a finer-grained interpretation and clusters that are sub-clusters of a more sophisticated crystallographic descriptor, which is consistent with both how the features were calculated, and how they are interpreted in the domain. A supervised classifier revealed that the types of features responsible for the separation are related to the bulk structure, regardless of the descriptor, but capture different types of information. For both the atomistic and crystallographic descriptor the gradient boosting decision tree classifier gave superior results of F1-scores of 0.96 and 0.98, respectively, with excellent precision and recall, even though the clustering presented a challenging multi-classification problem.
Collapse
Affiliation(s)
- Haonan Zhang
- School of Computing, Australian National University, Acton 2601, Australia.
| | | |
Collapse
|
9
|
Parker AJ, Barnard AS. Unsupervised structure classes vs. supervised property classes of silicon quantum dots using neural networks. NANOSCALE HORIZONS 2021; 6:277-282. [PMID: 33527922 DOI: 10.1039/d0nh00637h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Machine learning classification is a useful technique to predict structure/property relationships in samples of nanomaterials where distributions of sizes and mixtures of shapes are persistent. The separation of classes, however, can either be supervised based on domain knowledge (human intelligence), or based entirely on unsupervised machine learning (artificial intelligence). This raises the questions as to which approach is more reliable, and how they compare? In this study we combine an ensemble data set of electronic structure simulations of the size, shape and peak wavelength for the optical emission of hydrogen passivated silicon quantum dots with artificial neural networks to explore the utility of different types of classes. By comparing the domain-driven and data-driven approaches we find there is a disconnect between what we see (optical emission) and assume (that a particular color band represents a special class), and what the data supports. Contrary to expectation, controlling a limited set of structural characteristics is not specific enough to classify a quantum dot based on color, even though it is experimentally intuitive.
Collapse
Affiliation(s)
- Amanda J Parker
- CSIRO Data61, Door 34 Goods Shed Village St, Docklands, Victoria, Australia
| | | |
Collapse
|