1
|
Shamraeva MA, Visvikis T, Zoidis S, Anthony IGM, Van Nuffel S. The Application of a Random Forest Classifier to ToF-SIMS Imaging Data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024. [PMID: 39455427 DOI: 10.1021/jasms.4c00324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2024]
Abstract
Time-of-flight secondary ion mass spectrometry (ToF-SIMS) imaging is a potent analytical tool that provides spatially resolved chemical information on surfaces at the microscale. However, the hyperspectral nature of ToF-SIMS datasets can be challenging to analyze and interpret. Both supervised and unsupervised machine learning (ML) approaches are increasingly useful to help analyze ToF-SIMS data. Random Forest (RF) has emerged as a robust and powerful algorithm for processing mass spectrometry data. This machine learning approach offers several advantages, including accommodating nonlinear relationships, robustness to outliers in the data, managing the high-dimensional feature space, and mitigating the risk of overfitting. The application of RF to ToF-SIMS imaging facilitates the classification of complex chemical compositions and the identification of features contributing to these classifications. This tutorial aims to assist nonexperts in either machine learning or ToF-SIMS to apply Random Forest to complex ToF-SIMS datasets.
Collapse
Affiliation(s)
- Mariya A Shamraeva
- Maastricht MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Theodoros Visvikis
- Faculty of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| | - Stefanos Zoidis
- Faculty of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| | - Ian G M Anthony
- Maastricht MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
| | - Sebastiaan Van Nuffel
- Maastricht MultiModal Molecular Imaging Institute (M4i), Maastricht University, Universiteitssingel 50, 6229 ER Maastricht, The Netherlands
- Faculty of Science and Engineering, Maastricht University, Paul-Henri Spaaklaan 1, Maastricht 6229EN, The Netherlands
| |
Collapse
|
2
|
Fransaert N, Robert A, Cleuren B, Manca JV, Valkenborg D. Identifying Process Differences with ToF-SIMS: An MVA Decomposition Strategy. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024. [PMID: 39366671 DOI: 10.1021/jasms.4c00327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/06/2024]
Abstract
In time-of-flight secondary ion mass spectrometry (ToF-SIMS), multivariate analysis (MVA) methods such as principal component analysis (PCA) are routinely employed to differentiate spectra. However, additional insights can often be gained by comparing processes, where each process is characterized by its own start and end spectra, such as when identical samples undergo slightly different treatments or when slightly different samples receive the same treatment. This study proposes a strategy to compare such processes by decomposing the loading vectors associated with them, which highlights differences in the relative behavior of the peaks. This strategy identifies key information beyond what is captured by the loading vectors or the end spectra alone. While PCA is widely used, partial least-squares discriminant analysis (PLS-DA) serves as a supervised alternative and is the preferred method for deriving process-related loading vectors when classes are narrowly separated. The effectiveness of the decomposition strategy is demonstrated using artificial spectra and applied to a ToF-SIMS materials science case study on the photodegradation of N719 dye, a common dye in photovoltaics, on a mesoporous TiO2 anode. The study revealed that the photodegradation process varies over time, and the resulting fragments have been identified accordingly. The proposed methodology, applicable to both labeled (supervised) and unlabeled (unsupervised) spectral data, can be seamlessly integrated into most modern mass spectrometry data analysis workflows to automatically generate a list of peaks whose relative behavior varies between two processes, and is particularly effective in identifying subtle differences between highly similar physicochemical processes.
Collapse
Affiliation(s)
| | | | - Bart Cleuren
- UHasselt, Theory Lab, Agoralaan, 3590 Diepenbeek, Belgium
| | - Jean V Manca
- UHasselt, X-LAB, Agoralaan, 3590 Diepenbeek, Belgium
| | - Dirk Valkenborg
- UHasselt, Data Science Institute, Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Center for Statistics, Agoralaan, 3590 Diepenbeek, Belgium
| |
Collapse
|
3
|
Bamford SE, Gardner W, Winkler DA, Muir BW, Alahakoon D, Pigram PJ. Self-Organizing Maps for Secondary Ion Mass Spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2516-2528. [PMID: 39307990 DOI: 10.1021/jasms.4c00318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2024]
Abstract
Secondary ion mass spectrometry (SIMS) is a powerful analytical technique for characterizing the molecular and elemental composition of surfaces. Individual mass spectra can provide information about the mean surface composition, while spatial mapping can elucidate the spatial distributions of molecular species in 2D and 3D with no prior labeling of molecular targets. The data sets produced by SIMS techniques are large and inherently complex, often containing subtle relationships between spatial and molecular features. Machine learning algorithms are well suited to exploring this complexity, making them ideal for data analysis, interpretation, and visualization of SIMS data sets. One such algorithm, the self-organizing map (SOM), is particularly well suited to clustering similar samples and reducing the dimensionality of hyperspectral data sets. Here, we present an introduction to the SOM, a concise mathematical description, and recent examples of its use in SIMS and other related mass spectrometry techniques. These examples demonstrate how SOMs may be used to interpret high volumes of individual mass spectra, imaging, or depth profiling data sets. This review will be useful for specialists in SIMS and other mass spectral techniques seeking to explore self-organizing maps for data analysis.
Collapse
Affiliation(s)
- Sarah E Bamford
- Centre for Materials and Surface Science and Department of Mathematical and Physical Sciences, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Wil Gardner
- Centre for Materials and Surface Science and Department of Mathematical and Physical Sciences, La Trobe University, Bundoora, Victoria 3086, Australia
| | - David A Winkler
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria 3086, Australia
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | | | - Damminda Alahakoon
- Research Centre for Data Analytics and Cognition, La Trobe Business School, La Trobe University, Bundoora, Victoria 3086, Australia
| | - Paul J Pigram
- Centre for Materials and Surface Science and Department of Mathematical and Physical Sciences, La Trobe University, Bundoora, Victoria 3086, Australia
| |
Collapse
|
4
|
Daoudi M, Nuns N, Schiffmann P, Frobert A, Hanoune B, Desgroux P, Faccinetto A. A mass defect-based approach for the automatic construction of peak lists for databases of mass spectra with limited resolution: Application to time-of-flight secondary ion mass spectrometry data. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2024; 38:e9777. [PMID: 38797962 DOI: 10.1002/rcm.9777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 05/06/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024]
Abstract
RATIONALE This study has developed a data processing protocol based on mass defect analysis for the automatic construction of unique peak lists addressing the need for the fast and efficient treatment of databases of mass spectra with limited mass resolution. METHODS The data processing protocol, implemented in MATLAB, is tested on a database of 126 mass spectra obtained from time-of-flight secondary ion mass spectrometry analysis of the exhaust of a laboratory diesel miniCAST burner deposited on Ti substrates. RESULTS The data processing protocol converts the mass spectra into a data matrix suitable for chemometrics (peak list) by combining mass defect analysis and multivariate analysis. In particular, the role of the mass defect analysis is expanded to improve mass calibration and automate the construction of the peak list. CONCLUSIONS In this context, mass defect analysis becomes an invaluable technique for the efficient processing of databases of mass spectra with limited mass resolution by allowing the fast and automated construction of a peak list common to all mass spectra, by improving the mass calibration, and finally by reducing the number of molecular formulae consistent with a given accurate mass, thus facilitating the identification of unknown ions.
Collapse
Affiliation(s)
- Mouad Daoudi
- Univ. Lille, CNRS, UMR 8522 - PC2A - Physicochimie des Processus de Combustion et de l'Atmosphère, Lille, France
- IFP Energies Nouvelles, Institut Carnot IFPEN TE, Rueil-Malmaison, France
| | - Nicolas Nuns
- Univ. Lille, CNRS, INRAE, Centrale Lille, Univ. Artois, FR 2638 - IMEC - Institut Michel-Eugène Chevreul, Lille, France
| | - Philipp Schiffmann
- IFP Energies Nouvelles, Institut Carnot IFPEN TE, Rueil-Malmaison, France
| | - Arnaud Frobert
- IFP Energies Nouvelles, Institut Carnot IFPEN TE, Rueil-Malmaison, France
| | - Benjamin Hanoune
- Univ. Lille, CNRS, UMR 8522 - PC2A - Physicochimie des Processus de Combustion et de l'Atmosphère, Lille, France
| | - Pascale Desgroux
- Univ. Lille, CNRS, UMR 8522 - PC2A - Physicochimie des Processus de Combustion et de l'Atmosphère, Lille, France
| | - Alessandro Faccinetto
- Univ. Lille, CNRS, UMR 8522 - PC2A - Physicochimie des Processus de Combustion et de l'Atmosphère, Lille, France
| |
Collapse
|
5
|
Zhao Y, Otto SK, Lombardo T, Henss A, Koeppe A, Selzer M, Janek J, Nestler B. Identification of Lithium Compounds on Surfaces of Lithium Metal Anode with Machine-Learning-Assisted Analysis of ToF-SIMS Spectra. ACS APPLIED MATERIALS & INTERFACES 2023; 15:50469-50478. [PMID: 37852613 PMCID: PMC10623505 DOI: 10.1021/acsami.3c09643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/13/2023] [Indexed: 10/20/2023]
Abstract
Detailed knowledge about contamination and passivation compounds on the surface of lithium metal anodes (LMAs) is essential to enable their use in all-solid-state batteries (ASSBs). Time-of-flight secondary ion mass spectrometry (ToF-SIMS), a highly surface-sensitive technique, can be used to reliably characterize the surface status of LMAs. However, as ToF-SIMS data are usually highly complex, manual data analysis can be difficult and time-consuming. In this study, machine learning techniques, especially logistic regression (LR), are used to identify the characteristic secondary ions of 5 different pure lithium compounds. Furthermore, these models are applied to the mixture and LMA samples to enable identification of their compositions based on the measured ToF-SIMS spectra. This machine-learning-based analysis approach shows good performance in identifying characteristic ions of the analyzed compounds that fit well with their chemical nature. Moreover, satisfying accuracy in identifying the compositions of unseen new samples is achieved. In addition, the scope and limitations of such a strategy in practical applications are discussed. This work presents a robust analytical method that can assist researchers in simplifying the analysis of the studied lithium compound samples, offering the potential for broader applications in other material systems.
Collapse
Affiliation(s)
- Yinghan Zhao
- Institute
for Applied Materials − Microstructure Modelling and Simulation, Karlsruhe Institute of Technology, D-76131 Karlsruhe, Germany
| | - Svenja-K. Otto
- Institute
of Physical Chemistry, Justus-Liebig-Universität
Giessen, D-35392 Giessen, Germany
| | - Teo Lombardo
- Institute
of Physical Chemistry, Justus-Liebig-Universität
Giessen, D-35392 Giessen, Germany
| | - Anja Henss
- Institute
of Physical Chemistry, Justus-Liebig-Universität
Giessen, D-35392 Giessen, Germany
| | - Arnd Koeppe
- Institute
for Applied Materials − Microstructure Modelling and Simulation, Karlsruhe Institute of Technology, D-76131 Karlsruhe, Germany
| | - Michael Selzer
- Institute
for Applied Materials − Microstructure Modelling and Simulation, Karlsruhe Institute of Technology, D-76131 Karlsruhe, Germany
- Institute
for Digital Materials Science, Karlsruhe
University of Applied Sciences, D-76133 Karlsruhe, Germany
| | - Jürgen Janek
- Institute
of Physical Chemistry, Justus-Liebig-Universität
Giessen, D-35392 Giessen, Germany
| | - Britta Nestler
- Institute
for Applied Materials − Microstructure Modelling and Simulation, Karlsruhe Institute of Technology, D-76131 Karlsruhe, Germany
- Institute
for Digital Materials Science, Karlsruhe
University of Applied Sciences, D-76133 Karlsruhe, Germany
| |
Collapse
|
6
|
Lang Y, Zhou L, Imamura Y. Development of Machine-Learning Techniques for Time-of-Flight Secondary Ion Mass Spectrometry Spectral Analysis: Application for the Identification of Silane Coupling Agents in Multicomponent Films. Anal Chem 2022; 94:2546-2553. [DOI: 10.1021/acs.analchem.1c04436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yusheng Lang
- Innovative Technology Laboratories, AGC Incorporated, Yokohama 230-0045, Japan
| | - Lilin Zhou
- Innovative Technology Laboratories, AGC Incorporated, Yokohama 230-0045, Japan
| | - Yutaka Imamura
- Innovative Technology Laboratories, AGC Incorporated, Yokohama 230-0045, Japan
| |
Collapse
|
7
|
Tuck M, Blanc L, Touti R, Patterson NH, Van Nuffel S, Villette S, Taveau JC, Römpp A, Brunelle A, Lecomte S, Desbenoit N. Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review. Anal Chem 2020; 93:445-477. [PMID: 33253546 DOI: 10.1021/acs.analchem.0c04595] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Michael Tuck
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Landry Blanc
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Rita Touti
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Nathan Heath Patterson
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37232-8575, United States
| | - Sebastiaan Van Nuffel
- Materials Research Institute, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Sandrine Villette
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Jean-Christophe Taveau
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Andreas Römpp
- Bioanalytical Sciences and Food Analysis, University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany
| | - Alain Brunelle
- Laboratoire d'Archéologie Moléculaire et Structurale, LAMS UMR 8220, CNRS, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France
| | - Sophie Lecomte
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| | - Nicolas Desbenoit
- Institut de Chimie & Biologie des Membranes & des Nano-objets, CBMN UMR 5248, CNRS, Université de Bordeaux, 1 Allée Geoffroy Saint-Hilaire, 33600 Pessac, France
| |
Collapse
|
8
|
Analyzing 3D hyperspectral TOF-SIMS depth profile data using self-organizing map-relational perspective mapping. Biointerphases 2020; 15:061004. [PMID: 33198474 DOI: 10.1116/6.0000614] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The advantages of applying multivariate analysis to mass spectrometry imaging (MSI) data have been thoroughly demonstrated in recent decades. The identification and visualization of complex relationships between pixels in a hyperspectral data set can provide unique insights into the underlying surface chemistry. It is now recognized that most MSI data contain nonlinear relationships, which has led to increased application of machine learning approaches. Previously, we exemplified the use of the self-organizing map (SOM), a type of artificial neural network, for analyzing time-of-flight secondary ion mass spectrometry (TOF-SIMS) hyperspectral images. Recently, we developed a novel methodology, SOM-relational perspective mapping (RPM), which incorporates the algorithm RPM to improve visualization of the SOM for 2D TOF-SIMS images. Here, we use SOM-RPM to characterize and interpret 3D TOF-SIMS depth profile data, voxel-by-voxel. An organic Irganox™ multilayer standard sample was depth profiled using TOF-SIMS, and SOM-RPM was used to create 3D similarity maps of the depth-profiled sample, in which the mass spectral similarity of individual voxels is modeled with color similarity. We used this similarity map to segment the data into spatial features, demonstrating that the unsupervised method meaningfully differentiated between Irganox-3114 and Irganox-1010 nanometer-thin multilayer films. The method also identified unique clusters at the surface associated with environmental exposure and sample degradation. Key fragment ions characteristic of each cluster were identified, tying clusters to their underlying chemistries. SOM-RPM has the demonstrable ability to reduce vast data sets to simple 3D visualizations that can be used for clustering data and visualizing the complex relationships within.
Collapse
|
9
|
Gardner W, Cutts SM, Phillips DR, Pigram PJ. Understanding mass spectrometry images: complexity to clarity with machine learning. Biopolymers 2020; 112:e23400. [PMID: 32937683 DOI: 10.1002/bip.23400] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/25/2020] [Accepted: 08/26/2020] [Indexed: 11/08/2022]
Abstract
The application of artificial intelligence and machine learning to hyperspectral mass spectrometry imaging (MSI) data has received considerable attention over recent years. Various methodologies have shown great promise in their ability to handle the complexity and size of MSI data sets. Advances in this area have been particularly appealing for MSI of biological samples, which typically produce highly complicated data with often subtle relationships between features. There are many different machine learning approaches that have been applied to MSI data over the past two decades. In this review, we focus on a subset of non-linear machine learning techniques that have mostly only been applied in the past 5 years. Specifically, we review the use of the self-organizing map (SOM), SOM with relational perspective mapping (SOM-RPM), t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). While not their only functionality, we have grouped these techniques based on their ability to produce what we refer to as similarity maps. Similarity maps are color representations of hyperspectral data, in which spectral similarity between pixels-that is, their distance in high-dimensional space-is represented by relative color similarity. In discussing these techniques, we describe, briefly, their associated algorithms and functionalities, and also outline applications in MSI research with a strong focus on biological sample types. The aim of this review is therefore to introduce this relatively recent paradigm for visualizing and exploring hyperspectral MSI, while also providing a comparison between each technique discussed.
Collapse
Affiliation(s)
- Wil Gardner
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia.,La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia.,CSIRO Manufacturing, Clayton, Victoria, Australia
| | - Suzanne M Cutts
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
| | - Don R Phillips
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
| | - Paul J Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
10
|
Gardner W, Maliki R, Cutts SM, Muir BW, Ballabio D, Winkler DA, Pigram PJ. Self-Organizing Map and Relational Perspective Mapping for the Accurate Visualization of High-Dimensional Hyperspectral Data. Anal Chem 2020; 92:10450-10459. [DOI: 10.1021/acs.analchem.0c00986] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Wil Gardner
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria 3086, Australia
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
- CSIRO Manufacturing, Clayton, Victoria 3168, Australia
| | - Ruqaya Maliki
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria 3086, Australia
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Suzanne M. Cutts
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
| | | | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Piazza della Scienza 1, 20126, Milano, Italy
| | - David A. Winkler
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
- CSIRO Data61, Melbourne, Victoria 3008, Australia
| | - Paul J. Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
11
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 338] [Impact Index Per Article: 84.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Gardner W, Hook AL, Alexander MR, Ballabio D, Cutts SM, Muir BW, Pigram PJ. ToF-SIMS and Machine Learning for Single-Pixel Molecular Discrimination of an Acrylate Polymer Microarray. Anal Chem 2020; 92:6587-6597. [PMID: 32233419 PMCID: PMC7611022 DOI: 10.1021/acs.analchem.0c00349] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Combinatorial approaches to materials discovery offer promising potential for the rapid development of novel polymer systems. Polymer microarrays enable the high-throughput comparison of material physical and chemical properties-such as surface chemistry and properties like cell attachment or protein adsorption-in order to identify correlations that can progress materials development. A challenge for this approach is to accurately discriminate between highly similar polymer chemistries or identify heterogeneities within individual polymer spots. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) offers unique potential in this regard, capable of describing the chemistry associated with the outermost layer of a sample with high spatial resolution and chemical sensitivity. However, this comes at the cost of generating large scale, complex hyperspectral imaging data sets. We have demonstrated previously that machine learning is a powerful tool for interpreting ToF-SIMS images, describing a method for color-tagging the output of a self-organizing map (SOM). This reduces the entire hyperspectral data set to a single reconstructed color similarity map, in which the spectral similarity between pixels is represented by color similarity in the map. Here, we apply the same methodology to a ToF-SIMS image of a printed polymer microarray for the first time. We report complete, single-pixel molecular discrimination of the 70 unique homopolymer spots on the array while also identifying intraspot heterogeneities thought to be related to intermixing of the polymer and the pHEMA coating. In this way, we show that the SOM can identify layers of similarity and clusters in the data, both with respect to polymer backbone structures and their individual side groups. Finally, we relate the output of the SOM analysis with fluorescence data from polymer-protein adsorption studies, highlighting how polymer performance can be visualized within the context of the global topology of the data set.
Collapse
Affiliation(s)
- Wil Gardner
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
- CSIRO Manufacturing, Clayton, Victoria, Australia
| | - Andrew L. Hook
- Advanced Materials and Healthcare Technologies, School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK
| | - Morgan R. Alexander
- Advanced Materials and Healthcare Technologies, School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza della Scienza 1, 20126, Milano, Italy
| | - Suzanne M. Cutts
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
| | | | - Paul J. Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
13
|
Barnard AS, Motevalli B, Parker AJ, Fischer JM, Feigl CA, Opletal G. Nanoinformatics, and the big challenges for the science of small things. NANOSCALE 2019; 11:19190-19201. [PMID: 31397835 DOI: 10.1039/c9nr05912a] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The combination of computational chemistry and computational materials science with machine learning and artificial intelligence provides a powerful way of relating structural features of nanomaterials with functional properties. However, combining these fundamentally different scientific approaches is not as straightforward as it seems. Machine learning methods were developed for large data sets with small numbers of consistent features. Typically nanomaterials data sets are small, with high dimensionality and high variance in the feature space, and suffer from numerous destructive biases. None of the established data science or machine learning methods in widespread use today were devised with (nano)materials data sets in mind, but there are ways to overcome these challenges and use them reliably. In this review we will discuss domain-specific constraints on data-driven nanomaterials design, and explore the differences between nanomaterials simulation and nanoinformatics that can be leveraged for greater impact.
Collapse
Affiliation(s)
- A S Barnard
- CSIRO Data61, Docklands, Victoria, Australia.
| | - B Motevalli
- CSIRO Data61, Docklands, Victoria, Australia.
| | - A J Parker
- CSIRO Data61, Docklands, Victoria, Australia.
| | - J M Fischer
- CSIRO Data61, Docklands, Victoria, Australia.
| | - C A Feigl
- CSIRO Data61, Docklands, Victoria, Australia.
| | - G Opletal
- CSIRO Data61, Docklands, Victoria, Australia.
| |
Collapse
|
14
|
Madiona RMT, Welch NG, Muir BW, Winkler DA, Pigram PJ. Rapid evaluation of immobilized immunoglobulins using automated mass-segmented ToF-SIMS. Biointerphases 2019; 14:061002. [DOI: 10.1063/1.5121450] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Affiliation(s)
- Robert M. T. Madiona
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
- CSIRO Manufacturing, Clayton, Victoria 3168, Australia
| | - Nicholas G. Welch
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
- CSIRO Manufacturing, Clayton, Victoria 3168, Australia
| | | | - David A. Winkler
- CSIRO Manufacturing, Clayton, Victoria 3168, Australia
- La Trobe Institute for Molecular Sciences, School of Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia; Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia; and School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Paul J. Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
15
|
Gardner W, Cutts SM, Muir BW, Jones RT, Pigram PJ. Visualizing ToF-SIMS Hyperspectral Imaging Data Using Color-Tagged Toroidal Self-Organizing Maps. Anal Chem 2019; 91:13855-13865. [DOI: 10.1021/acs.analchem.9b03322] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Wil Gardner
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
- CSIRO Manufacturing, Clayton, Victoria, Australia
| | - Suzanne M. Cutts
- La Trobe Institute for Molecular Sciences, La Trobe University, Melbourne, Victoria, Australia
| | | | - Robert T. Jones
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
| | - Paul J. Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
16
|
Huang B, Lai H, Deng J, Xu H, Fan G. Study on the Interaction between Galena and Sphalerite During Grinding Based on the Migration of Surface Components. ACS OMEGA 2019; 4:12489-12497. [PMID: 31460368 PMCID: PMC6681990 DOI: 10.1021/acsomega.9b01173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 07/10/2019] [Indexed: 06/10/2023]
Abstract
In Pb-Zn ore flotation, unintentional activation of sphalerite often leads to difficult separation of Pb and Zn minerals, during which grinding plays a key role in unintentional activation. Therefore, the aim of this study was to evaluate the surface component changes of two different mineral particles and to propose the interaction between galena and sphalerite during mixed grinding using time-of-flight secondary ion mass spectrometry (ToF-SIMS). The results show that after mixed grinding of the galena and sphalerite, the Pb content on the sphalerite surface increased with the decrease of Zn and Fe contents on the sphalerite surface. The lead ions from galena were obviously absorbed onto the sphalerite surface, while the zinc and iron ions from sphalerite were not obviously migrated to the galena surface. Principal component analysis (PCA) of a dataset composed of 206 positive ion peaks of galena and sphalerite indicates that the surface components of galena and sphalerite migrated from either side to different degrees. This study successfully identified an important factor for unintentional activation of lead and zinc minerals during flotation: homogenization of surface components of different minerals during grinding.
Collapse
Affiliation(s)
- Bo Huang
- School
of Chemical & Environmental Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China
| | - Hao Lai
- State
Key Laboratory of Complex Nonferrous Metal Resources Clean Utilization,
Faculty of Land Resource Engineering, Kunming
University of Science and Technology, Kunming 650093, China
| | - Jiushuai Deng
- School
of Chemical & Environmental Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China
- Department
of Mechanical and Industrial Engineering, University of Toronto, Toronto M5S 3G8, Canada
| | - Hongxiang Xu
- School
of Chemical & Environmental Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China
| | - Guixia Fan
- School
of Chemical Engineering and Energy, Zhengzhou
University, Zhengzhou 450001, China
| |
Collapse
|
17
|
Madiona RMT, Bamford SE, Winkler DA, Muir BW, Pigram PJ. Distinguishing Chemically Similar Polyamide Materials with ToF-SIMS Using Self-Organizing Maps and a Universal Data Matrix. Anal Chem 2018; 90:12475-12484. [DOI: 10.1021/acs.analchem.8b01951] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Robert M. T. Madiona
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
- CSIRO Manufacturing, Clayton, VIC 3168, Australia
| | - Sarah E. Bamford
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
| | - David A. Winkler
- La Trobe Institute for Molecular Sciences, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
- CSIRO Manufacturing, Clayton, VIC 3168, Australia
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, U.K
| | | | - Paul J. Pigram
- Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
| |
Collapse
|