1
|
Cardoso Rial R. AI in analytical chemistry: Advancements, challenges, and future directions. Talanta 2024; 274:125949. [PMID: 38569367 DOI: 10.1016/j.talanta.2024.125949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/09/2024] [Accepted: 03/17/2024] [Indexed: 04/05/2024]
Abstract
This article explores the influence and applications of Artificial Intelligence (AI) in analytical chemistry, highlighting its potential to revolutionize the analysis of complex data sets and the development of innovative analytical methods. Additionally, it discusses the role of AI in interpreting large-scale data and optimizing experimental processes. AI has been fundamental in managing heterogeneous data and in advanced analysis of complex spectra in areas such as spectroscopy and chromatography. The article also examines the historical development of AI in chemistry, its current challenges, including the interpretation of AI models and the integration of large volumes of data. Finally, it forecasts future trends and the potential impact of AI on analytical chemistry, emphasizing the need for ethical and secure approaches in the use of AI.
Collapse
Affiliation(s)
- Rafael Cardoso Rial
- Federal Institute of Mato Grosso do Sul, 79750-000, Nova Andradina, MS, Brazil.
| |
Collapse
|
2
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
3
|
Orooji A, Shanbehzadeh M, Mirbagheri E, Kazemi-Arpanahi H. Comparing artificial neural network training algorithms to predict length of stay in hospitalized patients with COVID-19. BMC Infect Dis 2022; 22:923. [PMID: 36494613 PMCID: PMC9733380 DOI: 10.1186/s12879-022-07921-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/06/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The exponential spread of coronavirus disease 2019 (COVID-19) causes unexpected economic burdens to worldwide health systems with severe shortages in hospital resources (beds, staff, equipment). Managing patients' length of stay (LOS) to optimize clinical care and utilization of hospital resources is very challenging. Projecting the future demand requires reliable prediction of patients' LOS, which can be beneficial for taking appropriate actions. Therefore, the purpose of this research is to develop and validate models using a multilayer perceptron-artificial neural network (MLP-ANN) algorithm based on the best training algorithm for predicting COVID-19 patients' hospital LOS. METHODS Using a single-center registry, the records of 1225 laboratory-confirmed COVID-19 hospitalized cases from February 9, 2020 to December 20, 2020 were analyzed. In this study, first, the correlation coefficient technique was developed to determine the most significant variables as the input of the ANN models. Only variables with a correlation coefficient at a P-value < 0.2 were used in model construction. Then, the prediction models were developed based on 12 training algorithms according to full and selected feature datasets (90% of the training, with 10% used for model validation). Afterward, the root mean square error (RMSE) was used to assess the models' performance in order to select the best ANN training algorithm. Finally, a total of 343 patients were used for the external validation of the models. RESULTS After implementing feature selection, a total of 20 variables were determined as the contributing factors to COVID-19 patients' LOS in order to build the models. The conducted experiments indicated that the best performance belongs to a neural network with 20 and 10 neurons in the hidden layer of the Bayesian regularization (BR) training algorithm for whole and selected features with an RMSE of 1.6213 and 2.2332, respectively. CONCLUSIONS MLP-ANN-based models can reliably predict LOS in hospitalized patients with COVID-19 using readily available data at the time of admission. In this regard, the models developed in our study can help health systems to optimally allocate limited hospital resources and make informed evidence-based decisions.
Collapse
Affiliation(s)
- Azam Orooji
- grid.464653.60000 0004 0459 3173Department of Medical Informatics, Department of Advanced Technologies, School of Medicine, North Khorasan University of Medical Science (NKUMS), North Khorasan, Iran
| | - Mostafa Shanbehzadeh
- grid.449129.30000 0004 0611 9408Department of Health Information Management, Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Esmat Mirbagheri
- grid.411746.10000 0004 4911 7066Department of Health Information Management, Iran University of Medical Sciences, Tehran, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Management, Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran , Department of Health Information Management, Student Research Committee, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|
4
|
Boiko DA, Kozlov KS, Burykina JV, Ilyushenkova VV, Ananikov VP. Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning. J Am Chem Soc 2022; 144:14590-14606. [PMID: 35939718 DOI: 10.1021/jacs.2c03631] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mass spectrometry (MS) is a convenient, highly sensitive, and reliable method for the analysis of complex mixtures, which is vital for materials science, life sciences fields such as metabolomics and proteomics, and mechanistic research in chemistry. Although it is one of the most powerful methods for individual compound detection, complete signal assignment in complex mixtures is still a great challenge. The unconstrained formula-generating algorithm, covering the entire spectra and revealing components, is a "dream tool" for researchers. We present the framework for efficient MS data interpretation, describing a novel approach for detailed analysis based on deisotoping performed by gradient-boosted decision trees and a neural network that generates molecular formulas from the fine isotopic structure, approaching the long-standing inverse spectral problem. The methods were successfully tested on three examples: fragment ion analysis in protein sequencing for proteomics, analysis of the natural samples for life sciences, and study of the cross-coupling catalytic system for chemistry.
Collapse
Affiliation(s)
- Daniil A Boiko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Konstantin S Kozlov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Julia V Burykina
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentina V Ilyushenkova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| |
Collapse
|
5
|
Solihat NN, Son S, Williams EK, Ricker MC, Plante AF, Kim S. Assessment of artificial neural network to identify compositional differences in ultrahigh-resolution mass spectra acquired from coal mine affected soils. Talanta 2022; 248:123623. [PMID: 35660996 DOI: 10.1016/j.talanta.2022.123623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/26/2022]
Abstract
This study assessed the applicability of artificial neural networks (ANNs) as a tool to identify compounds contributing to compositional differences in coal-contaminated soils. An artificial neural network model was constructed from laser desorption ionization ultrahigh-resolution mass spectra obtained from coal contaminated soils. A good correlation (R2 = 1.00 for model and R2 = 0.99 for test) was observed between the measured and predicted values, thus validating the constructed model. To identify chemicals contributing to the coal contents of the soils, the weight values of the constructed model were evaluated. Condensed hydrocarbon and low oxygen containing compounds were found to have larger weight values and hence they were the main contributors to the coal contents of soils. In contrast, compounds identified as lignin did not contribute to the coal contents of soils. These findings were consistent with the conventional knowledge on coal and results from the conventional partial least square method. Therefore, we concluded that the weight interpretation following ANN analysis presented herein can be used to identify compounds that contribute to the compositional differences of natural organic matter (NOM) samples.
Collapse
Affiliation(s)
- Nissa Nurfajrin Solihat
- Research Center for Biomaterials, National Research and Innovation Agency (BRIN), Cibinong, 16911, Indonesia
| | - Seungwoo Son
- Department of Chemistry, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, Republic of Korea
| | | | | | | | - Sunghwan Kim
- Department of Chemistry, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, Republic of Korea; Mass Spectrometry Convergence Research Center and Green-Nano Materials Research Center, Daegu, 41566, Republic of Korea.
| |
Collapse
|
6
|
Shanbehzadeh M, Nopour R, Kazemi-Arpanahi H. Design of an artificial neural network to predict mortality among COVID-19 patients. INFORMATICS IN MEDICINE UNLOCKED 2022; 31:100983. [PMID: 35664686 PMCID: PMC9148440 DOI: 10.1016/j.imu.2022.100983] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 05/26/2022] [Accepted: 05/26/2022] [Indexed: 12/23/2022] Open
Abstract
Introduction The fast pandemic of coronavirus disease 2019 (COVID-19) has challenged clinicians with many uncertainties and ambiguities regarding disease outcomes and complications. To deal with these uncertainties, our study aimed to develop and evaluate several artificial neural networks (ANNs) to predict the mortality risk in hospitalized COVID-19 patients. Material and methods The data of 1710 hospitalized COVID-19 patients were used in this retrospective and developmental study. First, a Chi-square test (P < 0.05), Eta coefficient (η > 0.4), and binary logistics regression (BLR) analysis were performed to determine the factors affecting COVID-19 mortality. Then, using the selected variables, two types of feed-forward (FF) models, including the back-propagation (BP) and distributed time delay (DTD) were trained. The models' performance was assessed using mean squared error (MSE), error histogram (EH), and area under the ROC curve (AUC-ROC) metrics. Results After applying the univariate and multivariate analysis, 13 variables were selected as important features in predicting COVID-19 mortality at P < 0.05. A comparison of the two ANN architectures using the MSE showed that the BP-ANN (validation error: 0.067, most of the classified samples having 0.049 and 0.05 error rates, and AUC-ROC: 0.888) was the best model. Conclusions Our findings show the acceptable performance of ANN for predicting the risk of mortality in hospitalized COVID-19 patients. Application of the developed ANN-based CDSS in a real clinical environment will improve patient safety and reduce disease severity and mortality.
Collapse
|
7
|
Klingberg J, Keen B, Cawley A, Pasin D, Fu S. Developments in high-resolution mass spectrometric analyses of new psychoactive substances. Arch Toxicol 2022; 96:949-967. [PMID: 35141767 PMCID: PMC8921034 DOI: 10.1007/s00204-022-03224-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/12/2022] [Indexed: 11/17/2022]
Abstract
The proliferation of new psychoactive substances (NPS) has necessitated the development and improvement of current practices for the detection and identification of known NPS and newly emerging derivatives. High-resolution mass spectrometry (HRMS) is quickly becoming the industry standard for these analyses due to its ability to be operated in data-independent acquisition (DIA) modes, allowing for the collection of large amounts of data and enabling retrospective data interrogation as new information becomes available. The increasing popularity of HRMS has also prompted the exploration of new ways to screen for NPS, including broad-spectrum wastewater analysis to identify usage trends in the community and metabolomic-based approaches to examine the effects of drugs of abuse on endogenous compounds. In this paper, the novel applications of HRMS techniques to the analysis of NPS is reviewed. In particular, the development of innovative data analysis and interpretation approaches is discussed, including the application of machine learning and molecular networking to toxicological analyses.
Collapse
Affiliation(s)
- Joshua Klingberg
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia.
| | - Bethany Keen
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| | - Adam Cawley
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia
| | - Daniel Pasin
- Section of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Shanlin Fu
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| |
Collapse
|
8
|
Interest of high-resolution mass spectrometry in analytical toxicology: Focus on pharmaceuticals. TOXICOLOGIE ANALYTIQUE ET CLINIQUE 2022. [DOI: 10.1016/j.toxac.2021.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
9
|
Streun GL, Steuer AE, Ebert LC, Dobay A, Kraemer T. Interpretable machine learning model to detect chemically adulterated urine samples analyzed by high resolution mass spectrometry. Clin Chem Lab Med 2021; 59:1392-1399. [PMID: 33742969 DOI: 10.1515/cclm-2021-0010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 03/05/2021] [Indexed: 11/15/2022]
Abstract
OBJECTIVES Urine sample manipulation including substitution, dilution, and chemical adulteration is a continuing challenge for workplace drug testing, abstinence control, and doping control laboratories. The simultaneous detection of sample manipulation and prohibited drugs within one single analytical measurement would be highly advantageous. Machine learning algorithms are able to learn from existing datasets and predict outcomes of new data, which are unknown to the model. METHODS Authentic human urine samples were treated with pyridinium chlorochromate, potassium nitrite, hydrogen peroxide, iodine, sodium hypochlorite, and water as control. In total, 702 samples, measured with liquid chromatography coupled to quadrupole time-of-flight mass spectrometry, were used. After retention time alignment within Progenesis QI, an artificial neural network was trained with 500 samples, each featuring 33,448 values. The feature importance was analyzed with the local interpretable model-agnostic explanations approach. RESULTS Following 10-fold cross-validation, the mean sensitivity, specificity, positive predictive value, and negative predictive value was 88.9, 92.0, 91.9, and 89.2%, respectively. A diverse test set (n=202) containing treated and untreated urine samples could be correctly classified with an accuracy of 95.4%. In addition, 14 important features and four potential biomarkers were extracted. CONCLUSIONS With interpretable retention time aligned liquid chromatography high-resolution mass spectrometry data, a reliable machine learning model could be established that rapidly uncovers chemical urine manipulation. The incorporation of our model into routine clinical or forensic analysis allows simultaneous LC-MS analysis and sample integrity testing in one run, thus revolutionizing this field of drug testing.
Collapse
Affiliation(s)
- Gabriel L Streun
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Andrea E Steuer
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Lars C Ebert
- Department of Forensic Imaging/Virtopsy, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Akos Dobay
- Department of Forensic Imaging/Virtopsy, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland.,Department of Forensic Genetics, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Thomas Kraemer
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
Ge Z, Zhang K, Chen DDY, Yan B. Data-driven development of liquid chromatography-mass spectrometry methods for combined sample matrices. Talanta 2021; 224:121880. [PMID: 33379089 DOI: 10.1016/j.talanta.2020.121880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 11/05/2020] [Indexed: 11/25/2022]
Abstract
Herbal medicine formulas (HMFs), the combinations of two or more herbal medicine (HM) ingredients required in a single prescription, are a typical kind of combined sample matrices. LC-MS is a powerful platform for the analyses of such complex samples. The optimization of separation conditions may require a lot of experiments, because multiple analytes need to be separated from a plethora of possible interfering compounds in the sample mixture containing different herbal medicines. To greatly reduce the complexity needed for the optimization of separation conditions, this work proposes a data-driven approach for the systematic development of LC-MS methods for HMFs, using six HMFs created from four HMs (Atractylodis Macrocephalae Rhizoma, Paeoniae Radix Alba, Corydalis Rhizoma and Ophiopogonis Radix) as case-studies. In this approach, the chromatographic peak parameters (like retention times) of the analytes and interfering compounds under different separation conditions were extracted from the LC-MS database of the HMs. Then data-driven models between the chromatographic peak parameters and the separation parameters were built with machine learning methods (r > 0.996 for all the compounds) and used to predict the chromatographic peaks of the analytes and interfering compounds in HMF analyses. Based on the predictions, all of the separation parameters were optimized without any previous experiments on the HMFs. In the validation experiments for the six HMFs, all of the analytes were well separated. The data-driven approach demonstrated enables systematic and rapid development of LC-MS methods for HMFs, and the separation conditions can be efficiently adjusted for different analytes.
Collapse
Affiliation(s)
- Zhiwei Ge
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou, 310053, China; Analysis Center of Agrobiology and Environmental Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Kuanyong Zhang
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou, 310053, China
| | - David Da Yong Chen
- Department of Chemistry, University of British Columbia, Vancouver, V6T 1Z1, Canada.
| | - Binjun Yan
- College of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou, 310053, China; Department of Chemistry, University of British Columbia, Vancouver, V6T 1Z1, Canada.
| |
Collapse
|
11
|
Prediction of Slag Characteristics Based on Artificial Neural Network for Molten Gasification of Hazardous Wastes. ENERGIES 2020. [DOI: 10.3390/en13195115] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Molten gasification is considered as a promising technology for the processing and safe disposal of hazardous wastes. During this process, the organic components are completely converted while the hazardous materials are safely embedded in slag via the fusion-solidification-vitrification transformation. Ideally, the slag should be glassy with low viscosity to ensure the effective immobilization and steady discharge of hazardous materials. However, it is very difficult to predict the characteristics of slag using existing empirical equations or conventional mathematical methods, due to the complex non-linear relationship among the phase transformation, vitrification transition and chemical composition of slag. Equipped with a strong nonlinear mapping ability, an artificial neural network may be able to predict the properties of slags if a large amount of data is available for training. In this work, over 10,000 experimental data points were used to train and develop a slag classification model (glassy vs. non-glassy) based on a neural network. The optimal structure of the neural network was figured out and validated. The results suggest that the classification accuracy for the independent test samples reached 93.3%. Using 1 and 0 as model inputs to represent mildly reducing and inert atmospheres, a double hidden layer structure in the neural network enabled the accurate classification of slags under various atmospheres. Furthermore, the neural network for the prediction of glassy slag viscosity was optimized; it featured a double hidden layer structure. Under a mildly reducing atmosphere, the absolute error from the independent test data was generally within 4 Pa·s. By adding a gas atmosphere into the input of the neural network using a simple normalization method, a multi-atmosphere slag viscosity prediction model was developed. Said model is much more accurate than its counterpart that does not consider the effect of the atmosphere. In summary, the artificial neural network proved to be an effective approach to predicting the slag properties under different atmospheres. The data-driven models developed in this work are expected to facilitate the commercial deployment of molten gasification technology.
Collapse
|