1
|
Yang Q, Zhang H, Wang Y, Tan L, Xie T, Wang Y, Long J, Guo Z, Zhang Z, Lu H. MWFormer: Estimation of Molecular Weights from Electron Ionization Mass Spectra for Improved Library Searching. Anal Chem 2025; 97:212-219. [PMID: 39700345 DOI: 10.1021/acs.analchem.4c03781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Molecular weight (MW) is a crucial property to improve the accuracy of multidimensional compound identification. In this study, we have developed MWFormer, a novel method that predicts MWs solely from spectra of electron ionization mass spectrometry (EI-MS) based on a Transformer encoder. MWFormer achieves a mean absolute error (MAE) of 6.38 Da, which is only one-sixth of the MAE by the peak interpretation method (PIM) on the test set. The MWFormer-predicted MW with superior accuracy can be used to eliminate false positive molecules in multidimensional compound identification. The results show that the MW filter improves the recall@3 metric by nearly 4% points compared with solely spectrum matching results. Moreover, MWFormer can be combined with retention indices (RIs) to achieve GC-EI-MS 3D compound identification to improve the recall@3 metric by nearly 7% points, compared with the results of spectrum matching alone. Besides, a user-friendly web service is provided to predict MWs in single mode or batch mode. All code, data, and models are available at https://github.com/zhanghailiangcsu/MWFormer.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Lin Tan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ting Xie
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yufei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Jia Long
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zixuan Guo
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
2
|
Mazza L, Bory A, Luscher A, Kloehn J, Wolfender JL, van Delden C, Köhler T. Multidrug efflux pumps of Pseudomonas aeruginosa show selectivity for their natural substrates. Front Microbiol 2025; 15:1512472. [PMID: 39850140 PMCID: PMC11754269 DOI: 10.3389/fmicb.2024.1512472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 11/29/2024] [Indexed: 01/25/2025] Open
Abstract
Antibiotic-resistant Gram-negative bacteria are an increasing threat to human health. Strategies to restore antibiotic efficacy include targeting multidrug efflux pumps by competitive efflux pump inhibitors. These could be derived from natural substrates of these efflux systems. In this work, we aimed to elucidate the natural substrates of the clinically relevant Mex efflux pumps of Pseudomonas aeruginosa by an untargeted metabolomic approach. We constructed a PA14 mutant, genetically deleted in the major multidrug efflux pumps MexAB-OprM, MexCD-OprJ, MexXY-OprM, and MexEF-OprN and expressed in this mutant each efflux pump individually from an inducible promoter. Comparative analysis of the exo-metabolomes identified 210 features that were more abundant in the supernatant of efflux pump overexpressors compared to the pump-deficient mutant. Most of the identified features were efflux pump specific, while only a few were shared among several Mex pumps. We identified by-products of secondary metabolites as well as signaling molecules. Supernatants of the pump-deficient mutant also showed decreased accumulation of fatty acids, including long chain homoserine lactone quorum sensing molecules. Our data suggests that Mex efflux pumps of P. aeruginosa appear to have dedicated roles in extruding signaling molecules, metabolic by-products, as well as oxidized fatty acids. These findings represent an interesting starting point for the development of competitive efflux pump inhibitors.
Collapse
Affiliation(s)
- Léna Mazza
- Service of Infectious Diseases, Geneva University Hospitals, Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland
| | - Alexandre Bory
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Geneva, Switzerland
| | - Alexandre Luscher
- Service of Infectious Diseases, Geneva University Hospitals, Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland
| | - Joachim Kloehn
- Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland
| | - Jean-Luc Wolfender
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Geneva, Switzerland
| | - Christian van Delden
- Service of Infectious Diseases, Geneva University Hospitals, Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland
| | - Thilo Köhler
- Service of Infectious Diseases, Geneva University Hospitals, Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
3
|
Matyushin DD, Burov IA, Sholokhova AY. Uncertainty Quantification and Flagging of Unreliable Predictions in Predicting Mass Spectrometry-Related Properties of Small Molecules Using Machine Learning. Int J Mol Sci 2024; 25:13077. [PMID: 39684785 DOI: 10.3390/ijms252313077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 11/28/2024] [Accepted: 12/04/2024] [Indexed: 12/18/2024] Open
Abstract
Mass spectral identification (in particular, in metabolomics) can be refined by comparing the observed and predicted properties of molecules, such as chromatographic retention. Significant advancements have been made in predicting these values using machine learning and deep learning. Usually, model predictions do not contain any indication of the possible error (uncertainty) or only one criterion is used for this purpose. The spread of predictions of several models included in the ensemble, and the molecular similarity of the considered molecule and the most "similar" molecule from the training set, are values that allow us to estimate the uncertainty. The Euclidean distance between vectors, calculated based on real-valued molecular descriptors, can be used for the assessment of molecular similarity. Another factor indicating uncertainty is the molecule's belonging to one of the clusters (data set clustering). Together, all three factors can be used as features for the uncertainty assessment model. Classification models that predict whether a prediction belongs to the worst 15% were obtained. The area under the receiver operating curve value is in the range of 0.73-0.82 for the considered tasks: the prediction of retention indices in gas chromatography, retention times in liquid chromatography, and collision cross-sections in ion mobility spectroscopy.
Collapse
Affiliation(s)
- Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Ivan A Burov
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| | - Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071 Moscow, Russia
| |
Collapse
|
4
|
Sholokhova AY, Matyushin DD. Ready-to-use Models Built Using a Diverse Set of 266 Aroma Compounds for the Estimation of Gas Chromatographic Retention Indices for the 50%-Cyanopropylphenyl-50%-Dimethylpolysiloxane Stationary Phase. J Sep Sci 2024; 47:e70016. [PMID: 39494751 DOI: 10.1002/jssc.70016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/18/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024]
Abstract
Retention index prediction based on the molecule structure is not often used in practice due to low accuracy, the need to use paid software to calculate molecular descriptors (MD), and the narrow applicability domain of many models. In recent years, relatively accurate and versatile deep learning (DL)-based models have emerged. These models are now used in practice as an additional criterion in gas chromatography-mass spectrometry identification. The DB-225ms stationary phase (usually described as 50%-cyanopropylphenyl-50%-dimethylpolysiloxane in available sources) is widely used, but ready-to-use retention index estimation models are not available for it. This study presents such models. The models are linear and use simple constitutional MD and retention indices predicted by DL for the DB-WAX and DB-624 stationary phases as MD (we show that it is their use that allows us to achieve satisfactory accuracy). The accuracy obtained for a completely unseen hold-out test set: root mean square error 73.2; mean absolute error 45.7; median absolute error 22.0. The models were trained using a retention data set of 266 volatile compounds. All calculations can be performed using the convenient open-source software CHERESHNYA. The final equations are implemented as a spreadsheet and a code snippet and are available online: https://doi.org/10.6084/m9.figshare.26800789.
Collapse
Affiliation(s)
- Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
5
|
McGlynn DF, Rabe Andriamaharavo N, Kearsley AJ. Improved Discrimination of Mass Spectral Isomers Using the High-Dimensional Consensus Mass Spectral Similarity Algorithm. JOURNAL OF MASS SPECTROMETRY : JMS 2024; 59:e5084. [PMID: 39262149 DOI: 10.1002/jms.5084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 07/19/2024] [Accepted: 08/06/2024] [Indexed: 09/13/2024]
Abstract
This study employs a high-dimensional consensus mass spectral (HDCMS) similarity scoring technique to discriminate isomers collected using an electron ionization mass spectrometer. The HDCMS method was previously introduced and applied to the discrimination of mass spectra of constitutional isomers, methamphetamine and phentermine, collected with direct analysis real-time mass spectrometry (DART-MS). The method formulates the problem of discriminating mass spectra in a mathematical Hilbert space and is hence called "high dimensional." It requires replicate mass spectra to build a Gaussian model and evaluate the inner products between these functions. The resulting measurement variability is used as a signature by which to discriminate spectra. In this work, HDCMS is tested on electron impact ionization (EI) mass spectra of 7 terpene and terpene-related (C10H16 and C10H14) isomers with experimental retention indices that differ by less than 30 and with traditional cosine similarity scores greater than 0.9, on a scale of 0 to 1, when compared with at least one other compound in the test set. Using identical instrument parameters, 15 replicate gas chromatography-electron ionization-mass spectrometry (GC-EI-MS) spectra of each isomer were collected and separated into distinct library and query sets. The HDCMS algorithm discriminated each isomer, indicating the method's potential. Because the method requires replicate measurements, observations from a simple heuristic study of the number of replicates required to discriminate these isomers is presented. The paper concludes with a discussion of compound discrimination using HDCMS and the benefits and drawbacks of applying the method to EI-MS data.
Collapse
Affiliation(s)
- Deborah F McGlynn
- Applied and Computational Mathematics Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Nirina Rabe Andriamaharavo
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Anthony J Kearsley
- Applied and Computational Mathematics Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| |
Collapse
|
6
|
Metz TO, Chang CH, Gautam V, Anjum A, Tian S, Wang F, Colby SM, Nunez JR, Blumer MR, Edison AS, Fiehn O, Jones DP, Li S, Morgan ET, Patti GJ, Ross DH, Shapiro MR, Williams AJ, Wishart DS. Introducing 'identification probability' for automated and transferable assessment of metabolite identification confidence in metabolomics and related studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.30.605945. [PMID: 39131324 PMCID: PMC11312557 DOI: 10.1101/2024.07.30.605945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Methods for assessing compound identification confidence in metabolomics and related studies have been debated and actively researched for the past two decades. The earliest effort in 2007 focused primarily on mass spectrometry and nuclear magnetic resonance spectroscopy and resulted in four recommended levels of metabolite identification confidence - the Metabolite Standards Initiative (MSI) Levels. In 2014, the original MSI Levels were expanded to five levels (including two sublevels) to facilitate communication of compound identification confidence in high resolution mass spectrometry studies. Further refinement in identification levels have occurred, for example to accommodate use of ion mobility spectrometry in metabolomics workflows, and alternate approaches to communicate compound identification confidence also have been developed based on identification points schema. However, neither qualitative levels of identification confidence nor quantitative scoring systems address the degree of ambiguity in compound identifications in context of the chemical space being considered, are easily automated, or are transferable between analytical platforms. In this perspective, we propose that the metabolomics and related communities consider identification probability as an approach for automated and transferable assessment of compound identification and ambiguity in metabolomics and related studies. Identification probability is defined simply as 1/N, where N is the number of compounds in a reference library or chemical space that match to an experimentally measured molecule within user-defined measurement precision(s), for example mass measurement or retention time accuracy, etc. We demonstrate the utility of identification probability in an in silico analysis of multi-property reference libraries constructed from the Human Metabolome Database and computational property predictions, provide guidance to the community in transparent implementation of the concept, and invite the community to further evaluate this concept in parallel with their current preferred methods for assessing metabolite identification confidence.
Collapse
Affiliation(s)
- Thomas O. Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Christine H. Chang
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Afia Anjum
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB, Canada
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Sean M. Colby
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Jamie R. Nunez
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Madison R. Blumer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Arthur S. Edison
- Department of Biochemistry & Molecular Biology, Complex Carbohydrate Research Center and Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Oliver Fiehn
- West Coast Metabolomics Center, University of California Davis, Davis, CA, USA
| | - Dean P. Jones
- Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, Georgia, USA
| | - Shuzhao Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Edward T. Morgan
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Gary J. Patti
- Center for Mass Spectrometry and Metabolic Tracing, Department of Chemistry, Department of Medicine, Washington University, Saint Louis, Missouri, USA
| | - Dylan H. Ross
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Madelyn R. Shapiro
- Artificial Intelligence & Data Analytics Division, Pacific Northwest National Laboratory, Richland, WA USA
| | - Antony J. Williams
- U.S. Environmental Protection Agency, Office of Research & Development, Center for Computational Toxicology & Exposure (CCTE), Research Triangle Park, NC USA
| | - David S. Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
7
|
Wakoli J, Anjum A, Sajed T, Oler E, Wang F, Gautam V, LeVatte M, Wishart D. GCMS-ID: a webserver for identifying compounds from gas chromatography mass spectrometry experiments. Nucleic Acids Res 2024; 52:W381-W389. [PMID: 38783107 PMCID: PMC11223868 DOI: 10.1093/nar/gkae425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 04/28/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
GCMS-ID (Gas Chromatography Mass Spectrometry compound IDentifier) is a webserver designed to enable the identification of compounds from GC-MS experiments. GC-MS instruments produce both electron impact mass spectra (EI-MS) and retention index (RI) data for as few as one, to as many as hundreds of different compounds. Matching the measured EI-MS, RI or EI-MS + RI data to experimentally collected EI-MS and/or RI reference libraries allows facile compound identification. However, the number of available experimental RI and EI-MS reference spectra, especially for metabolomics or exposomics-related studies, is disappointingly small. Using machine learning to accurately predict the EI-MS spectra and/or RIs for millions of metabolomics and/or exposomics-relevant compounds could (partially) solve this spectral matching problem. This computational approach to compound identification is called in silico metabolomics. GCMS-ID brings this concept of in silico metabolomics closer to reality by intelligently integrating two of our previously published webservers: CFM-EI and RIpred. CFM-EI is an EI-MS spectral prediction webserver, and RIpred is a Kovats RI prediction webserver. We have found that GCMS-ID can accurately identify compounds from experimental RI, EI-MS or RI + EI-MS data through matching to its own large library of >1 million predicted RI/EI-MS values generated for metabolomics/exposomics-relevant compounds. GCMS-ID can also predict the RI or EI-MS spectrum from a user-submitted structure or annotate a user-submitted EI-MS spectrum. GCMS-ID is freely available at https://gcms-id.ca/.
Collapse
Affiliation(s)
- Julia Wakoli
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Afia Anjum
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Tanvir Sajed
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Eponine Oler
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Marcia LeVatte
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| |
Collapse
|
8
|
Khrisanfov MD, Matyushin DD, Samokhin AS. A general procedure for finding potentially erroneous entries in the database of retention indices. Anal Chim Acta 2024; 1297:342375. [PMID: 38438243 DOI: 10.1016/j.aca.2024.342375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 02/02/2024] [Accepted: 02/13/2024] [Indexed: 03/06/2024]
Abstract
BACKGROUND The NIST retention index database is one the most widely used sources of retention indices. In both untargeted analysis and machine learning studies filtering for potential errors is rather lacking or nonexistent. According to our estimates about 80% of the compounds from both NIST 17 and NIST 20 retention index databases have only one RI value per stationary phase, which makes searching for erroneous values with statistical methods impossible. Manual inspection is also impractical because the database contains more than 300 000 entries. RESULTS We suggest a two-step procedure to find potentially erroneous retention indices based on machine learning. The first step is to use five predictive models to obtain predicted retention index values for the whole database. The second one is to compare these predicted values against the experimental ones. We consider a retention index erroneous if its accuracy (the difference between predicted and experimental value) is in the bottom 5% for each of the five models simultaneously. Using this method, we were able to detect 2093 outlier entries for standard and semi-standard non-polar stationary phases in the NIST 17 retention index database, 566 of those were corrected or removed by the developers in the NIST 20. SIGNIFICANCE This is a novel approach to find potentially erroneous entries in a large-scale database with mostly unique entries, which can be applied not only to retention indices. The procedure can help filter and report mishandled data to improve the quality of the dataset for machine learning applications and experimental use.
Collapse
Affiliation(s)
- Mikhail D Khrisanfov
- Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia; A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.
| | - Andrey S Samokhin
- Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia.
| |
Collapse
|
9
|
Geer LY, Stein SE, Mallard WG, Slotta DJ. AIRI: Predicting Retention Indices and Their Uncertainties Using Artificial Intelligence. J Chem Inf Model 2024; 64:690-696. [PMID: 38230885 DOI: 10.1021/acs.jcim.3c01758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
The Kováts retention index (RI) is a quantity measured using gas chromatography and is commonly used in the identification of chemical structures. Creating libraries of observed RI values is a laborious task, so we explore the use of a deep neural network for predicting RI values from structure for standard semipolar columns. This network generated predictions with a mean absolute error of 15.1 and, in a quantification of the tail of the error distribution, a 95th percentile absolute error of 46.5. Because of the Artificial Intelligence Retention Indices (AIRI) network's accuracy, it was used to predict RI values for the NIST EI-MS spectral libraries. These RI values are used to improve chemical identification methods and the quality of the library. Estimating uncertainty is an important practical need when using prediction models. To quantify the uncertainty of our network for each individual prediction, we used the outputs of an ensemble of 8 networks to calculate a predicted standard deviation for each RI value prediction. This predicted standard deviation was corrected to follow the error between the observed and predicted RI values. The Z scores using these predicted standard deviations had a standard deviation of 1.52 and a 95th percentile absolute Z score corresponding to a mean RI value of 42.6.
Collapse
Affiliation(s)
- Lewis Y Geer
- National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E Stein
- National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - William Gary Mallard
- National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J Slotta
- National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
10
|
Li T, Su W, Zhong L, Liang W, Feng X, Zhu B, Ruan T, Jiang G. An Integrated Workflow Assisted by In Silico Predictions To Expand the List of Priority Polycyclic Aromatic Compounds. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:20854-20863. [PMID: 38010983 DOI: 10.1021/acs.est.3c07087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The limited information in existing mass spectral libraries hinders an accurate understanding of the composition, behavior, and toxicity of organic pollutants. In this study, a total of 350 polycyclic aromatic compounds (PACs) in 9 categories were successfully identified in fine particulate matter by gas chromatography high resolution mass spectrometry. Using mass spectra and retention indexes predicted by in silico tools as complementary information, the scope of chemical identification was efficiently expanded by 27%. In addition, quantitative structure-activity relationship models provided toxicity data for over 70% of PACs, facilitating a comprehensive health risk assessment. On the basis of extensive identification, the cumulative noncarcinogenic risk of PACs warranted attention. Meanwhile, the carcinogenic risk of 53 individual analogues was noteworthy. These findings suggest that there is a pressing need for an updated list of priority PACs for routine monitoring and toxicological research since legacy polycyclic aromatic hydrocarbons (PAHs) contributed modestly to the overall abundance (18%) and carcinogenic risk (8%). A toxicological priority index approach was applied for relative chemical ranking considering the environmental occurrence, fate, toxicity, and analytical availability. A list of 39 priority analogues was compiled, which predominantly consisted of high-molecular-weight PAHs and alkyl derivatives. These priority PACs further enhanced source interpretation, and the highest carcinogenic risk was attributed to coal combustion.
Collapse
Affiliation(s)
- Tingyu Li
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenyuan Su
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Laijin Zhong
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenqing Liang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoxia Feng
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bao Zhu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ting Ruan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
11
|
Anjum A, Liigand J, Milford R, Gautam V, Wishart DS. Accurate prediction of isothermal gas chromatographic Kováts retention indices. J Chromatogr A 2023; 1705:464176. [PMID: 37413909 DOI: 10.1016/j.chroma.2023.464176] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 06/17/2023] [Accepted: 06/20/2023] [Indexed: 07/08/2023]
Abstract
We describe a freely available web server called Retention Index Predictor (RIpred) (https://ripred.ca) that rapidly and accurately predicts Gas Chromatographic Kováts Retention Indices (RI) using SMILES strings as chemical structure input. RIpred performs RI prediction for three different stationary phases (semi-standard non-polar (SSNP), standard non-polar (SNP), and standard polar (SP)) for both derivatized (trimethylsilyl (TMS) and tert‑butyldimethylsilyl (TBDMS) derivatized) and underivatized (base compound) forms of GC-amenable structures. RIpred was developed to address the need for freely available, fast, highly accurate RI predictions for a wide range of derivatized and underivatized chemicals for all common GC stationary phases. RIpred was trained using a Graph Neural Network (GNN) that used compound structures, their extracted features (mostly atom-level features) and the GC-RI data from the National Institute of Standards and Technology databases (NIST 17 and NIST 20). We curated this NIST 17 and NIST 20 GC-RI data, which is available for all three stationary phases, to create appropriate inputs (molecular graphs in this case) needed to enhance our model performance. The performance of different RIpred predictive models was evaluated using 10-fold cross validation (CV). The best performing RIpred models were identified and when tested on hold-out test sets from all stationary phases, achieved a Mean Absolute Error (MAE) of <73 RI units (SSNP: 16.5-29.5, SNP: 38.5-45.9, SP: 46.52-72.53). The Mean Absolute Percentage Error (MAPE) of these models were typically within 3% (SSNP: 0.78-1.62%, SNP: 1.87-2.88%, SP: 2.34-4.05%). When compared to the best performing model by Qu et al., 2021, RIpred performed similarly (MAE of 16.57 RI units [RIpred] vs. 16.84 RI units [Qu et al., 2021 predictor] for derivatized compounds). RIpred also includes ∼5 million predicted RI values for all GC-amenable compounds (∼57,000) in the Human Metabolome Database HMDB 5.0 (Wishart et al., 2022).
Collapse
Affiliation(s)
- Afia Anjum
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Ralph Milford
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E9, Canada; Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada; Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada; Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada.
| |
Collapse
|
12
|
Abstract
Chemometrics and machine learning are artificial intelligence-based methods stirring a transformative change in chemistry. Organic synthesis, drug discovery and analytical techniques are incorporating machine learning techniques at an accelerated pace. However, machine-assisted chemistry faces challenges while solving critical problems in chemistry due to complex relationships in data sets. Even with increasing publishing volumes on machine learning, its application in areas of chemistry is not a straightforward endeavour. A particular concern in applying machine learning in chemistry is data availability and reproducibility. The present review article discusses the various chemometric methods, expert systems, and machine learning techniques developed for solving problems of organic synthesis and drug discovery with selected examples. Further, a concise discussion on chemometrics and ML deployed in analytical techniques such as, spectroscopy, microscopy and chromatography are presented. Finally, the review reflects the challenges, opportunities and future perspectives on machine learning and automation in chemistry. The review concludes by pondering on some tough questions on applying machine learning and their possibility of navigation in the different terrains of chemistry.
Collapse
Affiliation(s)
- Payal B. Joshi
- Operations and Method Development, Shefali Research Laboratories, Ambernath (East), Thane, Maharashtra 421501 India
| |
Collapse
|
13
|
Sholokhova AY, Grinevich OI, Matyushin DD, Buryak AK. Machine learning-assisted non-target analysis of a highly complex mixture of possible toxic unsymmetrical dimethylhydrazine transformation products with chromatography-mass spectrometry. CHEMOSPHERE 2022; 307:135764. [PMID: 35863423 DOI: 10.1016/j.chemosphere.2022.135764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/29/2022] [Accepted: 07/14/2022] [Indexed: 06/15/2023]
Abstract
Unsymmetrical dimethylhydrazine (UDMH) is a toxic and environmentally hostile compound that was massively introduced to the environment during previous decades due to its use in the space and rocket industry. The compound forms multiple transformation products, and many of them are as dangerous as UDMH or even more dangerous. The danger includes, but is not limited to, acute toxicity, chronic health hazards, carcinogenicity, and environmental damage. UDMH transformation products are poorly investigated. In this work, the mixture formed by long storage of the waste that contained UDMH was studied. Even a preliminary screening of such a mixture is a complex task. It consists of dozens of compounds, and most of them are missing in chemical and spectral databases. The complete preparative separation of such a mixture is very laborious. We applied several methods of gas chromatography-mass spectrometry and liquid chromatography-mass spectrometry, and several machine learning and chemoinformatics methods to make a preliminary but informative screening of the mixture. Machine learning allowed predicting retention indices and mass spectra of candidate structures. The combination of various ion sources and a comparison of the observed with the predicted spectra and retention was used to propose confident structures for 24 compounds. It was demonstrated that neither high-resolution mass spectrometry nor mass spectral library matching is enough to elucidate the structures of unknown UDMH transformation products. At the same time, the use of machine learning and a combination of methods significantly improves the identification power. Finally, machine learning was applied to estimate the acute toxicity of the discovered compounds. It was shown that many of them are comparable to or even more toxic than UDMH itself. Such an extremely wide and still underestimated variety of easily formed derivatives of UDMH can lead to a significant underestimation of the potential hazard of this compound.
Collapse
Affiliation(s)
- Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia.
| | - Oksana I Grinevich
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| | - Aleksey K Buryak
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| |
Collapse
|
14
|
Comparative Prediction of Gas Chromatographic Retention Indices for GC/MS Identification of Chemicals Related to Chemical Weapons Convention by Incremental and Machine Learning Methods. SEPARATIONS 2022. [DOI: 10.3390/separations9100265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
During on-site verification activities conducted by the Technical Secretariat of Organization for the Prohibition of Chemical Weapons, identification by gas chromatography retention indices (RI) data, in addition to mass spectrometry data, increase the reliability of factual findings. However, reference RIs do not cover all the possible chemical structures. That is why it is important to have models to predict RIs. Applicable only for narrow data sets of chemicals with a fixed scaffold (G- and V-series gases as example), the non-learning incremental method demonstrated predictive median absolute and percentage errors of 2–4 units and 0.1–0.2%; these are comparable with the experimental bias in RI measurements in the same laboratory with the same GC conditions. It outperforms the accuracy of two reported machine learning methods–median absolute and percentage errors of 11–52 units and 0.5–2.8%. However, for the whole Chemical Weapons Convention (CWC) data set of chemicals, when a fixed scaffold is absent, the incremental method is not applicable; essential machine learning methods achieved accuracy: median absolute and percentage errors of 29–33 units and 0.5–2.2%, depending on the machine learning method. In addition, we have developed a homology tree approach as a convenient method for the visualization of the CWC chemical space. We conclude that non-learning incremental methods may be more accurate than the state-of-the-art machine learning techniques in particular cases, such as predicting the RIs of homologues and isomers of chemicals related to CWC.
Collapse
|
15
|
Jiang H, Lin X, Wang L, Ren Y, Zhan S, Ma W. Predicting Material Properties by Deep Graph Networks. CRYSTAL RESEARCH AND TECHNOLOGY 2022. [DOI: 10.1002/crat.202200064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hantong Jiang
- Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education Hefei 230601 China
- School of Computer and Information Engineering Hefei University of Technology Hefei 230601 China
| | - Xuanjie Lin
- Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education Hefei 230601 China
- School of Computer and Information Engineering Hefei University of Technology Hefei 230601 China
| | - Liquan Wang
- Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education Hefei 230601 China
- School of Computer and Information Engineering Hefei University of Technology Hefei 230601 China
| | - Yongsheng Ren
- National Engineering Laboratory of Vacuum Metallurgy Kunming 650093 China
- Faculty of Metallurgical and Energy Engineering Kunming University of Science and Technology Kunming 650093 China
| | - Shu Zhan
- Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education Hefei 230601 China
- School of Computer and Information Engineering Hefei University of Technology Hefei 230601 China
| | - Wenhui Ma
- National Engineering Laboratory of Vacuum Metallurgy Kunming 650093 China
- Faculty of Metallurgical and Energy Engineering Kunming University of Science and Technology Kunming 650093 China
| |
Collapse
|
16
|
Qu C, Kearsley AJ, Schneider BI, Keyrouz W, Allison TC. Graph convolutional neural network applied to the prediction of normal boiling point. J Mol Graph Model 2022; 112:108149. [DOI: 10.1016/j.jmgm.2022.108149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/19/2022] [Accepted: 02/02/2022] [Indexed: 11/29/2022]
|
17
|
Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee B, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui V, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth H, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res 2022; 50:D622-D631. [PMID: 34986597 PMCID: PMC8728138 DOI: 10.1093/nar/gkab1062] [Citation(s) in RCA: 961] [Impact Index Per Article: 320.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 10/19/2021] [Indexed: 01/23/2023] Open
Abstract
The Human Metabolome Database or HMDB (https://hmdb.ca) has been providing comprehensive reference information about human metabolites and their associated biological, physiological and chemical properties since 2007. Over the past 15 years, the HMDB has grown and evolved significantly to meet the needs of the metabolomics community and respond to continuing changes in internet and computing technology. This year's update, HMDB 5.0, brings a number of important improvements and upgrades to the database. These should make the HMDB more useful and more appealing to a larger cross-section of users. In particular, these improvements include: (i) a significant increase in the number of metabolite entries (from 114 100 to 217 920 compounds); (ii) enhancements to the quality and depth of metabolite descriptions; (iii) the addition of new structure, spectral and pathway visualization tools; (iv) the inclusion of many new and much more accurately predicted spectral data sets, including predicted NMR spectra, more accurately predicted MS spectra, predicted retention indices and predicted collision cross section data and (v) enhancements to the HMDB's search functions to facilitate better compound identification. Many other minor improvements and updates to the content, the interface, and general performance of the HMDB website have also been made. Overall, we believe these upgrades and updates should greatly enhance the HMDB's ease of use and its potential applications not only in human metabolomics but also in exposomics, lipidomics, nutritional science, biochemistry and clinical chemistry.
Collapse
Affiliation(s)
- David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| | - AnChi Guo
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Eponine Oler
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Afia Anjum
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Harrison Peters
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Raynard Dizon
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Zinat Sayeeda
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Brian L Lee
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Mark Berjanskii
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Robert Mah
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Mai Yamamoto
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Juan Jovel
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | | | | | - Vicki W Lui
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Dorna Varshavi
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Dorsa Varshavi
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Dana Allen
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Nitya Khetarpal
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Aadhavya Sivakumaran
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Karxena Harford
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Selena Sanford
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Kristen Yee
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Xuan Cao
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Zachary Budinski
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Lun Zhang
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Jiamin Zheng
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Rupasri Mandal
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Naama Karu
- Leiden Academic Centre for Drug Research LACDR/Analytical Biosciences, Leiden University, Leiden, Netherlands
| | - Maija Dambrova
- Laboratory of Pharmaceutical Pharmacology, Latvian Institute of Organic Synthesis, Riga, Latvia
| | - Helgi B Schiöth
- Section of Functional Pharmacology, Department of Neuroscience, Uppsala University, Uppsala, Sweden
- Institute for Translational Medicine and Biotechnology, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| |
Collapse
|
18
|
Machado JL, Tomaz MA, da Luz JMR, Osório VM, Costa AV, Colodetti TV, Debona DG, Pereira LL. Evaluation of genetic divergence of coffee genotypes using the volatile compounds and sensory attributes profile. J Food Sci 2021; 87:383-395. [PMID: 34907528 DOI: 10.1111/1750-3841.15986] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 10/09/2021] [Accepted: 10/25/2021] [Indexed: 12/30/2022]
Abstract
The quality of the coffee beverage is related to the chemical, physical, and sensory attributes of the coffee beans that vary with the geographic location of the crop, genetic factors, and post-harvest processing. So, the objective of this study was to evaluate the genetic divergence of 27 genotypes of Coffea canephora using the volatile compounds and sensory attributes profile to select genotypes that produce a coffee beverage with high sensory quality. This genetic diversity was estimated from the Euclidean distance matrix using non-standard data and the Unweighted Pair-Group Method Using Arithmetic Averages (UPGMA). The 2-furyl-methanol, 4-ethenyl-2-methoxyphenol, furfural, 5-methylfurfural, methylpyrazine, and 2,6-dimethylpyrazine were predominating volatile compounds in the genotypes. The sensory attributes had a positive Pearson's correlation with the total score. The volatile compounds had a different relative contribution to the genetic divergence between the genotypes of C. canephora. The 4-ethenyl-2-methoxyphenol, 2-furyl-methanol, and furfural were volatile compounds that most contributed to the formation of the groups in the UPGMA dendrogram. The relative contribution of sensory attributes to dissimilarity among genotypes was 6.42% to 20.20%. Therefore, this study verified the relative contribution of volatile compounds, in specially 4-ethenyl-2-methoxyphenol, 2-furyl-methanol, and furfural, and sensory attributes (flavor, mouthfeel, and bitterness/sweetness) to the genetic divergence between the genotypes of the three clonal varieties. Thus, this work points out compounds that positively contribute to the sensory quality of the Conilon coffee beverage.
Collapse
Affiliation(s)
- Jéssica Louzada Machado
- Graduate Program in Agrochemistry, Federal University of Espírito Santo/UFES, Alegre, Espírito Santo, Brazil
| | - Marcelo Antonio Tomaz
- Agronomy Department, Federal University of Espírito Santo/UFES, Alegre, Espírito Santo, Brazil
| | | | - Vanessa Moreira Osório
- Chemistry and Physical Department, Federal University of Espírito Santo/UFES, Alegre, Espírito Santo, Brazil
| | - Adilson Vidal Costa
- Chemistry and Physical Department, Federal University of Espírito Santo/UFES, Alegre, Espírito Santo, Brazil
| | | | - Danieli Grancieri Debona
- Department of Coffee Research Analysis Laboratory, Federal Institute of Espírito Santo, Venda Nova do Imigrante, Espírito Santo, Brazil
| | - Lucas Louzada Pereira
- Department of Coffee Research Analysis Laboratory, Federal Institute of Espírito Santo, Venda Nova do Imigrante, Espírito Santo, Brazil
| |
Collapse
|
19
|
Ishtiaq M, Rauf A, Rubbab Q, Siddiqui MK, Ibrahim H. Algebraic Polynomial Based Topological Properties of Anti-Tumor Drug; Hyaluronic Acid-Doxorubicin (HAD). Polycycl Aromat Compd 2021. [DOI: 10.1080/10406638.2021.1995011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Muhammad Ishtiaq
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| | - Abdul Rauf
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| | - Qammar Rubbab
- Department of Mathematics, The Woman University Multan, Multan, Punjab, Pakistan
| | - Muhammad Kamran Siddiqui
- Department of Mathematics, Comsats University Islamabad, Lahore Campus, Lahore, Punjab, Pakistan
| | - Humaira Ibrahim
- Department of Mathematics, Air University Islamabad, Multan Campus, Multan, Punjab, Pakistan
| |
Collapse
|
20
|
Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases. Int J Mol Sci 2021; 22:ijms22179194. [PMID: 34502099 PMCID: PMC8430916 DOI: 10.3390/ijms22179194] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/23/2021] [Accepted: 08/24/2021] [Indexed: 01/12/2023] Open
Abstract
Prediction of gas chromatographic retention indices based on compound structure is an important task for analytical chemistry. The predicted retention indices can be used as a reference in a mass spectrometry library search despite the fact that their accuracy is worse in comparison with the experimental reference ones. In the last few years, deep learning was applied for this task. The use of deep learning drastically improved the accuracy of retention index prediction for non-polar stationary phases. In this work, we demonstrate for the first time the use of deep learning for retention index prediction on polar (e.g., polyethylene glycol, DB-WAX) and mid-polar (e.g., DB-624, DB-210, DB-1701, OV-17) stationary phases. The achieved accuracy lies in the range of 16–50 in terms of the mean absolute error for several stationary phases and test data sets. We also demonstrate that our approach can be directly applied to the prediction of the second dimension retention times (GC × GC) if a large enough data set is available. The achieved accuracy is considerably better compared with the previous results obtained using linear quantitative structure-retention relationships and ACD ChromGenius software. The source code and pre-trained models are available online.
Collapse
|