1
|
Wakoli J, Anjum A, Sajed T, Oler E, Wang F, Gautam V, LeVatte M, Wishart D. GCMS-ID: a webserver for identifying compounds from gas chromatography mass spectrometry experiments. Nucleic Acids Res 2024; 52:W381-W389. [PMID: 38783107 PMCID: PMC11223868 DOI: 10.1093/nar/gkae425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 04/28/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
GCMS-ID (Gas Chromatography Mass Spectrometry compound IDentifier) is a webserver designed to enable the identification of compounds from GC-MS experiments. GC-MS instruments produce both electron impact mass spectra (EI-MS) and retention index (RI) data for as few as one, to as many as hundreds of different compounds. Matching the measured EI-MS, RI or EI-MS + RI data to experimentally collected EI-MS and/or RI reference libraries allows facile compound identification. However, the number of available experimental RI and EI-MS reference spectra, especially for metabolomics or exposomics-related studies, is disappointingly small. Using machine learning to accurately predict the EI-MS spectra and/or RIs for millions of metabolomics and/or exposomics-relevant compounds could (partially) solve this spectral matching problem. This computational approach to compound identification is called in silico metabolomics. GCMS-ID brings this concept of in silico metabolomics closer to reality by intelligently integrating two of our previously published webservers: CFM-EI and RIpred. CFM-EI is an EI-MS spectral prediction webserver, and RIpred is a Kovats RI prediction webserver. We have found that GCMS-ID can accurately identify compounds from experimental RI, EI-MS or RI + EI-MS data through matching to its own large library of >1 million predicted RI/EI-MS values generated for metabolomics/exposomics-relevant compounds. GCMS-ID can also predict the RI or EI-MS spectrum from a user-submitted structure or annotate a user-submitted EI-MS spectrum. GCMS-ID is freely available at https://gcms-id.ca/.
Collapse
Affiliation(s)
- Julia Wakoli
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Afia Anjum
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Tanvir Sajed
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Eponine Oler
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Marcia LeVatte
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada
| |
Collapse
|
2
|
Khrisanfov MD, Matyushin DD, Samokhin AS. A general procedure for finding potentially erroneous entries in the database of retention indices. Anal Chim Acta 2024; 1297:342375. [PMID: 38438243 DOI: 10.1016/j.aca.2024.342375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 02/02/2024] [Accepted: 02/13/2024] [Indexed: 03/06/2024]
Abstract
BACKGROUND The NIST retention index database is one the most widely used sources of retention indices. In both untargeted analysis and machine learning studies filtering for potential errors is rather lacking or nonexistent. According to our estimates about 80% of the compounds from both NIST 17 and NIST 20 retention index databases have only one RI value per stationary phase, which makes searching for erroneous values with statistical methods impossible. Manual inspection is also impractical because the database contains more than 300 000 entries. RESULTS We suggest a two-step procedure to find potentially erroneous retention indices based on machine learning. The first step is to use five predictive models to obtain predicted retention index values for the whole database. The second one is to compare these predicted values against the experimental ones. We consider a retention index erroneous if its accuracy (the difference between predicted and experimental value) is in the bottom 5% for each of the five models simultaneously. Using this method, we were able to detect 2093 outlier entries for standard and semi-standard non-polar stationary phases in the NIST 17 retention index database, 566 of those were corrected or removed by the developers in the NIST 20. SIGNIFICANCE This is a novel approach to find potentially erroneous entries in a large-scale database with mostly unique entries, which can be applied not only to retention indices. The procedure can help filter and report mishandled data to improve the quality of the dataset for machine learning applications and experimental use.
Collapse
Affiliation(s)
- Mikhail D Khrisanfov
- Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia; A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.
| | - Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.
| | - Andrey S Samokhin
- Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia.
| |
Collapse
|
4
|
Knox C, Wilson M, Klinger C, Franklin M, Oler E, Wilson A, Pon A, Cox J, Chin NE, Strawbridge S, Garcia-Patino M, Kruger R, Sivakumaran A, Sanford S, Doshi R, Khetarpal N, Fatokun O, Doucet D, Zubkowski A, Rayat D, Jackson H, Harford K, Anjum A, Zakir M, Wang F, Tian S, Lee B, Liigand J, Peters H, Wang RQ, Nguyen T, So D, Sharp M, da Silva R, Gabriel C, Scantlebury J, Jasinski M, Ackerman D, Jewison T, Sajed T, Gautam V, Wishart D. DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res 2024; 52:D1265-D1275. [PMID: 37953279 PMCID: PMC10767804 DOI: 10.1093/nar/gkad976] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/09/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
First released in 2006, DrugBank (https://go.drugbank.com) has grown to become the 'gold standard' knowledge resource for drug, drug-target and related pharmaceutical information. DrugBank is widely used across many diverse biomedical research and clinical applications, and averages more than 30 million views/year. Since its last update in 2018, we have been actively enhancing the quantity and quality of the drug data in this knowledgebase. In this latest release (DrugBank 6.0), the number of FDA approved drugs has grown from 2646 to 4563 (a 72% increase), the number of investigational drugs has grown from 3394 to 6231 (a 38% increase), the number of drug-drug interactions increased from 365 984 to 1 413 413 (a 300% increase), and the number of drug-food interactions expanded from 1195 to 2475 (a 200% increase). In addition to this notable expansion in database size, we have added thousands of new, colorful, richly annotated pathways depicting drug mechanisms and drug metabolism. Likewise, existing datasets have been significantly improved and expanded, by adding more information on drug indications, drug-drug interactions, drug-food interactions and many other relevant data types for 11 891 drugs. We have also added experimental and predicted MS/MS spectra, 1D/2D-NMR spectra, CCS (collision cross section), RT (retention time) and RI (retention index) data for 9464 of DrugBank's 11 710 small molecule drugs. These and other improvements should make DrugBank 6.0 even more useful to a much wider research audience ranging from medicinal chemists to metabolomics specialists to pharmacologists.
Collapse
Affiliation(s)
- Craig Knox
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Mike Wilson
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Christen M Klinger
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Mark Franklin
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Eponine Oler
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Alex Wilson
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Allison Pon
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Jordan Cox
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Na Eun (Lucy) Chin
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Seth A Strawbridge
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Marysol Garcia-Patino
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Ray Kruger
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Aadhavya Sivakumaran
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Selena Sanford
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Rahil Doshi
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Nitya Khetarpal
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Omolola Fatokun
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Daphnee Doucet
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Ashley Zubkowski
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Dorsa Yahya Rayat
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Hayley Jackson
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Karxena Harford
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Afia Anjum
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Mahi Zakir
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Fei Wang
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Brian Lee
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Harrison Peters
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Ruo Qi (Rachel) Wang
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Tue Nguyen
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Denise So
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Matthew Sharp
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Rodolfo da Silva
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Cyrella Gabriel
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Joshua Scantlebury
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Marissa Jasinski
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - David Ackerman
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Timothy Jewison
- OMx Personal Health Analytics, Inc., 700–10130 103 St NW, Edmonton, AB T5J 1B9, Canada
| | - Tanvir Sajed
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - Vasuk Gautam
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H1, Canada
- Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 1C9, Canada
| |
Collapse
|