1
|
Zhu B, Li Z, Jin Z, Zhong Y, Lv T, Ge Z, Li H, Wang T, Lin Y, Liu H, Ma T, Wang S, Liao J, Fan X. Knowledge-based in silico fragmentation and annotation of mass spectra for natural products with MassKG. Comput Struct Biotechnol J 2024; 23:3327-3341. [PMID: 39310281 PMCID: PMC11415640 DOI: 10.1016/j.csbj.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/04/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024] Open
Abstract
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a potent analytical technique utilized for identifying natural products from complex sources. However, due to the structural diversity, annotating LC-MS/MS data of natural products efficiently remains challenging, hindering the discovery process of novel active structures. Here, we introduce MassKG, an algorithm that combines a knowledge-based fragmentation strategy and a deep learning-based molecule generation model to aid in rapid dereplication and the discovery of novel NP structures. Specifically, MassKG has compiled 407,720 known NP structures and, based on this, generated 266,353 new structures using chemical language models for the discovery of potential novel compounds. Furthermore, MassKG demonstrates exceptional performance in spectra annotation compared to state-of-the-art algorithms. To enhance usability, MassKG has been implemented as a web server for annotating tandem mass spectral data (MS/MS, MS2) with a user-friendly interface, automatic reporting, and fragment tree visualization. Lastly, the interpretive capability of MassKG is comprehensively validated through composition analysis and MS annotation of Panax notoginseng, Ginkgo biloba, Codonopsis pilosula, and Astragalus membranaceus. MassKG is now accessible at https://xomics.com.cn/masskg.
Collapse
Affiliation(s)
- Bingjie Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhenhao Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
| | - Zehua Jin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yi Zhong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianhang Lv
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhiwei Ge
- Analysis Center of Agrobiology and Environmental Sciences, Zhejiang University, Hangzhou 310058, China
| | - Haoran Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Tianhao Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yugang Lin
- Department of Pharmacy, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, Jinhua 321000, China
| | - Huihui Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianyi Ma
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Shufang Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Jie Liao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Xiaohui Fan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
- The Joint-laboratory of Clinical Multi-Omics Research between Zhejiang University and Ningbo Municipal Hospital of TCM, Ningbo Municipal Hospital of TCM, 315100 Ningbo, China
| |
Collapse
|
2
|
Liu J, Bao C, Zhang J, Han Z, Fang H, Lu H. Artificial intelligence with mass spectrometry-based multimodal molecular profiling methods for advancing therapeutic discovery of infectious diseases. Pharmacol Ther 2024; 263:108712. [PMID: 39241918 DOI: 10.1016/j.pharmthera.2024.108712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/22/2024] [Accepted: 09/03/2024] [Indexed: 09/09/2024]
Abstract
Infectious diseases, driven by a diverse array of pathogens, can swiftly undermine public health systems. Accurate diagnosis and treatment of infectious diseases-centered around the identification of biomarkers and the elucidation of disease mechanisms-are in dire need of more versatile and practical analytical approaches. Mass spectrometry (MS)-based molecular profiling methods can deliver a wealth of information on a range of functional molecules, including nucleic acids, proteins, and metabolites. While MS-driven omics analyses can yield vast datasets, the sheer complexity and multi-dimensionality of MS data can significantly hinder the identification and characterization of functional molecules within specific biological processes and events. Artificial intelligence (AI) emerges as a potent complementary tool that can substantially enhance the processing and interpretation of MS data. AI applications in this context lead to the reduction of spurious signals, the improvement of precision, the creation of standardized analytical frameworks, and the increase of data integration efficiency. This critical review emphasizes the pivotal roles of MS based omics strategies in the discovery of biomarkers and the clarification of infectious diseases. Additionally, the review underscores the transformative ability of AI techniques to enhance the utility of MS-based molecular profiling in the field of infectious diseases by refining the quality and practicality of data produced from omics analyses. In conclusion, we advocate for a forward-looking strategy that integrates AI with MS-based molecular profiling. This integration aims to transform the analytical landscape and the performance of biological molecule characterization, potentially down to the single-cell level. Such advancements are anticipated to propel the development of AI-driven predictive models, thus improving the monitoring of diagnostics and therapeutic discovery for the ongoing challenge related to infectious diseases.
Collapse
Affiliation(s)
- Jingjing Liu
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China
| | - Chaohui Bao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jiaxin Zhang
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China
| | - Zeguang Han
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| | - Haitao Lu
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China; Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China; Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
3
|
Scherbaum N, Bonnet U. [The challenges for psychiatric care posed by synthetic drugs]. DER NERVENARZT 2024; 95:818-823. [PMID: 39186107 DOI: 10.1007/s00115-024-01705-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/20/2024] [Indexed: 08/27/2024]
Abstract
BACKGROUND In addition to the drugs that have been known for decades, several hundred mainly synthetic substances have been identified as drugs for the first time in the last 20 years. AIM OF THE WORK Presentation of the various groups of substances and their psychotropic effects, the epidemiology of their use and the legal and social background of this development. MATERIAL Narrative literature review. RESULTS The most important new psychoactive substances (NPS) are synthetic cannabinoids, synthetic stimulants (cathinones), halluginogens and new synthetic opioids (NSO), in particular fentanyl and related substances. The new substances do not have any qualitatively new psychotropic effects. They were brought onto the market in particular as substitutes for substances subject to the Narcotics Act but are often associated with dangerous side effects and even mortality. The increasing availability of these substances has gone hand in hand with the establishment of the Internet as a source of knowledge (e.g. for synthesis routes) and as a marketplace. Substance group-related regulations have also been established in Germany (New Psychoactive Substances Act). In Germany the prevalence of NPS use is significantly lower than that of cannabis; however, there are indications that the production and distribution of synthetic drugs is more profitable for drug dealers than with conventional plant-based drugs, such as heroin. In the USA, for example, NSOs are the primarily drugs used for opioid addiction. DISCUSSION It remains to be seen whether NPS and NSOs will replace conventional drugs. The availability of synthetic drugs is more difficult to reduce than that of plant-based drugs. Harm reduction measures should be expanded, e.g., early warning systems for new drugs, drug checking and naloxone programs.
Collapse
Affiliation(s)
- Norbert Scherbaum
- LVR-Universitätsklinik Essen, Klinik für Psychiatrie und Psychotherapie, Medizinische Fakultät der Universität Duisburg-Essen, Essen, Deutschland.
| | - Udo Bonnet
- LVR-Universitätsklinik Essen, Klinik für Psychiatrie und Psychotherapie, Medizinische Fakultät der Universität Duisburg-Essen, Essen, Deutschland
- Ev. Krankenhaus Castrop-Rauxel, Klinik für Seelische Gesundheit, Castrop-Rauxel, Deutschland
| |
Collapse
|
4
|
Che P, Chang C, Buzzini P, Stegemann L, Kool J, Davidson JT, Kohler I. Identification of synthetic cathinone positional isomers using electron activated dissociation mass spectrometry. Anal Chim Acta 2024; 1319:342949. [PMID: 39122291 DOI: 10.1016/j.aca.2024.342949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 07/05/2024] [Accepted: 07/05/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND Synthetic cathinones (SCs) are a large category of new psychoactive substances (NPS), which pose a serious threat to public health due to limited information about their toxicology and pharmacology. Many SCs are closely related in their chemical structures, with some substances being positional isomers. In this study, we propose a new workflow for the identification of SC isomers using liquid chromatography-high-resolution tandem mass spectrometry (LC-HRMS2) combined with electron activated dissociation (EAD) and chemometrics. Differentiation between isomeric SCs is essential for both legislative and public safety reasons, since minor differences in their molecular structures may change their legal status and pharmacological profiles. RESULTS The workflow was optimized using ring-substituted isomers of methylmethcathinones, methylethcathinones, and chloromethcathinones. The kinetic energy in the EAD cell was investigated at three levels (i.e., 15, 18, and 20 eV) for each group. Two data analysis methods (i.e., t-distributed stochastic neighbor embedding [t-SNE] and a Random Forest [RF] algorithm) were applied using the obtained EAD mass spectral data. The three sets of ring-substituted SCs were clearly distinguished using t-SNE and an RF algorithm. Moreover, the RF approach resulted in a 97 % classification accuracy for isomer identification using various combinations of compounds, isomers, and electron kinetic energies. This workflow was subsequentially applied to the analysis of 26 blind street samples, resulting in a 92 % classification accuracy for isomer identification. However, the accuracy varied based on the kinetic electron energy. A subset of the original data set, focusing on 15-eV data only, was used, resulting in a classification accuracy of 100 %. SIGNIFICANCE This study presents the first LC-HRMS2 workflow based on EAD and chemometrics, which resulted in a classification accuracy of 100 % of authentic street samples. The developed LC-HRMS2 workflow demonstrates that EAD product ions and their characteristic ion ratios can be successfully used to identify ring-substituted positional isomers of SCs.
Collapse
Affiliation(s)
- Peng Che
- Vrije Universiteit Amsterdam, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Department of Chemistry and Pharmaceutical Sciences, Division of BioAnalytical Chemistry, Amsterdam, the Netherlands; Center for Analytical Sciences Amsterdam (CASA), Amsterdam, the Netherlands
| | - Christina Chang
- Sam Houston State University, Department of Forensic Science, Huntsville, TX, USA
| | - Patrick Buzzini
- Sam Houston State University, Department of Forensic Science, Huntsville, TX, USA
| | - Lavinia Stegemann
- Drugs Information and Monitoring System (DIMS), Drug Monitoring and Policy, Trimbos Institute, Utrecht, the Netherlands
| | - Jeroen Kool
- Vrije Universiteit Amsterdam, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Department of Chemistry and Pharmaceutical Sciences, Division of BioAnalytical Chemistry, Amsterdam, the Netherlands; Center for Analytical Sciences Amsterdam (CASA), Amsterdam, the Netherlands
| | - J Tyler Davidson
- Sam Houston State University, Department of Forensic Science, Huntsville, TX, USA.
| | - Isabelle Kohler
- Vrije Universiteit Amsterdam, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Department of Chemistry and Pharmaceutical Sciences, Division of BioAnalytical Chemistry, Amsterdam, the Netherlands; Center for Analytical Sciences Amsterdam (CASA), Amsterdam, the Netherlands; Co van Ledden Hulsebosch Center (CLHC), Amsterdam Center for Forensic Science and Medicine, Amsterdam, the Netherlands.
| |
Collapse
|
5
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2024:10.1007/s00216-024-05471-x. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
6
|
Iram A, Dong Y, Ignea C. Synthetic biology advances towards a bio-based society in the era of artificial intelligence. Curr Opin Biotechnol 2024; 87:103143. [PMID: 38781699 DOI: 10.1016/j.copbio.2024.103143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/04/2024] [Accepted: 05/04/2024] [Indexed: 05/25/2024]
Abstract
Synthetic biology is a rapidly emerging field with broad underlying applications in health, industry, agriculture, or environment, enabling sustainable solutions for unmet needs of modern society. With the very recent addition of artificial intelligence (AI) approaches, this field is now growing at a rate that can help reach the envisioned goals of bio-based society within the next few decades. Integrating AI with plant-based technologies, such as protein engineering, phytochemicals production, plant system engineering, or microbiome engineering, potentially disruptive applications have already been reported. These include enzymatic synthesis of new-to-nature molecules, bioelectricity generation, or biomass applications as construction material. Thus, in the not-so-distant future, synthetic biologists will help attain the overarching goal of a sustainable yet efficient production system for every aspect of society.
Collapse
Affiliation(s)
- Attia Iram
- Department of Bioengineering, McGill University, Montreal, QC H3A 0C3, Canada
| | - Yueming Dong
- Department of Bioengineering, McGill University, Montreal, QC H3A 0C3, Canada
| | - Codruta Ignea
- Department of Bioengineering, McGill University, Montreal, QC H3A 0C3, Canada.
| |
Collapse
|
7
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
8
|
Yang Y, Sun S, Yang S, Yang Q, Lu X, Wang X, Yu Q, Huo X, Qian X. Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method. Commun Chem 2024; 7:109. [PMID: 38740942 DOI: 10.1038/s42004-024-01189-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
Structural annotation of small molecules in tandem mass spectrometry has always been a central challenge in mass spectrometry analysis, especially using a miniaturized mass spectrometer for on-site testing. Here, we propose the Transformer enabled Fragment Tree (TeFT) method, which combines various types of fragmentation tree models and a deep learning Transformer module. It is aimed to generate the specific structure of molecules de novo solely from mass spectrometry spectra. The evaluation results on different open-source databases indicated that the proposed model achieved remarkable results in that the majority of molecular structures of compounds in the test can be successfully recognized. Also, the TeFT has been validated on a miniaturized mass spectrometer with low-resolution spectra for 16 flavonoid alcohols, achieving complete structure prediction for 8 substances. Finally, TeFT confirmed the structure of the compound contained in a Chinese medicine substance called the Anweiyang capsule. These results indicate that the TeFT method is suitable for annotating fragmentation peaks with clear fragmentation rules, particularly when applied to on-site mass spectrometry with lower mass resolution.
Collapse
Affiliation(s)
- Yiming Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuang Sun
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuyuan Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qin Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinqiong Lu
- CHIN Instrument (Hefei) Co., Ltd., Hefei, 231200, China
| | - Xiaohao Wang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Quan Yu
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinming Huo
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
| | - Xiang Qian
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| |
Collapse
|
9
|
Sun H, Xue X, Liu X, Hu HY, Deng Y, Wang X. Cross-Modal Retrieval Between 13C NMR Spectra and Structures Based on Focused Libraries. Anal Chem 2024; 96:5763-5770. [PMID: 38564366 DOI: 10.1021/acs.analchem.3c04294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Library matching by comparing carbon-13 nuclear magnetic resonance (13C NMR) spectra with spectral data in the library is a crucial method for compound identification. In our previous paper, we introduced a deep contrastive learning system called CReSS, which used a library that contained more structures. However, CReSS has two limitations: there were no unknown structures in the library, and a redundant library reduces the structure-elucidation accuracy. Herein, we replaced the oversize traditional libraries with focused libraries containing a small number of molecules. A previously generative model, CMGNet, was used to generate focused libraries for CReSS. The combined model achieved a Top-10 accuracy of 54.03% when tested on 6,471 13C NMR spectra. In comparison, CReSS with a random reference structure library achieved an accuracy of only 9.17%. Furthermore, to expand the advantages of the focused libraries, we proposed SAmpRNN, which is a recurrent neural network (RNN). With the large focused library amplified by SAmpRNN, the structure-identification accuracy of the model increased in 70.0% of the 30 random example cases. In general, cross-modal retrieval between 13C NMR spectra and structures based on focused libraries (CFLS) achieved high accuracy and provided more accurate candidate structures than traditional libraries for compound identification.
Collapse
Affiliation(s)
- Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
- Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
- Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China
| |
Collapse
|
10
|
Wallach I, Bernard D, Nguyen K, Ho G, Morrison A, Stecula A, Rosnik A, O’Sullivan AM, Davtyan A, Samudio B, Thomas B, Worley B, Butler B, Laggner C, Thayer D, Moharreri E, Friedland G, Truong H, van den Bedem H, Ng HL, Stafford K, Sarangapani K, Giesler K, Ngo L, Mysinger M, Ahmed M, Anthis NJ, Henriksen N, Gniewek P, Eckert S, de Oliveira S, Suterwala S, PrasadPrasad SVK, Shek S, Contreras S, Hare S, Palazzo T, O’Brien TE, Van Grack T, Williams T, Chern TR, Kenyon V, Lee AH, Cann AB, Bergman B, Anderson BM, Cox BD, Warrington JM, Sorenson JM, Goldenberg JM, Young MA, DeHaan N, Pemberton RP, Schroedl S, Abramyan TM, Gupta T, Mysore V, Presser AG, Ferrando AA, Andricopulo AD, Ghosh A, Ayachi AG, Mushtaq A, Shaqra AM, Toh AKL, Smrcka AV, Ciccia A, de Oliveira AS, Sverzhinsky A, de Sousa AM, Agoulnik AI, Kushnir A, Freiberg AN, Statsyuk AV, Gingras AR, Degterev A, Tomilov A, Vrielink A, Garaeva AA, Bryant-Friedrich A, Caflisch A, Patel AK, Rangarajan AV, Matheeussen A, Battistoni A, Caporali A, Chini A, Ilari A, Mattevi A, Foote AT, Trabocchi A, Stahl A, Herr AB, Berti A, Freywald A, Reidenbach AG, Lam A, Cuddihy AR, White A, Taglialatela A, Ojha AK, Cathcart AM, Motyl AAL, Borowska A, D’Antuono A, Hirsch AKH, Porcelli AM, Minakova A, Montanaro A, Müller A, Fiorillo A, Virtanen A, O’Donoghue AJ, Del Rio Flores A, Garmendia AE, Pineda-Lucena A, Panganiban AT, Samantha A, Chatterjee AK, Haas AL, Paparella AS, John ALS, Prince A, ElSheikh A, Apfel AM, Colomba A, O’Dea A, Diallo BN, Ribeiro BMRM, Bailey-Elkin BA, Edelman BL, Liou B, Perry B, Chua BSK, Kováts B, Englinger B, Balakrishnan B, Gong B, Agianian B, Pressly B, Salas BPM, Duggan BM, Geisbrecht BV, Dymock BW, Morten BC, Hammock BD, Mota BEF, Dickinson BC, Fraser C, Lempicki C, Novina CD, Torner C, Ballatore C, Bon C, Chapman CJ, Partch CL, Chaton CT, Huang C, Yang CY, Kahler CM, Karan C, Keller C, Dieck CL, Huimei C, Liu C, Peltier C, Mantri CK, Kemet CM, Müller CE, Weber C, Zeina CM, Muli CS, Morisseau C, Alkan C, Reglero C, Loy CA, Wilson CM, Myhr C, Arrigoni C, Paulino C, Santiago C, Luo D, Tumes DJ, Keedy DA, Lawrence DA, Chen D, Manor D, Trader DJ, Hildeman DA, Drewry DH, Dowling DJ, Hosfield DJ, Smith DM, Moreira D, Siderovski DP, Shum D, Krist DT, Riches DWH, Ferraris DM, Anderson DH, Coombe DR, Welsbie DS, Hu D, Ortiz D, Alramadhani D, Zhang D, Chaudhuri D, Slotboom DJ, Ronning DR, Lee D, Dirksen D, Shoue DA, Zochodne DW, Krishnamurthy D, Duncan D, Glubb DM, Gelardi ELM, Hsiao EC, Lynn EG, Silva EB, Aguilera E, Lenci E, Abraham ET, Lama E, Mameli E, Leung E, Giles E, Christensen EM, Mason ER, Petretto E, Trakhtenberg EF, Rubin EJ, Strauss E, Thompson EW, Cione E, Lisabeth EM, Fan E, Kroon EG, Jo E, García-Cuesta EM, Glukhov E, Gavathiotis E, Yu F, Xiang F, Leng F, Wang F, Ingoglia F, van den Akker F, Borriello F, Vizeacoumar FJ, Luh F, Buckner FS, Vizeacoumar FS, Bdira FB, Svensson F, Rodriguez GM, Bognár G, Lembo G, Zhang G, Dempsey G, Eitzen G, Mayer G, Greene GL, Garcia GA, Lukacs GL, Prikler G, Parico GCG, Colotti G, De Keulenaer G, Cortopassi G, Roti G, Girolimetti G, Fiermonte G, Gasparre G, Leuzzi G, Dahal G, Michlewski G, Conn GL, Stuchbury GD, Bowman GR, Popowicz GM, Veit G, de Souza GE, Akk G, Caljon G, Alvarez G, Rucinski G, Lee G, Cildir G, Li H, Breton HE, Jafar-Nejad H, Zhou H, Moore HP, Tilford H, Yuan H, Shim H, Wulff H, Hoppe H, Chaytow H, Tam HK, Van Remmen H, Xu H, Debonsi HM, Lieberman HB, Jung H, Fan HY, Feng H, Zhou H, Kim HJ, Greig IR, Caliandro I, Corvo I, Arozarena I, Mungrue IN, Verhamme IM, Qureshi IA, Lotsaris I, Cakir I, Perry JJP, Kwiatkowski J, Boorman J, Ferreira J, Fries J, Kratz JM, Miner J, Siqueira-Neto JL, Granneman JG, Ng J, Shorter J, Voss JH, Gebauer JM, Chuah J, Mousa JJ, Maynes JT, Evans JD, Dickhout J, MacKeigan JP, Jossart JN, Zhou J, Lin J, Xu J, Wang J, Zhu J, Liao J, Xu J, Zhao J, Lin J, Lee J, Reis J, Stetefeld J, Bruning JB, Bruning JB, Coles JG, Tanner JJ, Pascal JM, So J, Pederick JL, Costoya JA, Rayman JB, Maciag JJ, Nasburg JA, Gruber JJ, Finkelstein JM, Watkins J, Rodríguez-Frade JM, Arias JAS, Lasarte JJ, Oyarzabal J, Milosavljevic J, Cools J, Lescar J, Bogomolovas J, Wang J, Kee JM, Kee JM, Liao J, Sistla JC, Abrahão JS, Sishtla K, Francisco KR, Hansen KB, Molyneaux KA, Cunningham KA, Martin KR, Gadar K, Ojo KK, Wong KS, Wentworth KL, Lai K, Lobb KA, Hopkins KM, Parang K, Machaca K, Pham K, Ghilarducci K, Sugamori KS, McManus KJ, Musta K, Faller KME, Nagamori K, Mostert KJ, Korotkov KV, Liu K, Smith KS, Sarosiek K, Rohde KH, Kim KK, Lee KH, Pusztai L, Lehtiö L, Haupt LM, Cowen LE, Byrne LJ, Su L, Wert-Lamas L, Puchades-Carrasco L, Chen L, Malkas LH, Zhuo L, Hedstrom L, Hedstrom L, Walensky LD, Antonelli L, Iommarini L, Whitesell L, Randall LM, Fathallah MD, Nagai MH, Kilkenny ML, Ben-Johny M, Lussier MP, Windisch MP, Lolicato M, Lolli ML, Vleminckx M, Caroleo MC, Macias MJ, Valli M, Barghash MM, Mellado M, Tye MA, Wilson MA, Hannink M, Ashton MR, Cerna MVC, Giorgis M, Safo MK, Maurice MS, McDowell MA, Pasquali M, Mehedi M, Serafim MSM, Soellner MB, Alteen MG, Champion MM, Skorodinsky M, O’Mara ML, Bedi M, Rizzi M, Levin M, Mowat M, Jackson MR, Paige M, Al-Yozbaki M, Giardini MA, Maksimainen MM, De Luise M, Hussain MS, Christodoulides M, Stec N, Zelinskaya N, Van Pelt N, Merrill NM, Singh N, Kootstra NA, Singh N, Gandhi NS, Chan NL, Trinh NM, Schneider NO, Matovic N, Horstmann N, Longo N, Bharambe N, Rouzbeh N, Mahmoodi N, Gumede NJ, Anastasio NC, Khalaf NB, Rabal O, Kandror O, Escaffre O, Silvennoinen O, Bishop OT, Iglesias P, Sobrado P, Chuong P, O’Connell P, Martin-Malpartida P, Mellor P, Fish PV, Moreira POL, Zhou P, Liu P, Liu P, Wu P, Agogo-Mawuli P, Jones PL, Ngoi P, Toogood P, Ip P, von Hundelshausen P, Lee PH, Rowswell-Turner RB, Balaña-Fouce R, Rocha REO, Guido RVC, Ferreira RS, Agrawal RK, Harijan RK, Ramachandran R, Verma R, Singh RK, Tiwari RK, Mazitschek R, Koppisetti RK, Dame RT, Douville RN, Austin RC, Taylor RE, Moore RG, Ebright RH, Angell RM, Yan R, Kejriwal R, Batey RA, Blelloch R, Vandenberg RJ, Hickey RJ, Kelm RJ, Lake RJ, Bradley RK, Blumenthal RM, Solano R, Gierse RM, Viola RE, McCarthy RR, Reguera RM, Uribe RV, do Monte-Neto RL, Gorgoglione R, Cullinane RT, Katyal S, Hossain S, Phadke S, Shelburne SA, Geden SE, Johannsen S, Wazir S, Legare S, Landfear SM, Radhakrishnan SK, Ammendola S, Dzhumaev S, Seo SY, Li S, Zhou S, Chu S, Chauhan S, Maruta S, Ashkar SR, Shyng SL, Conticello SG, Buroni S, Garavaglia S, White SJ, Zhu S, Tsimbalyuk S, Chadni SH, Byun SY, Park S, Xu SQ, Banerjee S, Zahler S, Espinoza S, Gustincich S, Sainas S, Celano SL, Capuzzi SJ, Waggoner SN, Poirier S, Olson SH, Marx SO, Van Doren SR, Sarilla S, Brady-Kalnay SM, Dallman S, Azeem SM, Teramoto T, Mehlman T, Swart T, Abaffy T, Akopian T, Haikarainen T, Moreda TL, Ikegami T, Teixeira TR, Jayasinghe TD, Gillingwater TH, Kampourakis T, Richardson TI, Herdendorf TJ, Kotzé TJ, O’Meara TR, Corson TW, Hermle T, Ogunwa TH, Lan T, Su T, Banjo T, O’Mara TA, Chou T, Chou TF, Baumann U, Desai UR, Pai VP, Thai VC, Tandon V, Banerji V, Robinson VL, Gunasekharan V, Namasivayam V, Segers VFM, Maranda V, Dolce V, Maltarollo VG, Scoffone VC, Woods VA, Ronchi VP, Van Hung Le V, Clayton WB, Lowther WT, Houry WA, Li W, Tang W, Zhang W, Van Voorhis WC, Donaldson WA, Hahn WC, Kerr WG, Gerwick WH, Bradshaw WJ, Foong WE, Blanchet X, Wu X, Lu X, Qi X, Xu X, Yu X, Qin X, Wang X, Yuan X, Zhang X, Zhang YJ, Hu Y, Aldhamen YA, Chen Y, Li Y, Sun Y, Zhu Y, Gupta YK, Pérez-Pertejo Y, Li Y, Tang Y, He Y, Tse-Dinh YC, Sidorova YA, Yen Y, Li Y, Frangos ZJ, Chung Z, Su Z, Wang Z, Zhang Z, Liu Z, Inde Z, Artía Z, Heifets A. AI is a viable alternative to high throughput screening: a 318-target study. Sci Rep 2024; 14:7526. [PMID: 38565852 PMCID: PMC10987645 DOI: 10.1038/s41598-024-54655-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 02/15/2024] [Indexed: 04/04/2024] Open
Abstract
High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires physical compounds, which limits coverage of accessible chemical space. Computational approaches combined with vast on-demand chemical libraries can access far greater chemical space, provided that the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic area and protein class. We address historical limitations of computational screening by demonstrating success for target proteins without known binders, high-quality X-ray crystal structures, or manual cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical results suggest that computational methods can substantially replace HTS as the first step of small-molecule drug discovery.
Collapse
|
11
|
Lin YC, Chien WC, Wang YX, Wang YH, Yang FS, Tseng LP, Hung JH. PS 2MS: A Deep Learning-Based Prediction System for Identifying New Psychoactive Substances Using Mass Spectrometry. Anal Chem 2024; 96:4835-4844. [PMID: 38488022 PMCID: PMC10974679 DOI: 10.1021/acs.analchem.3c05019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/12/2024] [Accepted: 03/04/2024] [Indexed: 03/27/2024]
Abstract
The rapid proliferation of new psychoactive substances (NPS) poses significant challenges to conventional mass-spectrometry-based identification methods due to the absence of reference spectra for these emerging substances. This paper introduces PS2MS, an AI-powered predictive system designed specifically to address the limitations of identifying the emergence of unidentified novel illicit drugs. PS2MS builds a synthetic NPS database by enumerating feasible derivatives of known substances and uses deep learning to generate mass spectra and chemical fingerprints. When the mass spectrum of an analyte does not match any known reference, PS2MS simultaneously examines the chemical fingerprint and mass spectrum against the putative NPS database using integrated metrics to deduce possible identities. Experimental results affirm the effectiveness of PS2MS in identifying cathinone derivatives within real evidence specimens, signifying its potential for practical use in identifying emerging drugs of abuse for researchers and forensic experts.
Collapse
Affiliation(s)
- Yi-Ching Lin
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Laboratory Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Doctoral
Degree Program of Toxicology, College of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Wei-Chen Chien
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Yu-Xuan Wang
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Ying-Hau Wang
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
| | - Feng-Shuo Yang
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
- Department
of Medicinal and Applied Chemistry, Kaohsiung
Medical University, Kaohsiung 807, Taiwan
| | - Li-Ping Tseng
- Department
of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Jui-Hung Hung
- Department
of Computer Science, National Yang Ming
Chiao Tung University, HsinChu 300, Taiwan
- Program
in Biomedical Artificial Intelligence, National
Tsing Hua University, HsinChu 300, Taiwan
| |
Collapse
|
12
|
Lee KH, Won SJ, Oyinloye P, Shi L. Unlocking the Potential of High-Quality Dopamine Transporter Pharmacological Data: Advancing Robust Machine Learning-Based QSAR Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583803. [PMID: 38558976 PMCID: PMC10979915 DOI: 10.1101/2024.03.06.583803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The dopamine transporter (DAT) plays a critical role in the central nervous system and has been implicated in numerous psychiatric disorders. The ligand-based approaches are instrumental to decipher the structure-activity relationship (SAR) of DAT ligands, especially the quantitative SAR (QSAR) modeling. By gathering and analyzing data from literature and databases, we systematically assemble a diverse range of ligands binding to DAT, aiming to discern the general features of DAT ligands and uncover the chemical space for potential novel DAT ligand scaffolds. The aggregation of DAT pharmacological activity data, particularly from databases like ChEMBL, provides a foundation for constructing robust QSAR models. The compilation and meticulous filtering of these data, establishing high-quality training datasets with specific divisions of pharmacological assays and data types, along with the application of QSAR modeling, prove to be a promising strategy for navigating the pertinent chemical space. Through a systematic comparison of DAT QSAR models using training datasets from various ChEMBL releases, we underscore the positive impact of enhanced data set quality and increased data set size on the predictive power of DAT QSAR models.
Collapse
Affiliation(s)
- Kuo Hao Lee
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Sung Joon Won
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Precious Oyinloye
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Lei Shi
- Computational Chemistry and Molecular Biophysics Section, Molecular Targets and Medications Discovery Branch, National Institute on Drug Abuse – Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| |
Collapse
|
13
|
Hackman L, Mack P, Ménard H. Behind every good research there are data. What are they and their importance to forensic science. Forensic Sci Int Synerg 2024; 8:100456. [PMID: 38362142 PMCID: PMC10867567 DOI: 10.1016/j.fsisyn.2024.100456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/17/2024]
Abstract
Data underpinning science have become one of the most precious assets in research, and while the principles of FAIR (Findable, Accessible, Interoperable and Reusable) have been put forward as a guide to how to approach data handling, data sharing and long-term storage still remain a challenge for many research areas including forensic science. The reporting and the sharing of data can be made easier by giving them structure, the use of suitable labels and the inclusion of descriptors collated into metadata prior to their deposition in repositories with persistent identifiers. Such a systematic approach would strengthen the quality and the integrity of research while providing greater transparency to published materials.
Collapse
Affiliation(s)
- Lucina Hackman
- Leverhulme Research Centre for Forensic Science, University of Dundee, Nethergate, Dundee, DD1 4HN, UK
| | - Pauline Mack
- Leverhulme Research Centre for Forensic Science, University of Dundee, Nethergate, Dundee, DD1 4HN, UK
| | - Hervé Ménard
- Leverhulme Research Centre for Forensic Science, University of Dundee, Nethergate, Dundee, DD1 4HN, UK
| |
Collapse
|
14
|
Wang F, Pasin D, Skinnider MA, Liigand J, Kleis JN, Brown D, Oler E, Sajed T, Gautam V, Harrison S, Greiner R, Foster LJ, Dalsgaard PW, Wishart DS. Deep Learning-Enabled MS/MS Spectrum Prediction Facilitates Automated Identification Of Novel Psychoactive Substances. Anal Chem 2023; 95:18326-18334. [PMID: 38048435 PMCID: PMC10733899 DOI: 10.1021/acs.analchem.3c02413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 11/10/2023] [Accepted: 11/13/2023] [Indexed: 12/06/2023]
Abstract
The market for illicit drugs has been reshaped by the emergence of more than 1100 new psychoactive substances (NPS) over the past decade, posing a major challenge to the forensic and toxicological laboratories tasked with detecting and identifying them. Tandem mass spectrometry (MS/MS) is the primary method used to screen for NPS within seized materials or biological samples. The most contemporary workflows necessitate labor-intensive and expensive MS/MS reference standards, which may not be available for recently emerged NPS on the illicit market. Here, we present NPS-MS, a deep learning method capable of accurately predicting the MS/MS spectra of known and hypothesized NPS from their chemical structures alone. NPS-MS is trained by transfer learning from a generic MS/MS prediction model on a large data set of MS/MS spectra. We show that this approach enables a more accurate identification of NPS from experimentally acquired MS/MS spectra than any existing method. We demonstrate the application of NPS-MS to identify a novel derivative of phencyclidine (PCP) within an unknown powder seized in Denmark without the use of any reference standards. We anticipate that NPS-MS will allow forensic laboratories to identify more rapidly both known and newly emerging NPS. NPS-MS is available as a web server at https://nps-ms.ca/, which provides MS/MS spectra prediction capabilities for given NPS compounds. Additionally, it offers MS/MS spectra identification against a vast database comprising approximately 8.7 million predicted NPS compounds from DarkNPS and 24.5 million predicted ESI-QToF-MS/MS spectra for these compounds.
Collapse
Affiliation(s)
- Fei Wang
- Department
of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
- Alberta
Machine Intelligence Institute, Edmonton, Alberta T5J
3B1, Canada
| | - Daniel Pasin
- Section
of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen 2100, Denmark
| | - Michael A. Skinnider
- Michael
Smith Laboratories, University of British
Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Lewis-Sigler
Institute for Integrative Genomics, Princeton
University, Princeton, New Jersey 08544, United States
- Ludwig Institute
for Cancer Research, Princeton University, Princeton, New Jersey 08544, United States
| | - Jaanus Liigand
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
- Institute
of Chemistry, University of Tartu, Tartu 50411, Estonia
| | - Jan-Niklas Kleis
- Institute
of Forensic Medicine, Forensic Toxicology, Johannes Gutenberg University Mainz, Mainz 55131, Germany
| | - David Brown
- Forensic
Science Laboratory, ChemCentre, Bentley, Western Australia 6102, Australia
- School of Molecular and Life Sciences, Curtin University, Bentley, Western Australia 6009, Australia
| | - Eponine Oler
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Tanvir Sajed
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Vasuk Gautam
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Stephen Harrison
- Forensic
Science Laboratory, ChemCentre, Bentley, Western Australia 6102, Australia
| | - Russell Greiner
- Department
of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
- Alberta
Machine Intelligence Institute, Edmonton, Alberta T5J
3B1, Canada
| | - Leonard J. Foster
- Michael
Smith Laboratories, University of British
Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department
of Biochemistry and Molecular Biology, University
of British Columbia, Vancouver, British Columbia V6T 2A1, Canada
| | - Petur Weihe Dalsgaard
- Section
of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen 2100, Denmark
| | - David S. Wishart
- Department
of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
- Department
of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
- Department of Laboratory
Medicine and Pathology, University of Alberta, Edmonton, Alberta T6G 1C9, Canada
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta T6G 2C8, Canada
- Biological Sciences Division, Pacific Northwest
National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
15
|
Matey JM, Zapata F, Menéndez-Quintanal LM, Montalvo G, García-Ruiz C. Identification of new psychoactive substances and their metabolites using non-targeted detection with high-resolution mass spectrometry through diagnosing fragment ions/neutral loss analysis. Talanta 2023; 265:124816. [PMID: 37423179 DOI: 10.1016/j.talanta.2023.124816] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/24/2023] [Accepted: 06/12/2023] [Indexed: 07/11/2023]
Affiliation(s)
- José Manuel Matey
- Department of Chemistry and Drugs, National Institute of Toxicology and Forensic Sciences, C/ José Echegaray Nº4, 28232, Las Rozas de Madrid, Madrid, Spain; Universidad de Alcalá, Instituto Universitario de Investigación en Ciencias Policiales (IUICP), calle Libreros 27, 28801, Alcalá de Henares, Madrid, España(1); Chemical and Forensic Sciences (CINQUIFOR) Research Group, University of Alcalá, Ctra. Madrid-Barcelona km 33.600, 28871, Alcalá de Henares, Madrid, Spain(2).
| | - Félix Zapata
- Department of Analytical Chemistry, University of Murcia, Campus Espinardo, 30100, Murcia, Spain.
| | - Luis Manuel Menéndez-Quintanal
- Department of Chemistry and Drugs, National Institute of Toxicology and Forensic Sciences, Campus de Ciencias de la Salud, La Cuesta, 38320, La Laguna (Sta. Cruz de Tenerife), Spain.
| | - Gemma Montalvo
- Universidad de Alcalá, Instituto Universitario de Investigación en Ciencias Policiales (IUICP), calle Libreros 27, 28801, Alcalá de Henares, Madrid, España(1); Chemical and Forensic Sciences (CINQUIFOR) Research Group, University of Alcalá, Ctra. Madrid-Barcelona km 33.600, 28871, Alcalá de Henares, Madrid, Spain(2); Universidad de Alcalá, Departamento de Química Analítica, Quimica Física e Ingeniería Química, Ctra. Madrid-Barcelona km 33,6, 28871 Alcalá de Henares, Madrid, España.
| | - Carmen García-Ruiz
- Universidad de Alcalá, Instituto Universitario de Investigación en Ciencias Policiales (IUICP), calle Libreros 27, 28801, Alcalá de Henares, Madrid, España(1); Chemical and Forensic Sciences (CINQUIFOR) Research Group, University of Alcalá, Ctra. Madrid-Barcelona km 33.600, 28871, Alcalá de Henares, Madrid, Spain(2); Universidad de Alcalá, Departamento de Química Analítica, Quimica Física e Ingeniería Química, Ctra. Madrid-Barcelona km 33,6, 28871 Alcalá de Henares, Madrid, España.
| |
Collapse
|
16
|
Skinnider MA, Mérette SAM, Pasin D, Rogalski J, Foster LJ, Scheuermeyer F, Shapiro AM. Identification of Emerging Novel Psychoactive Substances by Retrospective Analysis of Population-Scale Mass Spectrometry Data Sets. Anal Chem 2023; 95:17300-17310. [PMID: 37966487 DOI: 10.1021/acs.analchem.3c03451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Over the last two decades, hundreds of new psychoactive substances (NPSs), also known as "designer drugs", have emerged on the illicit drug market. The toxic and potentially fatal effects of these compounds oblige laboratories around the world to screen for NPS in seized materials and biological samples, commonly using high-resolution mass spectrometry. However, unambiguous identification of a NPS by mass spectrometry requires comparison to data from analytical reference materials, acquired on the same instrument. The sheer number of NPSs that are available on the illicit market, and the pace at which new compounds are introduced, means that forensic laboratories must make difficult decisions about which reference materials to acquire. Here, we asked whether retrospective suspect screening of population-scale mass spectrometry data could provide a data-driven platform to prioritize emerging NPSs for assay development. We curated a suspect database of precursor and diagnostic fragment ion masses for 83 emerging NPSs and used this database to retrospectively screen mass spectrometry data from 12,727 urine drug screens from one Canadian province. We developed integrative computational strategies to prioritize the most reliable identifications and tracked the frequency of these identifications over a 3 year study period between August 2019 and August 2022. The resulting data were used to guide the acquisition of new reference materials, which were in turn used to validate a subset of the retrospective identifications. Last, we took advantage of matching clinical reports for all 12,727 samples to systematically benchmark the accuracy of our retrospective data analysis approach. Our work opens up new avenues to enable the rapid detection of emerging illicit drugs through large-scale reanalysis of mass spectrometry data.
Collapse
Affiliation(s)
- Michael A Skinnider
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States
- Ludwig Institute for Cancer Research, Princeton University, Princeton, New Jersey 08544, United States
| | - Sandrine A M Mérette
- Provincial Toxicology Centre, Provincial Health Services Authority, Vancouver, British Columbia V5Z 4R4, Canada
| | - Daniel Pasin
- Forensic Laboratory Division, Office of the Chief Medical Examiner, San Francisco, California 94124, United States
| | - Jason Rogalski
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Leonard J Foster
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | - Frank Scheuermeyer
- Department of Emergency Medicine, St. Paul's Hospital and the University of British Columbia, Vancouver, British Columbia V6Z IY6, Canada
- Centre for Health Evaluation and Outcome Sciences, St. Paul's Hospital, Vancouver, British Columbia V6Z IY6, Canada
| | - Aaron M Shapiro
- Provincial Toxicology Centre, Provincial Health Services Authority, Vancouver, British Columbia V5Z 4R4, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada
| |
Collapse
|
17
|
Skinnider MA. Hallucinating hallucinogens. Science 2023; 382:656-657. [PMID: 37943903 DOI: 10.1126/science.adk8626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Fighting the designer drug epidemic with generative AI.
Collapse
Affiliation(s)
- Michael A Skinnider
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Ludwig Institute for Cancer Research, Princeton University, Princeton, NJ, USA
| |
Collapse
|
18
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
19
|
Tay DWP, Yeo NZX, Adaikkappan K, Lim YH, Ang SJ. 67 million natural product-like compound database generated via molecular language processing. Sci Data 2023; 10:296. [PMID: 37208372 DOI: 10.1038/s41597-023-02207-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/21/2023] [Indexed: 05/21/2023] Open
Abstract
Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.
Collapse
Affiliation(s)
- Dillon W P Tay
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
| | - Naythan Z X Yeo
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Hwa Chong Institution, 661 Bukit Timah Road, Singapore, 269734, Republic of Singapore
| | - Krishnan Adaikkappan
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- National Junior College, 37 Hillcrest Road, Singapore, 288913, Republic of Singapore
| | - Yee Hwee Lim
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore, 117597, Republic of Singapore
| | - Shi Jun Ang
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros Building, Singapore, 138665, Republic of Singapore.
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore, 138632, Republic of Singapore.
| |
Collapse
|
20
|
Kou X, Shi P, Gao C, Ma P, Xing H, Ke Q, Zhang D. Data-Driven Elucidation of Flavor Chemistry. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:6789-6802. [PMID: 37102791 PMCID: PMC10176570 DOI: 10.1021/acs.jafc.3c00909] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Flavor molecules are commonly used in the food industry to enhance product quality and consumer experiences but are associated with potential human health risks, highlighting the need for safer alternatives. To address these health-associated challenges and promote reasonable application, several databases for flavor molecules have been constructed. However, no existing studies have comprehensively summarized these data resources according to quality, focused fields, and potential gaps. Here, we systematically summarized 25 flavor molecule databases published within the last 20 years and revealed that data inaccessibility, untimely updates, and nonstandard flavor descriptions are the main limitations of current studies. We examined the development of computational approaches (e.g., machine learning and molecular simulation) for the identification of novel flavor molecules and discussed their major challenges regarding throughput, model interpretability, and the lack of gold-standard data sets for equitable model evaluation. Additionally, we discussed future strategies for the mining and designing of novel flavor molecules based on multi-omics and artificial intelligence to provide a new foundation for flavor science research.
Collapse
Affiliation(s)
- Xingran Kou
- Collaborative Innovation Center of Fragrance Flavour and Cosmetics, School of Perfume and Aroma Technology, Shanghai Institute of Technology, Shanghai 201418, China
| | - Peiqin Shi
- Collaborative Innovation Center of Fragrance Flavour and Cosmetics, School of Perfume and Aroma Technology, Shanghai Institute of Technology, Shanghai 201418, China
| | - Chukun Gao
- Laboratory for Physical Chemistry, ETH Zürich, 8093 Zürich, Switzerland
| | - Peihua Ma
- Department of Nutrition and Food Science, University of Maryland, College Park, Maryland 20742, United States
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Qinfei Ke
- Collaborative Innovation Center of Fragrance Flavour and Cosmetics, School of Perfume and Aroma Technology, Shanghai Institute of Technology, Shanghai 201418, China
| | - Dachuan Zhang
- National Centre of Competence in Research (NCCR) Catalysis, Institute of Environmental Engineering, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|
21
|
Seo S, Lim J, Kim WY. Molecular Generative Model via Retrosynthetically Prepared Chemical Building Block Assembly. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2206674. [PMID: 36596675 PMCID: PMC10015872 DOI: 10.1002/advs.202206674] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Indexed: 06/17/2023]
Abstract
Deep generative models are attracting attention as a smart molecular design strategy. However, previous models often render molecules with low synthesizability, hindering their real-world applications. Here, a novel graph-based conditional generative model which makes molecules by tailoring retrosynthetically prepared chemical building blocks until achieving target properties in an auto-regressive fashion is proposed. This strategy improves the synthesizability and property control of the resulting molecules and also helps learn how to select appropriate building blocks and bind them together to achieve target properties. By applying a negative sampling method to the selection process of building blocks, this model overcame a critical limitation of previous fragment-based models, which can only use molecules from the training set during generation. As a result, the model works equally well with unseen building blocks without sacrificing computational efficiency. It is demonstrated that the model can generate potential inhibitors with high docking scores against the 3CL protease of SARS-COV-2.
Collapse
Affiliation(s)
- Seonghwan Seo
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| | - Jaechang Lim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
| | - Woo Youn Kim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
- AI InstituteKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| |
Collapse
|
22
|
Heinsvig PJ, Noble C, Dalsgaard PW, Mardal M. Forensic drug screening by liquid chromatography hyphenated with high-resolution mass spectrometry (LC-HRMS). Trends Analyt Chem 2023. [DOI: 10.1016/j.trac.2023.117023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
|
23
|
Yang Y, Liu D, Hua Z, Xu P, Wang Y, Di B, Liao J, Su M. Machine Learning-Assisted Rapid Screening of Four Types of New Psychoactive Substances in Drug Seizures. J Chem Inf Model 2023; 63:815-825. [PMID: 36645156 DOI: 10.1021/acs.jcim.2c01342] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Over the past few years, new psychoactive substances (NPS) have become a global health and social problem because of their wide variety, constant structural renewal, vague legal definitions, and rapid adaptation to legal restrictions. The rapid structural modifications of NPS have posed significant challenges for the screening and identification of these new substances using traditional mass spectrometric techniques based on reference substances or a mass spectral database. Here, we propose supervised machine learning (ML) classification models such as k-nearest neighbors, support vector machine, random forest, and multigrained cascade forest for the rapid screening of NPS using mass spectrometric data. This approach utilizes ML methods to learn the statistical probability distributions of mass spectral data for NPS and non-NPS. Four classification ML models were generated and evaluated using a data set comprising 567 LC-MS and 732 GC-MS spectra. Through cross validation, we achieved an F1 score of 0.35-0.97. These algorithms were applied in conjunction with mass spectrometry techniques for the detection of six seizures including electronic cigarette oil and suspected powdered substances netted in drug trafficking cases. The models provided warning signals for synthetic cannabinoids, synthetic cathinones, and fentanyl. Thus, an early warning system was successfully established, which provided a useful method for reliable and effective identifications of unknown NPS.
Collapse
Affiliation(s)
- Yuqing Yang
- School of Pharmacy, China Pharmaceutical University, Nanjing210009, China.,China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China
| | - Dongping Liu
- School of Science, China Pharmaceutical University, Nanjing210009, China
| | - Zhendong Hua
- China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China.,Key Laboratory of Drug Monitoring and Control, Drug Intelligence and Forensic Center, Ministry of Public Security, Beijing100741, P. R. China
| | - Peng Xu
- China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China.,Key Laboratory of Drug Monitoring and Control, Drug Intelligence and Forensic Center, Ministry of Public Security, Beijing100741, P. R. China
| | - Youmei Wang
- China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China.,Key Laboratory of Drug Monitoring and Control, Drug Intelligence and Forensic Center, Ministry of Public Security, Beijing100741, P. R. China
| | - Bin Di
- School of Pharmacy, China Pharmaceutical University, Nanjing210009, China.,China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China
| | - Jun Liao
- School of Science, China Pharmaceutical University, Nanjing210009, China
| | - Mengxiang Su
- School of Pharmacy, China Pharmaceutical University, Nanjing210009, China.,China National Narcotics Control Commission - ChinaPharmaceutical University Joint Laboratory on Key Technologies of Narcotics Control, Nanjing210009, China
| |
Collapse
|
24
|
Winter B, Winter C, Schilling J, Bardow A. A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing. DIGITAL DISCOVERY 2022; 1:859-869. [PMID: 36561987 PMCID: PMC9721150 DOI: 10.1039/d2dd00058j] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 09/27/2022] [Indexed: 12/25/2022]
Abstract
The knowledge of mixtures' phase equilibria is crucial in nature and technical chemistry. Phase equilibria calculations of mixtures require activity coefficients. However, experimental data on activity coefficients are often limited due to the high cost of experiments. For an accurate and efficient prediction of activity coefficients, machine learning approaches have been recently developed. However, current machine learning approaches still extrapolate poorly for activity coefficients of unknown molecules. In this work, we introduce a SMILES-to-properties-transformer (SPT), a natural language processing network, to predict binary limiting activity coefficients from SMILES codes. To overcome the limitations of available experimental data, we initially train our network on a large dataset of synthetic data sampled from COSMO-RS (10 million data points) and then fine-tune the model on experimental data (20 870 data points). This training strategy enables the SPT to accurately predict limiting activity coefficients even for unknown molecules, cutting the mean prediction error in half compared to state-of-the-art models for activity coefficient predictions such as COSMO-RS and UNIFACDortmund, and improving on recent machine learning approaches.
Collapse
Affiliation(s)
- Benedikt Winter
- Energy and Process System Engineering, ETH Zürich Tannenstrasse 3 8092 Zürich Switzerland
| | | | - Johannes Schilling
- Energy and Process System Engineering, ETH Zürich Tannenstrasse 3 8092 Zürich Switzerland
| | - André Bardow
- Energy and Process System Engineering, ETH Zürich Tannenstrasse 3 8092 Zürich Switzerland
| |
Collapse
|
25
|
Zhang Y, Jiang Q, Li L, Li Z, Xu Z, Chen Y, Sun Y, Liu C, Mao Z, Chen F, Li H, Cao Y, Pian C. Predicting the structure of unexplored novel fentanyl analogues by deep learning model. Brief Bioinform 2022; 23:6741166. [PMID: 36184256 DOI: 10.1093/bib/bbac418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/21/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Fentanyl and its analogues are psychoactive substances and the concern of fentanyl abuse has been existed in decades. Because the structure of fentanyl is easy to be modified, criminals may synthesize new fentanyl analogues to avoid supervision. The drug supervision is based on the structure matching to the database and too few kinds of fentanyl analogues are included in the database, so it is necessary to find out more potential fentanyl analogues and expand the sample space of fentanyl analogues. In this study, we introduced two deep generative models (SeqGAN and MolGPT) to generate potential fentanyl analogues, and a total of 11 041 valid molecules were obtained. The results showed that not only can we generate molecules with similar property distribution of original data, but the generated molecules also contain potential fentanyl analogues that are not pretty similar to any of original data. Ten molecules based on the rules of fentanyl analogues were selected for NMR, MS and IR validation. The results indicated that these molecules are all unreported fentanyl analogues. Furthermore, this study is the first to apply the deep learning to the generation of fentanyl analogues, greatly expands the exploring space of fentanyl analogues and provides help for the supervision of fentanyl.
Collapse
Affiliation(s)
| | | | | | - Zutan Li
- Bioinformatics Doctoral Student at Nanjing Agricultural University, China
| | - Zhihui Xu
- Researcher in Simcere Diagnostics Co., Ltd, China
| | - Yuanyuan Chen
- College of Sciences at Nanjing Agricultural University, China
| | - Yang Sun
- Nanjing Medical University, China
| | - Cheng Liu
- Department of Forensic Medicine, College of Basic Medical Science at Nanjing Medical University, China
| | - Zhengsheng Mao
- Forensic Science Department at Nanjing Medical University, China
| | | | - Hualan Li
- Bioinformatics Master Student at Nanjing Agricultural University, China
| | - Yue Cao
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing, JiangsuChina
| |
Collapse
|
26
|
MSNovelist: de novo structure generation from mass spectra. Nat Methods 2022; 19:865-870. [PMID: 35637304 PMCID: PMC9262714 DOI: 10.1038/s41592-022-01486-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/07/2022] [Indexed: 12/29/2022]
Abstract
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds. MSNovelist combines fingerprint prediction with an encoder–decoder neural network for de novo structure generation of small molecules from mass spectra.
Collapse
|
27
|
Klingberg J, Keen B, Cawley A, Pasin D, Fu S. Developments in high-resolution mass spectrometric analyses of new psychoactive substances. Arch Toxicol 2022; 96:949-967. [PMID: 35141767 PMCID: PMC8921034 DOI: 10.1007/s00204-022-03224-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/12/2022] [Indexed: 11/17/2022]
Abstract
The proliferation of new psychoactive substances (NPS) has necessitated the development and improvement of current practices for the detection and identification of known NPS and newly emerging derivatives. High-resolution mass spectrometry (HRMS) is quickly becoming the industry standard for these analyses due to its ability to be operated in data-independent acquisition (DIA) modes, allowing for the collection of large amounts of data and enabling retrospective data interrogation as new information becomes available. The increasing popularity of HRMS has also prompted the exploration of new ways to screen for NPS, including broad-spectrum wastewater analysis to identify usage trends in the community and metabolomic-based approaches to examine the effects of drugs of abuse on endogenous compounds. In this paper, the novel applications of HRMS techniques to the analysis of NPS is reviewed. In particular, the development of innovative data analysis and interpretation approaches is discussed, including the application of machine learning and molecular networking to toxicological analyses.
Collapse
Affiliation(s)
- Joshua Klingberg
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia.
| | - Bethany Keen
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| | - Adam Cawley
- Australian Racing Forensic Laboratory, Racing NSW, Sydney, NSW, 2000, Australia
| | - Daniel Pasin
- Section of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Shanlin Fu
- Centre for Forensic Science, University of Technology Sydney, Broadway, NSW, 2007, Australia
| |
Collapse
|
28
|
Shrivastava AD, Swainston N, Samanta S, Roberts I, Wright Muelas M, Kell DB. MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra. Biomolecules 2021; 11:1793. [PMID: 34944436 PMCID: PMC8699281 DOI: 10.3390/biom11121793] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/14/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
The 'inverse problem' of mass spectrometric molecular identification ('given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came') is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem ('calculate a small molecule's likely fragmentation and hence at least some of its mass spectrum from its structure alone') is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the 'translation' a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the 'true' molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are 'similar' to the top hit. In addition to using the 'top hits' directly, we can produce a rank order of these by 'round-tripping' candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to 'learn' millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Ivayla Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Marina Wright Muelas
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|