1
|
Sudo R, Asakura T, Ishikawa T, Hatakeyama R, Fujiwara A, Inoue K, Mochida K, Nomura K. Transcriptome analysis of the Japanese eel (Anguilla japonica) during larval metamorphosis. BMC Genomics 2024; 25:585. [PMID: 38862878 PMCID: PMC11165803 DOI: 10.1186/s12864-024-10459-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/27/2024] [Indexed: 06/13/2024] Open
Abstract
BACKGROUND Anguillid eels spend their larval period as leptocephalus larvae that have a unique and specialized body form with leaf-like and transparent features, and they undergo drastic metamorphosis to juvenile glass eels. Less is known about the transition of leptocephali to the glass eel stage, because it is difficult to catch the metamorphosing larvae in the open ocean. However, recent advances in rearing techniques for the Japanese eel have made it possible to study the larval metamorphosis of anguillid eels. In the present study, we investigated the dynamics of gene expression during the metamorphosis of Japanese eel leptocephali using RNA sequencing. RESULTS During metamorphosis, Japanese eels were classified into 7 developmental stages according to their morphological characteristics, and RNA sequencing was used to collect gene expression data from each stage. A total of 354.8 million clean reads were generated from the body and 365.5 million from the head, after the processing of raw reads. For filtering of genes that characterize developmental stages, a classification model created by a Random Forest algorithm was built. Using the importance of explanatory variables feature obtained from the created model, we identified 46 genes selected in the body and 169 genes selected in the head that were defined as the "most characteristic genes" during eel metamorphosis. Next, network analysis and subsequently gene clustering were conducted using the most characteristic genes and their correlated genes, and then 6 clusters in the body and 5 clusters in the head were constructed. Then, the characteristics of the clusters were revealed by Gene Ontology (GO) enrichment analysis. The expression patterns and GO terms of each stage were consistent with previous observations and experiments during the larval metamorphosis of the Japanese eel. CONCLUSION Genome and transcriptome resources have been generated for metamorphosing Japanese eels. Genes that characterized metamorphosis of the Japanese eel were identified through statistical modeling by a Random Forest algorithm. The functions of these genes were consistent with previous observations and experiments during the metamorphosis of anguillid eels.
Collapse
Affiliation(s)
- Ryusuke Sudo
- Fisheries Technology Institute, Minamiizu Field Station, Japan Fisheries Research and Education Agency, Minamiizu, Kamo, Shizuoka, 415-0156, Japan.
| | - Taiga Asakura
- Fisheries Resources Institute, Yokohama Field Station, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa, 236-8648, Japan
| | - Takashi Ishikawa
- Fisheries Technology Institute, Nansei Field Station, Japan Fisheries Research and Education Agency, Minamiise, Mie, 516-0193, Japan
| | - Rui Hatakeyama
- Fisheries Technology Institute, Minamiizu Field Station, Japan Fisheries Research and Education Agency, Minamiizu, Kamo, Shizuoka, 415-0156, Japan
| | - Atushi Fujiwara
- Fisheries Technology Institute, Nansei Field Station, Japan Fisheries Research and Education Agency, Minamiise, Mie, 516-0193, Japan
| | - Komaki Inoue
- RIKEN Center for Sustainable Resource Science, Tsurumi-Ku, Yokohama, 230-0045, Japan
| | - Keiichi Mochida
- RIKEN Center for Sustainable Resource Science, Tsurumi-Ku, Yokohama, 230-0045, Japan
- School of Information and Data Sciences, Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, 852-8521, Japan
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka-cho, Totsuka-ku, Yokohama, Kanagawa, 244-0813, Japan
- RIKEN Baton Zone Program, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Kazuharu Nomura
- Fisheries Technology Institute, Nansei Field Station, Japan Fisheries Research and Education Agency, Minamiise, Mie, 516-0193, Japan.
| |
Collapse
|
2
|
Fernandez ME, Martinez-Romero J, Aon MA, Bernier M, Price NL, de Cabo R. How is Big Data reshaping preclinical aging research? Lab Anim (NY) 2023; 52:289-314. [PMID: 38017182 DOI: 10.1038/s41684-023-01286-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/10/2023] [Indexed: 11/30/2023]
Abstract
The exponential scientific and technological progress during the past 30 years has favored the comprehensive characterization of aging processes with their multivariate nature, leading to the advent of Big Data in preclinical aging research. Spanning from molecular omics to organism-level deep phenotyping, Big Data demands large computational resources for storage and analysis, as well as new analytical tools and conceptual frameworks to gain novel insights leading to discovery. Systems biology has emerged as a paradigm that utilizes Big Data to gain insightful information enabling a better understanding of living organisms, visualized as multilayered networks of interacting molecules, cells, tissues and organs at different spatiotemporal scales. In this framework, where aging, health and disease represent emergent states from an evolving dynamic complex system, context given by, for example, strain, sex and feeding times, becomes paramount for defining the biological trajectory of an organism. Using bioinformatics and artificial intelligence, the systems biology approach is leading to remarkable advances in our understanding of the underlying mechanism of aging biology and assisting in creative experimental study designs in animal models. Future in-depth knowledge acquisition will depend on the ability to fully integrate information from different spatiotemporal scales in organisms, which will probably require the adoption of theories and methods from the field of complex systems. Here we review state-of-the-art approaches in preclinical research, with a focus on rodent models, that are leading to conceptual and/or technical advances in leveraging Big Data to understand basic aging biology and its full translational potential.
Collapse
Affiliation(s)
- Maria Emilia Fernandez
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Jorge Martinez-Romero
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Epidemiology and Population Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Miguel A Aon
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Cardiovascular Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Michel Bernier
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Nathan L Price
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Rafael de Cabo
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| |
Collapse
|
3
|
Wevers D, Ramautar R, Clark C, Hankemeier T, Ali A. Opportunities and challenges for sample preparation and enrichment in mass spectrometry for single-cell metabolomics. Electrophoresis 2023; 44:2000-2024. [PMID: 37667867 DOI: 10.1002/elps.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 08/08/2023] [Accepted: 08/19/2023] [Indexed: 09/06/2023]
Abstract
Single-cell heterogeneity in metabolism, drug resistance and disease type poses the need for analytical techniques for single-cell analysis. As the metabolome provides the closest view of the status quo in the cell, studying the metabolome at single-cell resolution may unravel said heterogeneity. A challenge in single-cell metabolome analysis is that metabolites cannot be amplified, so one needs to deal with picolitre volumes and a wide range of analyte concentrations. Due to high sensitivity and resolution, MS is preferred in single-cell metabolomics. Large numbers of cells need to be analysed for proper statistics; this requires high-throughput analysis, and hence automation of the analytical workflow. Significant advances in (micro)sampling methods, CE and ion mobility spectrometry have been made, some of which have been applied in high-throughput analyses. Microfluidics has enabled an automation of cell picking and metabolite extraction; image recognition has enabled automated cell identification. Many techniques have been used for data analysis, varying from conventional techniques to novel combinations of advanced chemometric approaches. Steps have been set in making data more findable, accessible, interoperable and reusable, but significant opportunities for improvement remain. Herein, advances in single-cell analysis workflows and data analysis are discussed, and recommendations are made based on the experimental goal.
Collapse
Affiliation(s)
- Dirk Wevers
- Wageningen University and Research, Wageningen, The Netherlands
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Rawi Ramautar
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Charlie Clark
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Thomas Hankemeier
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Ahmed Ali
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| |
Collapse
|
4
|
Liu W, Zhang L, Bao L, Shen G, Feng J. Accurate Classification and Prediction of Acute Myocardial Infarction through an ARMD Procedure. J Proteome Res 2023; 22:758-767. [PMID: 36710647 DOI: 10.1021/acs.jproteome.2c00488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The risk stratification of acute myocardial infarction (AMI) patients is of prime importance for clinical management and prognosis assessment. Thus, we propose an ensemble machine learning analysis procedure named ADASYN-RFECV-MDA-DNN (ARMD) to address sample-unbalanced problems and enable stratification and prediction of AMI outcomes. The ARMD analysis procedure was applied to the NMR data of sera from 534 AMI-related subjects in four categories with an extremely imbalanced sample proportion. Firstly, the adaptive synthetic sampling (ADASYN) algorithm was used to address the issue of the original sample imbalance. Secondly, the recursive feature elimination with cross-validation (RFECV) processing and random forest mean decrease accuracy (RF-MDA) algorithm was performed to identify the differential metabolites corresponding to each AMI outcome. Finally, the deep neural network (DNN) was employed to classify and predict AMI events, and its performance was evaluated by comparing the four traditional machine learning methods. Compared with the other four machine learning models, DNN presented consistent superiority in almost all of the model parameters including precision, f1-score, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and classification accuracy, highlighting the potential of deep learning in classification and stratification of clinical diseases. The ARMD analysis procedure was a practical analysis tool for supervised classification and regression modeling of clinical diseases.
Collapse
Affiliation(s)
- Wuping Liu
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Lirong Zhang
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Lijun Bao
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Guiping Shen
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| | - Jianghua Feng
- Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, 422 Siming South Road, Siming District, Xiamen, Fujian 361005, China
| |
Collapse
|
5
|
Miyamoto H, Kikuchi J. An evaluation of homeostatic plasticity for ecosystems using an analytical data science approach. Comput Struct Biotechnol J 2023; 21:869-878. [PMID: 36698969 PMCID: PMC9860287 DOI: 10.1016/j.csbj.2023.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 01/05/2023] Open
Abstract
The natural world is constantly changing, and planetary boundaries are issuing severe warnings about biodiversity and cycles of carbon, nitrogen, and phosphorus. In other views, social problems such as global warming and food shortages are spreading to various fields. These seemingly unrelated issues are closely related, but it can be said that understanding them in an integrated manner is still a step away. However, progress in analytical technologies has been recognized in various fields and, from a microscopic perspective, with the development of instruments including next-generation sequencers (NGS), nuclear magnetic resonance (NMR), gas chromatography-mass spectrometry (GC/MS), and liquid chromatography-mass spectrometry (LC/MS), various forms of molecular information such as genome data, microflora structure, metabolome, proteome, and lipidome can be obtained. The development of new technology has made it possible to obtain molecular information in a variety of forms. From a macroscopic perspective, the development of environmental analytical instruments and environmental measurement facilities such as satellites, drones, observation ships, and semiconductor censors has increased the data availability for various environmental factors. Based on these background, the role of computational science is to provide a mechanism for integrating and understanding these seemingly disparate data sets. This review describes machine learning and the need for structural equations and statistical causal inference of these data to solve these problems. In addition to introducing actual examples of how these technologies can be utilized, we will discuss how to use these technologies to implement environmentally friendly technologies in society.
Collapse
Affiliation(s)
- Hirokuni Miyamoto
- Graduate School of Horticulture, Chiba University, Matsudo, Chiba 271-8501, Japan
- RIKEN Center for Integrative Medical Science, Yokohama, Kanagawa 230-0045, Japan
- Sermas Co., Ltd., Ichikawa, Chiba 272-0033, Japan
- Japan Eco-science (Nikkan Kagaku) Co. Ltd., Chiba, Chiba 260-0034, Japan
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
| | - Jun Kikuchi
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa 230-0045, Japan
- Graduate School of Bioagricultural Sciences, Nagoya University, Chikusa, Nagoya 464-8601, Japan
| |
Collapse
|
6
|
Hao H, Jia X, Ren T, Du Y, Wang J. Novel insight into the mechanism underlying synergistic cytotoxicity from two components in 5-Fluorouracil-phenylalanine co-crystal based on cell metabolomics. Eur J Pharm Biopharm 2022; 180:181-189. [DOI: 10.1016/j.ejpb.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 09/21/2022] [Accepted: 10/03/2022] [Indexed: 11/04/2022]
|
7
|
Sidak D, Schwarzerová J, Weckwerth W, Waldherr S. Interpretable machine learning methods for predictions in systems biology from omics data. Front Mol Biosci 2022; 9:926623. [PMID: 36387282 PMCID: PMC9650551 DOI: 10.3389/fmolb.2022.926623] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/15/2022] [Indexed: 12/02/2022] Open
Abstract
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
Collapse
Affiliation(s)
- David Sidak
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
| | - Jana Schwarzerová
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic
| | - Wolfram Weckwerth
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- Vienna Metabolomics Center (VIME), Faculty of Life Sciences, University of Vienna, Vienna, Austria
| | - Steffen Waldherr
- Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
- *Correspondence: Steffen Waldherr,
| |
Collapse
|
8
|
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:ijms232012272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
|
9
|
Chardin D, Gille C, Pourcher T, Humbert O, Barlaud M. Learning a confidence score and the latent space of a new supervised autoencoder for diagnosis and prognosis in clinical metabolomic studies. BMC Bioinformatics 2022; 23:361. [PMID: 36050631 PMCID: PMC9434875 DOI: 10.1186/s12859-022-04900-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 07/27/2022] [Indexed: 11/15/2022] Open
Abstract
Background Presently, there is a wide variety of classification methods and deep neural network approaches in bioinformatics. Deep neural networks have proven their effectiveness for classification tasks, and have outperformed classical methods, but they suffer from a lack of interpretability. Therefore, these innovative methods are not appropriate for decision support systems in healthcare. Indeed, to allow clinicians to make informed and well thought out decisions, the algorithm should provide the main pieces of information used to compute the predicted diagnosis and/or prognosis, as well as a confidence score for this prediction. Methods Herein, we used a new supervised autoencoder (SAE) approach for classification of clinical metabolomic data. This new method has the advantage of providing a confidence score for each prediction thanks to a softmax classifier and a meaningful latent space visualization and to include a new efficient feature selection method, with a structured constraint, which allows for biologically interpretable results. Results Experimental results on three metabolomics datasets of clinical samples illustrate the effectiveness of our SAE and its confidence score. The supervised autoencoder provides an accurate localization of the patients in the latent space, and an efficient confidence score. Experiments show that the SAE outperforms classical methods (PLS-DA, Random Forests, SVM, and neural networks (NN)). Furthermore, the metabolites selected by the SAE were found to be biologically relevant. Conclusion In this paper, we describe a new efficient SAE method to support diagnostic or prognostic evaluation based on metabolomics analyses.
Collapse
Affiliation(s)
- David Chardin
- Transporters in Imaging and Radiotherapy in Oncology (TIRO), Direction de la Recherche Fondamentale (DRF), Institut des sciences du vivant Fréderic Joliot, Commissariat à l'Energie Atomique et aux énergies alternatives (CEA), Université Côte d'Azur (UCA), Nice, France.,Centre Antoine Lacassagne, Université Côte d'Azur (UCA), Nice, France
| | - Cyprien Gille
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis (I3S), Centre de Recherche Scientifique (CNRS), Université Côte d'Azur (UCA), Sophia Antipolis, France
| | - Thierry Pourcher
- Transporters in Imaging and Radiotherapy in Oncology (TIRO), Direction de la Recherche Fondamentale (DRF), Institut des sciences du vivant Fréderic Joliot, Commissariat à l'Energie Atomique et aux énergies alternatives (CEA), Université Côte d'Azur (UCA), Nice, France
| | - Olivier Humbert
- Transporters in Imaging and Radiotherapy in Oncology (TIRO), Direction de la Recherche Fondamentale (DRF), Institut des sciences du vivant Fréderic Joliot, Commissariat à l'Energie Atomique et aux énergies alternatives (CEA), Université Côte d'Azur (UCA), Nice, France.,Centre Antoine Lacassagne, Université Côte d'Azur (UCA), Nice, France
| | - Michel Barlaud
- Laboratoire d'Informatique, Signaux et Systèmes de Sophia Antipolis (I3S), Centre de Recherche Scientifique (CNRS), Université Côte d'Azur (UCA), Sophia Antipolis, France.
| |
Collapse
|
10
|
Parameter Visualization of Benchtop Nuclear Magnetic Resonance Spectra toward Food Process Monitoring. Processes (Basel) 2022. [DOI: 10.3390/pr10071264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022] Open
Abstract
Low-cost and user-friendly benchtop low-field nuclear magnetic resonance (NMR) spectrometers are typically used to monitor food processes in the food industry. Because of excessive spectral overlap, it is difficult to characterize food mixtures using low-field NMR spectroscopy. In addition, for standard compounds, low-field benchtop NMR data are typically unavailable compared to high-field NMR data, which have been accumulated and are reusable in public databases. This work focused on NMR parameter visualization of the chemical structure and mobility of mixtures and the use of high-field NMR data to analyze benchtop NMR data to characterize food process samples. We developed a tool to easily process benchtop NMR data and obtain chemical shifts and T2 relaxation times of peaks, as well as transform high-field NMR data into low-field NMR data. Line broadening and time–frequency analysis methods were adopted for data processing. This tool can visualize NMR parameters to characterize changes in the components and mobilities of food process samples using benchtop NMR data. In addition, assignment errors were smaller when the spectra of standard compounds were identified by transferring the high-field NMR data to low-field NMR data rather than directly using experimentally obtained low-field NMR spectra.
Collapse
|
11
|
Becker M, Jouda M, Kolchinskaya A, Korvink JG. Deep regression with ensembles enables fast, first-order shimming in low-field NMR. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2022; 336:107151. [PMID: 35183922 DOI: 10.1016/j.jmr.2022.107151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/21/2022] [Accepted: 01/24/2022] [Indexed: 06/14/2023]
Abstract
Shimming in the context of nuclear magnetic resonance aims to achieve a uniform magnetic field distribution, as perfect as possible, and is crucial for useful spectroscopy and imaging. Currently, shimming precedes most acquisition procedures in the laboratory, and this mostly semi-automatic procedure often needs to be repeated, which can be cumbersome and time-consuming. The paper investigates the feasibility of completely automating and accelerating the shimming procedure by applying deep learning (DL). We show that DL can relate measured spectral shape to shim current specifications and thus rapidly predict three shim currents simultaneously, given only four input spectra. Due to the lack of accessible data for developing shimming algorithms, we also introduce a database that served as our DL training set, and allows inference of changes to 1H NMR signals depending on shim offsets. In situ experiments of deep regression with ensembles demonstrate a high success rate in spectral quality improvement for random shim distortions over different neural architectures and chemical substances. This paper presents a proof-of-concept that machine learning can simplify and accelerate the shimming problem, either as a stand-alone method, or in combination with traditional shimming methods. Our database and code are publicly available.
Collapse
Affiliation(s)
- Moritz Becker
- Karlsruhe Institute of Technology (KIT), Institute of Microstructure Technology, Karlsruhe 76131, Germany
| | - Mazin Jouda
- Karlsruhe Institute of Technology (KIT), Institute of Microstructure Technology, Karlsruhe 76131, Germany
| | - Anastasiya Kolchinskaya
- Karlsruhe Institute of Technology (KIT), Institute of Microstructure Technology, Karlsruhe 76131, Germany
| | - Jan G Korvink
- Karlsruhe Institute of Technology (KIT), Institute of Microstructure Technology, Karlsruhe 76131, Germany.
| |
Collapse
|
12
|
Debik J, Sangermani M, Wang F, Madssen TS, Giskeødegård GF. Multivariate analysis of NMR-based metabolomic data. NMR IN BIOMEDICINE 2022; 35:e4638. [PMID: 34738674 DOI: 10.1002/nbm.4638] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/08/2021] [Accepted: 09/29/2021] [Indexed: 06/13/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy allows for simultaneous detection of a wide range of metabolites and lipids. As metabolites act together in complex metabolic networks, they are often highly correlated, and optimal biological insight is achieved when using methods that take the correlation into account. For this reason, latent-variable-based methods, such as principal component analysis and partial least-squares discriminant analysis, are widely used in metabolomic studies. However, with increasing availability of larger population cohorts, and a shift from analysis of spectral data to using quantified metabolite levels, both more traditional statistical approaches and alternative machine learning methods have become more widely used. This review aims at providing an overview of the current state-of-the-art multivariate methods for the analysis of NMR-based metabolomic data as well as alternative methods, highlighting their strengths and limitations.
Collapse
Affiliation(s)
- Julia Debik
- Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology-NTNU, Trondheim, Norway
| | - Matteo Sangermani
- Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology-NTNU, Trondheim, Norway
| | - Feng Wang
- Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology-NTNU, Trondheim, Norway
- Clinic of Surgery, St. Olavs Hospital HF, Trondheim, Norway
| | - Torfinn S Madssen
- Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology-NTNU, Trondheim, Norway
| | - Guro F Giskeødegård
- Clinic of Surgery, St. Olavs Hospital HF, Trondheim, Norway
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology-NTNU, Trondheim, Norway
| |
Collapse
|
13
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|
14
|
Kikuchi J, Yamada S. The exposome paradigm to predict environmental health in terms of systemic homeostasis and resource balance based on NMR data science. RSC Adv 2021; 11:30426-30447. [PMID: 35480260 PMCID: PMC9041152 DOI: 10.1039/d1ra03008f] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 08/31/2021] [Indexed: 12/22/2022] Open
Abstract
The environment, from microbial ecosystems to recycled resources, fluctuates dynamically due to many physical, chemical and biological factors, the profile of which reflects changes in overall state, such as environmental illness caused by a collapse of homeostasis. To evaluate and predict environmental health in terms of systemic homeostasis and resource balance, a comprehensive understanding of these factors requires an approach based on the "exposome paradigm", namely the totality of exposure to all substances. Furthermore, in considering sustainable development to meet global population growth, it is important to gain an understanding of both the circulation of biological resources and waste recycling in human society. From this perspective, natural environment, agriculture, aquaculture, wastewater treatment in industry, biomass degradation and biodegradable materials design are at the forefront of current research. In this respect, nuclear magnetic resonance (NMR) offers tremendous advantages in the analysis of samples of molecular complexity, such as crude bio-extracts, intact cells and tissues, fibres, foods, feeds, fertilizers and environmental samples. Here we outline examples to promote an understanding of recent applications of solution-state, solid-state, time-domain NMR and magnetic resonance imaging (MRI) to the complex evaluation of organisms, materials and the environment. We also describe useful databases and informatics tools, as well as machine learning techniques for NMR analysis, demonstrating that NMR data science can be used to evaluate the exposome in both the natural environment and human society towards a sustainable future.
Collapse
Affiliation(s)
- Jun Kikuchi
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Graduate School of Bioagricultural Sciences, Nagoya University Furo-cho, Chikusa-ku Nagoya 464-8601 Japan
- Graduate School of Medical Life Science, Yokohama City University 1-7-29 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
| | - Shunji Yamada
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Prediction Science Laboratory, RIKEN Cluster for Pioneering Research 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
- Data Assimilation Research Team, RIKEN Center for Computational Science 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
| |
Collapse
|
15
|
An Improved Stacked Autoencoder for Metabolomic Data Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:1051172. [PMID: 34434226 PMCID: PMC8382558 DOI: 10.1155/2021/1051172] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/28/2021] [Accepted: 07/28/2021] [Indexed: 12/24/2022]
Abstract
Naru3 (NR) is a traditional Mongolian medicine with high clinical efficacy and low incidence of side effects. Metabolomics is an approach that can facilitate the development of traditional drugs. However, metabolomic data have a high throughput, sparse, high-dimensional, and small sample nature, and their classification is challenging. Although deep learning methods have a wide range of applications, deep learning-based metabolomic studies have not been widely performed. We aimed to develop an improved stacked autoencoder (SAE) for metabolomic data classification. We established an NR-treated rheumatoid arthritis (RA) mouse model and classified the obtained metabolomic data using the Hessian-free SAE (HF-SAE) algorithm. During training, the unlabeled data were used for pretraining, and the labeled data were used for fine-tuning based on the HF algorithm for gradient descent optimization. The hybrid algorithm successfully classified the data. The results were compared with those of the support vector machine (SVM), k-nearest neighbor (KNN), and gradient descent SAE (GD-SAE) algorithms. A five-fold cross-validation was used to complete the classification experiment. In each fine-tuning process, the mean square error (MSE) and misclassification rates of the training and test data were recorded. We successfully established an NR animal model and an improved SAE for metabolomic data classification.
Collapse
|
16
|
Feng N, Geng X, Sun B. Study on Neural Network Integration Method Based on Morphological Associative Memory Framework. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10569-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
17
|
Date Y, Wei F, Tsuboi Y, Ito K, Sakata K, Kikuchi J. Relaxometric learning: a pattern recognition method for T 2 relaxation curves based on machine learning supported by an analytical framework. BMC Chem 2021; 15:13. [PMID: 33610164 PMCID: PMC7897374 DOI: 10.1186/s13065-020-00731-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 12/15/2020] [Indexed: 11/10/2022] Open
Abstract
Nuclear magnetic resonance (NMR)-based relaxometry is widely used in various fields of research because of its advantages such as simple sample preparation, easy handling, and relatively low cost compared with metabolomics approaches. However, there have been no reports on the application of the T2 relaxation curves in metabolomics studies involving the evaluation of metabolic mixtures, such as geographical origin determination and feature extraction by pattern recognition and data mining. In this study, we describe a data mining method for relaxometric data (i.e., relaxometric learning). This method is based on a machine learning algorithm supported by the analytical framework optimized for the relaxation curve analyses. In the analytical framework, we incorporated a variable optimization approach and bootstrap resampling-based matrixing to enhance the classification performance and balance the sample size between groups, respectively. The relaxometric learning enabled the extraction of features related to the physical properties of fish muscle and the determination of the geographical origin of the fish by improving the classification performance. Our results suggest that relaxometric learning is a powerful and versatile alternative to conventional metabolomics approaches for evaluating fleshiness of chemical mixtures in food and for other biological and chemical research requiring a nondestructive, cost-effective, and time-saving method. ![]()
Collapse
Affiliation(s)
- Yasuhiro Date
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Feifei Wei
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Yuuri Tsuboi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Kengo Ito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Kenji Sakata
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan. .,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan. .,Graduate School of Bioagricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan.
| |
Collapse
|
18
|
Wei F, Ito K, Sakata K, Asakura T, Date Y, Kikuchi J. Fish ecotyping based on machine learning and inferred network analysis of chemical and physical properties. Sci Rep 2021; 11:3766. [PMID: 33580151 PMCID: PMC7881121 DOI: 10.1038/s41598-021-83194-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/27/2021] [Indexed: 01/13/2023] Open
Abstract
Functional diversity rather than species richness is critical for the understanding of ecological patterns and processes. This study aimed to develop novel integrated analytical strategies for the functional characterization of fish diversity based on the quantification, prediction and integration of the chemical and physical features in fish muscles. Machine learning models with an improved random forest algorithm applied on 1867 muscle nuclear magnetic resonance spectra belonging to 249 fish species successfully predicted the mobility patterns of fishes into four categories (migratory, territorial, rockfish, and demersal) with accuracies of 90.3-95.4%. Markov blanket-based feature selection method with an ecological-chemical-physical integrated network based on the Bayesian network inference algorithm highlighted the importance of nitrogen metabolism, which is critical for environmental adaptability of fishes in nutrient-rich environments, in the functional characterization of fish biodiversity. Our study provides valuable information and analytical strategies for fish home-range assessment on the basis of the chemical and physical characterization of fish muscle, which can serve as an ecological indicator for fish ecotyping and human impact monitoring.
Collapse
Affiliation(s)
- Feifei Wei
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan
| | - Kengo Ito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan
| | - Kenji Sakata
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan
| | - Taiga Asakura
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan
| | - Yasuhiro Date
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 235-0045, Japan. .,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehirocho, Tsurumi-ku, Yokohama, 230-0045, Japan. .,Graduate School of Bioagricultural Sciences and School of Agricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan.
| |
Collapse
|
19
|
Yang Q, Ji H, Lu H, Zhang Z. Prediction of Liquid Chromatographic Retention Time with Graph Neural Networks to Assist in Small Molecule Identification. Anal Chem 2021; 93:2200-2206. [PMID: 33406817 DOI: 10.1021/acs.analchem.0c04071] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The predicted liquid chromatographic retention times (RTs) of small molecules are not accurate enough for wide adoption in structural identification. In this study, we used the graph neural network to predict the retention time (GNN-RT) from structures of small molecules directly without the requirement of molecular descriptors. The predicted accuracy of GNN-RT was compared with random forests (RFs), Bayesian ridge regression, convolutional neural network (CNN), and a deep-learning regression model (DLM) on a METLIN small molecule retention time (SMRT) dataset. GNN-RT achieved the highest predicting accuracy with a mean relative error of 4.9% and a median relative error of 3.2%. Furthermore, the SMRT-trained GNN-RT model can be transferred to the same type of chromatographic systems easily. The predicted RT is valuable for structural identification in complementary to tandem mass spectra and can be used to assist in the identification of compounds. The results indicate that GNN-RT is a promising method to predict the RT for liquid chromatography and improve the accuracy of structural identification for small molecules.
Collapse
Affiliation(s)
- Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
20
|
Applications of Deep Learning in Biomedicine. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11507-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
21
|
Wu K, Luo J, Zeng Q, Dong X, Chen J, Zhan C, Chen Z, Lin Y. Improvement in Signal-to-Noise Ratio of Liquid-State NMR Spectroscopy via a Deep Neural Network DN-Unet. Anal Chem 2020; 93:1377-1382. [DOI: 10.1021/acs.analchem.0c03087] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ke Wu
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Jie Luo
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Qing Zeng
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Xi Dong
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Jinyong Chen
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Chaoqun Zhan
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Zhong Chen
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Yanqin Lin
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| |
Collapse
|
22
|
Pomyen Y, Wanichthanarak K, Poungsombat P, Fahrmann J, Grapov D, Khoomrung S. Deep metabolome: Applications of deep learning in metabolomics. Comput Struct Biotechnol J 2020; 18:2818-2825. [PMID: 33133423 PMCID: PMC7575644 DOI: 10.1016/j.csbj.2020.09.033] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/21/2020] [Accepted: 09/21/2020] [Indexed: 01/11/2023] Open
Abstract
In the past few years, deep learning has been successfully applied to various omics data. However, the applications of deep learning in metabolomics are still relatively low compared to others omics. Currently, data pre-processing using convolutional neural network architecture appears to benefit the most from deep learning. Compound/structure identification and quantification using artificial neural network/deep learning performed relatively better than traditional machine learning techniques, whereas only marginally better results are observed in biological interpretations. Before deep learning can be effectively applied to metabolomics, several challenges should be addressed, including metabolome-specific deep learning architectures, dimensionality problems, and model evaluation regimes.
Collapse
Key Words
- AI, Artificial Intelligence
- ANN, Artificial Neural Network
- AUC, Area Under the receiver-operating characteristic Curve
- Artificial neural network
- CCS value, Collision Cross Section value
- CFM-EI, Competitive Fragmentation Modeling-Electron Ionization
- CNN, Convolutional Neural Network
- DL, Deep Learning
- DNN, Deep Neural Network
- Deep learning
- ECFP, Extended Circular Fingerprint
- ER, Estrogen Receptor
- FID, Free Induction Decay
- FP score, Fingerprint correlation score
- FTIR, Fourier Transform Infrared
- GC–MS, Gas Chromatography-Mass Spectrometry
- HDLSS data, High Dimensional Low Sample Size data
- IST, Iterative Soft Thresholding
- LC-MS, Liquid Chromatography-Mass Spectrometry
- LSTM, Long Short-Term Memory
- ML, Machine Learning
- MLP, Multi-layered Perceptron
- MS, Mass Spectrometry
- Mass spectrometry
- Metabolomics
- NEIMS, Neural Electron-Ionization Mass Spectrometry
- NMR
- NMR, Nuclear Magnetic Resonance
- NUS, Non-Uniformly Sampling
- PARAFAC2, Parallel Factor Analysis 2
- RF, Random Forest
- RNN, Recurrent Neural Network
- ReLU, Rectified Linear Unit
- SMARTS, SMILES arbitrary target specification
- SMILE, Sparse Multidimensional Iterative Lineshape-enhanced
- SMILES, Simplified Molecular-Input Line-Entry System
- SRA, Sequence Read Archive
- VAE, Variational Autoencoder
- istHMS, Implementation of IST at Harvard Medical School
- m/z, mass/charge ratio
Collapse
Affiliation(s)
- Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, Bangkok, Thailand
| | - Kwanjeera Wanichthanarak
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Patcha Poungsombat
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| | - Johannes Fahrmann
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | - Dmitry Grapov
- CDS- Creative Data Solutions LLC, https://creative-data.solutions, USA
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| |
Collapse
|
23
|
Sen P, Lamichhane S, Mathema VB, McGlinchey A, Dickens AM, Khoomrung S, Orešič M. Deep learning meets metabolomics: a methodological perspective. Brief Bioinform 2020; 22:1531-1542. [PMID: 32940335 DOI: 10.1093/bib/bbaa204] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/08/2020] [Accepted: 08/10/2020] [Indexed: 12/15/2022] Open
Abstract
Deep learning (DL), an emerging area of investigation in the fields of machine learning and artificial intelligence, has markedly advanced over the past years. DL techniques are being applied to assist medical professionals and researchers in improving clinical diagnosis, disease prediction and drug discovery. It is expected that DL will help to provide actionable knowledge from a variety of 'big data', including metabolomics data. In this review, we discuss the applicability of DL to metabolomics, while presenting and discussing several examples from recent research. We emphasize the use of DL in tackling bottlenecks in metabolomics data acquisition, processing, metabolite identification, as well as in metabolic phenotyping and biomarker discovery. Finally, we discuss how DL is used in genome-scale metabolic modelling and in interpretation of metabolomics data. The DL-based approaches discussed here may assist computational biologists with the integration, prediction and drawing of statistical inference about biological outcomes, based on metabolomics data.
Collapse
Affiliation(s)
- Partho Sen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Santosh Lamichhane
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Vivek B Mathema
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Aidan McGlinchey
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Alex M Dickens
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.,Center for Innovation in Chemistry (PERCH), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| |
Collapse
|
24
|
Large-Scale Evaluation of Major Soluble Macromolecular Components of Fish Muscle from a Conventional 1H-NMR Spectral Database. Molecules 2020; 25:molecules25081966. [PMID: 32340308 PMCID: PMC7221887 DOI: 10.3390/molecules25081966] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 04/18/2020] [Accepted: 04/21/2020] [Indexed: 01/03/2023] Open
Abstract
Conventional proton nuclear magnetic resonance (1H-NMR) has been widely used for identification and quantification of small molecular components in food. However, identification of major soluble macromolecular components from conventional 1H-NMR spectra is difficult. This is because the baseline appearance is masked by the dense and high-intensity signals from small molecular components present in the sample mixtures. In this study, we introduced an integrated analytical strategy based on the combination of additional measurement using a diffusion filter, covariation peak separation, and matrix decomposition in a small-scale training dataset. This strategy is aimed to extract signal profiles of soluble macromolecular components from conventional 1H-NMR spectral data in a large-scale dataset without the requirement of re-measurement. We applied this method to the conventional 1H-NMR spectra of water-soluble fish muscle extracts and investigated the distribution characteristics of fish diversity and muscle soluble macromolecular components, such as lipids and collagens. We identified a cluster of fish species with low content of lipids and high content of collagens in muscle, which showed great potential for the development of functional foods. Because this mechanical data processing method requires additional measurement of only a small-scale training dataset without special sample pretreatment, it should be immediately applicable to extract macromolecular signals from accumulated conventional 1H-NMR databases of other complex gelatinous mixtures in foods.
Collapse
|
25
|
Hou L, Guan S, Jin Y, Sun W, Wang Q, Du Y, Zhang R. Cell metabolomics to study the cytotoxicity of carbon black nanoparticles on A549 cells using UHPLC-Q/TOF-MS and multivariate data analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020; 698:134122. [PMID: 31505349 DOI: 10.1016/j.scitotenv.2019.134122] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/30/2019] [Accepted: 08/25/2019] [Indexed: 06/10/2023]
Abstract
Carbon black nanoparticles (CBNPs) are core component of fine particulate matter (PM2.5) in the atmosphere. It was reported that the particle in the atmosphere with smaller size and the larger the specific surface area are easier to reach the deep respiratory tract or even the alveoli through the respiratory barrier and cause lung injury. Therefore, it has been believed that ultrafine or nanometer particles with more toxic than those with larger particle sizes. Moreover, it was confirmed that CBNPs could induce inflammation, oxidative stress and changes in cell signaling and gene expression in mammalian cells and organs. However, the cytotoxicity mechanism of them has been uncertain so far. The aim of the present study was to explore the underlying mechanism of cytotoxicity induced by CBNPs on A549 cells. In the current research, the viabilities of A549 cells were detected by Cell Counting Kit-8 (CCK-8) assay. The further metabolomics studies were conducted to detect the cytotoxic effect of CBNPs on A549 cells with an IC50 value of 70 μg/mL for 48 h. Potential differential compounds were identified and quantified using a novel on-line acquisition method based on ultra-liquid chromatography quadrupole time-of-flight mass spectrometry(UHPLC-Q-TOF/MS). The cytotoxicity mechanism of CBNPs on A549 cells was evaluated by multivariate data analysis and statistics. As a result, a total of 32 differential compounds were identified between CBNPs exposure and control groups. In addition, pathway analysis showed the metabolic changes were involved in the tricarboxylic acid (TCA) cycle, alanine, aspartate and glutamate metabolism, histidine metabolism and so on. It is also suggested that CBNPs may induce cytotoxicity by affecting the normal process of energy metabolism and disturbing several vital signaling pathways and finally induce cell apoptosis.
Collapse
Affiliation(s)
- Ludan Hou
- Department of Pharmaceutical Analysis, School of Pharmacy, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Shuai Guan
- The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Yiran Jin
- Department of Pharmaceutical Analysis, School of Pharmacy, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China; The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Wenjing Sun
- Department of Pharmaceutical Analysis, School of Pharmacy, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Qiao Wang
- Department of Pharmaceutical Analysis, School of Pharmacy, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Yingfeng Du
- Department of Pharmaceutical Analysis, School of Pharmacy, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China.
| | - Rong Zhang
- Department of Occupational and Environmental Health, The School of Public Health, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| |
Collapse
|
26
|
Mendez KM, Broadhurst DI, Reinke SN. The application of artificial neural networks in metabolomics: a historical perspective. Metabolomics 2019; 15:142. [PMID: 31628551 DOI: 10.1007/s11306-019-1608-0] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 10/11/2019] [Indexed: 02/08/2023]
Abstract
BACKGROUND Metabolomics data, with its complex covariance structure, is typically modelled by projection-based machine learning (ML) methods such as partial least squares (PLS) regression, which project data into a latent structure. Biological data are often non-linear, so it is reasonable to hypothesize that metabolomics data may also have a non-linear latent structure, which in turn would be best modelled using non-linear equations. A non-linear ML method with a similar projection equation structure to PLS is artificial neural networks (ANNs). While ANNs were first applied to metabolic profiling data in the 1990s, the lack of community acceptance combined with limitations in computational capacity and the lack of volume of data for robust non-linear model optimisation inhibited their widespread use. Due to recent advances in computational power, modelling improvements, community acceptance, and the more demanding needs for data science, ANNs have made a recent resurgence in interest across research communities, including a small yet growing usage in metabolomics. As metabolomics experiments become more complex and start to be integrated with other omics data, there is potential for ANNs to become a viable alternative to linear projection methods. AIM OF REVIEW We aim to first describe ANNs and their structural equivalence to linear projection-based methods, including PLS regression. We then review the historical, current, and future uses of ANNs in the field of metabolomics. KEY SCIENTIFIC CONCEPT OF REVIEW Is metabolomics ready for the return of artificial neural networks?
Collapse
Affiliation(s)
- Kevin M Mendez
- Centre for Integrative Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia
| | - David I Broadhurst
- Centre for Integrative Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia.
| | - Stacey N Reinke
- Centre for Integrative Metabolomics & Computational Biology, School of Science, Edith Cowan University, Joondalup, 6027, Australia.
| |
Collapse
|
27
|
Signal pattern plot: a simple tool for time-dependent metabolomics studies by 1H NMR spectroscopy. Anal Bioanal Chem 2019; 411:6857-6866. [DOI: 10.1007/s00216-019-02055-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 06/21/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]
|
28
|
Liu Y, Zhou S, Han W, Liu W, Qiu Z, Li C. Convolutional neural network for hyperspectral data analysis and effective wavelengths selection. Anal Chim Acta 2019; 1086:46-54. [PMID: 31561793 DOI: 10.1016/j.aca.2019.08.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/06/2019] [Accepted: 08/14/2019] [Indexed: 01/23/2023]
Abstract
Fusion of spectral and spatial information has been proved to be an effective approach to improve model performance in near-infrared hyperspectral data analysis. Regardless, most of the existing spectral-spatial classification methods require fairly complex pipelines and exact selection of parameters, which mainly depend on the investigator's experience and the object under test. Convolutional neural network (CNN) is a powerful tool for representing complicated data and usually works with few "hand-engineering", making it an appropriate candidate for developing a general and automatic approach. In this paper, a two-branch convolutional neural network (2B-CNN) was developed for spectral-spatial classification and effective wavelengths (EWs) selection. The proposed network was evaluated by three classification data sets, including herbal medicine, coffee bean and strawberry. The results showed that the 2B-CNN obtained the best classification accuracies (96.72% in average) when compared with support vector machine (92.60% in average), one dimensional CNN (92.58% in average), and grey level co-occurrence matrix based support vector machine (93.83% in average). Furthermore, the learned weights of the two-dimensional branch in 2B-CNN were adopted as the indicator of EWs and compared with the successive projections algorithm. The 2B-CNN models built with wavelengths selected by the weight indicator achieved the best accuracies (96.02% in average) among all the examined EWs models. Different from the conventional EWs selection method, the proposed algorithm works without any additional retraining and has the ability to comprehensively consider the discriminative power in spectral domain and spatial domain.
Collapse
Affiliation(s)
- Yisen Liu
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China
| | - Songbin Zhou
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China.
| | - Wei Han
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China
| | - Weixin Liu
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China
| | - Zefan Qiu
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China
| | - Chang Li
- Guangdong Institute of Intelligent Manufacturing, Guangzhou, China
| |
Collapse
|
29
|
Steuer AE, Brockbals L, Kraemer T. Metabolomic Strategies in Biomarker Research-New Approach for Indirect Identification of Drug Consumption and Sample Manipulation in Clinical and Forensic Toxicology? Front Chem 2019; 7:319. [PMID: 31134189 PMCID: PMC6523029 DOI: 10.3389/fchem.2019.00319] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 04/23/2019] [Indexed: 11/13/2022] Open
Abstract
Drug of abuse (DOA) consumption is a growing problem worldwide, particularly with increasing numbers of new psychoactive substances (NPS) entering the drug market. Generally, little information on their adverse effects and toxicity are available. The direct detection and identification of NPS is an analytical challenge due to their ephemerality on the drug scene. An approach that does not directly focus on the structural detection of an analyte or its metabolites, would be beneficial for this complex analytical scenario and the development of alternative screening methods could help to provide fast response on suspected NPS consumption. A metabolomics approach might represent such an alternative strategy for the identification of biomarkers for different questions in DOA testing. Metabolomics is the monitoring of changes in small (endogenous) molecules (<1,000 Da) in response to a certain stimulus, e.g., DOA consumption. For this review, a literature search targeting "metabolomics" and different DOAs or NPS was conducted. Thereby, different applications of metabolomic strategies in biomarker research for DOA identification were identified: (a) as an additional tool for metabolism studies bearing the major advantage that particularly a priori unknown or unexpected metabolites can be identified; and (b) for identification of endogenous biomarker or metabolite patterns, e.g., for synthetic cannabinoids or also to indirectly detect urine manipulation attempts by chemical adulteration or replacement with artificial urine samples. The majority of the currently available literature in that field, however, deals with metabolomic studies for DOAs to better assess their acute or chronic effects or to find biomarkers for drug addiction and tolerance. Certain changes in endogenous compounds are detected for all studied DOAs, but often similar compounds/pathways are influenced. When evaluating these studies with regard to possible biomarkers for drug consumption, the observed changes appear, albeit statistically significant, too small to reliably work as biomarker for drug consumption. Further, different drugs were shown to affect the same pathways. In conclusion, metabolomic approaches possess potential for detection of biomarkers indicating drug consumption. More studies, including more sensitive targeted analyses, multi-variant statistical models or deep-learning approaches are needed to fully explore the potential of omics science in DOA testing.
Collapse
Affiliation(s)
- Andrea E Steuer
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Lana Brockbals
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Thomas Kraemer
- Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| |
Collapse
|
30
|
Ji H, Xu Y, Lu H, Zhang Z. Deep MS/MS-Aided Structural-Similarity Scoring for Unknown Metabolite Identification. Anal Chem 2019; 91:5629-5637. [DOI: 10.1021/acs.analchem.8b05405] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Hongchao Ji
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, People’s Republic of China
| | - Yamei Xu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, People’s Republic of China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, People’s Republic of China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, People’s Republic of China
| |
Collapse
|
31
|
Misra BB, Mohapatra S. Tools and resources for metabolomics research community: A 2017-2018 update. Electrophoresis 2018; 40:227-246. [PMID: 30443919 DOI: 10.1002/elps.201800428] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 11/09/2018] [Accepted: 11/09/2018] [Indexed: 01/09/2023]
Abstract
The scale at which MS- and NMR-based platforms generate metabolomics datasets for both research, core, and clinical facilities to address challenges in the various sciences-ranging from biomedical to agricultural-is underappreciated. Thus, metabolomics efforts spanning microbe, environment, plant, animal, and human systems have led to continual and concomitant growth of in silico resources for analysis and interpretation of these datasets. These software tools, resources, and databases drive the field forward to help keep pace with the amount of data being generated and the sophisticated and diverse analytical platforms that are being used to generate these metabolomics datasets. To address challenges in data preprocessing, metabolite annotation, statistical interrogation, visualization, interpretation, and integration, the metabolomics and informatics research community comes up with hundreds of tools every year. The purpose of the present review is to provide a brief and useful summary of more than 95 metabolomics tools, software, and databases that were either developed or significantly improved during 2017-2018. We hope to see this review help readers, developers, and researchers to obtain informed access to these thorough lists of resources for further improvisation, implementation, and application in due course of time.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| | | |
Collapse
|