1
|
Karasev DA, Sobolev BN, Lagunin AA, Filimonov DA, Poroikov VV. The method predicting interaction between protein targets and small-molecular ligands with the wide applicability domain. Comput Biol Chem 2022; 98:107674. [DOI: 10.1016/j.compbiolchem.2022.107674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 11/03/2022]
|
2
|
Wood JE, Gill BD, Longstaff WM, Crawford RA, Indyk HE, Kissling RC, Lin YH, Bergonia CA, Davis LM, Matuszek A. Dairy product quality using screening of aroma compounds by selected ion flow tube‒mass spectrometry: A chemometric approach. Int Dairy J 2021. [DOI: 10.1016/j.idairyj.2021.105107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
3
|
Almaghrabi F, Xu DL, Yang JB. An evidential reasoning rule based feature selection for improving trauma outcome prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107112] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 2021; 21:54. [PMID: 33588830 PMCID: PMC7885605 DOI: 10.1186/s12911-021-01403-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. METHODS This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. RESULTS A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. CONCLUSIONS A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.
Collapse
Affiliation(s)
| | - Lisa M Hess
- Eli Lilly and Company, Indianapolis, IN, USA.
| |
Collapse
|
5
|
Abstract
In recent years, mass spectrometry (MS)-based metabolomics has been extensively applied to characterize biochemical mechanisms, and study physiological processes and phenotypic changes associated with disease. Metabolomics has also been important for identifying biomarkers of interest suitable for clinical diagnosis. For the purpose of predictive modeling, in this chapter, we will review various supervised learning algorithms such as random forest (RF), support vector machine (SVM), and partial least squares-discriminant analysis (PLS-DA). In addition, we will also review feature selection methods for identifying the best combination of metabolites for an accurate predictive model. We conclude with best practices for reproducibility by including internal and external replication, reporting metrics to assess performance, and providing guidelines to avoid overfitting and to deal with imbalanced classes. An analysis of an example data will illustrate the use of different machine learning methods and performance metrics.
Collapse
Affiliation(s)
- Tusharkanti Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Weiming Zhang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
6
|
Brown SD. Classification of tropical hardwood samples by species and geographical origin. Microchem J 2020. [DOI: 10.1016/j.microc.2020.105326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Mascellani A, Hoca G, Babisz M, Krska P, Kloucek P, Havlik J. 1H NMR chemometric models for classification of Czech wine type and variety. Food Chem 2020; 339:127852. [PMID: 32889133 DOI: 10.1016/j.foodchem.2020.127852] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 08/13/2020] [Accepted: 08/14/2020] [Indexed: 02/06/2023]
Abstract
A set of 917 wines of Czech origin were analysed using nuclear magnetic resonance spectroscopy (NMR) with the aim of building and evaluating multivariate statistical models and machine learning methods for the classification of 6 types based on colour and residual sugar content, 13 wine grape varieties and 4 locations based on 1H NMR spectra. The predictive models afforded greater than 93% correctness for classifying dry and medium dry, medium, and sweet white wines and dry red wines. The trained Random Forest (RF) model classified Pinot noir with 96% correctness, Blaufränkisch 96%, Riesling 92%, Cabernet Sauvignon 77%, Chardonnay 76%, Gewürtztraminer 60%, Hibernal 60%, Grüner Veltliner 52%, Pinot gris 48%, Sauvignon Blanc 45%, and Pálava 40%. Pinot blanc and Chardonnay, varieties that are often mistakenly interchanged, were discriminated with 71% correctness. The findings support chemometrics as a tool for predicting important features in wine, particularly for quality assessment and fraud detection.
Collapse
Affiliation(s)
- Anna Mascellani
- Department of Food Science, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, 165 00 Prague 6 - Suchdol, Czech Republic
| | - Gokce Hoca
- Department of Food Science, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, 165 00 Prague 6 - Suchdol, Czech Republic
| | - Marek Babisz
- The National Wine Centre, Zamek 1, 691 42 Valtice, Czech Republic
| | - Pavel Krska
- The National Wine Centre, Zamek 1, 691 42 Valtice, Czech Republic
| | - Pavel Kloucek
- Department of Food Science, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, 165 00 Prague 6 - Suchdol, Czech Republic
| | - Jaroslav Havlik
- Department of Food Science, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Kamycka 129, 165 00 Prague 6 - Suchdol, Czech Republic.
| |
Collapse
|
8
|
Identifying unknown metabolites using NMR-based metabolic profiling techniques. Nat Protoc 2020; 15:2538-2567. [PMID: 32681152 DOI: 10.1038/s41596-020-0343-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 04/20/2020] [Indexed: 01/20/2023]
Abstract
Metabolic profiling of biological samples provides important insights into multiple physiological and pathological processes but is hindered by a lack of automated annotation and standardized methods for structure elucidation of candidate disease biomarkers. Here we describe a system for identifying molecular species derived from nuclear magnetic resonance (NMR) spectroscopy-based metabolic phenotyping studies, with detailed information on sample preparation, data acquisition and data modeling. We provide eight different modular workflows to be followed in a recommended sequential order according to their level of difficulty. This multi-platform system involves the use of statistical spectroscopic tools such as Statistical Total Correlation Spectroscopy (STOCSY), Subset Optimization by Reference Matching (STORM) and Resolution-Enhanced (RED)-STORM to identify other signals in the NMR spectra relating to the same molecule. It also uses two-dimensional NMR spectroscopic analysis, separation and pre-concentration techniques, multiple hyphenated analytical platforms and data extraction from existing databases. The complete system, using all eight workflows, would take up to a month, as it includes multi-dimensional NMR experiments that require prolonged experiment times. However, easier identification cases using fewer steps would take 2 or 3 days. This approach to biomarker discovery is efficient and cost-effective and offers increased chemical space coverage of the metabolome, resulting in faster and more accurate assignment of NMR-generated biomarkers arising from metabolic phenotyping studies. It requires a basic understanding of MATLAB to use the statistical spectroscopic tools and analytical skills to perform solid phase extraction (SPE), liquid chromatography (LC) fraction collection, LC-NMR-mass spectroscopy and one-dimensional and two-dimensional NMR experiments.
Collapse
|
9
|
Bos TS, Knol WC, Molenaar SR, Niezen LE, Schoenmakers PJ, Somsen GW, Pirok BW. Recent applications of chemometrics in one- and two-dimensional chromatography. J Sep Sci 2020; 43:1678-1727. [PMID: 32096604 PMCID: PMC7317490 DOI: 10.1002/jssc.202000011] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/20/2020] [Accepted: 02/21/2020] [Indexed: 12/28/2022]
Abstract
The proliferation of increasingly more sophisticated analytical separation systems, often incorporating increasingly more powerful detection techniques, such as high-resolution mass spectrometry, causes an urgent need for highly efficient data-analysis and optimization strategies. This is especially true for comprehensive two-dimensional chromatography applied to the separation of very complex samples. In this contribution, the requirement for chemometric tools is explained and the latest developments in approaches for (pre-)processing and analyzing data arising from one- and two-dimensional chromatography systems are reviewed. The final part of this review focuses on the application of chemometrics for method development and optimization.
Collapse
Affiliation(s)
- Tijmen S. Bos
- Division of Bioanalytical ChemistryAmsterdam Institute for Molecules, Medicines and SystemsVrije Universiteit AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Wouter C. Knol
- Analytical Chemistry Groupvan ’t Hoff Institute for Molecular Sciences, Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Stef R.A. Molenaar
- Analytical Chemistry Groupvan ’t Hoff Institute for Molecular Sciences, Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Leon E. Niezen
- Analytical Chemistry Groupvan ’t Hoff Institute for Molecular Sciences, Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Peter J. Schoenmakers
- Analytical Chemistry Groupvan ’t Hoff Institute for Molecular Sciences, Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Govert W. Somsen
- Division of Bioanalytical ChemistryAmsterdam Institute for Molecules, Medicines and SystemsVrije Universiteit AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| | - Bob W.J. Pirok
- Analytical Chemistry Groupvan ’t Hoff Institute for Molecular Sciences, Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
- Centre for Analytical Sciences Amsterdam (CASA)AmsterdamThe Netherlands
| |
Collapse
|
10
|
Bongers BJ, IJzerman AP, Van Westen GJP. Proteochemometrics - recent developments in bioactivity and selectivity modeling. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:89-98. [PMID: 33386099 DOI: 10.1016/j.ddtec.2020.08.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/12/2023]
Abstract
Proteochemometrics is a machine learning based modeling approach relying on a combination of ligand and protein descriptors. With ongoing developments in machine learning and increases in public data the technique is more frequently applied in early drug discovery, typically in ligand-target binding prediction. Common applications include improvements to single target quantitative structure-activity relationship models, protein selectivity and promiscuity modeling, and large-scale deep learning approaches. The increase in predictive power using proteochemometrics is observed in multi-target bioactivity modeling, opening the door to more extensive studies covering whole protein families. On top of that, with deep learning fueling more complex and larger scale models, proteochemometrics allows faster and higher quality computational models supporting the design, make, test cycle.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P Van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|
11
|
Lee LC, Jemain AA. Predictive modelling of colossal ATR-FTIR spectral data using PLS-DA: empirical differences between PLS1-DA and PLS2-DA algorithms. Analyst 2019; 144:2670-2678. [PMID: 30849143 DOI: 10.1039/c8an02074d] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In response to our review paper [L. C. Lee et al., Analyst, 2018, 143, 3526-3539], we present a study that compares empirical differences between PLS1-DA and PLS2-DA algorithms in modelling a colossal ATR-FTIR spectral dataset. Over the past two decades, partial least squares-discriminant analysis (PLS-DA) has gained wide acceptance and huge popularity in the field of applied research, partly due to its dimensionality reduction capability and ability to handle multicollinear and correlated variables. To solve a K-class problem (K > 2) using PLS-DA and high-dimensional data like infrared spectra, one can construct either K one-versus-all PLS1-DA models or only one PLS2-DA model. The aim of this work is to explore empirical differences between the two PLS-DA algorithms in modeling a colossal ATR-FTIR spectral dataset. The practical task is to build a prediction model using the imbalanced, high dimensional, colossal and multi-class ATR-FTIR spectra of blue gel pen inks. Four different sub-datasets were prepared from the principal dataset by considering the raw and asymmetric least squares (AsLS) preprocessed forms: (a) Raw-global region; (b) Raw-local region; (c) AsLS-global region; and (d) AsLS-local region. A series of 50 models which includes the first 50 PLS components incrementally was constructed repeatedly using the four sub-datasets. Each model was evaluated using six different variants of v-fold cross validation, autoprediction and external testing methods. As a result, each PLS-DA algorithm was represented by a number of figures of merit. The differences between PLS1-DA and PLS2-DA algorithms were assessed using hypothesis tests with respect to model accuracy, stability and fitting. On the other hand, confusion matrices of the two PLS-DA algorithms were inspected carefully for assessment of model parsimony. Overall, both the algorithms presented satisfactory model accuracy and stability. Nonetheless, PLS1-DA models showed significantly higher accuracy rates than PLS2-DA models, whereas PLS2-DA models seem to be much more stable compared to PLS1-DA models. Eventually, PLS2-DA also proved to be less prone to overfitting and is more parsimonious than PLS1-DA. In conclusion, the relatively high accuracy of the PLS1-DA algorithm is achieved at the cost of rather low parsimony and stability, and with an increased risk of overfitting.
Collapse
Affiliation(s)
- Loong Chuen Lee
- Forensic Science Programme, FSK, Universiti Kebangsaan Malaysia, Jalan Raja Muda Abdul Aziz, 50300 Kuala Lumpur, Malaysia.
| | | |
Collapse
|
12
|
Koziol P, Raczkowska MK, Skibinska J, McCollum NJ, Urbaniak-Wasik S, Paluszkiewicz C, Kwiatek WM, Wrobel TP. Denoising influence on discrete frequency classification results for quantum cascade laser based infrared microscopy. Anal Chim Acta 2018; 1051:24-31. [PMID: 30661616 DOI: 10.1016/j.aca.2018.11.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 11/14/2018] [Accepted: 11/15/2018] [Indexed: 12/23/2022]
Abstract
Currently, there is great interest in bringing the application of IR spectroscopy into the clinic. This however will require a significant reduction in measurement time as Fourier Transform Infrared (FT-IR) imaging takes hours to days to scan a clinically relevant specimen. A potential remedy for this issue is the use of Quantum Cascade Laser Infrared (QCL IR) microscopy performed in Discrete Frequency (DF) mode for maximum speed gain. This gain could be furthermore improved by applying a proper denoising algorithm that takes into account the specific data structure. We have recently compared spectral and spatial denoising techniques in the context of Fourier Transform IR (FT-IR) imaging and showed that the optimal methods depend heavily on the exact data structure. In general multivariate denoising methods such as Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF) are the most effective for a dataset containing multiple bands. Histologic classification of QCL IR images of pancreatic tissue using Random Forest was therefore performed to investigate which denoising schemes are the most optimal for such experimental data structure. This work is the first to show the effects of denoising on classification accuracy of QCL data and is likely to be transferable to other QCL microscopes and other modalities using DF imaging, e.g. AFM-IR or CARS/SRS imaging.
Collapse
Affiliation(s)
- Paulina Koziol
- Institute of Nuclear Physics Polish Academy of Sciences, PL-31342 Krakow, Poland
| | - Magda K Raczkowska
- Institute of Nuclear Physics Polish Academy of Sciences, PL-31342 Krakow, Poland; Faculty of Physics and Applied Computer Science, AGH University of Science and Technology, Mickiewicza 30, Krakow, Poland
| | - Justyna Skibinska
- Institute of Nuclear Physics Polish Academy of Sciences, PL-31342 Krakow, Poland; Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Science and Technology, Mickiewicza 30, Krakow, Poland
| | | | | | | | - Wojciech M Kwiatek
- Institute of Nuclear Physics Polish Academy of Sciences, PL-31342 Krakow, Poland
| | - Tomasz P Wrobel
- Institute of Nuclear Physics Polish Academy of Sciences, PL-31342 Krakow, Poland.
| |
Collapse
|
13
|
Brunius C, Pedersen A, Malmodin D, Karlsson BG, Andersson LI, Tybring G, Landberg R. Prediction and modeling of pre-analytical sampling errors as a strategy to improve plasma NMR metabolomics data. Bioinformatics 2018; 33:3567-3574. [PMID: 29036400 PMCID: PMC5870544 DOI: 10.1093/bioinformatics/btx442] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 07/13/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Biobanks are important infrastructures for life science research. Optimal sample handling regarding e.g. collection and processing of biological samples is highly complex, with many variables that could alter sample integrity and even more complex when considering multiple study centers or using legacy samples with limited documentation on sample management. Novel means to understand and take into account such variability would enable high-quality research on archived samples. Results This study investigated whether pre-analytical sample variability could be predicted and reduced by modeling alterations in the plasma metabolome, measured by NMR, as a function of pre-centrifugation conditions (1–36 h pre-centrifugation delay time at 4 °C and 22 °C) in 16 individuals. Pre-centrifugation temperature and delay times were predicted using random forest modeling and performance was validated on independent samples. Alterations in the metabolome were modeled at each temperature using a cluster-based approach, revealing reproducible effects of delay time on energy metabolism intermediates at both temperatures, but more pronounced at 22 °C. Moreover, pre-centrifugation delay at 4 °C resulted in large, specific variability at 3 h, predominantly of lipids. Pre-analytical sample handling error correction resulted in significant improvement of data quality, particularly at 22 °C. This approach offers the possibility to predict pre-centrifugation delay temperature and time in biobanked samples before use in costly downstream applications. Moreover, the results suggest potential to decrease the impact of undesired, delay-induced variability. However, these findings need to be validated in multiple, large sample sets and with analytical techniques covering a wider range of the metabolome, such as LC-MS. Availability and implementation The sampleDrift R package is available at https://gitlab.com/CarlBrunius/sampleDrift. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carl Brunius
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden.,Department of Molecular Sciences, Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden
| | - Anders Pedersen
- Swedish NMR Centre, University of Gothenburg, SE-405?30 Gothenburg, Sweden
| | - Daniel Malmodin
- Swedish NMR Centre, University of Gothenburg, SE-405?30 Gothenburg, Sweden
| | - B Göran Karlsson
- Swedish NMR Centre, University of Gothenburg, SE-405?30 Gothenburg, Sweden
| | | | | | - Rikard Landberg
- Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden.,Department of Molecular Sciences, Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden.,Institute of Environmental Medicine, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| |
Collapse
|
14
|
Acharjee A, Prentice P, Acerini C, Smith J, Hughes IA, Ong K, Griffin JL, Dunger D, Koulman A. The translation of lipid profiles to nutritional biomarkers in the study of infant metabolism. Metabolomics 2017; 13:25. [PMID: 28190990 PMCID: PMC5272886 DOI: 10.1007/s11306-017-1166-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 01/12/2017] [Indexed: 02/02/2023]
Abstract
INTRODUCTION Links between early life exposures and later health outcomes may, in part, be due to nutritional programming in infancy. This hypothesis is supported by observed long-term benefits associated with breastfeeding, such as better cognitive development in childhood, and lower risks of obesity and high blood pressure in later life. However, the possible underlying mechanisms are expected to be complex and may be difficult to disentangle due to the lack of understanding of the metabolic processes that differentiate breastfed infants compared to those receiving just formula feed. OBJECTIVE Our aim was to investigate the relationships between infant feeding and the lipid profiles and to validate specific lipids in separate datasets so that a small set of lipids can be used as nutritional biomarkers. METHOD We utilized a direct infusion high-resolution mass spectrometry method to analyse the lipid profiles of 3.2 mm dried blood spot samples collected at age 3 months from the Cambridge Baby Growth Study (CBGS-1), which formed the discovery cohort. For validation two sample sets were profiled: Cambridge Baby Growth Study (CBGS-2) and Pregnancy Outcome Prediction Study (POPS). Lipidomic profiles were compared between infant groups who were either exclusively breastfed, exclusively formula-fed or mixed-fed at various levels. Data analysis included supervised Random Forest method with combined classification and regression mode. Selection of lipids was based on an iterative backward elimination procedure without compromising the class error in the classification mode. CONCLUSION From this study, we were able to identify and validate three lipids: PC(35:2), SM(36:2) and SM(39:1) that can be used collectively as biomarkers for infant nutrition during early development. These biomarkers can be used to determine whether young infants (3-6 months) are breast-fed or receive formula milk.
Collapse
Affiliation(s)
- Animesh Acharjee
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Philippa Prentice
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Carlo Acerini
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - James Smith
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000 0004 1936 8403grid.9909.9School of Food Science and Nutrition, University of Leeds, Leeds, UK
| | - Ieuan A. Hughes
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Ken Ong
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
- 0000000121885934grid.5335.0MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Julian L. Griffin
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - David Dunger
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Albert Koulman
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0NIHR BRC Clinical Metabolomics and Lipidomics Laboratory, Level 4, Laboratory Block, Cambridge University Hospitals, University of Cambridge, Hills Road, Cambridge, CB2 0QQ UK
| |
Collapse
|
15
|
Acharjee A, Ament Z, West JA, Stanley E, Griffin JL. Integration of metabolomics, lipidomics and clinical data using a machine learning method. BMC Bioinformatics 2016; 17:440. [PMID: 28185575 PMCID: PMC5133491 DOI: 10.1186/s12859-016-1292-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The recent pandemic of obesity and the metabolic syndrome (MetS) has led to the realisation that new drug targets are needed to either reduce obesity or the subsequent pathophysiological consequences associated with excess weight gain. Certain nuclear hormone receptors (NRs) play a pivotal role in lipid and carbohydrate metabolism and have been highlighted as potential treatments for obesity. This realisation started a search for NR agonists in order to understand and successfully treat MetS and associated conditions such as insulin resistance, dyslipidaemia, hypertension, hypertriglyceridemia, obesity and cardiovascular disease. The most studied NRs for treating metabolic diseases are the peroxisome proliferator-activated receptors (PPARs), PPAR-α, PPAR-γ, and PPAR-δ. However, prolonged PPAR treatment in animal models has led to adverse side effects including increased risk of a number of cancers, but how these receptors change metabolism long term in terms of pathology, despite many beneficial effects shorter term, is not fully understood. In the current study, changes in male Sprague Dawley rat liver caused by dietary treatment with a PPAR-pan (PPAR-α, -γ, and -δ) agonist were profiled by classical toxicology (clinical chemistry) and high throughput metabolomics and lipidomics approaches using mass spectrometry. RESULTS In order to integrate an extensive set of nine different multivariate metabolic and lipidomics datasets with classical toxicological parameters we developed a hypotheses free, data driven machine learning approach. From the data analysis, we examined how the nine datasets were able to model dose and clinical chemistry results, with the different datasets having very different information content. CONCLUSIONS We found lipidomics (Direct Infusion-Mass Spectrometry) data the most predictive for different dose responses. In addition, associations with the metabolic and lipidomic data with aspartate amino transaminase (AST), a hepatic leakage enzyme to assess organ damage, and albumin, indicative of altered liver synthetic function, were established. Furthermore, by establishing correlations and network connections between eicosanoids, phospholipids and triacylglycerols, we provide evidence that these lipids function as a key link between inflammatory processes and intermediary metabolism.
Collapse
Affiliation(s)
- Animesh Acharjee
- Medical Research Council, Elsie Widdowson Laboratory, 120 Fulbourn Road, Cambridge, CB1 9NL, UK.,The Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Zsuzsanna Ament
- Medical Research Council, Elsie Widdowson Laboratory, 120 Fulbourn Road, Cambridge, CB1 9NL, UK
| | - James A West
- Medical Research Council, Elsie Widdowson Laboratory, 120 Fulbourn Road, Cambridge, CB1 9NL, UK
| | - Elizabeth Stanley
- Medical Research Council, Elsie Widdowson Laboratory, 120 Fulbourn Road, Cambridge, CB1 9NL, UK
| | - Julian L Griffin
- Medical Research Council, Elsie Widdowson Laboratory, 120 Fulbourn Road, Cambridge, CB1 9NL, UK. .,The Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.
| |
Collapse
|
16
|
Grissa D, Pétéra M, Brandolini M, Napoli A, Comte B, Pujos-Guillot E. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data. Front Mol Biosci 2016; 3:30. [PMID: 27458587 PMCID: PMC4937038 DOI: 10.3389/fmolb.2016.00030] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 06/20/2016] [Indexed: 11/28/2022] Open
Abstract
Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48 best candidates for prediction. Using linear logistic regression on this reduced dataset enabled us to obtain the best performances in terms of prediction accuracy and number of false positive with a model including 5 top variables. Therefore, these results highlighted the interest of feature selection methods and the importance of working on reduced datasets for the identification of predictive biomarkers issued from untargeted metabolomics data.
Collapse
Affiliation(s)
| | - Mélanie Pétéra
- INRA, UMR1019, Plateforme d'Exploration du Métabolisme Clermont-Ferrand, France
| | - Marion Brandolini
- INRA, UMR1019, Plateforme d'Exploration du Métabolisme Clermont-Ferrand, France
| | | | | | - Estelle Pujos-Guillot
- INRA, UMR1019, UNH-MAPPINGClermont-Ferrand, France; INRA, UMR1019, Plateforme d'Exploration du MétabolismeClermont-Ferrand, France
| |
Collapse
|
17
|
Rinaudo P, Boudah S, Junot C, Thévenot EA. biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data. Front Mol Biosci 2016; 3:26. [PMID: 27446929 PMCID: PMC4914951 DOI: 10.3389/fmolb.2016.00026] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 06/03/2016] [Indexed: 01/02/2023] Open
Abstract
High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares—Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics.
Collapse
Affiliation(s)
- Philippe Rinaudo
- CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB Gif-sur-Yvette, France
| | - Samia Boudah
- Laboratoire d'Etude du Métabolisme des Médicaments, DSV/iBiTec-S/SPI, MetaboHUB, CEA-Saclay Gif-sur-Yvette, France
| | - Christophe Junot
- Laboratoire d'Etude du Métabolisme des Médicaments, DSV/iBiTec-S/SPI, MetaboHUB, CEA-Saclay Gif-sur-Yvette, France
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB Gif-sur-Yvette, France
| |
Collapse
|
18
|
Yaroshenko I, Kirsanov D, Kartsova L, Sidorova A, Sun Q, Wan H, He Y, Wang P, Legin A. Exploring bitterness of traditional Chinese medicine samples by potentiometric electronic tongue and by capillary electrophoresis and liquid chromatography coupled to UV detection. Talanta 2016; 152:105-11. [DOI: 10.1016/j.talanta.2016.01.058] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 10/22/2022]
|
19
|
Yi L, Dong N, Yun Y, Deng B, Ren D, Liu S, Liang Y. Chemometric methods in data processing of mass spectrometry-based metabolomics: A review. Anal Chim Acta 2016; 914:17-34. [PMID: 26965324 DOI: 10.1016/j.aca.2016.02.001] [Citation(s) in RCA: 159] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 01/28/2016] [Accepted: 02/01/2016] [Indexed: 01/03/2023]
Abstract
This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, 999077, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| | - Baichuan Deng
- College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Dabing Ren
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, 410008, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
20
|
Large-scale identification of potential drug targets based on the topological features of human protein–protein interaction network. Anal Chim Acta 2015; 871:18-27. [DOI: 10.1016/j.aca.2015.02.032] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Revised: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 01/17/2023]
|
21
|
Scott IM, Ward JL, Miller SJ, Beale MH. Opposite variations in fumarate and malate dominate metabolic phenotypes of Arabidopsis salicylate mutants with abnormal biomass under chilling. PHYSIOLOGIA PLANTARUM 2014; 152:660-674. [PMID: 24735077 DOI: 10.1111/ppl.12210] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 03/12/2014] [Accepted: 03/13/2014] [Indexed: 06/03/2023]
Abstract
In chilling conditions (5°C), salicylic acid (SA)-deficient mutants (sid2, eds5 and NahG) of Arabidopsis thaliana produced more biomass than wild type (Col-0), whereas the SA overproducer cpr1 was extremely stunted. The hypothesis that these phenotypes were reflected in metabolism was explored using 600 MHz (1) H nuclear magnetic resonance (NMR) analysis of unfractionated polar shoot extracts. Biomass-related metabolic phenotypes were identified as multivariate data models of these NMR 'fingerprints'. These included principal components that correlated with biomass. Also, partial least squares-regression models were found to predict the relative size of plants in previously unseen experiments in different light intensities, or relative size of one genotype from the others. The dominant signal in these models was fumarate, which was high in SA-deficient mutants, intermediate in Col-0 and low in cpr1 at 5°C. Among signals negatively correlated with biomass, malate was prominent. Abundance of transcripts of the FUM2 cytosolic fumarase (At5g50950) showed strong positive correlation with fumarate levels and with biomass, whereas no significant differences were found for the FUM1 mitochondrial fumarase (At2g47510). It was confirmed that the morphological effects of SA under chilling find expression in the metabolome, with a role of fumarate highlighted.
Collapse
Affiliation(s)
- Ian M Scott
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3DA, UK
| | | | | | | |
Collapse
|
22
|
Yi L, Dong N, Yun Y, Deng B, Liu S, Zhang Y, Liang Y. WITHDRAWN: Recent advances in chemometric methods for plant metabolomics: A review. Biotechnol Adv 2014:S0734-9750(14)00183-9. [PMID: 25461504 DOI: 10.1016/j.biotechadv.2014.11.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/17/2014] [Accepted: 11/18/2014] [Indexed: 12/17/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong 999077, Hong Kong, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Baichuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yi Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Yi L, Dong N, Shi S, Deng B, Yun Y, Yi Z, Zhang Y. Metabolomic identification of novel biomarkers of nasopharyngeal carcinoma. RSC Adv 2014. [DOI: 10.1039/c4ra09860a] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
This paper introduces a new identification strategy of novel metabolic biomarkers for nasopharyngeal carcinoma (NPC).
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute
- Kunming University of Science and Technology
- Kunming, China
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology
- The Hong Kong Polytechnic University
- Hong Kong, China
| | - Shuting Shi
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha, China
| | - Baichuan Deng
- Department of Chemistry
- University of Bergen
- Bergen, Norway
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha, China
| | - Zhibiao Yi
- Dongguan Mathematical and Engineering Academy of Chinese Medicine
- GuangZhou University of Chinese Medicine
- Dongguan, China
| | - Yi Zhang
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha, China
| |
Collapse
|