1
|
Gao Y, Zhao C. Development and applications of metabolic models in plant multi-omics research. FRONTIERS IN PLANT SCIENCE 2024; 15:1361183. [PMID: 39483677 PMCID: PMC11524811 DOI: 10.3389/fpls.2024.1361183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 04/15/2024] [Indexed: 11/03/2024]
Abstract
Plant growth and development are characterized by systematic and continuous processes, each involving intricate metabolic coordination mechanisms. Mathematical models are essential tools for investigating plant growth and development, metabolic regulation networks, and growth patterns across different stages. These models offer insights into secondary metabolism patterns in plants and the roles of metabolites. The proliferation of data related to plant genomics, transcriptomics, proteomics, and metabolomics in the last decade has underscored the growing importance of mathematical modeling in this field. This review aims to elucidate the principles and types of metabolic models employed in studying plant secondary metabolism, their strengths, and limitations. Furthermore, the application of mathematical models in various plant systems biology subfields will be discussed. Lastly, the review will outline how mathematical models can be harnessed to address research questions in this context.
Collapse
Affiliation(s)
| | - Cheng Zhao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of
Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
2
|
Pakkir Shah AK, Walter A, Ottosson F, Russo F, Navarro-Diaz M, Boldt J, Kalinski JCJ, Kontou EE, Elofson J, Polyzois A, González-Marín C, Farrell S, Aggerbeck MR, Pruksatrakul T, Chan N, Wang Y, Pöchhacker M, Brungs C, Cámara B, Caraballo-Rodríguez AM, Cumsille A, de Oliveira F, Dührkop K, El Abiead Y, Geibel C, Graves LG, Hansen M, Heuckeroth S, Knoblauch S, Kostenko A, Kuijpers MCM, Mildau K, Papadopoulos Lambidis S, Portal Gomes PW, Schramm T, Steuer-Lodd K, Stincone P, Tayyab S, Vitale GA, Wagner BC, Xing S, Yazzie MT, Zuffa S, de Kruijff M, Beemelmanns C, Link H, Mayer C, van der Hooft JJJ, Damiani T, Pluskal T, Dorrestein P, Stanstrup J, Schmid R, Wang M, Aron A, Ernst M, Petras D. Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data. Nat Protoc 2024:10.1038/s41596-024-01046-3. [PMID: 39304763 DOI: 10.1038/s41596-024-01046-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/02/2024] [Indexed: 09/22/2024]
Abstract
Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.
Collapse
Affiliation(s)
- Abzer K Pakkir Shah
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Axel Walter
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Filip Ottosson
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark
| | - Francesco Russo
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark
| | - Marcelo Navarro-Diaz
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Judith Boldt
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
- German Center for Infection Research, Partner Site Braunschweig-Hannover, Braunschweig, Germany
| | - Jarmo-Charles J Kalinski
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Eftychia Eva Kontou
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- The Novo Nordisk Foundation for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - James Elofson
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Alexandros Polyzois
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Carolina González-Marín
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Universidad EAFIT, Medellín, Antioquia, Colombia
| | - Shane Farrell
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
- School of Marine Sciences, Darling Marine Center, University of Maine, Walpole, ME, USA
| | - Marie R Aggerbeck
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Environmental Science, Aarhus University, Roskilde, Denmark
| | - Thapanee Pruksatrakul
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Thailand Science Park, Pathum Thani, Thailand
| | - Nathan Chan
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Yunshu Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Magdalena Pöchhacker
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Food Chemistry and Toxicology, University of Vienna, Vienna, Austria
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Beatriz Cámara
- Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Centro de Biotecnología DAL, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | | | - Andres Cumsille
- Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Centro de Biotecnología DAL, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Fernanda de Oliveira
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
- Department of Biotechnology, Engineering School of Lorena, University of São Paulo, Lorena, São Paulo, Brazil
| | - Kai Dührkop
- Department of Bioinformatics, University of Jena, Jena, Germany
| | - Yasin El Abiead
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Christian Geibel
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Lana G Graves
- Department of Environmental Systems Analysis, University of Tübingen, Tübingen, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
| | - Martin Hansen
- Department of Environmental Science, Aarhus University, Roskilde, Denmark
| | - Steffen Heuckeroth
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Simon Knoblauch
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Anastasiia Kostenko
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Mirte C M Kuijpers
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Kevin Mildau
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Analytical Chemistry, University of Vienna, Vienna, Austria
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
| | | | - Paulo Wender Portal Gomes
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Tilman Schramm
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA
| | - Karoline Steuer-Lodd
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA
| | - Paolo Stincone
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Sibgha Tayyab
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Giovanni Andrea Vitale
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Berenike C Wagner
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Shipei Xing
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Marquis T Yazzie
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Simone Zuffa
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Martinus de Kruijff
- Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany
| | - Christine Beemelmanns
- Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany
- Saarland University, Saarbrücken, Germany
| | - Hannes Link
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Christoph Mayer
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Justin J J van der Hooft
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Tito Damiani
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Pieter Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jan Stanstrup
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Frederiksberg C, Denmark
| | - Robin Schmid
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Mingxun Wang
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Allegra Aron
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Madeleine Ernst
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark.
| | - Daniel Petras
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA.
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany.
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA.
| |
Collapse
|
3
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
4
|
Lai Y, Koelmel JP, Walker DI, Price EJ, Papazian S, Manz KE, Castilla-Fernández D, Bowden JA, Nikiforov V, David A, Bessonneau V, Amer B, Seethapathy S, Hu X, Lin EZ, Jbebli A, McNeil BR, Barupal D, Cerasa M, Xie H, Kalia V, Nandakumar R, Singh R, Tian Z, Gao P, Zhao Y, Froment J, Rostkowski P, Dubey S, Coufalíková K, Seličová H, Hecht H, Liu S, Udhani HH, Restituito S, Tchou-Wong KM, Lu K, Martin JW, Warth B, Godri Pollitt KJ, Klánová J, Fiehn O, Metz TO, Pennell KD, Jones DP, Miller GW. High-Resolution Mass Spectrometry for Human Exposomics: Expanding Chemical Space Coverage. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:12784-12822. [PMID: 38984754 PMCID: PMC11271014 DOI: 10.1021/acs.est.4c01156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 07/11/2024]
Abstract
In the modern "omics" era, measurement of the human exposome is a critical missing link between genetic drivers and disease outcomes. High-resolution mass spectrometry (HRMS), routinely used in proteomics and metabolomics, has emerged as a leading technology to broadly profile chemical exposure agents and related biomolecules for accurate mass measurement, high sensitivity, rapid data acquisition, and increased resolution of chemical space. Non-targeted approaches are increasingly accessible, supporting a shift from conventional hypothesis-driven, quantitation-centric targeted analyses toward data-driven, hypothesis-generating chemical exposome-wide profiling. However, HRMS-based exposomics encounters unique challenges. New analytical and computational infrastructures are needed to expand the analysis coverage through streamlined, scalable, and harmonized workflows and data pipelines that permit longitudinal chemical exposome tracking, retrospective validation, and multi-omics integration for meaningful health-oriented inferences. In this article, we survey the literature on state-of-the-art HRMS-based technologies, review current analytical workflows and informatic pipelines, and provide an up-to-date reference on exposomic approaches for chemists, toxicologists, epidemiologists, care providers, and stakeholders in health sciences and medicine. We propose efforts to benchmark fit-for-purpose platforms for expanding coverage of chemical space, including gas/liquid chromatography-HRMS (GC-HRMS and LC-HRMS), and discuss opportunities, challenges, and strategies to advance the burgeoning field of the exposome.
Collapse
Affiliation(s)
- Yunjia Lai
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Jeremy P. Koelmel
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Douglas I. Walker
- Gangarosa
Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Elliott J. Price
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Stefano Papazian
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
- National
Facility for Exposomics, Metabolomics Platform, Science for Life Laboratory, Stockholm University, Solna 171 65, Sweden
| | - Katherine E. Manz
- Department
of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Delia Castilla-Fernández
- Department
of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1010 Vienna, Austria
| | - John A. Bowden
- Center for
Environmental and Human Toxicology, Department of Physiological Sciences,
College of Veterinary Medicine, University
of Florida, Gainesville, Florida 32611, United States
| | | | - Arthur David
- Univ Rennes,
Inserm, EHESP, Irset (Institut de recherche en santé, environnement
et travail) − UMR_S, 1085 Rennes, France
| | - Vincent Bessonneau
- Univ Rennes,
Inserm, EHESP, Irset (Institut de recherche en santé, environnement
et travail) − UMR_S, 1085 Rennes, France
| | - Bashar Amer
- Thermo
Fisher Scientific, San Jose, California 95134, United States
| | | | - Xin Hu
- Gangarosa
Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Elizabeth Z. Lin
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Akrem Jbebli
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Brooklynn R. McNeil
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Dinesh Barupal
- Department
of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Marina Cerasa
- Institute
of Atmospheric Pollution Research, Italian National Research Council, 00015 Monterotondo, Rome, Italy
| | - Hongyu Xie
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Vrinda Kalia
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Renu Nandakumar
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Randolph Singh
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Zhenyu Tian
- Department
of Chemistry and Chemical Biology, Northeastern
University, Boston, Massachusetts 02115, United States
| | - Peng Gao
- Department
of Environmental and Occupational Health, and Department of Civil
and Environmental Engineering, University
of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
- UPMC Hillman
Cancer Center, Pittsburgh, Pennsylvania 15232, United States
| | - Yujia Zhao
- Institute
for Risk Assessment Sciences, Utrecht University, Utrecht 3584CM, The Netherlands
| | | | | | - Saurabh Dubey
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Kateřina Coufalíková
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Hana Seličová
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Sheng Liu
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Hanisha H. Udhani
- Biomarkers
Core Laboratory, Irving Institute for Clinical and Translational Research, Columbia University Irving Medical Center, New York, New York 10032, United States
| | - Sophie Restituito
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Kam-Meng Tchou-Wong
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Kun Lu
- Department
of Environmental Sciences and Engineering, Gillings School of Global
Public Health, The University of North Carolina
at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Jonathan W. Martin
- Department
of Environmental Science, Science for Life Laboratory, Stockholm University, SE-106 91 Stockholm, Sweden
- National
Facility for Exposomics, Metabolomics Platform, Science for Life Laboratory, Stockholm University, Solna 171 65, Sweden
| | - Benedikt Warth
- Department
of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1010 Vienna, Austria
| | - Krystal J. Godri Pollitt
- Department
of Environmental Health Sciences, Yale School
of Public Health, New Haven, Connecticut 06520, United States
| | - Jana Klánová
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Oliver Fiehn
- West Coast
Metabolomics Center, University of California−Davis, Davis, California 95616, United States
| | - Thomas O. Metz
- Biological
Sciences Division, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Kurt D. Pennell
- School
of Engineering, Brown University, Providence, Rhode Island 02912, United States
| | - Dean P. Jones
- Department
of Medicine, School of Medicine, Emory University, Atlanta, Georgia 30322, United States
| | - Gary W. Miller
- Department
of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| |
Collapse
|
5
|
Pelletier SJ, Leclercq M, Roux-Dalvai F, de Geus MB, Leslie S, Wang W, Lam TT, Nairn AC, Arnold SE, Carlyle BC, Precioso F, Droit A. BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks. Nat Commun 2024; 15:3777. [PMID: 38710683 PMCID: PMC11074280 DOI: 10.1038/s41467-024-48177-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 04/24/2024] [Indexed: 05/08/2024] Open
Abstract
Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for profiling complex biological samples. However, batch effects typically arise from differences in sample processing protocols, experimental conditions, and data acquisition techniques, significantly impacting the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research, but current methods are not optimal for the removal of batch effects without compressing the genuine biological variation under study. We propose a suite of Batch Effect Removal Neural Networks (BERNN) to remove batch effects in large LC-MS experiments, with the goal of maximizing sample classification performance between conditions. More importantly, these models must efficiently generalize in batches not seen during training. A comparison of batch effect correction methods across five diverse datasets demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal. Finally, we show that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
Collapse
Affiliation(s)
- Simon J Pelletier
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Mickaël Leclercq
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Florence Roux-Dalvai
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
- Proteomics Platform, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Matthijs B de Geus
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
- Leiden University Medical Center, Leiden, The Netherlands
| | - Shannon Leslie
- Yale Department of Psychiatry, New Haven, CT, USA
- Janssen Pharmaceuticals, San Diego, CA, USA
| | - Weiwei Wang
- Keck MS & Proteomics Resource, Yale School of Medicine, New Haven, CT, USA
| | - TuKiet T Lam
- Keck MS & Proteomics Resource, Yale School of Medicine, New Haven, CT, USA
- Yale School of Medicine, Department of Molecular Biophysics and Biochemistry, New Haven, CT, USA
| | | | - Steven E Arnold
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
| | - Becky C Carlyle
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
- Oxford University Department of Physiology Anatomy and Genetics, Oxford, UK
- Kavli Institute for Nanoscience Discovery, Oxford, UK
| | - Frédéric Precioso
- Université Côte d'Azur, CNRS, INRIA, I3S, Sophia Antipolis, Nice, France
| | - Arnaud Droit
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada.
- Proteomics Platform, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada.
| |
Collapse
|
6
|
Wanichthanarak K, In-on A, Fan S, Fiehn O, Wangwiwatsin A, Khoomrung S. Data processing solutions to render metabolomics more quantitative: case studies in food and clinical metabolomics using Metabox 2.0. Gigascience 2024; 13:giae005. [PMID: 38488666 PMCID: PMC10941642 DOI: 10.1093/gigascience/giae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/22/2023] [Accepted: 02/02/2024] [Indexed: 03/18/2024] Open
Abstract
In classic semiquantitative metabolomics, metabolite intensities are affected by biological factors and other unwanted variations. A systematic evaluation of the data processing methods is crucial to identify adequate processing procedures for a given experimental setup. Current comparative studies are mostly focused on peak area data but not on absolute concentrations. In this study, we evaluated data processing methods to produce outputs that were most similar to the corresponding absolute quantified data. We examined the data distribution characteristics, fold difference patterns between 2 metabolites, and sample variance. We used 2 metabolomic datasets from a retail milk study and a lupus nephritis cohort as test cases. When studying the impact of data normalization, transformation, scaling, and combinations of these methods, we found that the cross-contribution compensating multiple standard normalization (ccmn) method, followed by square root data transformation, was most appropriate for a well-controlled study such as the milk study dataset. Regarding the lupus nephritis cohort study, only ccmn normalization could slightly improve the data quality of the noisy cohort. Since the assessment accounted for the resemblance between processed data and the corresponding absolute quantified data, our results denote a helpful guideline for processing metabolomic datasets within a similar context (food and clinical metabolomics). Finally, we introduce Metabox 2.0, which enables thorough analysis of metabolomic data, including data processing, biomarker analysis, integrative analysis, and data interpretation. It was successfully used to process and analyze the data in this study. An online web version is available at http://metsysbio.com/metabox.
Collapse
Affiliation(s)
- Kwanjeera Wanichthanarak
- Siriraj Center of Research Excellence in Metabolomics and Systems Biology (SiCORE-MSB), Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Ammarin In-on
- Siriraj Center of Research Excellence in Metabolomics and Systems Biology (SiCORE-MSB), Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Sili Fan
- Department of Biostatistics, University of California Davis, Davis, CA 95616, USA
| | - Oliver Fiehn
- West Coast Metabolomics Center, University of California Davis Genome Center, Davis, CA 95616, USA
| | - Arporn Wangwiwatsin
- Department of Systems Biosciences and Computational Medicine, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Sakda Khoomrung
- Siriraj Center of Research Excellence in Metabolomics and Systems Biology (SiCORE-MSB), Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
- Center of Excellence for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
7
|
Ai J, Zhao W, Yu Q, Qian X, Zhou J, Huo X, Tang F. SR-Unet: A Super-Resolution Algorithm for Ion Trap Mass Spectrometers Based on the Deep Neural Network. Anal Chem 2023; 95:17407-17415. [PMID: 37963290 DOI: 10.1021/acs.analchem.3c04172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
The mass spectrometer is an important tool for modern chemical analysis and detection. Especially, the emergence of miniature mass spectrometers has provided new tools for field analysis and detection. The resolution of a mass spectrometer reflects the ability of the instrument to discriminate between adjacent mass-to-charge ratio ions, and the higher the resolution, the better the discrimination of complex mixtures. Quadrupole ion traps are generally considered as a low-resolution mass spectrometry method, but they have gained wide attention and development in recent years because of their suitability for miniaturization and high qualitative capability. For an ion trap mass spectrometer, the mass sensitivity and resolution can be mutually constrained and need to be balanced by setting an appropriate scanning speed. In this study, a super-resolution U-net algorithm (SR-Unet) is proposed for ion trap mass spectrometry, which can estimate the possible ions from the overlapping ion peaks of low-resolution spectra and improve the equivalent resolution while ensuring sufficient sensitivity and analysis speed of the instrument. By determining the mass spectra of a linear ion trap mass spectrometer (LTQ XL) in Turbo and Normal scan modes, the same unit mass resolution as that at a scan speed of 16,667 Da/s was successfully obtained at 125,000 Da/s. Also, the experiments demonstrated that the algorithm is capable of the mass-to-charge ratio and instrument migration. SR-Unet can be migrated and applied to a miniature mass spectrometer for cruise detection of volatile organic compounds (VOCs), and the identification of VOC species in Photochemical Assessment Monitoring Stations (PAMS) was improved from 31 to 50 species with the same monitoring and analysis speed requirement. Further, super-unit mass resolution peptide detection was achieved on a miniature mass spectrometer with the help of the SR-Unet algorithm, which reduced the full width at half-maxima (FWHM) of bradykinin divalent ions (m/z 531) from 0.35 to 0.15 Da at a scan speed of 375 Da/s and improved the equivalent resolution to 3540. The proposed method provides a new idea to enhance the field mixture detection capability of miniature ion trap mass spectrometers.
Collapse
Affiliation(s)
- Jiawen Ai
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Weize Zhao
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| | - Quan Yu
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
| | - Xiang Qian
- Division of Advanced Manufacturing, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
| | - Jianhua Zhou
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen 518107, China
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xinming Huo
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-Sen University, Shenzhen 518107, China
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Fei Tang
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing 100084, China
| |
Collapse
|
8
|
Droit A, Pelletier S, Leclerq M, Roux-Dalvai F, de Geus M, Leslie S, Wang W, Lam T, Nairn A, Arnold S, Carlyle B, Precioso F. Enhancing Classification of liquid chromatography mass spectrometry data with Batch Effect Removal Neural Networks (BERNN). RESEARCH SQUARE 2023:rs.3.rs-3112514. [PMID: 37461653 PMCID: PMC10350225 DOI: 10.21203/rs.3.rs-3112514/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for profiling complex biological samples. However, batch effects typically arise from differences in sample processing protocols, experimental conditions and data acquisition techniques, significantlyimpacting the interpretability of results. Correcting batch effects is crucial for the reproducibility of proteomics research, but current methods are not optimal for removal of batch effects without compressing the genuine biological variation under study. We propose a suite of Batch Effect Removal Neural Networks (BERNN) to remove batch effects in large LC-MS experiments, with the goal of maximizing sample classification performance between conditions. More importantly, these models must efficiently generalize in batches not seen during training. Comparison of batch effect correction methods across three diverse datasets demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal. Finally, we show that overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
Collapse
Affiliation(s)
- Arnaud Droit
- Centre de Recherche du CHU de Québec - Université Laval, Axe Endocrinologie et Néphrologie, Québec, Canada
| | | | | | | | | | | | - Weiwei Wang
- 7. Keck MS & Proteomics Resource, Yale School of Medicine
| | - TuKiet Lam
- 7. Keck MS & Proteomics Resource, Yale School of Medicine
| | | | - Steven Arnold
- 3. Massachusetts General Hospital Department of Neurology
| | - Becky Carlyle
- 3. Massachusetts General Hospital Department of Neurology
| | | |
Collapse
|
9
|
Märtens A, Holle J, Mollenhauer B, Wegner A, Kirwan J, Hiller K. Instrumental Drift in Untargeted Metabolomics: Optimizing Data Quality with Intrastudy QC Samples. Metabolites 2023; 13:metabo13050665. [PMID: 37233706 DOI: 10.3390/metabo13050665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/08/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023] Open
Abstract
Untargeted metabolomics is an important tool in studying health and disease and is employed in fields such as biomarker discovery and drug development, as well as precision medicine. Although significant technical advances were made in the field of mass-spectrometry driven metabolomics, instrumental drifts, such as fluctuations in retention time and signal intensity, remain a challenge, particularly in large untargeted metabolomics studies. Therefore, it is crucial to consider these variations during data processing to ensure high-quality data. Here, we will provide recommendations for an optimal data processing workflow using intrastudy quality control (QC) samples that identifies errors resulting from instrumental drifts, such as shifts in retention time and metabolite intensities. Furthermore, we provide an in-depth comparison of the performance of three popular batch-effect correction methods of different complexity. By using different evaluation metrics based on QC samples and a machine learning approach based on biological samples, the performance of the batch-effect correction methods were evaluated. Here, the method TIGER demonstrated the overall best performance by reducing the relative standard deviation of the QCs and dispersion-ratio the most, as well as demonstrating the highest area under the receiver operating characteristic with three different probabilistic classifiers (Logistic regression, Random Forest, and Support Vector Machine). In summary, our recommendations will help to generate high-quality data that are suitable for further downstream processing, leading to more accurate and meaningful insights into the underlying biological processes.
Collapse
Affiliation(s)
- Andre Märtens
- Department of Bioinformatics and Biochemistry, Braunschweig Integrated Centre of Systems Biology, Technische Universität Braunschweig, 38118 Braunschweig, Germany
- Physikalisch-Technische Bundesanstalt, 38116 Braunschweig, Germany
| | - Johannes Holle
- Department of Pediatric Gastroenterology, Nephrology and Metabolic Diseases, Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Brit Mollenhauer
- Department of Neurology, University Medical Center Göttingen, 37073 Göttingen, Germany
- Paracelsus-Elena-Klinik, 34128 Kassel, Germany
| | - Andre Wegner
- Department of Bioinformatics and Biochemistry, Braunschweig Integrated Centre of Systems Biology, Technische Universität Braunschweig, 38118 Braunschweig, Germany
| | - Jennifer Kirwan
- Berlin Institute of Health at Charité, Universitätsmedizin Berlin, 10117 Berlin, Germany
| | - Karsten Hiller
- Department of Bioinformatics and Biochemistry, Braunschweig Integrated Centre of Systems Biology, Technische Universität Braunschweig, 38118 Braunschweig, Germany
| |
Collapse
|
10
|
Song G, Wang L, Tang J, Li H, Pang S, Li Y, Liu L, Hu J. Circulating metabolites as potential biomarkers for the early detection and prognosis surveillance of gastrointestinal cancers. Metabolomics 2023; 19:36. [PMID: 37014438 PMCID: PMC10073066 DOI: 10.1007/s11306-023-02002-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/22/2023] [Indexed: 04/05/2023]
Abstract
BACKGROUND AND AIMS Two of the most lethal gastrointestinal (GI) cancers, gastric cancer (GC) and colon cancer (CC), are ranked in the top five cancers that cause deaths worldwide. Most GI cancer deaths can be reduced by earlier detection and more appropriate medical treatment. Unlike the current "gold standard" techniques, non-invasive and highly sensitive screening tests are required for GI cancer diagnosis. Here, we explored the potential of metabolomics for GI cancer detection and the classification of tissue-of-origin, and even the prognosis management. METHODS Plasma samples from 37 gastric cancer (GC), 17 colon cancer (CC), and 27 non-cancer (NC) patients were prepared for metabolomics and lipidomics analysis by three MS-based platforms. Univariate, multivariate, and clustering analyses were used for selecting significant metabolic features. ROC curve analysis was based on a series of different binary classifications as well as the true-positive rate (sensitivity) and the false-positive rate (1-specificity). RESULTS GI cancers exhibited obvious metabolic perturbation compared with benign diseases. The differentiated metabolites of gastric cancer (GC) and colon cancer (CC) were targeted to same pathways but with different degrees of cellular metabolism reprogramming. The cancer-specific metabolites distinguished the malignant and benign, and classified the cancer types. We also applied this test to before- and after-surgery samples, wherein surgical resection significantly altered the blood-metabolic patterns. There were 15 metabolites significantly altered in GC and CC patients who underwent surgical treatment, and partly returned to normal conditions. CONCLUSION Blood-based metabolomics analysis is an efficient strategy for GI cancer screening, especially for malignant and benign diagnoses. The cancer-specific metabolic patterns process the potential for classifying tissue-of-origin in multi-cancer screening. Besides, the circulating metabolites for prognosis management of GI cancer is a promising area of research.
Collapse
Affiliation(s)
- Guodong Song
- The Second Hospital of Tianjin Medical University, No 23. Pingjiang Road, Hexi District, 300211, Tianjin, China
| | - Li Wang
- The Second Hospital of Tianjin Medical University, No 23. Pingjiang Road, Hexi District, 300211, Tianjin, China
| | - Junlong Tang
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China
| | - Haohui Li
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China
| | - Shuyu Pang
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China
| | - Yan Li
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China
| | - Li Liu
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China.
| | - Junyuan Hu
- Metanotitia Inc, No 59. Gaoxin South 9Th Road, Yuehai Street, Nanshan District, Shenzhen, 518056, Guangdong, China.
| |
Collapse
|
11
|
Hattaway ME, Black GP, Young TM. Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system. Anal Bioanal Chem 2023; 415:1321-1331. [PMID: 36627378 PMCID: PMC9928919 DOI: 10.1007/s00216-023-04511-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/08/2022] [Accepted: 01/02/2023] [Indexed: 01/12/2023]
Abstract
Nontarget chemical analysis using high-resolution mass spectrometry has increasingly been used to discern spatial patterns and temporal trends in anthropogenic chemical abundance in natural and engineered systems. A critical experimental design consideration in such applications, especially those monitoring complex matrices over long time periods, is a choice between analyzing samples in multiple batches as they are collected, or in one batch after all samples have been processed. While datasets acquired in multiple analytical batches can include the effects of instrumental variability over time, datasets acquired in a single batch risk compound degradation during sample storage. To assess the influence of batch effects on the analysis and interpretation of nontarget data, this study examined a set of 56 samples collected from a municipal wastewater system over 7 months. Each month's samples included 6 from sites within the collection system, one combined influent, and one treated effluent sample. Samples were analyzed using liquid chromatography high-resolution mass spectrometry in positive electrospray ionization mode in multiple batches as the samples were collected and in a single batch at the conclusion of the study. Data were aligned and normalized using internal standard scaling and ComBat, an empirical Bayes method developed for estimating and removing batch effects in microarrays. As judged by multiple lines of evidence, including comparing principal variance component analysis between single and multi-batch datasets and through patterns in principal components and hierarchical clustering analyses, ComBat appeared to significantly reduce the influence of batch effects. For this reason, we recommend the use of more, small batches with an appropriate batch correction step rather than acquisition in one large batch.
Collapse
Affiliation(s)
- Madison E Hattaway
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA
| | - Gabrielle P Black
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA
| | - Thomas M Young
- Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA, 95616, USA.
| |
Collapse
|
12
|
Quantitative challenges and their bioinformatic solutions in mass spectrometry-based metabolomics. Trends Analyt Chem 2023. [DOI: 10.1016/j.trac.2023.117009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
|
13
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
14
|
Nichani K, Uhlig S, Colson B, Hettwer K, Simon K, Bönick J, Uhlig C, Kemmlein S, Stoyke M, Gowik P, Huschek G, Rawel HM. Development of Non-Targeted Mass Spectrometry Method for Distinguishing Spelt and Wheat. Foods 2022; 12:141. [PMID: 36613357 PMCID: PMC9818861 DOI: 10.3390/foods12010141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/13/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022] Open
Abstract
Food fraud, even when not in the news, is ubiquitous and demands the development of innovative strategies to combat it. A new non-targeted method (NTM) for distinguishing spelt and wheat is described, which aids in food fraud detection and authenticity testing. A highly resolved fingerprint in the form of spectra is obtained for several cultivars of spelt and wheat using liquid chromatography coupled high-resolution mass spectrometry (LC-HRMS). Convolutional neural network (CNN) models are built using a nested cross validation (NCV) approach by appropriately training them using a calibration set comprising duplicate measurements of eleven cultivars of wheat and spelt, each. The results reveal that the CNNs automatically learn patterns and representations to best discriminate tested samples into spelt or wheat. This is further investigated using an external validation set comprising artificially mixed spectra, samples for processed goods (spelt bread and flour), eleven untypical spelt, and six old wheat cultivars. These cultivars were not part of model building. We introduce a metric called the D score to quantitatively evaluate and compare the classification decisions. Our results demonstrate that NTMs based on NCV and CNNs trained using appropriately chosen spectral data can be reliable enough to be used on a wider range of cultivars and their mixes.
Collapse
Affiliation(s)
- Kapil Nichani
- QuoData GmbH, Prellerstr. 14, D-01309 Dresden, Germany
- Institute of Nutritional Science, University of Potsdam, Arthur-Scheunert-Allee 114-116, D-14558 Nuthetal, Germany
| | - Steffen Uhlig
- QuoData GmbH, Fabeckstr. 43, D-14195 Berlin, Germany
| | | | | | - Kirsten Simon
- QuoData GmbH, Prellerstr. 14, D-01309 Dresden, Germany
| | - Josephine Bönick
- Bundesinstitut für Risikobewertung, Max-Dohrn-Str. 8-10, D-10589 Berlin, Germany
| | - Carsten Uhlig
- Akees GmbH, Ansbacher Str. 11, D-10787 Berlin, Germany
| | - Sabine Kemmlein
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Manfred Stoyke
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Petra Gowik
- Bundesamt für Verbraucherschutz und Lebensmittelsicherheit, Diedersdorfer Weg. 1, D-12277 Berlin, Germany
| | - Gerd Huschek
- IGV-Institut für Getreideverarbeitung GmbH, Arthur-Scheunert-Allee 40/41, D-14558 Nuthetal, Germany
| | - Harshadrai M. Rawel
- Institute of Nutritional Science, University of Potsdam, Arthur-Scheunert-Allee 114-116, D-14558 Nuthetal, Germany
| |
Collapse
|
15
|
A Comprehensive Mass Spectrometry-Based Workflow for Clinical Metabolomics Cohort Studies. Metabolites 2022; 12:metabo12121168. [PMID: 36557207 PMCID: PMC9782571 DOI: 10.3390/metabo12121168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022] Open
Abstract
As a comprehensive analysis of all metabolites in a biological system, metabolomics is being widely applied in various clinical/health areas for disease prediction, diagnosis, and prognosis. However, challenges remain in dealing with the metabolomic complexity, massive data, metabolite identification, intra- and inter-individual variation, and reproducibility, which largely limit its widespread implementation. This study provided a comprehensive workflow for clinical metabolomics, including sample collection and preparation, mass spectrometry (MS) data acquisition, and data processing and analysis. Sample collection from multiple clinical sites was strictly carried out with standardized operation procedures (SOP). During data acquisition, three types of quality control (QC) samples were set for respective MS platforms (GC-MS, LC-MS polar, and LC-MS lipid) to assess the MS performance, facilitate metabolite identification, and eliminate contamination. Compounds annotation and identification were implemented with commercial software and in-house-developed PAppLineTM and UlibMS library. The batch effects were removed using a deep learning model method (NormAE). Potential biomarkers identification was performed with tree-based modeling algorithms including random forest, AdaBoost, and XGBoost. The modeling performance was evaluated using the F1 score based on a 10-times repeated trial for each. Finally, a sub-cohort case study validated the reliability of the entire workflow.
Collapse
|
16
|
Adamer MF, Brüningk SC, Tejada-Arranz A, Estermann F, Basler M, Borgwardt K. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. BIOINFORMATICS ADVANCES 2022; 2:vbac071. [PMID: 36699372 PMCID: PMC9710604 DOI: 10.1093/bioadv/vbac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/01/2022] [Accepted: 09/26/2022] [Indexed: 01/28/2023]
Abstract
Motivation With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm. Results We present reComBat, a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen Pseudomonas aeruginosa. Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. reComBat fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study. Availability and implementation The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | | | | | - Marek Basler
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland,Swiss Institute for Bioinformatics (SIB), Lausanne 1015, Switzerland
| |
Collapse
|
17
|
Yan S, Bhawal R, Yin Z, Thannhauser TW, Zhang S. Recent advances in proteomics and metabolomics in plants. MOLECULAR HORTICULTURE 2022; 2:17. [PMID: 37789425 PMCID: PMC10514990 DOI: 10.1186/s43897-022-00038-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/20/2022] [Indexed: 10/05/2023]
Abstract
Over the past decade, systems biology and plant-omics have increasingly become the main stream in plant biology research. New developments in mass spectrometry and bioinformatics tools, and methodological schema to integrate multi-omics data have leveraged recent advances in proteomics and metabolomics. These progresses are driving a rapid evolution in the field of plant research, greatly facilitating our understanding of the mechanistic aspects of plant metabolisms and the interactions of plants with their external environment. Here, we review the recent progresses in MS-based proteomics and metabolomics tools and workflows with a special focus on their applications to plant biology research using several case studies related to mechanistic understanding of stress response, gene/protein function characterization, metabolic and signaling pathways exploration, and natural product discovery. We also present a projection concerning future perspectives in MS-based proteomics and metabolomics development including their applications to and challenges for system biology. This review is intended to provide readers with an overview of how advanced MS technology, and integrated application of proteomics and metabolomics can be used to advance plant system biology research.
Collapse
Affiliation(s)
- Shijuan Yan
- Guangdong Key Laboratory for Crop Germplasm Resources Preservation and Utilization, Agro-biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Ruchika Bhawal
- Proteomics and Metabolomics Facility, Institute of Biotechnology, Cornell University, 139 Biotechnology Building, 526 Campus Road, Ithaca, NY, 14853, USA
| | - Zhibin Yin
- Guangdong Key Laboratory for Crop Germplasm Resources Preservation and Utilization, Agro-biological Gene Research Center, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | | | - Sheng Zhang
- Proteomics and Metabolomics Facility, Institute of Biotechnology, Cornell University, 139 Biotechnology Building, 526 Campus Road, Ithaca, NY, 14853, USA.
| |
Collapse
|
18
|
Niu J, Yang J, Guo Y, Qian K, Wang Q. Joint deep learning for batch effect removal and classification toward MALDI MS based metabolomics. BMC Bioinformatics 2022; 23:270. [PMID: 35818047 PMCID: PMC9275160 DOI: 10.1186/s12859-022-04758-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 05/30/2022] [Indexed: 12/02/2022] Open
Abstract
Background Metabolomics is a primary omics topic, which occupies an important position in both clinical applications and basic researches for metabolic signatures and biomarkers. Unfortunately, the relevant studies are challenged by the batch effect caused by many external factors. In last decade, the technique of deep learning has become a dominant tool in data science, such that one may train a diagnosis network from a known batch and then generalize it to a new batch. However, the batch effect inevitably hinders such efforts, as the two batches under consideration can be highly mismatched. Results We propose an end-to-end deep learning framework, for joint batch effect removal and then classification upon metabolomics data. We firstly validate the proposed deep learning framework on a public CyTOF dataset as a simulated experiment. We also visually compare the t-SNE distribution and demonstrate that our method effectively removes the batch effects in latent space. Then, for a private MALDI MS dataset, we have achieved the highest diagnostic accuracy, with about 5.1 ~ 7.9% increase on average over state-of-the-art methods. Conclusions Both experiments conclude that our method performs significantly better in classification than conventional methods benefitting from the effective removal of batch effect. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04758-z.
Collapse
Affiliation(s)
- Jingyang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jing Yang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuyu Guo
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
19
|
Niu J, Xu W, Wei D, Qian K, Wang Q. Deep Learning Framework for Integrating Multibatch Calibration, Classification, and Pathway Activities. Anal Chem 2022; 94:8937-8946. [PMID: 35709357 DOI: 10.1021/acs.analchem.2c00601] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of available biological data has exploded since the emergence of high-throughput technologies, which is not only revolting the way we recognize molecules and diseases but also bringing novel analytical challenges to bioinformatics analysis. In recent years, deep learning has become a dominant technique in data science. However, classification accuracy is plagued with domain discrepancy. Notably, in the presence of multiple batches, domain discrepancy typically happens between individual batches. Most pairwise adaptation approaches may be suboptimal as they fail to eliminate external factors across multiple batches and take the classification task into account simultaneously. We propose a joint deep learning framework for integrating batch effect removal, classification, and downstream pathway activities upon biological data. To this end, we validate it on two MALDI MS-based metabolomics datasets. We have achieved the highest diagnostic accuracy (ACC), with a notable ∼10% improvement over other methods. Overall, these results indicate that our approach removes batch effect more effectively than state-of-the-art methods and yields more accurate classification as well as biomarkers for smart diagnosis.
Collapse
Affiliation(s)
- JingYang Niu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Wei Xu
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - DongMing Wei
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Kun Qian
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
20
|
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F, Yang G. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2022; 82:99-122. [PMID: 35664012 PMCID: PMC8878813 DOI: 10.1016/j.inffus.2022.01.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 12/22/2021] [Accepted: 01/07/2022] [Indexed: 05/13/2023]
Abstract
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Collapse
Affiliation(s)
- Yang Nan
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Javier Del Ser
- Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
- TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
| | - Simon Walsh
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Carola Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
- Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
| | - Ian Selby
- Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
| | - Kit Howard
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - John Owen
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Jon Neville
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Julien Guiot
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | - Benoit Ernst
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | | | | | - Marion I. Menzel
- Technische Hochschule Ingolstadt, Ingolstadt, Germany
- GE Healthcare GmbH, Munich, Germany
| | - Sean Walsh
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Nina Flerin
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | | | | | - Avishek Chatterjee
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Henry Woodruff
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Philippe Lambin
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Leonor Cerdá-Alberich
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Luis Martí-Bonmatí
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Francisco Herrera
- Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
| |
Collapse
|
21
|
Ding X, Yang F, Chen Y, Xu J, He J, Zhang R, Abliz Z. Norm ISWSVR: A Data Integration and Normalization Approach for Large-Scale Metabolomics. Anal Chem 2022; 94:7500-7509. [PMID: 35584098 DOI: 10.1021/acs.analchem.1c05502] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Large-scale and long-period metabolomics study is more susceptible to various sources of systematic errors, resulting in nonreproducibility and poor data quality. A reliable and robust batch correction method removes unwanted systematic variations and improves the statistical power of metabolomics data, which undeniably becomes an important issue for the quality control of metabolomics. This study proposed a novel data normalization and integration method, Norm ISWSVR. It is a two-step approach via combining the best-performance internal standard correction with support vector regression normalization, comprehensively removing the systematic and random errors and matrix effects. This method was investigated in three untargeted lipidomics or metabolomics datasets, and the performance was further evaluated systematically in comparison with that of 11 other normalization methods. As a result, Norm ISWSVR decreased the data's median cross-validated relative standard deviation (cvRSD), increased the correlation between QCs, improved the classification accuracy of biomarkers, and was well-compatible with quantitative data. More importantly, Norm ISWSVR also allows a low frequency of QCs, which could significantly decrease the burden of a large-scale experiment. Correspondingly, Norm ISWSVR favorably improves the data quality of large-scale metabolomics data.
Collapse
Affiliation(s)
- Xian Ding
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050 Beijing, China
| | - Fen Yang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Center of Drug Clinical Trial, Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Yanhua Chen
- Key Laboratory of Mass Spectrometry Imaging and Metabolomics, Minzu University of China, State Ethnic Affairs Commission, 100081 Beijing, China.,Center for Imaging and Systems Biology, College of Life and Environmental Sciences, Minzu University of China, 100081 Beijing, China.,Key Laboratory of Ethnomedicine of Ministry of Education, School of Pharmacy, Minzu University of China, Beijing 100081, China
| | - Jing Xu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050 Beijing, China
| | - Jiuming He
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050 Beijing, China
| | - Ruiping Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050 Beijing, China
| | - Zeper Abliz
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, 100050 Beijing, China.,Key Laboratory of Mass Spectrometry Imaging and Metabolomics, Minzu University of China, State Ethnic Affairs Commission, 100081 Beijing, China.,Center for Imaging and Systems Biology, College of Life and Environmental Sciences, Minzu University of China, 100081 Beijing, China.,Key Laboratory of Ethnomedicine of Ministry of Education, School of Pharmacy, Minzu University of China, Beijing 100081, China
| |
Collapse
|
22
|
Han W, Li L. Evaluating and minimizing batch effects in metabolomics. MASS SPECTROMETRY REVIEWS 2022; 41:421-442. [PMID: 33238061 DOI: 10.1002/mas.21672] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 10/27/2020] [Accepted: 10/29/2020] [Indexed: 06/11/2023]
Abstract
Determining metabolomic differences among samples of different phenotypes is a critical component of metabolomics research. With the rapid advances in analytical tools such as ultrahigh-resolution chromatography and mass spectrometry, an increasing number of metabolites can now be profiled with high quantification accuracy. The increased detectability and accuracy raise the level of stringiness required to reduce or control any experimental artifacts that can interfere with the measurement of phenotype-related metabolome changes. One of the artifacts is the batch effect that can be caused by multiple sources. In this review, we discuss the origins of batch effects, approaches to detect interbatch variations, and methods to correct unwanted data variability due to batch effects. We recognize that minimizing batch effects is currently an active research area, yet a very challenging task from both experimental and data processing perspectives. Thus, we try to be critical in describing the performance of a reported method with the hope of stimulating further studies for improving existing methods or developing new methods.
Collapse
Affiliation(s)
- Wei Han
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| | - Liang Li
- Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
23
|
Abstract
![]()
Available automated
methods for peak detection in untargeted metabolomics
suffer from poor precision. We present NeatMS, which uses machine
learning based on a convoluted neural network to reduce the number
and fraction of false peaks. NeatMS comes with a pre-trained model
representing expert knowledge in the differentiation of true chemical
signal from noise. Furthermore, it provides all necessary functions
to easily train new models or improve existing ones by transfer learning.
Thus, the tool improves peak curation and contributes to the robust
and scalable analysis of large-scale experiments. We show how to integrate
it into different liquid chromatography–mass spectrometry (LC-MS)
analysis workflows, quantify its performance, and compare it to various
other approaches. NeatMS software is available as open source on github
under permissive MIT license and is also provided as easy-to-install
PyPi and Bioconda packages.
Collapse
Affiliation(s)
- Yoann Gloaguen
- Berlin Institute of Health at Charité, Metabolomics Platform, 10178 Berlin, Germany.,Berlin Institute of Health at Charité, Core Unit Bioinformatics, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| | - Jennifer A Kirwan
- Berlin Institute of Health at Charité, Metabolomics Platform, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| | - Dieter Beule
- Berlin Institute of Health at Charité, Core Unit Bioinformatics, 10178 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany
| |
Collapse
|
24
|
|
25
|
Jirayupat C, Nagashima K, Hosomi T, Takahashi T, Tanaka W, Samransuksamer B, Zhang G, Liu J, Kanai M, Yanagida T. Image Processing and Machine Learning for Automated Identification of Chemo-/Biomarkers in Chromatography-Mass Spectrometry. Anal Chem 2021; 93:14708-14715. [PMID: 34704450 DOI: 10.1021/acs.analchem.1c03163] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a method named NPFimg, which automatically identifies multivariate chemo-/biomarker features of analytes in chromatography-mass spectrometry (MS) data by combining image processing and machine learning. NPFimg processes a two-dimensional MS map (m/z vs retention time) to discriminate analytes and identify and visualize the marker features. Our approach allows us to comprehensively characterize the signals in MS data without the conventional peak picking process, which suffers from false peak detections. The feasibility of marker identification is successfully demonstrated in case studies of aroma odor and human breath on gas chromatography-mass spectrometry (GC-MS) even at the parts per billion level. Comparison with the widely used XCMS shows the excellent reliability of NPFimg, in that it has lower error rates of signal acquisition and marker identification. In addition, we show the potential applicability of NPFimg to the untargeted metabolomics of human breath. While this study shows the limited applications, NPFimg is potentially applicable to data processing in diverse metabolomics/chemometrics using GC-MS and liquid chromatography-MS. NPFimg is available as open source on GitHub (http://github.com/poomcj/NPFimg) under the MIT license.
Collapse
Affiliation(s)
- Chaiyanut Jirayupat
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.,Interdisciplinary Graduate School of Engineering Sciences, Kyushu University, 6-1 Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japan
| | - Kazuki Nagashima
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.,Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Takuro Hosomi
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.,Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Tsunaki Takahashi
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.,Japan Science and Technology Agency (JST), PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Wataru Tanaka
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Benjarong Samransuksamer
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Guozhu Zhang
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Jiangyang Liu
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Masaki Kanai
- Institute for Materials Chemistry and Engineering, Kyushu University, 6-1 Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japan
| | - Takeshi Yanagida
- Department of Applied Chemistry, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.,Interdisciplinary Graduate School of Engineering Sciences, Kyushu University, 6-1 Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japan.,Institute for Materials Chemistry and Engineering, Kyushu University, 6-1 Kasuga-Koen, Kasuga, Fukuoka 816-8580, Japan
| |
Collapse
|
26
|
Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Anal Chim Acta 2021; 1161:338403. [DOI: 10.1016/j.aca.2021.338403] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 01/01/2023]
|
27
|
Sen P, Lamichhane S, Mathema VB, McGlinchey A, Dickens AM, Khoomrung S, Orešič M. Deep learning meets metabolomics: a methodological perspective. Brief Bioinform 2020; 22:1531-1542. [PMID: 32940335 DOI: 10.1093/bib/bbaa204] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/08/2020] [Accepted: 08/10/2020] [Indexed: 12/15/2022] Open
Abstract
Deep learning (DL), an emerging area of investigation in the fields of machine learning and artificial intelligence, has markedly advanced over the past years. DL techniques are being applied to assist medical professionals and researchers in improving clinical diagnosis, disease prediction and drug discovery. It is expected that DL will help to provide actionable knowledge from a variety of 'big data', including metabolomics data. In this review, we discuss the applicability of DL to metabolomics, while presenting and discussing several examples from recent research. We emphasize the use of DL in tackling bottlenecks in metabolomics data acquisition, processing, metabolite identification, as well as in metabolic phenotyping and biomarker discovery. Finally, we discuss how DL is used in genome-scale metabolic modelling and in interpretation of metabolomics data. The DL-based approaches discussed here may assist computational biologists with the integration, prediction and drawing of statistical inference about biological outcomes, based on metabolomics data.
Collapse
Affiliation(s)
- Partho Sen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Santosh Lamichhane
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Vivek B Mathema
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Aidan McGlinchey
- School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| | - Alex M Dickens
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sakda Khoomrung
- Metabolomics and Systems Biology, Department of Biochemistry, and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand.,Center for Innovation in Chemistry (PERCH), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| | - Matej Orešič
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland.,School of Medical Sciences, Örebro University, 702 81 Örebro, Sweden
| |
Collapse
|
28
|
Liu Q, Walker D, Uppal K, Liu Z, Ma C, Tran V, Li S, Jones DP, Yu T. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci Rep 2020; 10:13856. [PMID: 32807888 PMCID: PMC7431853 DOI: 10.1038/s41598-020-70850-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 07/28/2020] [Indexed: 12/31/2022] Open
Abstract
With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography–Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.
Collapse
Affiliation(s)
- Qin Liu
- School of Software Engineering, Tongji University, Shanghai, 201804, China
| | - Douglas Walker
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Karan Uppal
- Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Zihe Liu
- School of Software Engineering, Tongji University, Shanghai, 201804, China
| | - Chunyu Ma
- Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - ViLinh Tran
- Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Shuzhao Li
- The Jackson Laboratory, Farmington, CT, 06032, USA
| | - Dean P Jones
- Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Shenzhen, 518172, Guangdong Province, China.
| |
Collapse
|
29
|
Matyushin DD, Sholokhova AY, Buryak AK. Deep Learning Driven GC-MS Library Search and Its Application for Metabolomics. Anal Chem 2020; 92:11818-11825. [PMID: 32867500 DOI: 10.1021/acs.analchem.0c02082] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Preliminary compound identification and peak annotation in gas chromatography-mass spectrometry is usually made using mass spectral databases. There are a few algorithms that enable performing a search of a spectrum in a large mass spectral library. In many cases, a library search procedure returns a wrong answer even if a correct compound is contained in a library. In this work, we present a deep learning driven approach to a library search in order to reduce the probability of such cases. Machine learning ranking (learning to rank) is a class of machine learning and deep learning algorithms that perform a comparison (ranking) of objects. This work introduces the usage of deep learning ranking for small molecules identification using low-resolution electron ionization mass spectrometry. Instead of simple similarity measures for two spectra, such as the dot product or the Euclidean distance between vectors that represent spectra, a deep convolutional neural network is used. The deep learning ranking model outperforms other approaches and enables reducing a fraction of wrong answers (at rank-1) by 9-23% depending on the used data set. Spectra from the Golm Metabolome Database, Human Metabolome Database, and FiehnLib were used for testing the model.
Collapse
Affiliation(s)
- Dmitriy D Matyushin
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| | - Anastasia Yu Sholokhova
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| | - Aleksey K Buryak
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia
| |
Collapse
|