1
|
Wang R, Lu M, An S, Wang J, Yu C. G-Aligner: a graph-based feature alignment method for untargeted LC-MS-based metabolomics. BMC Bioinformatics 2023; 24:431. [PMID: 37964228 PMCID: PMC10644574 DOI: 10.1186/s12859-023-05525-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/09/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Liquid chromatography-mass spectrometry is widely used in untargeted metabolomics for composition profiling. In multi-run analysis scenarios, features of each run are aligned into consensus features by feature alignment algorithms to observe the intensity variations across runs. However, most of the existing feature alignment methods focus more on accurate retention time correction, while underestimating the importance of feature matching. None of the existing methods can comprehensively consider feature correspondences among all runs and achieve optimal matching. RESULTS To comprehensively analyze feature correspondences among runs, we propose G-Aligner, a graph-based feature alignment method for untargeted LC-MS data. In the feature matching stage, G-Aligner treats features and potential correspondences as nodes and edges in a multipartite graph, considers the multi-run feature matching problem an unbalanced multidimensional assignment problem, and provides three combinatorial optimization algorithms to find optimal matching solutions. In comparison with the feature alignment methods in OpenMS, MZmine2 and XCMS on three public metabolomics benchmark datasets, G-Aligner achieved the best feature alignment performance on all the three datasets with up to 9.8% and 26.6% increase in accurately aligned features and analytes, and helped all comparison software obtain more accurate results on their self-extracted features by integrating G-Aligner to their analysis workflow. G-Aligner is open-source and freely available at https://github.com/CSi-Studio/G-Aligner under a permissive license. Benchmark datasets, manual annotation results, evaluation methods and results are available at https://doi.org/10.5281/zenodo.8313034 CONCLUSIONS: In this study, we proposed G-Aligner to improve feature matching accuracy for untargeted metabolomics LC-MS data. G-Aligner comprehensively considered potential feature correspondences between all runs, converting the feature matching problem as a multidimensional assignment problem (MAP). In evaluations on three public metabolomics benchmark datasets, G-Aligner achieved the highest alignment accuracy on manual annotated and popular software extracted features, proving the effectiveness and robustness of the algorithm.
Collapse
Affiliation(s)
- Ruimin Wang
- Fudan University, Shanghai, 200433, Shanghai, China
- School of Engineering, Westlake University, Hangzhou, 310030, Zhejiang, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
| | - Miaoshan Lu
- School of Engineering, Westlake University, Hangzhou, 310030, Zhejiang, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Shaowei An
- Fudan University, Shanghai, 200433, Shanghai, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
| | - Jinyin Wang
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
| | - Changbin Yu
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China.
| |
Collapse
|
2
|
Skoraczyński G, Gambin A, Miasojedow B. Alignstein: Optimal transport for improved LC-MS retention time alignment. Gigascience 2022; 11:6795291. [PMID: 36329619 PMCID: PMC9633278 DOI: 10.1093/gigascience/giac101] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/24/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
Background Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography–mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes’ elution order). Results We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. Conclusions We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm’s source code is available at https://github.com/grzsko/Alignstein.
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- Correspondence address. Grzegorz Skoraczyński, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland. E-mail:
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Błażej Miasojedow
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
3
|
Gupta S, Ahadi S, Zhou W, Röst H. DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics. Mol Cell Proteomics 2019; 18:806-817. [PMID: 30705124 PMCID: PMC6442363 DOI: 10.1074/mcp.tir118.001132] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 12/25/2018] [Indexed: 01/30/2023] Open
Abstract
Sequential Windowed Acquisition of All Theoretical Fragment Ion Mass Spectra (SWATH-MS) is widely used for proteomics analysis given its high throughput and reproducibility, but ensuring consistent quantification of analytes across large-scale studies of heterogeneous samples such as human plasma remains challenging. Heterogeneity in large-scale studies can be caused by large time intervals between data acquisition, acquisition by different operators or instruments, and intermittent repair or replacement of parts, such as the liquid chromatography column, all of which affect retention time (RT) reproducibility and, successively, performance of SWATH-MS data analysis. Here, we present a novel algorithm for RT alignment of SWATH-MS data based on direct alignment of raw MS2 chromatograms using a hybrid dynamic programming approach. The algorithm does not impose a chronological order of elution and allows for alignment of elution-order-swapped peaks. Furthermore, allowing RT mapping in a certain window around a coarse global fit makes it robust against noise. On a manually validated dataset, this strategy outperformed the current state-of-the-art approaches. In addition, on real-world clinical data, our approach outperformed global alignment methods by mapping 98% of peaks compared with 67% cumulatively. DIAlignR reduced alignment error up to 30-fold for extremely distant runs. The robustness of technical parameters used in this pairwise alignment strategy is also demonstrated. The source code is released under the BSD license at https://github.com/Roestlab/DIAlignR.
Collapse
Affiliation(s)
- Shubham Gupta
- From the ‡Department of Molecular Genetics, University of Toronto, Toronto, ON M5G 1A8, Canada;; The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Sara Ahadi
- ¶Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305
| | - Wenyu Zhou
- ¶Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305
| | - Hannes Röst
- From the ‡Department of Molecular Genetics, University of Toronto, Toronto, ON M5G 1A8, Canada;; The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada;.
| |
Collapse
|
4
|
Di Silvestre D, Bergamaschi A, Bellini E, Mauri P. Large Scale Proteomic Data and Network-Based Systems Biology Approaches to Explore the Plant World. Proteomes 2018; 6:proteomes6020027. [PMID: 29865292 PMCID: PMC6027444 DOI: 10.3390/proteomes6020027] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 05/30/2018] [Accepted: 06/01/2018] [Indexed: 12/26/2022] Open
Abstract
The investigation of plant organisms by means of data-derived systems biology approaches based on network modeling is mainly characterized by genomic data, while the potential of proteomics is largely unexplored. This delay is mainly caused by the paucity of plant genomic/proteomic sequences and annotations which are fundamental to perform mass-spectrometry (MS) data interpretation. However, Next Generation Sequencing (NGS) techniques are contributing to filling this gap and an increasing number of studies are focusing on plant proteome profiling and protein-protein interactions (PPIs) identification. Interesting results were obtained by evaluating the topology of PPI networks in the context of organ-associated biological processes as well as plant-pathogen relationships. These examples foreshadow well the benefits that these approaches may provide to plant research. Thus, in addition to providing an overview of the main-omic technologies recently used on plant organisms, we will focus on studies that rely on concepts of module, hub and shortest path, and how they can contribute to the plant discovery processes. In this scenario, we will also consider gene co-expression networks, and some examples of integration with metabolomic data and genome-wide association studies (GWAS) to select candidate genes will be mentioned.
Collapse
Affiliation(s)
- Dario Di Silvestre
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - Andrea Bergamaschi
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - Edoardo Bellini
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - PierLuigi Mauri
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| |
Collapse
|
5
|
Tutorial: Correction of shifts in single-stage LC-MS(/MS) data. Anal Chim Acta 2018; 999:37-53. [DOI: 10.1016/j.aca.2017.09.039] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 09/26/2017] [Accepted: 09/27/2017] [Indexed: 11/19/2022]
|
6
|
Wu L, Amon S, Lam H. A hybrid retention time alignment algorithm for SWATH-MS data. Proteomics 2016; 16:2272-83. [DOI: 10.1002/pmic.201500511] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 05/06/2016] [Accepted: 06/10/2016] [Indexed: 11/09/2022]
Affiliation(s)
- Long Wu
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
| | - Sabine Amon
- Department of Biology; Institute of Molecular Systems Biology; ETH Zurich; Zurich Switzerland
| | - Henry Lam
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong P. R. China
| |
Collapse
|
7
|
Wandy J, Daly R, Breitling R, Rogers S. Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets. Bioinformatics 2015; 31:1999-2006. [PMID: 25649621 PMCID: PMC4760236 DOI: 10.1093/bioinformatics/btv072] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Accepted: 01/28/2015] [Indexed: 11/24/2022] Open
Abstract
Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that coelute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pair-wise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result. Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools. Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment. Contact:Simon.Rogers@glasgow.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joe Wandy
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Rónán Daly
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Rainer Breitling
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| |
Collapse
|
8
|
Lu L, Wang J, Xu Y, Wang K, Hu Y, Tian R, Yang B, Lai Q, Li Y, Zhang W, Shao Z, Lam H, Qian PY. A high-resolution LC-MS-based secondary metabolite fingerprint database of marine bacteria. Sci Rep 2014; 4:6537. [PMID: 25298017 PMCID: PMC5377448 DOI: 10.1038/srep06537] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 09/04/2014] [Indexed: 01/01/2023] Open
Abstract
Marine bacteria are the most widely distributed organisms in the ocean environment and produce a wide variety of secondary metabolites. However, traditional screening for bioactive natural compounds is greatly hindered by the lack of a systematic way of cataloguing the chemical profiles of bacterial strains found in nature. Here we present a chemical fingerprint database of marine bacteria based on their secondary metabolite profiles, acquired by high-resolution LC-MS. Till now, 1,430 bacterial strains spanning 168 known species collected from different marine environments were cultured and profiled. Using this database, we demonstrated that secondary metabolite profile similarity is approximately, but not always, correlated with taxonomical similarity. We also validated the ability of this database to find species-specific metabolites, as well as to discover known bioactive compounds from previously unknown sources. An online interface to this database, as well as the accompanying software, is provided freely for the community to use.
Collapse
Affiliation(s)
- Liang Lu
- 1] Environmental Science Program, School of Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China [2]
| | - Jijie Wang
- 1] Division of Biomedical Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China [2]
| | - Ying Xu
- Division of Life Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Kailing Wang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
| | - Yingwei Hu
- Department of Chemical and Biomolecular Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Renmao Tian
- Environmental Science Program, School of Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Bo Yang
- Division of Life Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Qiliang Lai
- Third Institute of Oceanography, State Oceanic Administration, Xiamen 361005, China
| | - Yongxin Li
- Division of Life Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Weipeng Zhang
- Environmental Science Program, School of Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Zongze Shao
- Third Institute of Oceanography, State Oceanic Administration, Xiamen 361005, China
| | - Henry Lam
- 1] Division of Biomedical Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China [2] Department of Chemical and Biomolecular Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| | - Pei-Yuan Qian
- 1] Environmental Science Program, School of Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China [2] Division of Life Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China
| |
Collapse
|