1
|
Imputation of Missing Values for Multi-Biospecimen Metabolomics Studies: Bias and Effects on Statistical Validity. Metabolites 2022; 12:metabo12070671. [PMID: 35888795 PMCID: PMC9317643 DOI: 10.3390/metabo12070671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/07/2022] [Accepted: 07/19/2022] [Indexed: 02/05/2023] Open
Abstract
The analysis of high-throughput metabolomics mass spectrometry data across multiple biological sample types (biospecimens) poses challenges due to missing data. During differential abundance analysis, dropping samples with missing values can lead to severe loss of data as well as biased results in group comparisons and effect size estimates. However, the imputation of missing data (the process of replacing missing data with estimated values such as a mean) may compromise the inherent intra-subject correlation of a metabolite across multiple biospecimens from the same subject, which in turn may compromise the efficacy of the statistical analysis of differential metabolites in biomarker discovery. We investigated imputation strategies when considering multiple biospecimens from the same subject. We compared a novel, but simple, approach that consists of combining the two biospecimen data matrices (rows and columns of subjects and metabolites) and imputes the two biospecimen data matrices together to an approach that imputes each biospecimen data matrix separately. We then compared the bias in the estimation of the intra-subject multi-specimen correlation and its effects on the validity of statistical significance tests between two approaches. The combined approach to multi-biospecimen studies has not been evaluated previously even though it is intuitive and easy to implement. We examine these two approaches for five imputation methods: random forest, k nearest neighbor, expectation-maximization with bootstrap, quantile regression, and half the minimum observed value. Combining the biospecimen data matrices for imputation did not greatly increase efficacy in conserving the correlation structure or improving accuracy in the statistical conclusions for most of the methods examined. Random forest tended to outperform the other methods in all performance metrics, except specificity.
Collapse
|
2
|
Wei W, Yan H, Zhao J, Li H, Li Z, Guo H, Wang X, Zhou Y, Zhang X, Zeng J, Chen T, Zhou L. Multi-omics comparisons of p-aminosalicylic acid (PAS) resistance in folC mutated and un-mutated Mycobacterium tuberculosis strains. Emerg Microbes Infect 2019; 8:248-261. [PMID: 30866779 PMCID: PMC6455211 DOI: 10.1080/22221751.2019.1568179] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
p-Aminosalicylic acid (PAS) is an important second-line antibiotic for treating multidrug-resistant tuberculosis (MDR-TB). Due to gastrointestinal disturbance and intolerance, its potent and efficacy in the treatment of extensively drug-resistant (XDR)-TB commonly are poor. Thus, it is important to reveal the mechanism of susceptibility and resistance of Mycobacterium tuberculosis (Mtb) to this drug. Herein, we screened and established PAS-resistant (PASr) folC mutated and un-mutated Mtb strains, then utilized a multi-omics (genome, proteome, and metabolome) analysis to better characterize the mechanisms of PAS resistance in Mtb. Interestingly, we found that promotion of SAM-dependent methyltransferases and suppression of PAS uptake via inhibiting some drug transport associated membrane proteins were two key pathways for the folC mutated strain evolving into the PASr Mtb strain. However, the folC un-mutated strain was resistant to PAS via uptake of exogenous methionine, mitigating the role of inhibitors, and promoting DfrA, ThyA and FolC expression. Beyond these findings, we also found PAS resistance in Mtb might be associated with the increasing phenylalanine metabolism pathway. Collectively, our findings uncovered the differences of resistant mechanism between folC mutated and un-mutated Mtb strains resistant to PAS using multi-omics analysis and targeting modulators to these pathways may be effective for treatment of PASr Mtb strains.
Collapse
Affiliation(s)
- Wenjing Wei
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| | - Huimin Yan
- c Dongguan Key Laboratory of Medical Bioactive Molecular Development and Translational Research, Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics , Guangdong Medical University , Dongguan , People's Republic of China
| | - Jiao Zhao
- d Jinan University , Guangzhou , People's Republic of China
| | - Haicheng Li
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| | - Zhenyan Li
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| | - Huixin Guo
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| | - Xuezhi Wang
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| | - Ying Zhou
- e School of Stomatology and Medicine , Foshan University , Foshan , People's Republic of China
| | - Xiaoli Zhang
- e School of Stomatology and Medicine , Foshan University , Foshan , People's Republic of China
| | - Jincheng Zeng
- c Dongguan Key Laboratory of Medical Bioactive Molecular Development and Translational Research, Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics , Guangdong Medical University , Dongguan , People's Republic of China
| | - Tao Chen
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China.,f South China Institute of Biomedicine , Guangzhou , People's Republic of China
| | - Lin Zhou
- a Center for Tuberculosis Control of Guangdong Province , Guangzhou , People's Republic of China.,b Key Laboratory of Translational Medicine of Guangdong , Guangzhou , People's Republic of China
| |
Collapse
|
3
|
Myint L, Kleensang A, Zhao L, Hartung T, Hansen KD. Joint Bounding of Peaks Across Samples Improves Differential Analysis in Mass Spectrometry-Based Metabolomics. Anal Chem 2017; 89:3517-3523. [PMID: 28221771 PMCID: PMC5362739 DOI: 10.1021/acs.analchem.6b04719] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 02/21/2017] [Indexed: 12/20/2022]
Abstract
As mass spectrometry-based metabolomics becomes more widely used in biomedical research, it is important to revisit existing data analysis paradigms. Existing data preprocessing efforts have largely focused on methods which start by extracting features separately from each sample, followed by a subsequent attempt to group features across samples to facilitate comparisons. We show that this preprocessing approach leads to unnecessary variability in peak quantifications that adversely impacts downstream analysis. We present a new method, bakedpi, for the preprocessing of both centroid and profile mode metabolomics data that relies on an intensity-weighted bivariate kernel density estimation on a pooling of all samples to detect peaks. This new method reduces this unnecessary quantification variability and increases power in downstream differential analysis.
Collapse
Affiliation(s)
- Leslie Myint
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, United States
| | - Andre Kleensang
- Center for Alternatives to Animal Testing (CAAT), Department of Environmental
Health and Engineering, Johns Hopkins Bloomberg
School of Public Health, Baltimore, Maryland 21205, United States
| | - Liang Zhao
- Center for Alternatives to Animal Testing (CAAT), Department of Environmental
Health and Engineering, Johns Hopkins Bloomberg
School of Public Health, Baltimore, Maryland 21205, United States
| | - Thomas Hartung
- Center for Alternatives to Animal Testing (CAAT), Department of Environmental
Health and Engineering, Johns Hopkins Bloomberg
School of Public Health, Baltimore, Maryland 21205, United States
- University of Konstanz, CAAT-Europe, 78457 Konstanz, Germany
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, United States
- McKusick-Nathans
Institute of Genetic Medicine, Johns Hopkins
University School of Medicine, Baltimore, Maryland 21205, United States
| |
Collapse
|