1
|
Ma G, Kang J, Yu T. Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes. Brief Bioinform 2024; 25:bbae141. [PMID: 38581417 PMCID: PMC10998539 DOI: 10.1093/bib/bbae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/08/2024] Open
Abstract
Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.
Collapse
Affiliation(s)
- Guoxuan Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong - Shenzhen (CUHK-Shenzhen), Shenzhen, Guangdong 518172, China
| |
Collapse
|
2
|
Hall SM, Raines NH, Ramirez-Rubio O, Amador JJ, López-Pilarte D, O'Callaghan-Gordo C, Gil-Redondo R, Embade N, Millet O, Peng X, Vences S, Keogh SA, Delgado IS, Friedman DJ, Brooks DR, Leibler JH. Urinary Metabolomic Profile of Youth at Risk of Chronic Kidney Disease in Nicaragua. KIDNEY360 2023; 4:899-908. [PMID: 37068179 PMCID: PMC10371259 DOI: 10.34067/kid.0000000000000129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/22/2023] [Indexed: 04/19/2023]
Abstract
Key Points Urinary concentrations of glycine, a molecule associated with thermoregulation, were elevated among youth from a high-risk region for chronic kidney disease of non-traditional etiology (CKDnt). Urinary concentrations of pyruvate, citric acid, and inosine were lower among youth at higher risk of CKDnt, suggesting renal stress. Metabolomic analyses may shed light on early disease processes or profiles or risk in the context of CKDnt. Background CKD of a nontraditional etiology (CKDnt) is responsible for high mortality in Central America, although its causes remain unclear. Evidence of kidney dysfunction has been observed among youth, suggesting that early kidney damage contributing to CKDnt may initiate in childhood. Methods Urine specimens of young Nicaraguan participants 12–23 years without CKDnt (n =136) were analyzed by proton nuclear magnetic resonance spectroscopy for 50 metabolites associated with kidney dysfunction. Urinary metabolite levels were compared by, regional CKDnt prevalence, sex, age, and family history of CKDnt using supervised statistical methods and pathway analysis in MetaboAnalyst. Magnitude of associations and changes over time were assessed through multivariable linear regression. Results In adjusted analyses, glycine concentrations were higher among youth from high-risk regions (β =0.82, [95% confidence interval, 0.16 to 1.85]; P = 0.01). Pyruvate concentrations were lower among youth with low eGFR (β = −0.36 [95% confidence interval, −0.57 to −0.04]; P = 0.03), and concentrations of other citric acid cycle metabolites differed by key risk factors. Over four years, participants with low eGFR experienced greater declines in 1-methylnicotinamide and 2-oxoglutarate and greater increases in citrate and guanidinoacetate concentrations. Conclusion Urinary concentration of glycine, a molecule associated with thermoregulation and kidney function preservation, was higher among youth in high-risk CKDnt regions, suggestive of greater heat exposure or renal stress. Lower pyruvate concentrations were associated with low eGFR, and citric acid cycle metabolites, such as pyruvate, likely relate to mitochondrial respiration rates in the kidneys. Participants with low eGFR experienced longitudinal declines in concentrations of 1-methylnicotinamide, an anti-inflammatory metabolite associated with anti-fibrosis in tubule cells. These findings merit further consideration in research on the origins of CKDnt.
Collapse
Affiliation(s)
- Samantha M. Hall
- Department of Environmental Health, Boston University School of Public Health, Boston, Massachusetts
| | - Nathan H. Raines
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts
| | - Oriana Ramirez-Rubio
- Barcelona Institute for Global Health, ISGlobal, Barcelona, Spain
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts
| | - Juan José Amador
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts
| | - Damaris López-Pilarte
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts
| | - Cristina O'Callaghan-Gordo
- Barcelona Institute for Global Health, ISGlobal, Barcelona, Spain
- Faculty of Health Sciences, Universitat Oberta de Catalunya, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
| | - Rubén Gil-Redondo
- Precision Medicine and Metabolism Laboratory, CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia, Spain
| | - Nieves Embade
- Precision Medicine and Metabolism Laboratory, CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia, Spain
| | - Oscar Millet
- Precision Medicine and Metabolism Laboratory, CIC bioGUNE, Basque Research and Technology Alliance (BRTA), Bizkaia, Spain
- CIBERehd, Instituto de Salud Carlos III, Madrid, Spain
| | - Xiaojing Peng
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Selene Vences
- Department of Environmental Health, Boston University School of Public Health, Boston, Massachusetts
| | - Sinead A. Keogh
- Department of Environmental Health, Boston University School of Public Health, Boston, Massachusetts
| | - Iris S. Delgado
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts
| | - David J. Friedman
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts
| | - Daniel R. Brooks
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts
| | - Jessica H. Leibler
- Department of Environmental Health, Boston University School of Public Health, Boston, Massachusetts
| |
Collapse
|
3
|
Signorelli M, Tsonaka R, Aartsma-Rus A, Spitali P. Multiomic characterization of disease progression in mice lacking dystrophin. PLoS One 2023; 18:e0283869. [PMID: 37000843 PMCID: PMC10065259 DOI: 10.1371/journal.pone.0283869] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 03/19/2023] [Indexed: 04/03/2023] Open
Abstract
Duchenne muscular dystrophy (DMD) is caused by genetic mutations leading to lack of dystrophin in skeletal muscle. A better understanding of how objective biomarkers for DMD vary across subjects and over time is needed to model disease progression and response to therapy more effectively, both in pre-clinical and clinical research. We present an in-depth characterization of disease progression in 3 murine models of DMD by multiomic analysis of longitudinal trajectories between 6 and 30 weeks of age. Integration of RNA-seq, mass spectrometry-based metabolomic and lipidomic data obtained in muscle and blood samples by Multi-Omics Factor Analysis (MOFA) led to the identification of 8 latent factors that explained 78.8% of the variance in the multiomic dataset. Latent factors could discriminate dystrophic and healthy mice, as well as different time-points. MOFA enabled to connect the gene expression signature in dystrophic muscles, characterized by pro-fibrotic and energy metabolism alterations, to inflammation and lipid signatures in blood. Our results show that omic observations in blood can be directly related to skeletal muscle pathology in dystrophic muscle.
Collapse
Affiliation(s)
- Mirko Signorelli
- Mathematical Institute, Leiden University, Leiden, The Netherlands
| | - Roula Tsonaka
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Annemieke Aartsma-Rus
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Pietro Spitali
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
4
|
Tian L, Li Z, Ma G, Zhang X, Tang Z, Wang S, Kang J, Liang D, Yu T. Metapone: a Bioconductor package for joint pathway testing for untargeted metabolomics data. Bioinformatics 2022; 38:3662-3664. [PMID: 35639952 PMCID: PMC9272804 DOI: 10.1093/bioinformatics/btac364] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/07/2022] [Accepted: 05/25/2022] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Testing for pathway enrichment is an important aspect in the analysis of untargeted metabolomics data. Due to the unique characteristics of untargeted metabolomics data, some key issues have not been fully addressed in existing pathway testing algorithms: (1) matching uncertainty between data features and metabolites; (2) lacking of method to analyze positive mode and negative mode LC/MS data simultaneously on the same set of subjects; (3) the incompleteness of pathways in individual software packages. RESULTS We developed an innovative R/Bioconductor package: metabolic pathway testing with positive and negative mode data (metapone), which can perform two novel statistical tests that take matching uncertainty into consideration - (1) a weighted GSEA-type test, and (2) a permutation-based weighted hypergeometric test. The package is capable of combining positive and negative ion mode results in a single testing scheme. For comprehensiveness, the built-in pathways were manually curated from three sources: KEGG, Mummichog, and SMPDB. AVAILABILITY The package is available at https://bioconductor.org/packages/devel/bioc/html/metapone.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leqi Tian
- Shenzhen Research Institute of Big Data.,School of Data Science, The Chinese University of Hong Kong-Shenzhen
| | - Zhenjiang Li
- Gangarosa Department of Environmental Health, Emory University
| | - Guoxuan Ma
- School of Data Science, The Chinese University of Hong Kong-Shenzhen.,Department of Biostatistics, University of Michigan
| | - Xiaoyue Zhang
- Gangarosa Department of Environmental Health, Emory University
| | - Ziyin Tang
- Gangarosa Department of Environmental Health, Emory University
| | - Siheng Wang
- School of Data Science, The Chinese University of Hong Kong-Shenzhen
| | - Jian Kang
- Department of Biostatistics, University of Michigan
| | - Donghai Liang
- Gangarosa Department of Environmental Health, Emory University
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data.,School of Data Science, The Chinese University of Hong Kong-Shenzhen.,Warshel Institute, Shenzhen, Guangdong, China
| |
Collapse
|
5
|
Signorelli M, Spitali P, Szigyarto CAK, Tsonaka R. Penalized regression calibration: A method for the prediction of survival outcomes using complex longitudinal and high-dimensional data. Stat Med 2021; 40:6178-6196. [PMID: 34464990 PMCID: PMC9293191 DOI: 10.1002/sim.9178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 08/10/2021] [Accepted: 08/10/2021] [Indexed: 11/18/2022]
Abstract
Longitudinal and high‐dimensional measurements have become increasingly common in biomedical research. However, methods to predict survival outcomes using covariates that are both longitudinal and high‐dimensional are currently missing. In this article, we propose penalized regression calibration (PRC), a method that can be employed to predict survival in such situations. PRC comprises three modeling steps: First, the trajectories described by the longitudinal predictors are flexibly modeled through the specification of multivariate mixed effects models. Second, subject‐specific summaries of the longitudinal trajectories are derived from the fitted mixed models. Third, the time to event outcome is predicted using the subject‐specific summaries as covariates in a penalized Cox model. To ensure a proper internal validation of the fitted PRC models, we furthermore develop a cluster bootstrap optimism correction procedure that allows to correct for the optimistic bias of apparent measures of predictiveness. PRC and the CBOCP are implemented in the R package pencal, available from CRAN. After studying the behavior of PRC via simulations, we conclude by illustrating an application of PRC to data from an observational study that involved patients affected by Duchenne muscular dystrophy, where the goal is predict time to loss of ambulation using longitudinal blood biomarkers.
Collapse
Affiliation(s)
- Mirko Signorelli
- Mathematical Institute, Leiden University, Leiden, The Netherlands
| | - Pietro Spitali
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | - Roula Tsonaka
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
6
|
Ebrahimpoor M, Spitali P, Goeman JJ, Tsonaka R. Pathway testing for longitudinal metabolomics. Stat Med 2021; 40:3053-3065. [PMID: 33768548 PMCID: PMC8252476 DOI: 10.1002/sim.8957] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 02/19/2021] [Accepted: 03/04/2021] [Indexed: 01/12/2023]
Abstract
We propose a top‐down approach for pathway analysis of longitudinal metabolite data. We apply a score test based on a shared latent process mixed model which can identify pathways with differentially progressing metabolites. The strength of our approach is that it can handle unbalanced designs, deals with potential missing values in the longitudinal markers, and gives valid results even with small sample sizes. Contrary to bottom‐up approaches, correlations between metabolites are explicitly modeled leveraging power gains. For large pathway sizes, a computationally efficient solution is proposed based on pseudo‐likelihood methodology. We demonstrate the advantages of the proposed method in identification of differentially expressed pathways through simulation studies. Finally, longitudinal metabolite data from a mice experiment is analyzed to demonstrate our methodology.
Collapse
Affiliation(s)
- Mitra Ebrahimpoor
- Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Pietro Spitali
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jelle J Goeman
- Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Roula Tsonaka
- Medical Statistics, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|