1
|
Yan Y, Sankar BS, Mirza B, Ng DCM, Pelletier AR, Huang SD, Wang W, Watson K, Wang D, Ping P. Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation. J Proteome Res 2024; 23:4151-4162. [PMID: 39189460 DOI: 10.1021/acs.jproteome.4c00263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
Temporal proteomics data sets are often confounded by the challenges of missing values. These missing data points, in a time-series context, can lead to fluctuations in measurements or the omission of critical events, thus hindering the ability to fully comprehend the underlying biomedical processes. We introduce a Data Multiple Imputation (DMI) pipeline designed to address this challenge in temporal data set turnover rate quantifications, enabling robust downstream analysis to gain novel discoveries. To demonstrate its utility and generalizability, we applied this pipeline to two use cases: a murine cardiac temporal proteomics data set and a human plasma temporal proteomics data set, both aimed at examining protein turnover rates. This DMI pipeline significantly enhanced the detection of protein turnover rate in both data sets, and furthermore, the imputed data sets captured new representation of proteins, leading to an augmented view of biological pathways, protein complex dynamics, as well as biomarker-disease associations. Importantly, DMI exhibited superior performance in benchmark data sets compared to single imputation methods (DSI). In summary, we have demonstrated that this DMI pipeline is effective at overcoming challenges introduced by missing values in temporal proteome dynamics studies.
Collapse
Affiliation(s)
- Yu Yan
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
| | - Baradwaj Simha Sankar
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
| | - Bilal Mirza
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
| | - Dominic C M Ng
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
| | - Alexander R Pelletier
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- Department of Computer Science and Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, California 90095, United States
| | - Sarah D Huang
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
| | - Wei Wang
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- Department of Computer Science and Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, California 90095, United States
| | - Karol Watson
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
| | - Ding Wang
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
| | - Peipei Ping
- Departments of Physiology and Medicine, University of California, Los Angeles (UCLA) School of Medicine, Los Angeles, California 90095, United States
- NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Los Angeles, California 90095, United States
- NIH BRIDGE2AI Center & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA, Suite 1-609, MRL Building, 675 Charles E. Young Drive South, Los Angeles, California 90095, United States
- Department of Computer Science and Scalable Analytics Institute, UCLA School of Engineering, Los Angeles, California 90095, United States
| |
Collapse
|
2
|
Flanagan SD, Hougland JR, Zeng X, Cantrell PS, Sun M, Jones-Laughner J, Canino MC, Hughes JM, Foulis SA, Taylor KM, Walker LA, Guerriere KI, Sterczala AJ, Connaboy C, Beckner ME, Matheny RW, Nindl BC. Urinary Proteomic Biomarkers of Trabecular Bone Volume Change during Army Basic Combat Training. Med Sci Sports Exerc 2024; 56:1644-1654. [PMID: 38758530 DOI: 10.1249/mss.0000000000003464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
PURPOSE The purpose of this study is to optimize a dMS-based urinary proteomic technique and evaluate the relationship between urinary proteome content and adaptive changes in bone microarchitecture during BCT. METHODS Urinary proteomes were analyzed with an optimized dMS technique in two groups of 13 recruits ( N = 26) at the beginning (Pre) and end (Post) of BCT. Matched by age (21 ± 4 yr), sex (16 W), and baseline tibial trabecular bone volume fractions (Tb.BV/TV), these groups were distinguished by the most substantial (High) and minimal (Low) improvements in Tb.BV/TV. Differential protein expression was analyzed with mixed permutation ANOVA and false discovery proportion-based adjustment for multiple comparisons. RESULTS Tibial Tb.BV/TV increased from pre- to post-BCT in High (3.30 ± 1.64%, P < 0.0001) but not Low (-0.35 ± 1.25%, P = 0.4707). The optimized dMS technique identified 10,431 peptides from 1368 protein groups that represented 165 integrative biological processes. Seventy-four urinary proteins changed from pre- to post-BCT ( P = 0.0019), and neutrophil-mediated immunity was the most prominent ontology. Two proteins (immunoglobulin heavy constant gamma 4 and C-type lectin domain family 4 member G) differed from pre- to post-BCT in High and Low ( P = 0.0006). CONCLUSIONS The dMS technique can identify more than 1000 urinary proteins. At least 74 proteins are responsive to BCT, and other principally immune system-related proteins show differential expression patterns that coincide with adaptive bone formation.
Collapse
Affiliation(s)
| | | | - Xuemei Zeng
- Biomedical Mass Spectrometry Center, University of Pittsburgh, Pittsburgh, PA
| | - Pamela S Cantrell
- Biomedical Mass Spectrometry Center, University of Pittsburgh, Pittsburgh, PA
| | - Mai Sun
- Biomedical Mass Spectrometry Center, University of Pittsburgh, Pittsburgh, PA
| | | | - Maria C Canino
- Department of Sports Medicine and Nutrition, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA
| | - Julie M Hughes
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Stephen A Foulis
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Kathryn M Taylor
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Leila A Walker
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Katelyn I Guerriere
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Adam J Sterczala
- Department of Sports Medicine and Nutrition, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA
| | | | - Meaghan E Beckner
- Military Performance Division, United States Army Research Institute of Environmental Medicine, Natick, MA
| | - Ronald W Matheny
- Military Operational Medicine Research Program, Fort Detrick, MD
| | - Bradley C Nindl
- Department of Sports Medicine and Nutrition, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
3
|
Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun 2024; 15:3922. [PMID: 38724498 PMCID: PMC11082229 DOI: 10.1038/s41467-024-47899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Collapse
Affiliation(s)
- Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jinyan Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore.
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore.
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
4
|
Bilekova S, Garcia-Colomer B, Cebrian-Serrano A, Schirge S, Krey K, Sterr M, Kurth T, Hauck SM, Lickert H. Inceptor facilitates acrosomal vesicle formation in spermatids and is required for male fertility. Front Cell Dev Biol 2023; 11:1240039. [PMID: 37691832 PMCID: PMC10483240 DOI: 10.3389/fcell.2023.1240039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 08/07/2023] [Indexed: 09/12/2023] Open
Abstract
Spermatogenesis is a crucial biological process that enables the production of functional sperm, allowing for successful reproduction. Proper germ cell differentiation and maturation require tight regulation of hormonal signals, cellular signaling pathways, and cell biological processes. The acrosome is a lysosome-related organelle at the anterior of the sperm head that contains enzymes and receptors essential for egg-sperm recognition and fusion. Even though several factors crucial for acrosome biogenesis have been discovered, the precise molecular mechanism of pro-acrosomal vesicle formation and fusion is not yet known. In this study, we investigated the role of the insulin inhibitory receptor (inceptor) in acrosome formation. Inceptor is a single-pass transmembrane protein with similarities to mannose-6-phosphate receptors (M6PR). Inceptor knockout male mice are infertile due to malformations in the acrosome and defects in the nuclear shape of spermatozoa. We show that inceptor is expressed in early spermatids and mainly localizes to vesicles between the Golgi apparatus and acrosome. Here we show that inceptor is an essential factor in the intracellular transport of trans-Golgi network-derived vesicles which deliver acrosomal cargo in maturing spermatids. The absence of inceptor results in vesicle-fusion defects, acrosomal malformation, and male infertility. These findings support our hypothesis of inceptor as a universal lysosomal or lysosome-related organelle sorting receptor expressed in several secretory tissues.
Collapse
Affiliation(s)
- Sara Bilekova
- Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Institute of Diabetes and Regeneration Research, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
| | - Balma Garcia-Colomer
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- Helmholtz Center Munich, Institute for Diabetes and Obesity, Neuherberg, Germany
| | - Alberto Cebrian-Serrano
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- Helmholtz Center Munich, Institute for Diabetes and Obesity, Neuherberg, Germany
| | - Silvia Schirge
- Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Institute of Diabetes and Regeneration Research, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Karsten Krey
- School of Medicine, Technical University of Munich, Munich, Germany
- Institute of Virology, Technical University of Munich, Munich, Germany
| | - Michael Sterr
- Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Institute of Diabetes and Regeneration Research, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Thomas Kurth
- Center for Molecular and Cellular Bioengineering (CMCB), Technology Platform, Core Facility Electron Microscopy and Histology, Dresden University of Technology, Dresden, Germany
| | - Stefanie M. Hauck
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- Metabolomics and Proteomics Core, Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Munich, Germany
| | - Heiko Lickert
- Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Institute of Diabetes and Regeneration Research, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
| |
Collapse
|
5
|
Fan S, Wilson CM, Fridley BL, Li Q. Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis. Methods Mol Biol 2023; 2629:247-269. [PMID: 36929081 DOI: 10.1007/978-1-0716-2986-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.
Collapse
Affiliation(s)
- Sili Fan
- Graduate Group of Biostatistics, University of California, Davis, CA, USA
| | - Christopher M Wilson
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
6
|
Wang J, Gong X, Hu M, Zhao L. Improved GSimp: A Flexible Missing Value Imputation Method to Support Regulatory Bioequivalence Assessment. Ann Biomed Eng 2023; 51:163-173. [PMID: 36107365 DOI: 10.1007/s10439-022-03070-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 08/30/2022] [Indexed: 01/13/2023]
Abstract
Missing values are not uncommon in in vivo bioequivalence (BE) studies and pose non-trivial challenges for BE assessment. Missing values typically appear as a mixture of different types, such as Missing Not at Random (MNAR) and Missing Completely at Random (MCAR), however, current data imputation methods were usually developed for a certain type of missing values (e.g., MNAR). Among them, an iterative Gibbs sampler-based left-censored missing value imputation approach (GSimp) was recently developed and showed superior performance over other methods in handling MNAR data. In this study, we introduce an improved GSimp ("Improved GSimp" thereafter) that offers flexibility in handling mixed types of missing data and better imputation accuracy to support BE assessment for studies with missing values. Simulations mimicking different missing value scenarios (e.g., mixture of different missing types and proportion of missing values) were conducted to compare performance of the Improved GSimp with other methods (e.g., original GSimp and half of minimal value). Normalized root mean square error (NRMSE) was used to evaluate imputation accuracy. Our results showed that the Improved GSimp always had the best accuracy in all simulated scenarios compared to other methods.
Collapse
Affiliation(s)
- Jing Wang
- Division of Quantitative Methods and Modeling, Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Xiajing Gong
- Division of Quantitative Methods and Modeling, Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Meng Hu
- Division of Quantitative Methods and Modeling, Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Ave., Bldg 75, Room 4649, Silver Spring, MD, 20993-0002, USA.
| | - Liang Zhao
- Division of Quantitative Methods and Modeling, Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| |
Collapse
|
7
|
Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022; 22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]
Abstract
Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.
Collapse
Affiliation(s)
- Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Centre for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
8
|
Gardner ML, Freitas MA. Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics. Int J Mol Sci 2021; 22:ijms22179650. [PMID: 34502557 PMCID: PMC8431783 DOI: 10.3390/ijms22179650] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 08/28/2021] [Accepted: 08/31/2021] [Indexed: 01/15/2023] Open
Abstract
Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.
Collapse
Affiliation(s)
- Miranda L. Gardner
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | - Michael A. Freitas
- Ohio State Biochemistry Program, Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA;
- Cancer Biology and Genetics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Correspondence: or
| |
Collapse
|
9
|
Arioli A, Dagliati A, Geary B, Peek N, Kalra PA, Whetton AD, Geifman N. OptiMissP: A dashboard to assess missingness in proteomic data-independent acquisition mass spectrometry. PLoS One 2021; 16:e0249771. [PMID: 33857200 PMCID: PMC8049317 DOI: 10.1371/journal.pone.0249771] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 03/24/2021] [Indexed: 11/24/2022] Open
Abstract
Background Missing values are a key issue in the statistical analysis of proteomic data. Defining the strategy to address missing values is a complex task in each study, potentially affecting the quality of statistical analyses. Results We have developed OptiMissP, a dashboard to visually and qualitatively evaluate missingness and guide decision making in the handling of missing values in proteomics studies that use data-independent acquisition mass spectrometry. It provides a set of visual tools to retrieve information about missingness through protein densities and topology-based approaches, and facilitates exploration of different imputation methods and missingness thresholds. Conclusions OptiMissP provides support for researchers’ and clinicians’ qualitative assessment of missingness in proteomic datasets in order to define study-specific strategies for the handling of missing values. OptiMissP considers biases in protein distributions related to the choice of imputation method and helps analysts to balance the information loss caused by low missingness thresholds and the noise introduced by selecting high missingness thresholds. This is complemented by topological data analysis which provides additional insight to the structure of the data and their missingness. We use an example in Chronic Kidney Disease to illustrate the main functionalities of OptiMissP.
Collapse
Affiliation(s)
- Angelica Arioli
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
- Division of Informatics, Imaging, and Data Science, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
| | - Bethany Geary
- Division of Cancer Sciences, Stoller Biomarker Discovery Centre, Manchester, United Kingdom
| | - Niels Peek
- Division of Informatics, Imaging, and Data Science, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
- NIHR Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United Kingdom
| | | | - Anthony D. Whetton
- Division of Cancer Sciences, Stoller Biomarker Discovery Centre, Manchester, United Kingdom
- NIHR Manchester Biomedical Research Centre, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United Kingdom
- School of Medical Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United Kingdom
| | - Nophar Geifman
- Division of Informatics, Imaging, and Data Science, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Li Q, Liu X, Yang J, Erlund I, Lernmark Å, Hagopian W, Rewers M, She JX, Toppari J, Ziegler AG, Akolkar B, Krischer JP. Plasma Metabolome and Circulating Vitamins Stratified Onset Age of an Initial Islet Autoantibody and Progression to Type 1 Diabetes: The TEDDY Study. Diabetes 2021; 70:282-292. [PMID: 33106256 PMCID: PMC7876562 DOI: 10.2337/db20-0696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 10/20/2020] [Indexed: 12/11/2022]
Abstract
Children's plasma metabolome, especially lipidome, reflects gene regulation and dietary exposures, heralding the development of islet autoantibodies (IA) and type 1 diabetes (T1D). The Environmental Determinants of Diabetes in the Young (TEDDY) study enrolled 8,676 newborns by screening of HLA-DR-DQ genotypes at six clinical centers in four countries, profiled metabolome, and measured concentrations of ascorbic acid, 25-hydroxyvitamin D [25(OH)D], and erythrocyte membrane fatty acids following birth until IA seroconversion under a nested case-control design. We grouped children having an initial autoantibody only against insulin (IAA-first) or GAD (GADA-first) by unsupervised clustering of temporal lipidome, identifying a subgroup of children having early onset of each initial autoantibody, i.e., IAA-first by 12 months and GADA-first by 21 months, consistent with population-wide early seroconversion age. Differential analysis showed that infants having reduced plasma ascorbic acid and cholesterol experienced IAA-first earlier, while early onset of GADA-first was preceded by reduced sphingomyelins at infancy. Plasma 25(OH)D prior to either autoantibody was lower in T1D progressors compared with nonprogressors, with simultaneous lower diglycerides, lysophosphatidylcholines, triglycerides, and alanine before GADA-first. Plasma ascorbic acid and 25(OH)D at infancy were lower in HLA-DR3/DR4 children among IA case subjects but not in matched control subjects, implying gene expression dysregulation of circulating vitamins as latent signals for IA or T1D progression.
Collapse
Affiliation(s)
- Qian Li
- Health Informatics Institute, University of South Florida, Tampa, FL
| | - Xiang Liu
- Health Informatics Institute, University of South Florida, Tampa, FL
| | - Jimin Yang
- Health Informatics Institute, University of South Florida, Tampa, FL
| | - Iris Erlund
- Department of Government Services, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Åke Lernmark
- Department of Clinical Sciences, Clinical Research Centre, Skåne University Hospital, Lund University, Malmö, Sweden
| | | | - Marian Rewers
- Barbara Davis Center for Childhood Diabetes, University of Colorado Denver, Aurora, CO
| | - Jin-Xiong She
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA
| | - Jorma Toppari
- Department of Pediatrics, Turku University Hospital, Turku, Finland
- Department of Physiology, University of Turku, Turku, Finland
| | - Anette-G Ziegler
- Institute of Diabetes Research, Helmholtz Zentrum München, Munich, Germany
- Forschergruppe Diabetes, Technical University of Munich, Klinikum Rechts der Isar, Munich, Germany
- Forschergruppe Diabetes e.V. at Helmholtz Zentrum München, Munich, Germany
| | - Beena Akolkar
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD
| | | |
Collapse
|
11
|
Wang S, Li W, Hu L, Cheng J, Yang H, Liu Y. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 2020; 48:e83. [PMID: 32526036 PMCID: PMC7641313 DOI: 10.1093/nar/gkaa498] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/20/2020] [Accepted: 06/08/2020] [Indexed: 02/05/2023] Open
Abstract
Mass spectrometry (MS)-based quantitative proteomics experiments frequently generate data with missing values, which may profoundly affect downstream analyses. A wide variety of imputation methods have been established to deal with the missing-value issue. To date, however, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics community. Herein, we developed a user-friendly and powerful stand-alone software, NAguideR, to enable implementation and evaluation of different missing value methods offered by 23 widely used missing-value imputation algorithms. NAguideR further evaluates data imputation results through classic computational criteria and, unprecedentedly, proteomic empirical criteria, such as quantitative consistency between different charge-states of the same peptide, different peptides belonging to the same proteins, and individual proteins participating protein complexes and functional interactions. We applied NAguideR into three label-free proteomic datasets featuring peptide-level, protein-level, and phosphoproteomic variables respectively, all generated by data independent acquisition mass spectrometry (DIA-MS) with substantial biological replicates. The results indicate that NAguideR is able to discriminate the optimal imputation methods that are facilitating DIA-MS experiments over those sub-optimal and low-performance algorithms. NAguideR further provides downloadable tables and figures supporting flexible data analysis and interpretation. NAguideR is freely available at http://www.omicsolution.org/wukong/NAguideR/ and the source code: https://github.com/wangshisheng/NAguideR/.
Collapse
Affiliation(s)
- Shisheng Wang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Liqiang Hu
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jingqiu Cheng
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Hao Yang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA.,Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
12
|
Li Q, Parikh H, Butterworth MD, Lernmark Å, Hagopian W, Rewers M, She JX, Toppari J, Ziegler AG, Akolkar B, Fiehn O, Fan S, Krischer JP. Longitudinal Metabolome-Wide Signals Prior to the Appearance of a First Islet Autoantibody in Children Participating in the TEDDY Study. Diabetes 2020; 69:465-476. [PMID: 32029481 PMCID: PMC7034190 DOI: 10.2337/db19-0756] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 12/05/2019] [Indexed: 12/19/2022]
Abstract
Children at increased genetic risk for type 1 diabetes (T1D) after environmental exposures may develop pancreatic islet autoantibodies (IA) at a very young age. Metabolic profile changes over time may imply responses to exposures and signal development of the first IA. Our present research in The Environmental Determinants of Diabetes in the Young (TEDDY) study aimed to identify metabolome-wide signals preceding the first IA against GAD (GADA-first) or against insulin (IAA-first). We profiled metabolomes by mass spectrometry from children's plasma at 3-month intervals after birth until appearance of the first IA. A trajectory analysis discovered each first IA preceded by reduced amino acid proline and branched-chain amino acids (BCAAs), respectively. With independent time point analysis following birth, we discovered dehydroascorbic acid (DHAA) contributing to the risk of each first IA, and γ-aminobutyric acid (GABAs) associated with the first autoantibody against insulin (IAA-first). Methionine and alanine, compounds produced in BCAA metabolism and fatty acids, also preceded IA at different time points. Unsaturated triglycerides and phosphatidylethanolamines decreased in abundance before appearance of either autoantibody. Our findings suggest that IAA-first and GADA-first are heralded by different patterns of DHAA, GABA, multiple amino acids, and fatty acids, which may be important to primary prevention of T1D.
Collapse
Affiliation(s)
- Qian Li
- Health Informatics Institute, Morsani College of Medicine, University of South Florida, Tampa, FL
| | - Hemang Parikh
- Health Informatics Institute, Morsani College of Medicine, University of South Florida, Tampa, FL
| | - Martha D Butterworth
- Health Informatics Institute, Morsani College of Medicine, University of South Florida, Tampa, FL
| | - Åke Lernmark
- Department of Clinical Sciences, Lund University/CRC, Skåne University Hospital SUS, Malmo, Sweden
| | | | - Marian Rewers
- Barbara Davis Center for Childhood Diabetes, University of Colorado Denver, Aurora, CO
| | - Jin-Xiong She
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA
| | - Jorma Toppari
- Department of Pediatrics, Turku University Hospital, Turku, Finland
- Research Centre for Integrative Physiology and Pharmacology, Institute of Biomedicine, University of Turku, Turku, Finland
| | - Anette-G Ziegler
- Institute of Diabetes Research, Helmholtz Zentrum München, Munich, Germany
- Forschergruppe Diabetes, Technical University of Munich, Klinikum Rechts der Isar, Munich, Germany
- Forschergruppe Diabetes e.V. at Helmholtz Zentrum München, Munich, Germany
| | - Beena Akolkar
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD
| | - Oliver Fiehn
- Genome Center, University of California, Davis, Davis, CA
| | - Sili Fan
- Genome Center, University of California, Davis, Davis, CA
| | - Jeffrey P Krischer
- Health Informatics Institute, Morsani College of Medicine, University of South Florida, Tampa, FL
| |
Collapse
|