1
|
Matabuena M, Sartini J. Multilevel functional data analysis modeling of human glucose response to meal intake. ARXIV 2024:arXiv:2405.14690v1. [PMID: 38827463 PMCID: PMC11142320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Glucose meal response information collected via Continuous Glucose Monitoring (CGM) is relevant to the assessment of individual metabolic status and the support of personalized diet prescriptions. However, the complexity of the data produced by CGM monitors pushes the limits of existing analytic methods. CGM data often exhibits substantial within-person variability and has a natural multilevel structure. This research is motivated by the analysis of CGM data from individuals without diabetes in the AEGIS study. The dataset includes detailed information on meal timing and nutrition for each individual over different days. The primary focus of this study is to examine CGM glucose responses following patients' meals and explore the time-dependent associations with dietary and patient characteristics. Motivated by this problem, we propose a new analytical framework based on multilevel functional models, including a new functional mixed R-square coefficient. The use of these models illustrates 3 key points: (i) The importance of analyzing glucose responses across the entire functional domain when making diet recommendations; (ii) The differential metabolic responses between normoglycemic and prediabetic patients, particularly with regards to lipid intake; (iii) The importance of including random, person-level effects when modelling this scientific problem.
Collapse
Affiliation(s)
- Marcos Matabuena
- Universidad de Santiago de Compostela and Department of Biostatistics, Harvard University, Boston, MA 02115, USA
| | - Joe Sartini
- Department of Biostatistics, Johns Hopkins University, Francisco Gude, Universidad de Santiago de Compostela
| |
Collapse
|
2
|
Gomez-Peralta F, Chico Ballesteros A, Marco Martínez A, Pérez Corral B, Conget Donlo I, Fuentealba Melo P, Zaragozá Arnáez F, Matabuena Rodríguez M. Insulin glargine 300 U/ml versus insulin degludec 100 U/ml improves nocturnal glycaemic control and variability in type 1 diabetes under routine clinical practice: A glucodensities-based post hoc analysis of the OneCare study. Diabetes Obes Metab 2024; 26:1993-1997. [PMID: 38379106 DOI: 10.1111/dom.15496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/19/2024] [Accepted: 01/28/2024] [Indexed: 02/22/2024]
Affiliation(s)
| | - Ana Chico Ballesteros
- Department of Endocrinology and Nutrition, Hospital Santa Creu i Sant Pau, Barcelona, Spain. CIBER-BBN, Instituto de Salud Carlos III, Madrid, Spain. Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | | | - Ignacio Conget Donlo
- Diabetes Unit, Department of Endocrinology and Nutrition, IDF Centre of Education and Excellence in Diabetes Care, ICMDM, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | | | | | - Marcos Matabuena Rodríguez
- CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
3
|
Olsen MT, Klarskov CK, Dungu AM, Hansen KB, Pedersen-Bjergaard U, Kristensen PL. Statistical Packages and Algorithms for the Analysis of Continuous Glucose Monitoring Data: A Systematic Review. J Diabetes Sci Technol 2024:19322968231221803. [PMID: 38179940 DOI: 10.1177/19322968231221803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Abstract
BACKGROUND Continuous glucose monitoring (CGM) measures glucose levels every 1 to 15 minutes and is widely used in clinical and research contexts. Statistical packages and algorithms reduce the time-consuming and error-prone process of manually calculating CGM metrics and contribute to standardizing CGM metrics defined by international consensus. The aim of this systematic review is to summarize existing data on (1) statistical packages for retrospective CGM data analysis and (2) statistical algorithms for retrospective CGM analysis not available in these statistical packages. METHODS A systematic literature search in PubMed and EMBASE was conducted on September 19, 2023. We also searched Google Scholar and Google Search until October 12, 2023 as sources of gray literature and performed reference checks of the included literature. Articles in English and Danish were included. This systematic review is registered with PROSPERO (CRD42022378163). RESULTS A total of 8731 references were screened and 46 references were included. We identified 23 statistical packages for the analysis of CGM data. The statistical packages could calculate many metrics of the 2022 CGM consensus and non-consensus CGM metrics, and 22/23 (96%) statistical packages were freely available. Also, 23 statistical algorithms were identified. The statistical algorithms could be divided into three groups based on content: (1) CGM data reduction (eg, clustering of CGM data), (2) composite CGM outcomes, and (3) other CGM metrics. CONCLUSION This systematic review provides detailed tabular and textual up-to-date descriptions of the contents of statistical packages and statistical algorithms for retrospective analysis of CGM data.
Collapse
Affiliation(s)
- Mikkel Thor Olsen
- Department of Endocrinology and Nephrology, Copenhagen University Hospital-North Zealand, Hilleroed, Denmark
| | - Carina Kirstine Klarskov
- Department of Endocrinology and Nephrology, Copenhagen University Hospital-North Zealand, Hilleroed, Denmark
| | - Arnold Matovu Dungu
- Department of Pulmonary and Infectious Diseases, Copenhagen University Hospital-North Zealand, Hilleroed, Denmark
| | - Katrine Bagge Hansen
- Steno Diabetes Center Copenhagen, Copenhagen University Hospital-Herlev-Gentofte, Herlev, Denmark
| | - Ulrik Pedersen-Bjergaard
- Department of Endocrinology and Nephrology, Copenhagen University Hospital-North Zealand, Hilleroed, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Peter Lommer Kristensen
- Department of Endocrinology and Nephrology, Copenhagen University Hospital-North Zealand, Hilleroed, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
Ghosal R, Matabuena M, Zhang J. Functional proportional hazards mixture cure model with applications in cancer mortality in NHANES and post ICU recovery. Stat Methods Med Res 2023; 32:2254-2269. [PMID: 37855203 DOI: 10.1177/09622802231206472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
We develop a functional proportional hazards mixture cure model with scalar and functional covariates measured at the baseline. The mixture cure model, useful in studying populations with a cure fraction of a particular event of interest is extended to functional data. We employ the expectation-maximization algorithm and develop a semiparametric penalized spline-based approach to estimate the dynamic functional coefficients of the incidence and the latency part. The proposed method is computationally efficient and simultaneously incorporates smoothness in the estimated functional coefficients via roughness penalty. Simulation studies illustrate a satisfactory performance of the proposed method in accurately estimating the model parameters and the baseline survival function. Finally, the clinical potential of the model is demonstrated in two real data examples that incorporate rich high-dimensional biomedical signals as functional covariates measured at the baseline and constitute novel domains to apply cure survival models in contemporary medical situations. In particular, we analyze (i) minute-by-minute physical activity data from the National Health And Nutrition Examination Survey 2003-2006 to study the association between diurnal patterns of physical activity at baseline and all cancer mortality through 2019 while adjusting for other biological factors; (ii) the impact of daily functional measures of disease severity collected in the intensive care unit on post intensive care unit recovery and mortality event. Our findings provide novel epidemiological insights into the association between daily patterns of physical activity and cancer mortality. Software implementation and illustration of the proposed estimation method are provided in R.
Collapse
Affiliation(s)
- Rahul Ghosal
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - Marcos Matabuena
- Department of Biostatistics, Harvard University T. H. Chan School of Public Health, Boston, MA, USA
| | - Jiajia Zhang
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
5
|
Cui EH, Goldfine AB, Quinlan M, James DA, Sverdlov O. Investigating the value of glucodensity analysis of continuous glucose monitoring data in type 1 diabetes: an exploratory analysis. FRONTIERS IN CLINICAL DIABETES AND HEALTHCARE 2023; 4:1244613. [PMID: 37753312 PMCID: PMC10518413 DOI: 10.3389/fcdhc.2023.1244613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 08/14/2023] [Indexed: 09/28/2023]
Abstract
Introduction Continuous glucose monitoring (CGM) devices capture longitudinal data on interstitial glucose levels and are increasingly used to show the dynamics of diabetes metabolism. Given the complexity of CGM data, it is crucial to extract important patterns hidden in these data through efficient visualization and statistical analysis techniques. Methods In this paper, we adopted the concept of glucodensity, and using a subset of data from an ongoing clinical trial in pediatric individuals and young adults with new-onset type 1 diabetes, we performed a cluster analysis of glucodensities. We assessed the differences among the identified clusters using analysis of variance (ANOVA) with respect to residual pancreatic beta-cell function and some standard CGM-derived parameters such as time in range, time above range, and time below range. Results Distinct CGM data patterns were identified using cluster analysis based on glucodensities. Statistically significant differences were shown among the clusters with respect to baseline levels of pancreatic beta-cell function surrogate (C-peptide) and with respect to time in range and time above range. Discussion Our findings provide supportive evidence for the value of glucodensity in the analysis of CGM data. Some challenges in the modeling of CGM data include unbalanced data structure, missing observations, and many known and unknown confounders, which speaks to the importance of--and provides opportunities for--taking an approach integrating clinical, statistical, and data science expertise in the analysis of these data.
Collapse
Affiliation(s)
- Elvis Han Cui
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Allison B. Goldfine
- Division of Translational Medicine, Cardiometabolic Disease, Novartis Institutes for Biomedical Research, Cambridge, MA, United States
| | - Michelle Quinlan
- Early Development Analytics, Novartis Pharmaceuticals Corporation, East Hanover, NJ, United States
| | - David A. James
- Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, NJ, United States
| | - Oleksandr Sverdlov
- Early Development Analytics, Novartis Pharmaceuticals Corporation, East Hanover, NJ, United States
| |
Collapse
|
6
|
Matabuena M, Pazos-Couselo M, Alonso-Sampedro M, Fernández-Merino C, González-Quintela A, Gude F. Reproducibility of continuous glucose monitoring results under real-life conditions in an adult population: a functional data analysis. Sci Rep 2023; 13:13987. [PMID: 37634017 PMCID: PMC10460390 DOI: 10.1038/s41598-023-40949-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/18/2023] [Indexed: 08/28/2023] Open
Abstract
Continuous glucose monitoring systems (CGM) are a very useful tool to understand the behaviour of glucose in different situations and populations. Despite the widespread use of CGM systems in both clinical practice and research, our understanding of the reproducibility of CGM data remains limited. The present work examines the reproducibility of the results provided by a CGM system in a random sample of a free-living adult population, from a functional data analysis approach. Functional intraclass correlation coefficients (ICCs) and their 95% confidence intervals (CI) were calculated to assess the reproducibility of CGM results in 581 individuals. 62% were females 581 participants (62% women) mean age 48 years (range 18-87) were included, 12% had previously been diagnosed with diabetes. The inter-day reproducibility of the CGM results was greater for subjects with diabetes (ICC 0.46 [CI 0.39-0.55]) than for normoglycaemic subjects (ICC 0.30 [CI 0.27-0.33]); the value for prediabetic subjects was intermediate (ICC 0.37 [CI 0.31-0.42]). For normoglycaemic subjects, inter-day reproducibility was poorer among the younger (ICC 0.26 [CI 0.21-0.30]) than the older subjects (ICC 0.39 [CI 0.32-0.45]). Inter-day reproducibility was poorest among normoglycaemic subjects, especially younger normoglycaemic subjects, suggesting the need to monitor some patient groups more often than others.
Collapse
Affiliation(s)
- Marcos Matabuena
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
| | - Marcos Pazos-Couselo
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain.
- Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain.
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS-ISCIII), Santiago de Compostela, Spain.
| | - Manuela Alonso-Sampedro
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS-ISCIII), Santiago de Compostela, Spain
| | - Carmen Fernández-Merino
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
- Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS-ISCIII), Santiago de Compostela, Spain
- A Estrada Primary Care Center, A Estrada, Spain
| | - Arturo González-Quintela
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
- Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS-ISCIII), Santiago de Compostela, Spain
- Internal Medicine Department, University Hospital of Santiago de Compostela, Santiago de Compostela, Spain
| | - Francisco Gude
- Research Methods Group (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
- Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS-ISCIII), Santiago de Compostela, Spain
- Concepción Arenal Primary Care Center, Santiago de Compostela, Spain
| |
Collapse
|
7
|
Ghosal R, Varma VR, Volfson D, Hillel I, Urbanek J, Hausdorff JM, Watts A, Zipunnikov V. Distributional data analysis via quantile functions and its application to modeling digital biomarkers of gait in Alzheimer's Disease. Biostatistics 2023; 24:539-561. [PMID: 36519565 PMCID: PMC10544806 DOI: 10.1093/biostatistics/kxab041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 09/10/2021] [Accepted: 10/19/2021] [Indexed: 07/20/2023] Open
Abstract
With the advent of continuous health monitoring with wearable devices, users now generate their unique streams of continuous data such as minute-level step counts or heartbeats. Summarizing these streams via scalar summaries often ignores the distributional nature of wearable data and almost unavoidably leads to the loss of critical information. We propose to capture the distributional nature of wearable data via user-specific quantile functions (QF) and use these QFs as predictors in scalar-on-quantile-function-regression (SOQFR). As an alternative approach, we also propose to represent QFs via user-specific L-moments, robust rank-based analogs of traditional moments, and use L-moments as predictors in SOQFR (SOQFR-L). These two approaches provide two mutually consistent interpretations: in terms of quantile levels by SOQFR and in terms of L-moments by SOQFR-L. We also demonstrate how to deal with multi-modal distributional data via Joint and Individual Variation Explained using L-moments. The proposed methods are illustrated in a study of association of digital gait biomarkers with cognitive function in Alzheimers disease. Our analysis shows that the proposed methods demonstrate higher predictive performance and attain much stronger associations with clinical cognitive scales compared to simple distributional summaries.
Collapse
Affiliation(s)
- Rahul Ghosal
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Vijay R Varma
- National Institute on Aging (NIA), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Dmitri Volfson
- Neuroscience Analytics, Computational Biology, Takeda, Cambridge, MA, USA
| | - Inbar Hillel
- Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Jacek Urbanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jeffrey M Hausdorff
- Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel, Department of Physical Therapy, Sackler Faculty of Medicine, and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel, and Rush Alzheimer’s Disease Center and Department of Orthopedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Amber Watts
- Department of Psychology, University of Kansas, Lawrence, KS, USA
| | - Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
8
|
Sousa PHTO, de Souza CPE, Dias R. Bayesian adaptive selection of basis functions for functional data representation. J Appl Stat 2023; 51:958-992. [PMID: 38524799 PMCID: PMC10956930 DOI: 10.1080/02664763.2023.2172143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023]
Abstract
Considering the context of functional data analysis, we developed and applied a new Bayesian approach via the Gibbs sampler to select basis functions for a finite representation of functional data. The proposed methodology uses Bernoulli latent variables to assign zero to some of the basis function coefficients with a positive probability. This procedure allows for an adaptive basis selection since it can determine the number of bases and which ones should be selected to represent functional data. Moreover, the proposed procedure measures the uncertainty of the selection process and can be applied to multiple curves simultaneously. The methodology developed can deal with observed curves that may differ due to experimental error and random individual differences between subjects, which one can observe in a real dataset application involving daily numbers of COVID-19 cases in Brazil. Simulation studies show the main properties of the proposed method, such as its accuracy in estimating the coefficients and the strength of the procedure to find the true set of basis functions. Despite having been developed in the context of functional data analysis, we also compared the proposed model via simulation with the well-established LASSO and Bayesian LASSO, which are methods developed for non-functional data.
Collapse
Affiliation(s)
| | - Camila P. E. de Souza
- Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada
| | - Ronaldo Dias
- Department of Statistics, University of Campinas, Campinas, SP, Brazil
| |
Collapse
|
9
|
Marco A, Pazos-Couselo M, Moreno-Fernandez J, Díez-Fernández A, Alonso-Sampedro M, Fernández-Merino C, Gonzalez-Quintela A, Gude F. Time above range for predicting the development of type 2 diabetes. Front Public Health 2022; 10:1005513. [PMID: 36568777 PMCID: PMC9772988 DOI: 10.3389/fpubh.2022.1005513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
Aim To investigate the prognostic value of time range metrics, as measured by continuous glucose monitoring, with respect to the development of type 2 diabetes (T2D). Research design and methods A total of 499 persons without diabetes from the general population were followed-up for 5 years. Time range metrics were measured at the start and medical records were checked over the period study. Results Twenty-two subjects (8.3 per 1,000 person-years) developed T2D. After adjusting for age, gender, family history of diabetes, body mass index and glycated hemoglobin concentration, multivariate analysis revealed 'time above range' (TAR, i.e., with a plasma glucose concentration of >140 mg/dL) to be significantly associated with a greater risk (OR = 1.06, CI 1.01-1.11) of developing diabetes (AUC = 0.94, Brier = 0.035). Conclusions Time above range provides additional information to that offered by glycated hemoglobin to identify patients at a higher risk of developing type 2 diabetes in a population-based study.
Collapse
Affiliation(s)
- Alejandra Marco
- Primary Care Center, Santiago de Compostela, Spain,Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
| | - Marcos Pazos-Couselo
- Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain,Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain,*Correspondence: Marcos Pazos-Couselo
| | - Jesús Moreno-Fernandez
- Endocrinology and Nutrition Service, Ciudad Real General University Hospital, Ciudad Real, Spain
| | - Ana Díez-Fernández
- Facultad de Enfermería de Cuenca, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - Manuela Alonso-Sampedro
- Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain,Department of Clinical Epidemiology, Hospital Clínico Universitario de Santiago de Compostela, Santiago de Compostela, Spain
| | - Carmen Fernández-Merino
- Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain,Primary Care Center, A Estrada, Spain
| | - Arturo Gonzalez-Quintela
- Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain,Department of Internal Medicine, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain
| | - Francisco Gude
- Research Methods (RESMET), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain,Department of Psychiatry, Radiology, Public Health, Nursing and Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain,Department of Clinical Epidemiology, Hospital Clínico Universitario de Santiago de Compostela, Santiago de Compostela, Spain
| |
Collapse
|
10
|
Ghosal R, Varma VR, Volfson D, Urbanek J, Hausdorff JM, Watts A, Zipunnikov V. Scalar on time-by-distribution regression and its application for modelling associations between daily-living physical activity and cognitive functions in Alzheimer's Disease. Sci Rep 2022; 12:11558. [PMID: 35798763 PMCID: PMC9263176 DOI: 10.1038/s41598-022-15528-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 06/24/2022] [Indexed: 11/26/2022] Open
Abstract
Wearable data is a rich source of information that can provide a deeper understanding of links between human behaviors and human health. Existing modelling approaches use wearable data summarized at subject level via scalar summaries in regression, temporal (time-of-day) curves in functional data analysis (FDA), and distributions in distributional data analysis (DDA). We propose to capture temporally local distributional information in wearable data using subject-specific time-by-distribution (TD) data objects. Specifically, we develop scalar on time-by-distribution regression (SOTDR) to model associations between scalar response of interest such as health outcomes or disease status and TD predictors. Additionally, we show that TD data objects can be parsimoniously represented via a collection of time-varying L-moments that capture distributional changes over the time-of-day. The proposed method is applied to the accelerometry study of mild Alzheimer’s disease (AD). We found that mild AD is significantly associated with reduced upper quantile levels of physical activity, particularly during morning hours. In-sample cross validation demonstrated that TD predictors attain much stronger associations with clinical cognitive scales of attention, verbal memory, and executive function when compared to predictors summarized via scalar total activity counts, temporal functional curves, and quantile functions. Taken together, the present results suggest that SOTDR analysis provides novel insights into cognitive function and AD.
Collapse
Affiliation(s)
- Rahul Ghosal
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | - Vijay R Varma
- National Institute on Aging (NIA), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Dmitri Volfson
- Neuroscience Analytics, Computational Biology, Takeda, Cambridge, MA, USA
| | - Jacek Urbanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jeffrey M Hausdorff
- Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel.,Department of Physical Therapy, Sackler Faculty of Medicine, and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.,Rush Alzheimer's Disease Center and Department of Orthopedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Amber Watts
- Department of Psychology, University of Kansas, Lawrence, KS, USA
| | - Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
11
|
Matabuena M, Félix P, García-Meixide C, Gude F. Kernel machine learning methods to handle missing responses with complex predictors. Application in modelling five-year glucose changes using distributional representations. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106905. [PMID: 35649295 DOI: 10.1016/j.cmpb.2022.106905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 05/11/2022] [Accepted: 05/22/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVES Missing data is a ubiquitous problem in longitudinal studies due to the number of patients lost to follow-up. Kernel methods have enriched the machine learning field by successfully managing non-vectorial predictors, such as graphs, strings, and probability distributions, and have emerged as a promising tool for the analysis of complex data stemming from modern healthcare. This paper proposes a new set of kernel methods to handle missing data in the response variables. These methods will be applied to predict long-term changes in glycated haemoglobin (A1c), the primary biomarker used to diagnose and monitor the progression of diabetes mellitus, making emphasis on exploring the predictive potential of continuous glucose monitoring (CGM). METHODS We propose a new framework of non-linear kernel methods for testing statistical independence, selecting relevant predictors, and quantifying the uncertainty of the resultant predictive models. As a novelty in the clinical analysis, we used a distributional representation of CGM as a predictor and compared its performance with that of traditional diabetes biomarkers. RESULTS The results show that, after the incorporation of CGM information, predictive ability increases from R2=0.61 to R2=0.71. In addition, uncertainty analysis is useful for characterising some subpopulations where predictivity is worsened, and a more personalised clinical follow-up is advisable according to expected patient uncertainty in glucose values. CONCLUSIONS The proposed methods have proven to deal effectively with missing data. They also have the potential to improve the results of predictive tasks by including new complex objects as explanatory variables and modelling arbitrary dependence relations. The application of these methods to a longitudinal study of diabetes showed that the inclusion of a distributional representation of CGM data provides greater sensitivity in predicting five-year A1c changes than classical diabetes biomarkers and traditional CGM metrics.
Collapse
Affiliation(s)
- Marcos Matabuena
- CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain.
| | - Paulo Félix
- CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain
| | | | - Francisco Gude
- Unidade de Epidemioloxía Clínica, Complexo Hospitalario Universidade de Santiago (CHUS), Travesía da Choupana, Santiago de Compostela 15706, Spain
| |
Collapse
|