1
|
Gomon D, Putter H, Fiocco M, Signorelli M. Dynamic prediction of survival using multivariate functional principal component analysis: A strict landmarking approach. Stat Methods Med Res 2024; 33:256-272. [PMID: 38196243 DOI: 10.1177/09622802231224631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Dynamically predicting patient survival probabilities using longitudinal measurements has become of great importance with routine data collection becoming more common. Many existing models utilize a multi-step landmarking approach for this problem, mostly due to its ease of use and versatility but unfortunately most fail to do so appropriately. In this article we make use of multivariate functional principal component analysis to summarize the available longitudinal information, and employ a Cox proportional hazards model for prediction. Additionally, we consider a centred functional principal component analysis procedure in an attempt to remove the natural variation incurred by the difference in age of the considered subjects. We formalize the difference between a 'relaxed' landmarking approach where only validation data is landmarked and a 'strict' landmarking approach where both the training and validation data are landmarked. We show that a relaxed landmarking approach fails to effectively use the information contained in the longitudinal outcomes, thereby producing substantially worse prediction accuracy than a strict landmarking approach.
Collapse
Affiliation(s)
- Daniel Gomon
- Mathematical Institute, Leiden University, Leiden, the Netherlands
| | - Hein Putter
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Marta Fiocco
- Mathematical Institute, Leiden University, Leiden, the Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Mirko Signorelli
- Mathematical Institute, Leiden University, Leiden, the Netherlands
| |
Collapse
|
2
|
Devaux A, Helmer C, Genuer R, Proust-Lima C. Random survival forests with multivariate longitudinal endogenous covariates. Stat Methods Med Res 2023; 32:2331-2346. [PMID: 37886845 DOI: 10.1177/09622802231206477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Predicting the individual risk of clinical events using the complete patient history is a major challenge in personalized medicine. Analytical methods have to account for a possibly large number of time-dependent predictors, which are often characterized by irregular and error-prone measurements, and are truncated early by the event. In this work, we extended the competing-risk random survival forests to handle such endogenous longitudinal predictors when predicting event probabilities. The method, implemented in the R package DynForest, internally transforms the time-dependent predictors at each node of each tree into time-fixed features (using mixed models) that can then be used as splitting candidates. The final individual event probability is computed as the average of leaf-specific Aalen-Johansen estimators over the trees. Using simulations, we compared the performances of DynForest to accurately predict an event with (i) a joint modeling alternative when considering two longitudinal predictors only, and with (ii) a regression calibration method that ignores the informative truncation by the event when dealing with a large number of longitudinal predictors. Through an application in dementia research, we also illustrated how DynForest can be used to develop a dynamic prediction tool for dementia from multimodal repeated markers, and quantify the importance of each marker.
Collapse
Affiliation(s)
- Anthony Devaux
- Univ. Bordeaux, INSERM, BPH, U1219, Bordeaux, France
- The George Institute for Global Health, UNSW Sydney, Australia
- School of Population Health, UNSW Sydney, Australia
| | | | - Robin Genuer
- Univ. Bordeaux, INSERM, INRIA, BPH, U1219, Bordeaux, France
| | | |
Collapse
|
3
|
Zou Y, Yue M, Jia L, Wang Y, Chen H, Zhang A, Xia X, Liu W, Yu R, Yang S, Huang P. Accurate prediction of HCC risk after SVR in patients with hepatitis C cirrhosis based on longitudinal data. BMC Cancer 2023; 23:1147. [PMID: 38007418 PMCID: PMC10676612 DOI: 10.1186/s12885-023-11628-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/09/2023] [Indexed: 11/27/2023] Open
Abstract
BACKGROUND Most existing predictive models of hepatocellular carcinoma (HCC) risk after sustained virologic response (SVR) are built on data collected at baseline and therefore have limited accuracy. The current study aimed to construct an accurate predictive model incorporating longitudinal data using a novel modeling strategy. The predictive performance of the longitudinal model was also compared with a baseline model. METHODS A total of 400 patients with HCV-related cirrhosis who achieved SVR with direct-acting antivirals (DAA) were enrolled in the study. Patients were randomly divided into a training set (70%) and a validation set (30%). Informative features were extracted from the longitudinal variables and then put into the random survival forest (RSF) to develop the longitudinal model. A baseline model including the same variables was built for comparison. RESULTS During a median follow-up time of approximately 5 years, 25 patients (8.9%) in the training set and 11 patients (9.2%) in the validation set developed HCC. The areas under the receiver-operating characteristics curves (AUROC) for the longitudinal model were 0.9507 (0.8838-0.9997), 0.8767 (0.6972,0.9918), and 0.8307 (0.6941,0.9993) for 1-, 2- and 3-year risk prediction, respectively. The brier scores of the longitudinal model were also relatively low for the 1-, 2- and 3-year risk prediction (0.0283, 0.0561, and 0.0501, respectively). In contrast, the baseline model only achieved mediocre AUROCs of around 0.6 (0.6113, 0.6213, and 0.6480, respectively). CONCLUSIONS Our longitudinal model yielded accurate predictions of HCC risk in patients with HCV-relate cirrhosis, outperforming the baseline model. Our model can provide patients with valuable prognosis information and guide the intensity of surveillance in clinical practice.
Collapse
Affiliation(s)
- Yanzheng Zou
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China
| | - Ming Yue
- Department of Infectious Diseases, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Linna Jia
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China
| | - Yifan Wang
- Department of Infectious Disease, Jurong Hospital Affiliated to Jiangsu University, Jurong, China
| | - Hongbo Chen
- Department of Infectious Disease, Jurong Hospital Affiliated to Jiangsu University, Jurong, China
| | - Amei Zhang
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Yunnan, China
| | - Xueshan Xia
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Yunnan, China
- Kunming Medical University, Kunming, China
| | - Wei Liu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China
- Beijing Institute of Microbiology and Epidemiology, State Key Laboratory of Pathogen and Biosecurity, Beijing, China
| | - Rongbin Yu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China.
| | - Sheng Yang
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China.
| | - Peng Huang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, 211166, China.
| |
Collapse
|
4
|
Cao T, Reeder HT, Foulkes AS. Functional principal component analysis and sparse-group LASSO to identify associations between biomarker trajectories and mortality among hospitalized SARS-CoV-2 infected individuals. BMC Med Res Methodol 2023; 23:254. [PMID: 37898791 PMCID: PMC10613396 DOI: 10.1186/s12874-023-02076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/18/2023] [Indexed: 10/30/2023] Open
Abstract
BACKGROUND A substantial body of clinical research involving individuals infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evaluated the association between in-hospital biomarkers and severe SARS-CoV-2 outcomes, including intubation and death. However, most existing studies considered each of multiple biomarkers independently and focused analysis on baseline or peak values. METHODS We propose a two-stage analytic strategy combining functional principal component analysis (FPCA) and sparse-group LASSO (SGL) to characterize associations between biomarkers and 30-day mortality rates. Unlike prior reports, our proposed approach leverages: 1) time-varying biomarker trajectories, 2) multiple biomarkers simultaneously, and 3) the pathophysiological grouping of these biomarkers. We apply this method to a retrospective cohort of 12, 941 patients hospitalized at Massachusetts General Hospital or Brigham and Women's Hospital and conduct simulation studies to assess performance. RESULTS Renal, inflammatory, and cardio-thrombotic biomarkers were associated with 30-day mortality rates among hospitalized SARS-CoV-2 patients. Sex-stratified analysis revealed that hematogolical biomarkers were associated with higher mortality in men while this association was not identified in women. In simulation studies, our proposed method maintained high true positive rates and outperformed alternative approaches using baseline or peak values only with respect to false positive rates. CONCLUSIONS The proposed two-stage approach is a robust strategy for identifying biomarkers that associate with disease severity among SARS-CoV-2-infected individuals. By leveraging information on multiple, grouped biomarkers' longitudinal trajectories, our method offers an important first step in unraveling disease etiology and defining meaningful risk strata.
Collapse
Affiliation(s)
- Tingyi Cao
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Harrison T Reeder
- Biostatistics, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Andrea S Foulkes
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Biostatistics, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Ge X, Cui K, Qin Y, Chen D, Han H, Yu H. Screening strategies and dynamic risk prediction models for Alzheimer's disease. J Psychiatr Res 2023; 166:92-99. [PMID: 37757706 DOI: 10.1016/j.jpsychires.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 07/16/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023]
Abstract
BACKGROUND Characterizing the progression from Mild cognitive impairment (MCI) to Alzheimer's disease (AD) is essential for early AD prevention and targeted intervention. Our goal was to construct precise screening schemes for individuals with different risk of AD and to establish prognosis models for them. METHODS We constructed a retrospective cohort by reviewing individuals with baseline diagnosis of MCI and at least one follow-up visits between November 2005 and May 2021. They were stratified into high-risk and low-risk groups with longitudinal cognitive trajectory. Then, we established a screening framework and obtained optimal screening strategies for two risk groups. Cox and random survival forest (RSF) models were developed for dynamic prognosis prediction. RESULTS In terms of screening strategies, the combination of Clinical Dementia Rating Sum of Boxes (CDRSB) and hippocampus volume was recommended for the high-risk MCI group, while the combination of Alzheimer's Disease Assessment Scale Cognitive 13 items (ADAS13) and FAQ was recommended for low-risk MCI group. The concordance index (C-index) of the Cox model for the high-risk group was 0.844 (95% CI: 0.815-0.873) and adjustments for demographic information and APOE ε4. The RSF model incorporating longitudinal ADAS13, FAQ, and demographic information and APOE ε4 performed for the low-risk group. CONCLUSION This precise screening scheme will optimize allocation of medical resources and reduce the economic burden on individuals with low risk of MCI. Moreover, dynamic prognosis models may be helpful for early identification of individuals at risk and clinical decisions, which will promote the secondary prevention of AD.
Collapse
Affiliation(s)
- Xiaoyan Ge
- Department of Health Statistics, School of Public Health, Jinzhou Medical University, 40 SongPo Road, Jinzhou, China.
| | - Kai Cui
- Department of Health Statistics, School of Public Health, Jinzhou Medical University, 40 SongPo Road, Jinzhou, China.
| | - Yao Qin
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 XinJian South Road, Taiyuan, China.
| | - Durong Chen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 XinJian South Road, Taiyuan, China.
| | - Hongjuan Han
- Department of Mathematics, School of Basic Medical Sciences, Shanxi Medical University, Taiyuan, China.
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 XinJian South Road, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, 56 XinJian South Road, Taiyuan, China.
| |
Collapse
|
6
|
Sun T, Ding Y. Neural network on interval-censored data with application to the prediction of Alzheimer's disease. Biometrics 2023; 79:2677-2690. [PMID: 35960189 PMCID: PMC10177011 DOI: 10.1111/biom.13734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 08/01/2022] [Indexed: 11/28/2022]
Abstract
Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles. Data used in the preparation of this article were obtained from the ADNI database.
Collapse
Affiliation(s)
- Tao Sun
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
7
|
Li C, Zhao K, Zhang D, Pang X, Pu H, Lei M, Fan B, Lv J, You D, Li Z, Zhang T. Prediction models of colorectal cancer prognosis incorporating perioperative longitudinal serum tumor markers: a retrospective longitudinal cohort study. BMC Med 2023; 21:63. [PMID: 36803500 PMCID: PMC9942392 DOI: 10.1186/s12916-023-02773-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 02/08/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND Current prognostic prediction models of colorectal cancer (CRC) include only the preoperative measurement of tumor markers, with their available repeated postoperative measurements underutilized. CRC prognostic prediction models were constructed in this study to clarify whether and to what extent the inclusion of perioperative longitudinal measurements of CEA, CA19-9, and CA125 can improve the model performance, and perform a dynamic prediction. METHODS The training and validating cohort included 1453 and 444 CRC patients who underwent curative resection, with preoperative measurement and two or more measurements within 12 months after surgery, respectively. Prediction models to predict CRC overall survival were constructed with demographic and clinicopathological variables, by incorporating preoperative CEA, CA19-9, and CA125, as well as their perioperative longitudinal measurements. RESULTS In internal validation, the model with preoperative CEA, CA19-9, and CA125 outperformed the model including CEA only, with the better area under the receiver operating characteristic curves (AUCs: 0.774 vs 0.716), brier scores (BSs: 0.057 vs 0.058), and net reclassification improvement (NRI = 33.5%, 95% CI: 12.3 ~ 54.8%) at 36 months after surgery. Furthermore, the prediction models, by incorporating longitudinal measurements of CEA, CA19-9, and CA125 within 12 months after surgery, had improved prediction accuracy, with higher AUC (0.849) and lower BS (0.049). Compared with preoperative models, the model incorporating longitudinal measurements of the three markers had significant NRI (40.8%, 95% CI: 19.6 to 62.1%) at 36 months after surgery. External validation showed similar results to internal validation. The proposed longitudinal prediction model can provide a personalized dynamic prediction for a new patient, with estimated survival probability updated when a new measurement is collected during 12 months after surgery. CONCLUSIONS Prediction models including longitudinal measurements of CEA, CA19-9, and CA125 have improved accuracy in predicting the prognosis of CRC patients. We recommend repeated measurements of CEA, CA19-9, and CA125 in the surveillance of CRC prognosis.
Collapse
Affiliation(s)
- Chunxia Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhuaxi Road, PO Box 100, Jinan, 250012, Shandong, China
| | - Ke Zhao
- Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China.,Guangdong Cardiovascular Institute, Guangzhou, 510080, China.,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Dafu Zhang
- Department of Radiology, the Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer Center, No.519 Kunzhou Road, Xishan District, Kunming, 650118, Yunnan, China
| | - Xiaolin Pang
- Department of Radiotherapy, the Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510655, China
| | - Hongjiang Pu
- Department of Colorectal Surgery, the Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer Center, Kunming, 650118, China
| | - Ming Lei
- Department of Clinical Laboratory Medicine, the Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer Center, Kunming, 650118, China
| | - Bingbing Fan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhuaxi Road, PO Box 100, Jinan, 250012, Shandong, China
| | - Jiali Lv
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhuaxi Road, PO Box 100, Jinan, 250012, Shandong, China
| | - Dingyun You
- School of Biomedical Engineering Research, Kunming Medical University, No.1168 Chunrongxi Road, Chenggong District, Kunming, 650500, Yunnan, China.
| | - Zhenhui Li
- Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, 510080, China. .,Guangdong Cardiovascular Institute, Guangzhou, 510080, China. .,Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China. .,Department of Radiology, the Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Yunnan Cancer Center, No.519 Kunzhou Road, Xishan District, Kunming, 650118, Yunnan, China.
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, 44 Wenhuaxi Road, PO Box 100, Jinan, 250012, Shandong, China. .,Institute for Medical Dataology, Shandong University, Jinan, 250002, China.
| |
Collapse
|
8
|
Raunig DL, Pennello GA, Delfino JG, Buckler AJ, Hall TJ, Guimaraes AR, Wang X, Huang EP, Barnhart HX, deSouza N, Obuchowski N. Multiparametric Quantitative Imaging Biomarker as a Multivariate Descriptor of Health: A Roadmap. Acad Radiol 2023; 30:159-182. [PMID: 36464548 PMCID: PMC9825667 DOI: 10.1016/j.acra.2022.10.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/24/2022] [Accepted: 10/29/2022] [Indexed: 12/02/2022]
Abstract
Multiparametric quantitative imaging biomarkers (QIBs) offer distinct advantages over single, univariate descriptors because they provide a more complete measure of complex, multidimensional biological systems. In disease, where structural and functional disturbances occur across a multitude of subsystems, multivariate QIBs are needed to measure the extent of system malfunction. This paper, the first Use Case in a series of articles on multiparameter imaging biomarkers, considers multiple QIBs as a multidimensional vector to represent all relevant disease constructs more completely. The approach proposed offers several advantages over QIBs as multiple endpoints and avoids combining them into a single composite that obscures the medical meaning of the individual measurements. We focus on establishing statistically rigorous methods to create a single, simultaneous measure from multiple QIBs that preserves the sensitivity of each univariate QIB while incorporating the correlation among QIBs. Details are provided for metrological methods to quantify the technical performance. Methods to reduce the set of QIBs, test the superiority of the mp-QIB model to any univariate QIB model, and design study strategies for generating precision and validity claims are also provided. QIBs of Alzheimer's Disease from the ADNI merge data set are used as a case study to illustrate the methods described.
Collapse
Affiliation(s)
- David L Raunig
- Department of Statistical and Quantitative Sciences, Data Science Institute, Takeda Pharmaceuticals, Cambridge, Massachusetts.
| | - Gene A Pennello
- Center for Devices and Radiological Health, US Food and Drug Administration Division of Imaging, Diagnostic and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Jana G Delfino
- Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | | | - Timothy J Hall
- Department of Medical Physics, University of Wisconsin, Madison, Wisconsin
| | - Alexander R Guimaraes
- Department of Diagnostic Radiology, Oregon Health & Sciences University, Portland, Oregon
| | - Xiaofeng Wang
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland, Ohio
| | - Erich P Huang
- Biometric Research Program, Division of Cancer Treatment and Diagnosis - National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Huiman X Barnhart
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Nandita deSouza
- Division of Radiotherapy and Imaging, the Insitute of Cancer Research and Royal Marsden NHS Foundation Trust, London, United Kingdom
| | - Nancy Obuchowski
- Department of Quantitative Health Sciences, Lerner Research Institute Cleveland Clinic Foundation, Cleveland, Ohio
| |
Collapse
|
9
|
Yan YH, Chen TB, Yang CP, Tsai IJ, Yu HL, Wu YS, Huang WJ, Tseng ST, Peng TY, Chou EP. Long-term exposure to particulate matter was associated with increased dementia risk using both traditional approaches and novel machine learning methods. Sci Rep 2022; 12:17130. [PMID: 36224306 PMCID: PMC9556552 DOI: 10.1038/s41598-022-22100-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 10/10/2022] [Indexed: 02/05/2023] Open
Abstract
Air pollution exposure has been linked to various diseases, including dementia. However, a novel method for investigating the associations between air pollution exposure and disease is lacking. The objective of this study was to investigate whether long-term exposure to ambient particulate air pollution increases dementia risk using both the traditional Cox model approach and a novel machine learning (ML) with random forest (RF) method. We used health data from a national population-based cohort in Taiwan from 2000 to 2017. We collected the following ambient air pollution data from the Taiwan Environmental Protection Administration (EPA): fine particulate matter (PM2.5) and gaseous pollutants, including sulfur dioxide (SO2), carbon monoxide (CO), ozone (O3), nitrogen oxide (NOx), nitric oxide (NO), and nitrogen dioxide (NO2). Spatiotemporal-estimated air quality data calculated based on a geostatistical approach, namely, the Bayesian maximum entropy method, were collected. Each subject's residential county and township were reviewed monthly and linked to air quality data based on the corresponding township and month of the year for each subject. The Cox model approach and the ML with RF method were used. Increasing the concentration of PM2.5 by one interquartile range (IQR) increased the risk of dementia by approximately 5% (HR = 1.05 with 95% CI = 1.04-1.05). The comparison of the performance of the extended Cox model approach with the RF method showed that the prediction accuracy was approximately 0.7 by the RF method, but the AUC was lower than that of the Cox model approach. This national cohort study over an 18-year period provides supporting evidence that long-term particulate air pollution exposure is associated with increased dementia risk in Taiwan. The ML with RF method appears to be an acceptable approach for exploring associations between air pollutant exposure and disease.
Collapse
Affiliation(s)
- Yuan-Horng Yan
- grid.415517.30000 0004 0572 8068Department of Endocrinology and Metabolism, Kuang Tien General Hospital, Taichung, Taiwan ,grid.415517.30000 0004 0572 8068Department of Medical Research, Kuang Tien General Hospital, Taichung, Taiwan ,grid.411432.10000 0004 1770 3722Institute of Biomedical Nutrition, Hungkuang University, Taichung, Taiwan
| | - Ting-Bin Chen
- grid.410764.00000 0004 0573 0731Department of Neurology, Neurological Institute, Taichung Veterans General Hospital, Taichung, Taiwan ,grid.411432.10000 0004 1770 3722Department of Applied Cosmetology, Hungkuang University, Taichung, Taiwan
| | - Chun-Pai Yang
- grid.415517.30000 0004 0572 8068Department of Medical Research, Kuang Tien General Hospital, Taichung, Taiwan ,grid.411432.10000 0004 1770 3722Institute of Biomedical Nutrition, Hungkuang University, Taichung, Taiwan ,grid.415517.30000 0004 0572 8068Department of Neurology, Kuang Tien General Hospital, Taichung, Taiwan
| | - I-Ju Tsai
- grid.415517.30000 0004 0572 8068Department of Medical Research, Kuang Tien General Hospital, Taichung, Taiwan
| | - Hwa-Lung Yu
- grid.19188.390000 0004 0546 0241Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei, Taiwan
| | - Yuh-Shen Wu
- grid.411432.10000 0004 1770 3722Department of Safety, Health, and Environmental Engineering, Hungkuang University, Taichung, Taiwan
| | - Winn-Jung Huang
- grid.411432.10000 0004 1770 3722Department of Safety, Health, and Environmental Engineering, Hungkuang University, Taichung, Taiwan
| | - Shih-Ting Tseng
- grid.415517.30000 0004 0572 8068Division of Endocrinology and Metabolism, Department of Internal Medicine, Kuang Tien General Hospital, Taichung, Taiwan ,Jenteh Junior College of Medicine, Nursing and Management, Miaoli County, Taiwan
| | - Tzu-Yu Peng
- grid.412042.10000 0001 2106 6277Department of Statistics, National Chengchi University, No. 64, Sec. 2, Zhinan Rd., Wenshan Dist., Taipei City, 116 Taiwan
| | - Elizabeth P. Chou
- grid.412042.10000 0001 2106 6277Department of Statistics, National Chengchi University, No. 64, Sec. 2, Zhinan Rd., Wenshan Dist., Taipei City, 116 Taiwan
| |
Collapse
|
10
|
Li Y, Hsu W. A classification for complex imbalanced data in disease screening and early diagnosis. Stat Med 2022; 41:3679-3695. [PMID: 35603639 PMCID: PMC9541048 DOI: 10.1002/sim.9442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/11/2022] [Accepted: 05/10/2022] [Indexed: 11/09/2022]
Abstract
Imbalanced classification has drawn considerable attention in the statistics and machine learning literature. Typically, traditional classification methods often perform poorly when a severely skewed class distribution is observed, not to mention under a high-dimensional longitudinal data structure. Given the ubiquity of big data in modern health research, it is expected that imbalanced classification in disease diagnosis may encounter an additional level of difficulty that is imposed by such a complex data structure. In this article, we propose a nonparametric classification approach for imbalanced data in longitudinal and high-dimensional settings. Technically, the functional principal component analysis is first applied for feature extraction under the longitudinal structure. The univariate exponential loss function coupled with group LASSO penalty is then adopted into the classification procedure in high-dimensional settings. Along with a good improvement in imbalanced classification, our approach provides a meaningful feature selection for interpretation while enjoying a remarkably lower computational complexity. The proposed method is illustrated on the real data application of Alzheimer's disease early detection and its empirical performance in finite sample size is extensively evaluated by simulations.
Collapse
Affiliation(s)
- Yiming Li
- Department of StatisticsKansas State UniversityManhattanKansasUSA
| | - Wei‐Wen Hsu
- Division of Biostatistics and Bioinformatics, Department of Environmental and Public Health SciencesUniversity of CincinnatiCincinnatiOhioUSA
| | | |
Collapse
|
11
|
Lin J, Luo S. Deep learning for the dynamic prediction of multivariate longitudinal and survival data. Stat Med 2022; 41:2894-2907. [PMID: 35347750 DOI: 10.1002/sim.9392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/18/2022] [Accepted: 03/08/2022] [Indexed: 11/10/2022]
Abstract
The joint model for longitudinal and survival data improves time-to-event predictions by including longitudinal outcome variables in addition to baseline covariates. However, in practice, joint models may be limited by parametric assumptions in both the longitudinal and survival submodels. In addition, computational difficulties arise when considering multiple longitudinal outcomes due to the large number of random effects to be integrated out in the full likelihood. In this article, we discuss several recent machine learning methods for incorporating multivariate longitudinal data for time-to-event prediction. The presented methods use functional data analysis or convolutional neural networks to model the longitudinal data, both of which scale well to multiple longitudinal outcomes. In addition, we propose a novel architecture based on the transformer neural network, named TransformerJM, which jointly models longitudinal and time-to-event data. The prognostic abilities of each model are assessed and compared through both simulation and real data analysis on Alzheimer's disease datasets. Specifically, the models were evaluated based on their ability to dynamically update predictions as new longitudinal data becomes available. We showed that TransformerJM improves upon the predictive performance of existing methods across different scenarios.
Collapse
Affiliation(s)
- Jeffrey Lin
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, USA
| |
Collapse
|
12
|
Pickett KL, Suresh K, Campbell KR, Davis S, Juarez-Colunga E. Random survival forests for dynamic predictions of a time-to-event outcome using a longitudinal biomarker. BMC Med Res Methodol 2021; 21:216. [PMID: 34657597 PMCID: PMC8520610 DOI: 10.1186/s12874-021-01375-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 08/21/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Risk prediction models for time-to-event outcomes play a vital role in personalized decision-making. A patient's biomarker values, such as medical lab results, are often measured over time but traditional prediction models ignore their longitudinal nature, using only baseline information. Dynamic prediction incorporates longitudinal information to produce updated survival predictions during follow-up. Existing methods for dynamic prediction include joint modeling, which often suffers from computational complexity and poor performance under misspecification, and landmarking, which has a straightforward implementation but typically relies on a proportional hazards model. Random survival forests (RSF), a machine learning algorithm for time-to-event outcomes, can capture complex relationships between the predictors and survival without requiring prior specification and has been shown to have superior predictive performance. METHODS We propose an alternative approach for dynamic prediction using random survival forests in a landmarking framework. With a simulation study, we compared the predictive performance of our proposed method with Cox landmarking and joint modeling in situations where the proportional hazards assumption does not hold and the longitudinal marker(s) have a complex relationship with the survival outcome. We illustrated the use of the RSF landmark approach in two clinical applications to assess the performance of various RSF model building decisions and to demonstrate its use in obtaining dynamic predictions. RESULTS In simulation studies, RSF landmarking outperformed joint modeling and Cox landmarking when a complex relationship between the survival and longitudinal marker processes was present. It was also useful in application when there were several predictors for which the clinical relevance was unknown and multiple longitudinal biomarkers were present. Individualized dynamic predictions can be obtained from this method and the variable importance metric is useful for examining the changing predictive power of variables over time. In addition, RSF landmarking is easily implementable in standard software and using suggested specifications requires less computation time than joint modeling. CONCLUSIONS RSF landmarking is a nonparametric, machine learning alternative to current methods for obtaining dynamic predictions when there are complex or unknown relationships present. It requires little upfront decision-making and has comparable predictive performance and has preferable computational speed.
Collapse
Affiliation(s)
- Kaci L Pickett
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
| | - Krithika Suresh
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
- Adult and Child Consortium for Health Outcomes and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
| | - Kristen R Campbell
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
| | - Scott Davis
- Division of Renal Diseases and Hypertension, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
| | - Elizabeth Juarez-Colunga
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045 Colorado USA
| |
Collapse
|
13
|
Howlett J, Hill SM, Ritchie CW, Tom BDM. Disease Modelling of Cognitive Outcomes and Biomarkers in the European Prevention of Alzheimer's Dementia Longitudinal Cohort. Front Big Data 2021; 4:676168. [PMID: 34490422 PMCID: PMC8417903 DOI: 10.3389/fdata.2021.676168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 07/30/2021] [Indexed: 12/04/2022] Open
Abstract
A key challenge for the secondary prevention of Alzheimer’s dementia is the need to identify individuals early on in the disease process through sensitive cognitive tests and biomarkers. The European Prevention of Alzheimer’s Dementia (EPAD) consortium recruited participants into a longitudinal cohort study with the aim of building a readiness cohort for a proof-of-concept clinical trial and also to generate a rich longitudinal data-set for disease modelling. Data have been collected on a wide range of measurements including cognitive outcomes, neuroimaging, cerebrospinal fluid biomarkers, genetics and other clinical and environmental risk factors, and are available for 1,828 eligible participants at baseline, 1,567 at 6 months, 1,188 at one-year follow-up, 383 at 2 years, and 89 participants at three-year follow-up visit. We novelly apply state-of-the-art longitudinal modelling and risk stratification approaches to these data in order to characterise disease progression and biological heterogeneity within the cohort. Specifically, we use longitudinal class-specific mixed effects models to characterise the different clinical disease trajectories and a semi-supervised Bayesian clustering approach to explore whether participants can be stratified into homogeneous subgroups that have different patterns of cognitive functioning evolution, while also having subgroup-specific profiles in terms of baseline biomarkers and longitudinal rate of change in biomarkers.
Collapse
Affiliation(s)
- James Howlett
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Steven M Hill
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Craig W Ritchie
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Brian D M Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
14
|
Lu P, Colliot O. Multilevel Survival Modeling with Structured Penalties for Disease Prediction from Imaging Genetics data. IEEE J Biomed Health Inform 2021; 26:798-808. [PMID: 34329174 DOI: 10.1109/jbhi.2021.3100918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This paper introduces a framework for disease prediction from multimodal genetic and imaging data. We propose a multilevel survival model which allows predicting the time of occurrence of a future disease state in patients initially exhibiting mild symptoms. This new multilevel setting allows modeling the interactions between genetic and imaging variables. This is in contrast with classical additive models which treat all modalities in the same manner and can result in undesirable elimination of specific modalities when their contributions are unbalanced. Moreover, the use of a survival model allows overcoming the limitations of previous approaches based on classification which consider a fixed time frame. Furthermore, we introduce specific penalties taking into account the structure of the different types of data, such as a group lasso penalty over the genetic modality and a L2-penalty over the imaging modality. Finally, we propose a fast optimization algorithm, based on a proximal gradient method. The approach was applied to the prediction of Alzheimer's disease (AD) among patients with mild cognitive impairment (MCI) based on genetic (single nucleotide polymorphisms - SNP) and imaging (anatomical MRI measures) data from the ADNI database. The experiments demonstrate the effectiveness of the method for predicting the time of conversion to AD. It revealed how genetic variants and brain imaging alterations interact in the prediction of future disease status. The approach is generic and could potentially be useful for the prediction of other diseases.
Collapse
|
15
|
Jiang S, Xie Y, Colditz GA. Functional ensemble survival tree: Dynamic prediction of Alzheimer’s disease progression accommodating multiple time‐varying covariates. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Shu Jiang
- Division of Public Health Sciences Washington University School of Medicine in St. Louis St. Louis USA
| | - Yijun Xie
- Department of Statistics and Actuarial Sciences University of Waterloo Waterloo Canada
| | - Graham A. Colditz
- Division of Public Health Sciences Washington University School of Medicine in St. Louis St. Louis USA
| |
Collapse
|