1
|
Lindner H, Gimotty PA, Bilker WB. The diagnostic likelihood ratio function and modified test for trend: Identifying, evaluating, and validating nontraditional biomarkers in case-control studies. Stat Med 2023; 42:5313-5337. [PMID: 37735925 PMCID: PMC11073617 DOI: 10.1002/sim.9912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/05/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023]
Abstract
The ROC curve and its associated summary statistic, the AUC, are used to identify informative diagnostic biomarkers under the assumption that risk of disease is a monotone function of the biomarker. We refer to biomarkers that meet this assumption as traditional, and those that do not as nontraditional. Nontraditional biomarkers most often arise when both low and high biomarker values are associated with an outcome of interest, such as blood pressure with medical complications or leukocyte count with ICU prognosis. Since nontraditional biomarkers do not meet the assumptions for ROC-based analyses, we propose using the discrete diagnostic likelihood ratio (DLR) function to evaluate a wider class of informative biomarkers. We obtain the DLR function using the multinomial logistic regression (MLR) model to improve upon existing estimation techniques, and implement a likelihood ratio test to identify candidate informative traditional and nontraditional biomarkers. We propose a modification of the Cochran-Armitage test for trend that separates biomarkers deemed informative into traditional and nontraditional categories. The statistical properties of the likelihood ratio test and modified test for trend are explored under simulation. Together, these methods achieve the identification, evaluation, and validation of biomarkers from early discovery research. Finally, we show that incorporating covariates into the MLR model results in a covariate-adjusted DLR function that is useful for integrating multiple sources of information in clinical decision making. The methods are applied to gene expression data from subjects with high grade serous ovarian cancer, where stage, early stage vs late stage, is the outcome of interest.
Collapse
Affiliation(s)
- Hanna Lindner
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Phyllis A. Gimotty
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
- Authors contributed equally as senior co-authors
| | - Warren B. Bilker
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
- Authors contributed equally as senior co-authors
| |
Collapse
|
2
|
Panyard DJ, McKetney J, Deming YK, Morrow AR, Ennis GE, Jonaitis EM, Van Hulle CA, Yang C, Sung YJ, Ali M, Kollmorgen G, Suridjan I, Bayfield A, Bendlin BB, Zetterberg H, Blennow K, Cruchaga C, Carlsson CM, Johnson SC, Asthana S, Coon JJ, Engelman CD. Large-scale proteome and metabolome analysis of CSF implicates altered glucose and carbon metabolism and succinylcarnitine in Alzheimer's disease. Alzheimers Dement 2023; 19:5447-5470. [PMID: 37218097 PMCID: PMC10663389 DOI: 10.1002/alz.13130] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/23/2023] [Accepted: 04/04/2023] [Indexed: 05/24/2023]
Abstract
INTRODUCTION A hallmark of Alzheimer's disease (AD) is the aggregation of proteins (amyloid beta [A] and hyperphosphorylated tau [T]) in the brain, making cerebrospinal fluid (CSF) proteins of particular interest. METHODS We conducted a CSF proteome-wide analysis among participants of varying AT pathology (n = 137 participants; 915 proteins) with nine CSF biomarkers of neurodegeneration and neuroinflammation. RESULTS We identified 61 proteins significantly associated with the AT category (P < 5.46 × 10-5 ) and 636 significant protein-biomarker associations (P < 6.07 × 10-6 ). Proteins from glucose and carbon metabolism pathways were enriched among amyloid- and tau-associated proteins, including malate dehydrogenase and aldolase A, whose associations with tau were replicated in an independent cohort (n = 717). CSF metabolomics identified and replicated an association of succinylcarnitine with phosphorylated tau and other biomarkers. DISCUSSION These results implicate glucose and carbon metabolic dysregulation and increased CSF succinylcarnitine levels with amyloid and tau pathology in AD. HIGHLIGHTS Cerebrospinal fluid (CSF) proteome enriched for extracellular, neuronal, immune, and protein processing. Glucose/carbon metabolic pathways enriched among amyloid/tau-associated proteins. Key glucose/carbon metabolism protein associations independently replicated. CSF proteome outperformed other omics data in predicting amyloid/tau positivity. CSF metabolomics identified and replicated a succinylcarnitine-phosphorylated tau association.
Collapse
Affiliation(s)
- Daniel J. Panyard
- Department of Population Health Sciences, University of Wisconsin-Madison; 610 Walnut Street, 707 WARF Building, Madison, WI 53726, United States of America
| | - Justin McKetney
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison; Madison, WI 53706, United States of America
- Department of Biomolecular Chemistry, University of Wisconsin-Madison; Madison, WI 53506, United States of America
| | - Yuetiva K. Deming
- Department of Population Health Sciences, University of Wisconsin-Madison; 610 Walnut Street, 707 WARF Building, Madison, WI 53726, United States of America
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
| | - Autumn R. Morrow
- Department of Population Health Sciences, University of Wisconsin-Madison; 610 Walnut Street, 707 WARF Building, Madison, WI 53726, United States of America
| | - Gilda E. Ennis
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
| | - Erin M. Jonaitis
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Wisconsin Alzheimer’s Institute, University of Wisconsin-Madison; 610 Walnut Street, 9 Floor, Madison, WI 53726, United States of America
| | - Carol A. Van Hulle
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
| | - Chengran Yang
- Department of Psychiatry, Washington University School of Medicine; St Louis, MO 63110, United States of America
- NeuroGenomics and Informatics Center, Washington University School of Medicine; St Louis, MO 63110, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine; St Louis, MO 63110, United States of America
| | - Yun Ju Sung
- Department of Psychiatry, Washington University School of Medicine; St Louis, MO 63110, United States of America
- NeuroGenomics and Informatics Center, Washington University School of Medicine; St Louis, MO 63110, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine; St Louis, MO 63110, United States of America
| | - Muhammad Ali
- Department of Psychiatry, Washington University School of Medicine; St Louis, MO 63110, United States of America
- NeuroGenomics and Informatics Center, Washington University School of Medicine; St Louis, MO 63110, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine; St Louis, MO 63110, United States of America
| | | | | | | | - Barbara B. Bendlin
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
- Wisconsin Alzheimer’s Institute, University of Wisconsin-Madison; 610 Walnut Street, 9 Floor, Madison, WI 53726, United States of America
- William S. Middleton Memorial Veterans Hospital; 2500 Overlook Terrace, Madison, WI 53705, United States of America
| | - Henrik Zetterberg
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg; Mölndal, Sweden
- Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital; Mölndal, Sweden
- Department of Neurodegenerative Disease, UCL Institute of Neurology; London, UK
- UK Dementia Research Institute at UCL; London, UK
- Hong Kong Center for Neurodegenerative Diseases; Hong Kong, China
| | - Kaj Blennow
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg; Mölndal, Sweden
- Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital; Mölndal, Sweden
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine; St Louis, MO 63110, United States of America
- NeuroGenomics and Informatics Center, Washington University School of Medicine; St Louis, MO 63110, United States of America
- Hope Center for Neurological Disorders, Washington University School of Medicine; St Louis, MO 63110, United States of America
| | - Cynthia M. Carlsson
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
- Wisconsin Alzheimer’s Institute, University of Wisconsin-Madison; 610 Walnut Street, 9 Floor, Madison, WI 53726, United States of America
- William S. Middleton Memorial Veterans Hospital; 2500 Overlook Terrace, Madison, WI 53705, United States of America
| | - Sterling C. Johnson
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
- Wisconsin Alzheimer’s Institute, University of Wisconsin-Madison; 610 Walnut Street, 9 Floor, Madison, WI 53726, United States of America
- William S. Middleton Memorial Veterans Hospital; 2500 Overlook Terrace, Madison, WI 53705, United States of America
| | - Sanjay Asthana
- Wisconsin Alzheimer’s Disease Research Center, University of Wisconsin-Madison; 600 Highland Avenue, J5/1 Mezzanine, Madison, WI 53792, United States of America
- Department of Medicine, University of Wisconsin-Madison; 1685 Highland Avenue, 5158 Medical Foundation Centennial Building, Madison, WI 53705, United States of America
- William S. Middleton Memorial Veterans Hospital; 2500 Overlook Terrace, Madison, WI 53705, United States of America
| | - Joshua J. Coon
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison; Madison, WI 53706, United States of America
- Department of Biomolecular Chemistry, University of Wisconsin-Madison; Madison, WI 53506, United States of America
- Morgridge Institute for Research; Madison, WI 53706, United States of America
- Department of Chemistry, University of Wisconsin-Madison; Madison, WI 53506, United States of America
| | - Corinne D. Engelman
- Department of Population Health Sciences, University of Wisconsin-Madison; 610 Walnut Street, 707 WARF Building, Madison, WI 53726, United States of America
| |
Collapse
|
3
|
Zhang HT, Wang WT. Prediction of the Potential Distribution of the Endangered Species Meconopsis punicea Maxim under Future Climate Change Based on Four Species Distribution Models. PLANTS (BASEL, SWITZERLAND) 2023; 12:1376. [PMID: 36987063 PMCID: PMC10056925 DOI: 10.3390/plants12061376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 03/15/2023] [Accepted: 03/15/2023] [Indexed: 06/19/2023]
Abstract
Climate change increases the extinction risk of species, and studying the impact of climate change on endangered species is of great significance to biodiversity conservation. In this study, the endangered plant Meconopsis punicea Maxim (M. punicea) was selected as the research object. Four species distribution models (SDMs): the generalized linear model, the generalized boosted regression tree model, random forest and flexible discriminant analysis were applied to predict the potential distribution of M. punicea under current and future climates scenarios. Among them, two emission scenarios of sharing socio-economic pathways (SSPs; i.e., SSP2-4.5 and SSP5-8.5) and two global circulation models (GCMs) were considered for future climate conditions. Our results showed that temperature seasonality, mean temperature of coldest quarter, precipitation seasonality and precipitation of warmest quarter were the most important factors shaping the potential distribution of M. punicea. The prediction of the four SDMs consistently indicated that the current potential distribution area of M. punicea is concentrated between 29.02° N-39.06° N and 91.40° E-105.89° E. Under future climate change, the potential distribution of M. punicea will expand from the southeast to the northwest, and the expansion area under SSP5-8.5 would be wider than that under SSP2-4.5. In addition, there were significant differences in the potential distribution of M. punicea predicted by different SDMs, with slight differences caused by GCMs and emission scenarios. Our study suggests using agreement results from different SDMs as the basis for developing conservation strategies to improve reliability.
Collapse
|
4
|
Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, Magwood O, Sheikh Y, McInnes M, Holzinger A. Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:329-341. [PMID: 35077357 DOI: 10.1109/tpami.2022.3145392] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optimal performance is desired for decision-making in any field with binary classifiers and diagnostic tests, however common performance measures lack depth in information. The area under the receiver operating characteristic curve (AUC) and the area under the precision recall curve are too general because they evaluate all decision thresholds including unrealistic ones. Conversely, accuracy, sensitivity, specificity, positive predictive value and the F1 score are too specific-they are measured at a single threshold that is optimal for some instances, but not others, which is not equitable. In between both approaches, we propose deep ROC analysis to measure performance in multiple groups of predicted risk (like calibration), or groups of true positive rate or false positive rate. In each group, we measure the group AUC (properly), normalized group AUC, and averages of: sensitivity, specificity, positive and negative predictive value, and likelihood ratio positive and negative. The measurements can be compared between groups, to whole measures, to point measures and between models. We also provide a new interpretation of AUC in whole or part, as balanced average accuracy, relevant to individuals instead of pairs. We evaluate models in three case studies using our method and Python toolkit and confirm its utility.
Collapse
|
5
|
Wang J, Zhanghuang C, Jin L, Zhang Z, Tan X, Mi T, Liu J, Li M, Wu X, Tian X, He D. Development and validation of a nomogram to predict cancer-specific survival in elderly patients with papillary thyroid carcinoma: a population-based study. BMC Geriatr 2022; 22:736. [PMID: 36076163 PMCID: PMC9454205 DOI: 10.1186/s12877-022-03430-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 08/29/2022] [Indexed: 11/21/2022] Open
Abstract
Objective Thyroid carcinoma (TC) is the most common endocrine tumor in the human body. Papillary thyroid carcinoma (PTC) accounts for more than 80% of thyroid cancers. Accurate prediction of elderly PTC can help reduce the mortality of patients. We aimed to construct a nomogram predicting cancer-specific survival (CSS) in elderly patients with PTC. Methods Patient information was downloaded from the Surveillance, Epidemiology, and End Results (SEER) program. Univariate and multivariate Cox regression models were used to screen the independent risk factors for patients with PTC. The nomogram of elderly patients with PTC was constructed based on the multivariate Cox regression model. We used the concordance index (C-index), the area under the receiver operating characteristic curve (AUC) and the calibration curve to test the accuracy and discrimination of the prediction model. Decision curve analysis (DCA) was used to test the clinical value of the model. Results A total of 14,138 elderly patients with PTC were included in this study. Patients from 2004 to 2015 were randomly divided into a training set (N = 7379) and a validation set (N = 3141), and data from 2016 to 2018 were divided into an external validation set (N = 3618). Proportional sub-distribution hazard model showed that age, sex, tumor size, histological grade, TNM stage, surgery and chemotherapy were independent risk factors for prognosis. In the training set, validation set and external validation set, the C-index was 0.87(95%CI: 0.852–0.888), 0.891(95%CI: 0.866–0.916) and 0.931(95%CI:0.894–0.968), respectively, indicating that the nomogram had good discrimination. Calibration curves and AUC suggest that the prediction model has good discrimination and accuracy. Conclusions We constructed a new nomogram to predict CSS in elderly patients with PTC. Internal cross-validation and external validation indicate that the model has good discrimination and accuracy. The predictive model can help doctors and patients make clinical decisions.
Collapse
Affiliation(s)
- Jinkui Wang
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Chenghao Zhanghuang
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China.,Department of Urology, Kunming Children's Hospital, Yunnan Provincial Key Research Laboratory of Pediatric Major Diseases, Kunming, 650228, China
| | - Liming Jin
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Zhaoxia Zhang
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Xiaojun Tan
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Tao Mi
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Jiayan Liu
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Mujie Li
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Xin Wu
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Xiaomao Tian
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| | - Dawei He
- Department of Urology, Chongqing Key Laboratory of Children Urogenital Development and Tissue Engineering, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, People's Republic of China.
| |
Collapse
|
6
|
Bhattacharyay S, Milosevic I, Wilson L, Menon DK, Stevens RD, Steyerberg EW, Nelson DW, Ercole A. The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach. PLoS One 2022; 17:e0270973. [PMID: 35788768 PMCID: PMC9255749 DOI: 10.1371/journal.pone.0270973] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/21/2022] [Indexed: 11/30/2022] Open
Abstract
When a patient is admitted to the intensive care unit (ICU) after a traumatic brain injury (TBI), an early prognosis is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale–Extended (GOSE) into eight, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at a certain threshold of GOSE (e.g., prediction of survival [GOSE > 1]). We aimed to develop ordinal prediction models that concurrently predict probabilities of each GOSE score. From a prospective cohort (n = 1,550, 65 centres) in the ICU stratum of the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) patient dataset, we extracted all clinical information within 24 hours of ICU admission (1,151 predictors) and 6-month GOSE scores. We analysed the effect of two design elements on ordinal model performance: (1) the baseline predictor set, ranging from a concise set of ten validated predictors to a token-embedded representation of all possible predictors, and (2) the modelling strategy, from ordinal logistic regression to multinomial deep learning. With repeated k-fold cross-validation, we found that expanding the baseline predictor set significantly improved ordinal prediction performance while increasing analytical complexity did not. Half of these gains could be achieved with the addition of eight high-impact predictors to the concise set. At best, ordinal models achieved 0.76 (95% CI: 0.74–0.77) ordinal discrimination ability (ordinal c-index) and 57% (95% CI: 54%– 60%) explanation of ordinal variation in 6-month GOSE (Somers’ Dxy). Model performance and the effect of expanding the predictor set decreased at higher GOSE thresholds, indicating the difficulty of predicting better functional outcomes shortly after ICU admission. Our results motivate the search for informative predictors that improve confidence in prognosis of higher GOSE and the development of ordinal dynamic prediction models.
Collapse
Affiliation(s)
- Shubhayu Bhattacharyay
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, United Kingdom
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail:
| | - Ioan Milosevic
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
| | - Lindsay Wilson
- Division of Psychology, University of Stirling, Stirling, United Kingdom
| | - David K. Menon
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
| | - Robert D. Stevens
- Laboratory of Computational Intensive Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, United States of America
| | - Ewout W. Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David W. Nelson
- Department of Physiology and Pharmacology, Section for Perioperative Medicine and Intensive Care, Karolinska Institutet, Stockholm, Sweden
| | - Ari Ercole
- Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for Artificial Intelligence in Medicine, Cambridge, United Kingdom
| | | |
Collapse
|
7
|
Sou KL, Say A, Xu H. Unity Assumption in Audiovisual Emotion Perception. Front Neurosci 2022; 16:782318. [PMID: 35310087 PMCID: PMC8931414 DOI: 10.3389/fnins.2022.782318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 02/09/2022] [Indexed: 11/29/2022] Open
Abstract
We experience various sensory stimuli every day. How does this integration occur? What are the inherent mechanisms in this integration? The “unity assumption” proposes a perceiver’s belief of unity in individual unisensory information to modulate the degree of multisensory integration. However, this has yet to be verified or quantified in the context of semantic emotion integration. In the present study, we investigate the ability of subjects to judge the intensities and degrees of similarity in faces and voices of two emotions (angry and happy). We found more similar stimulus intensities to be associated with stronger likelihoods of the face and voice being integrated. More interestingly, multisensory integration in emotion perception was observed to follow a Gaussian distribution as a function of the emotion intensity difference between the face and voice—the optimal cut-off at about 2.50 points difference on a 7-point Likert scale. This provides a quantitative estimation of the multisensory integration function in audio-visual semantic emotion perception with regards to stimulus intensity. Moreover, to investigate the variation of multisensory integration across the population, we examined the effects of personality and autistic traits of participants. Here, we found no correlation of autistic traits with unisensory processing in a nonclinical population. Our findings shed light on the current understanding of multisensory integration mechanisms.
Collapse
Affiliation(s)
- Ka Lon Sou
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
- Humanities, Arts and Social Sciences, Singapore University of Technology and Design, Singapore, Singapore
| | - Ashley Say
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hong Xu
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
- *Correspondence: Hong Xu,
| |
Collapse
|
8
|
Abstract
AbstractThe H-measure is a classifier performance measure which takes into account the context of application without requiring a rigid value of relative misclassification costs to be set. Since its introduction in 2009 it has become widely adopted. This paper answers various queries which users have raised since its introduction, including questions about its interpretation, the choice of a weighting function, whether it is strictly proper, its coherence, and relates the measure to other work.
Collapse
|
9
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
10
|
Ramos HM, Ollero J, Suárez-Llorens A. Two sensitivity orders applied to the comparison of ROC curves. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2019.1656744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Héctor M. Ramos
- Departamento de Estadística e Investigación Operativa, Universidad de Cádiz, Cádiz, Spain
| | - Jorge Ollero
- Departamento de Estadística e Investigación Operativa, Universidad de Cádiz, Cádiz, Spain
| | - Alfonso Suárez-Llorens
- Departamento de Estadística e Investigación Operativa, Universidad de Cádiz, Cádiz, Spain
| |
Collapse
|
11
|
Tsalatsanis A, Hozo I, Djulbegovic B. Research synthesis of information theory measures of uncertainty: Meta-analysis of entropy and mutual information of diagnostic tests. J Eval Clin Pract 2021; 27:246-255. [PMID: 32914916 DOI: 10.1111/jep.13475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/17/2020] [Accepted: 08/19/2020] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Assessing the performance of diagnostic tests requires evaluation of the amount of diagnostic uncertainty a test reduces. Statistical measures, such as sensitivity and specificity, currently dominating the evidence-based medicine (EBM) and related fields, cannot explicitly measure this reduction in diagnostic uncertainty. Mutual information (MI), an information theory statistic, explicitly quantifies diagnostic uncertainty by measuring information gain before vs after diagnostic testing. In this paper, we propose the use of MI as a single measure to express diagnostic test performance and demonstrate how it can be used in the meta-analysis of diagnostic test studies. METHODS We use two case studies from the literature to demonstrate the applicability of MI meta-analysis in assessing diagnostic performance. Meta-analysis of studies evaluating (a) ultrasonography (US) to detect endometrial cancer and (b) magnetic resonance angiography to detect arterial stenosis. RESULTS The results of MI meta-analyses are comparable to those of traditional statistical measures' meta-analyses. However, the results of MI are easier to understand as it relates directly to the extent of uncertainty a diagnostic test can reduce. For example, the US test, diagnosing endometrial cancer, is 40% specific and 94% sensitive. The combination of these values is difficult to interpret and may lead to inappropriate assessment (eg, one could favour the test due to its high sensitivity, ignoring its low specificity). In terms of MI, however, a single metric shows that the test reduces diagnostic uncertainty by 10%, which many users may consider small under most circumstances. CONCLUSIONS We have demonstrated the suitability of MI in assessing the performance of diagnostic tests, which can facilitate easier interpretation of the true utility of diagnostic tests. Similarly, to the guidance for interpretation of effect size of treatment interventions, we also propose the guidelines for interpretation of the utility of diagnostic tests based on the magnitude of reduction in diagnostic uncertainty.
Collapse
Affiliation(s)
| | - Iztok Hozo
- Department of Mathematics, Indiana University Northwest, Gary, Indiana, USA
| | - Benjamin Djulbegovic
- Department of Supportive Care Medicine, City of Hope, Duarte, California, USA.,Department of Hematology, City of Hope, Duarte, California, USA.,Evidence-based Analytics & Program for Comparative Effectiveness Research and Evidence-based Medicine, City of Hope, Duarte, California, USA
| |
Collapse
|
12
|
Bantis LE, Tsimikas JV, Chambers G, Capello M, Hanash S, Feng Z. The length of the receiver operating characteristic curve and the two cutoff Youden index within a robust framework for discovery, evaluation, and cutoff estimation in biomarker studies involving improper receiver operating characteristic curves. Stat Med 2021; 40:1767-1789. [PMID: 33530129 PMCID: PMC9976806 DOI: 10.1002/sim.8869] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 12/09/2020] [Accepted: 12/14/2020] [Indexed: 02/06/2023]
Abstract
During the early stage of biomarker discovery, high throughput technologies allow for simultaneous input of thousands of biomarkers that attempt to discriminate between healthy and diseased subjects. In such cases, proper ranking of biomarkers is highly important. Common measures, such as the area under the receiver operating characteristic (ROC) curve (AUC), as well as affordable sensitivity and specificity levels, are often taken into consideration. Strictly speaking, such measures are appropriate under a stochastic ordering assumption, which implies, without loss of generality, that higher measurements are more indicative for the disease. Such an assumption is not always plausible and may lead to rejection of extremely useful biomarkers at this early discovery stage. We explore the length of a smooth ROC curve as a measure for biomarker ranking, which is not subject to directionality. We show that the length corresponds to a ϕ divergence, is identical to the corresponding length of the optimal (likelihood ratio) ROC curve, and is an appropriate measure for ranking biomarkers. We explore the relationship between the length measure and the AUC of the optimal ROC curve. We then provide a complete framework for the evaluation of a biomarker in terms of sensitivity and specificity through a proposed ROC analogue for use in improper settings. In the absence of any clinical insight regarding the appropriate cutoffs, we estimate the sensitivity and specificity under a two-cutoff extension of the Youden index and we further take into account the implied costs. We apply our approaches on two biomarker studies that relate to pancreatic and esophageal cancer.
Collapse
Affiliation(s)
- Leonidas E. Bantis
- Dept. of Biostatistics and Data Science, University of Kansas Medical Center, Kansas City, U.S.A
| | - John V. Tsimikas
- Dept of Statistics and Actuarial-Financial Mathematics, University of the Aegean, Samos, Greece
| | | | - Michela Capello
- Dept. of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, U.S.A
| | - Samir Hanash
- Dept. of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, U.S.A
| | - Ziding Feng
- Dept. of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, U.S.A
| |
Collapse
|
13
|
Martínez-Camblor P, Pérez-Fernández S, Díaz-Coto S. The area under the generalized receiver-operating characteristic curve. Int J Biostat 2021; 18:293-306. [PMID: 33761578 DOI: 10.1515/ijb-2020-0091] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 03/01/2021] [Indexed: 12/22/2022]
Abstract
The receiver operating-characteristic (ROC) curve is a well-known graphical tool routinely used for evaluating the discriminatory ability of continuous markers, referring to a binary characteristic. The area under the curve (AUC) has been proposed as a summarized accuracy index. Higher values of the marker are usually associated with higher probabilities of having the characteristic under study. However, there are other situations where both, higher and lower marker scores, are associated with a positive result. The generalized ROC (gROC) curve has been proposed as a proper extension of the ROC curve to fit these situations. Of course, the corresponding area under the gROC curve, gAUC, has also been introduced as a global measure of the classification capacity. In this paper, we study in deep the gAUC properties. The weak convergence of its empirical estimator is provided while deriving an explicit and useful expression for the asymptotic variance. We also obtain the expression for the asymptotic covariance of related gAUCs and propose a non-parametric procedure to compare them. The finite-samples behavior is studied through Monte Carlo simulations under different scenarios, presenting a real-world problem in order to illustrate its practical application. The R code functions implementing the procedures are provided as Supplementary Material.
Collapse
Affiliation(s)
- Pablo Martínez-Camblor
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 7 Lebanon Street, Suite 309, Hinman Box 7261, Hanover, NH 03755, USA
| | | | - Susana Díaz-Coto
- Department of Statistics, Oviedo University, Oviedo, Asturies, Spain
| |
Collapse
|
14
|
Validation and verification of predictive salivary biomarkers for oral health. Sci Rep 2021; 11:6406. [PMID: 33742017 PMCID: PMC7979790 DOI: 10.1038/s41598-021-85120-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/24/2021] [Indexed: 02/06/2023] Open
Abstract
Oral health is important not only due to the diseases emerging in the oral cavity but also due to the direct relation to systemic health. Thus, early and accurate characterization of the oral health status is of utmost importance. There are several salivary biomarkers as candidates for gingivitis and periodontitis, which are major oral health threats, affecting the gums. These need to be verified and validated for their potential use as differentiators of health, gingivitis and periodontitis status, before they are translated to chair-side for diagnostics and personalized monitoring. We aimed to measure 10 candidates using high sensitivity ELISAs in a well-controlled cohort of 127 individuals from three groups: periodontitis (60), gingivitis (31) and healthy (36). The statistical approaches included univariate statistical tests, receiver operating characteristic curves (ROC) with the corresponding Area Under the Curve (AUC) and Classification and Regression Tree (CART) analysis. The main outcomes were that the combination of multiple biomarker assays, rather than the use of single ones, can offer a predictive accuracy of > 90% for gingivitis versus health groups; and 100% for periodontitis versus health and periodontitis versus gingivitis groups. Furthermore, ratios of biomarkers MMP-8, MMP-9 and TIMP-1 were also proven to be powerful differentiating values compared to the single biomarkers.
Collapse
|
15
|
Barela IA, Burger LM, Wang G, Evans KO, Meng Q, Taylor JD. Spatial transferability of expert opinion models for American beaver habitat. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
16
|
Ferreira ADS, Meziat-Filho N, Ferreira APA. Double threshold receiver operating characteristic plot for three-modal continuous predictors. Comput Stat 2021. [DOI: 10.1007/s00180-021-01080-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Li Y, Babcock SE, Stewart SL, Hirdes JP, Schwean VL. Psychometric Evaluation of the Depressive Severity Index (DSI) Among Children and Youth Using the interRAI Child and Youth Mental Health (ChYMH) Assessment Tool. CHILD & YOUTH CARE FORUM 2021. [DOI: 10.1007/s10566-020-09592-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Van Hulle C, Jonaitis EM, Betthauser TJ, Batrla R, Wild N, Kollmorgen G, Andreasson U, Okonkwo O, Bendlin BB, Asthana S, Carlsson CM, Johnson SC, Zetterberg H, Blennow K. An examination of a novel multipanel of CSF biomarkers in the Alzheimer's disease clinical and pathological continuum. Alzheimers Dement 2020; 17:431-445. [PMID: 33336877 PMCID: PMC8016695 DOI: 10.1002/alz.12204] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 07/30/2020] [Accepted: 09/02/2020] [Indexed: 01/08/2023]
Abstract
INTRODUCTION This study examines the utility of a multipanel of cerebrospinal fluid (CSF) biomarkers complementing Alzheimer's disease (AD) biomarkers in a clinical research sample. We compared biomarkers across groups defined by clinical diagnosis and pTau181 /Aβ42 status (+/-) and explored their value in predicting cognition. METHODS CSF biomarkers amyloid beta (Aβ)42 , pTau181 , tTau, Aβ40 , neurogranin, neurofilament light (NfL), α-synuclein, glial fibrillary acidic protein (GFAP), chitinase-3-like protein 1 (YKL-40), soluble triggering receptor expressed on myeloid cells 2 (sTREM2), S100 calcium binding protein B (S100B), and interleukin 6 (IL6), were measured with the NeuroToolKit (NTK) for 720 adults ages 40 to 93 years (mean age = 63.9 years, standard deviation [SD] = 9.0; 50 with dementia; 54 with mild cognitive impairment [MCI], 616 unimpaired). RESULTS Neurodegeneration and glial activation biomarkers were elevated in pTau181 /Aβ42 + MCI/dementia participants relative to all pTau181 /Aβ42 - participants. Neurodegeneration biomarkers increased with clinical severity among pTau181 /Aβ42 + participants and predicted worse cognitive performance. Glial activation biomarkers were unrelated to cognitive performance. DISCUSSION The NTK contains promising markers that improve the pathophysiological characterization of AD. Neurodegeneration biomarkers beyond tTau improved statistical prediction of cognition and disease stages.
Collapse
Affiliation(s)
- Carol Van Hulle
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Erin M Jonaitis
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Tobey J Betthauser
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Richard Batrla
- Roche Diagnostics International AG, Rotkreuz, Switzerland
| | | | | | - Ulf Andreasson
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden
| | - Ozioma Okonkwo
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Barbara B Bendlin
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Sanjay Asthana
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Geriatric Research, Education and Clinical Center at the William S. Middleton Memorial Veterans Hospital, Madison, Wisconsin, USA
| | - Cynthia M Carlsson
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Geriatric Research, Education and Clinical Center at the William S. Middleton Memorial Veterans Hospital, Madison, Wisconsin, USA
| | - Sterling C Johnson
- Wisconsin Alzheimer's Disease Research Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin, USA.,Geriatric Research, Education and Clinical Center at the William S. Middleton Memorial Veterans Hospital, Madison, Wisconsin, USA
| | - Henrik Zetterberg
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden.,Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Mölndal, Sweden.,Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK.,UK Dementia Research Institute at UCL, London, UK
| | - Kaj Blennow
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden.,Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Mölndal, Sweden
| |
Collapse
|
19
|
A study of job involvement prediction using machine learning technique. INTERNATIONAL JOURNAL OF ORGANIZATIONAL ANALYSIS 2020. [DOI: 10.1108/ijoa-05-2020-2222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
Job involvement can be linked with important work outcomes. One way for organizations to increase job involvement is to use machine learning technology to predict employees’ job involvement, so that their leaders of human resource (HR) management can take proactive measures or plan succession for preservation. This paper aims to develop a reliable job involvement prediction model using machine learning technique.
Design/methodology/approach
This study used the data set, which is available at International Business Machines (IBM) Watson Analytics in IBM community and applied a generalized linear model (GLM) including linear regression and binomial classification. This study essentially had two primary approaches. First, this paper intends to understand the role of variables in job involvement prediction modeling better. Second, the study seeks to evaluate the predictive performance of GLM including linear regression and binomial classification.
Findings
In these results, first, employees’ job involvement with a lot of individual factors can be predicted. Second, for each model, this model showed the outstanding predictive performance.
Practical implications
The pre-access and modeling methodology used in this paper can be viewed as a roadmap for the reader to follow the steps taken in this study and to apply procedures to identify the causes of many other HR management problems.
Originality/value
This paper is the first one to attempt to come up with the best-performing model for predicting job involvement based on a limited set of features including employees’ demographics using machine learning technique.
Collapse
|
20
|
Chatzimichail T, Hatjimihail AT. A Software Tool for Exploring the Relation between Diagnostic Accuracy and Measurement Uncertainty. Diagnostics (Basel) 2020; 10:E610. [PMID: 32825135 PMCID: PMC7555914 DOI: 10.3390/diagnostics10090610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/26/2020] [Accepted: 08/14/2020] [Indexed: 12/16/2022] Open
Abstract
Screening and diagnostic tests are used to classify people with and without a disease. Diagnostic accuracy measures are used to evaluate the correctness of a classification in clinical research and practice. Although this depends on the uncertainty of measurement, there has been limited research on their relation. The objective of this work was to develop an exploratory tool for the relation between diagnostic accuracy measures and measurement uncertainty, as diagnostic accuracy is fundamental to clinical decision-making, while measurement uncertainty is critical to quality and risk management in laboratory medicine. For this reason, a freely available interactive program was developed for calculating, optimizing, plotting and comparing various diagnostic accuracy measures and the corresponding risk of diagnostic or screening tests measuring a normally distributed measurand, applied at a single point in time in non-diseased and diseased populations. This is done for differing prevalence of the disease, mean and standard deviation of the measurand, diagnostic threshold, standard measurement uncertainty of the tests and expected loss. The application of the program is illustrated with a case study of glucose measurements in diabetic and non-diabetic populations. The program is user-friendly and can be used as an educational and research tool in medical decision-making.
Collapse
|
21
|
Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, Manuel DG. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inform Decis Mak 2020; 20:4. [PMID: 31906931 PMCID: PMC6945414 DOI: 10.1186/s12911-019-1014-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 12/20/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. METHODS We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data-as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. RESULTS Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. CONCLUSIONS The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. FUTURE WORK Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.
Collapse
Affiliation(s)
| | - Paul W Fieguth
- Faculty of Engineering, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Hammad Qazi
- School of Public Health and Health Systems, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Andreas Holzinger
- Holzinger Group (HCAI), Institute for Medical Informatics/Statistics, Medical University Graz, 8036, Graz, Austria.,Institute of Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria
| | - Helen H Chen
- School of Public Health and Health Systems, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Franz Mayr
- Universidad ORT Uruguay, 11100, Montevideo, Uruguay
| | - Douglas G Manuel
- Ottawa Hospital Research Institute, Ottawa, K1H 8L6, Canada.,Department of Family Medicine, University of Ottawa, Ottawa, Canada.,School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada.,Institute for Clinical Evaluative Sciences, Ottawa, Canada.,Statistics Canada, Ottawa, Canada.,C.T. Lamont Primary Health Care Research Centre and Bruỳere Research Institute, Ottawa, Canada.,Division of Clinical Public Health, Dalla Lana School of Public Health, Toronto, Canada
| |
Collapse
|
22
|
Wynants L, van Smeden M, McLernon DJ, Timmerman D, Steyerberg EW, Van Calster B. Three myths about risk thresholds for prediction models. BMC Med 2019; 17:192. [PMID: 31651317 PMCID: PMC6814132 DOI: 10.1186/s12916-019-1425-3] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 09/16/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. MAIN TEXT We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. CONCLUSION Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.
Collapse
Affiliation(s)
- Laure Wynants
- KU Leuven Department of Development and Regeneration, Leuven, Belgium. .,Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands.
| | - Maarten van Smeden
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David J McLernon
- Medical Statistics Team, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| | - Dirk Timmerman
- KU Leuven Department of Development and Regeneration, Leuven, Belgium.,Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Ben Van Calster
- KU Leuven Department of Development and Regeneration, Leuven, Belgium.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | | |
Collapse
|
23
|
Pfeiffer RM, Gail MH. Estimating the decision curve and its precision from three study designs. Biom J 2019; 62:764-776. [PMID: 31394013 DOI: 10.1002/bimj.201800240] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 06/26/2019] [Accepted: 07/09/2019] [Indexed: 01/16/2023]
Abstract
The decision curve plots the net benefit ( N B ) of a risk model for making decisions over a range of risk thresholds, corresponding to different ratios of misclassification costs. We discuss three methods to estimate the decision curve, together with corresponding methods of inference and methods to compare two risk models at a given risk threshold. One method uses risks (R) and a binary event indicator (Y) on the entire validation cohort. This method makes no assumptions on how well-calibrated the risk model is nor on the incidence of disease in the population and is comparatively robust to model miscalibration. If one assumes that the model is well-calibrated, one can compute a much more precise estimate of N B based on risks R alone. However, if the risk model is miscalibrated, serious bias can result. Case-control data can also be used to estimate N B if the incidence (or prevalence) of the event ( Y = 1 ) is known. This strategy has comparable efficiency to using the full ( R , Y ) data, and its efficiency is only modestly less than that for the full ( R , Y ) data if the incidence is estimated from the mean of Y. We estimate variances using influence functions and propose a bootstrap procedure to obtain simultaneous confidence bands around the decision curve for a range of thresholds. The influence function approach to estimate variances can also be applied to cohorts derived from complex survey samples instead of simple random samples.
Collapse
Affiliation(s)
- Ruth M Pfeiffer
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, USA
| | - Mitchell H Gail
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, USA
| |
Collapse
|
24
|
Gantchoff M, Conlee L, Belant J. Conservation implications of sex‐specific landscape suitability for a large generalist carnivore. DIVERS DISTRIB 2019. [DOI: 10.1111/ddi.12954] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Mariela Gantchoff
- Camp Fire Program in Wildlife Conservation, College of Environmental Science and Forestry State University of New York Syracuse New York USA
| | - Laura Conlee
- Missouri Department of Conservation Columbia Missouri USA
| | - Jerrold Belant
- Camp Fire Program in Wildlife Conservation, College of Environmental Science and Forestry State University of New York Syracuse New York USA
| |
Collapse
|
25
|
Wang X, Zhang Y, Hao S, Zheng L, Liao J, Ye C, Xia M, Wang O, Liu M, Weng CH, Duong SQ, Jin B, Alfreds ST, Stearns F, Kanov L, Sylvester KG, Widen E, McElhinney DB, Ling XB. Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine. J Med Internet Res 2019; 21:e13260. [PMID: 31099339 PMCID: PMC6542253 DOI: 10.2196/13260] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 04/18/2019] [Accepted: 04/23/2019] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate. OBJECTIVE The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. METHODS Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018. RESULTS The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer. CONCLUSIONS We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.
Collapse
Affiliation(s)
- Xiaofang Wang
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Yan Zhang
- Department of Oncology, The First Hospital of Shijiazhuang, Shijiazhuang, China
| | - Shiying Hao
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Le Zheng
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Jiayu Liao
- Department of Bioengineering, University of California, Riverside, CA, United States
- West China-California Multiomics Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Minjie Xia
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Oliver Wang
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Modi Liu
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Ching Ho Weng
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Son Q Duong
- Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Bo Jin
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | | | - Frank Stearns
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Laura Kanov
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Karl G Sylvester
- Department of Surgery, Stanford University, Stanford, CA, United States
| | - Eric Widen
- Healthcare Business Intelligence Solutions Inc, Palo Alto, CA, United States
| | - Doff B McElhinney
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| | - Xuefeng B Ling
- Department of Surgery, Stanford University, Stanford, CA, United States
- Clinical and Translational Research Program, Betty Irene Moore Children's Heart Center, Lucile Packard Children's Hospital, Palo Alto, CA, United States
| |
Collapse
|
26
|
Farrell A, Wang G, Rush SA, Martin JA, Belant JL, Butler AB, Godwin D. Machine learning of large-scale spatial distributions of wild turkeys with high-dimensional environmental data. Ecol Evol 2019; 9:5938-5949. [PMID: 31161010 PMCID: PMC6540709 DOI: 10.1002/ece3.5177] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 03/27/2019] [Accepted: 03/28/2019] [Indexed: 11/05/2022] Open
Abstract
Species distribution modeling often involves high-dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys (Meleagris gallopavo silvestris) across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife.
Collapse
Affiliation(s)
- Annie Farrell
- Department of Wildlife, Fisheries and AquacultureMississippi State UniversityMississippi StateMississippi
| | - Guiming Wang
- Department of Wildlife, Fisheries and AquacultureMississippi State UniversityMississippi StateMississippi
| | - Scott A. Rush
- Department of Wildlife, Fisheries and AquacultureMississippi State UniversityMississippi StateMississippi
| | - James A. Martin
- Warnell School of Forestry and Natural Resources and Savannah River Ecology LaboratoryUniversity of GeorgiaAthensGeorgia
| | - Jerrold L. Belant
- Camp Fire Program in Wildlife ConservationState University of New York College of Environmental Science and ForestrySyracuseNew York
| | - Adam B. Butler
- The Mississippi Department of Wildlife, Fisheries, and ParksJacksonMississippi
| | - Dave Godwin
- Mississippi Forestry AssociationJacksonMississippi
| |
Collapse
|
27
|
Katki HA. Quantifying risk stratification provided by diagnostic tests and risk predictions: Comparison to AUC and decision curve analysis. Stat Med 2019; 38:2943-2955. [PMID: 31037749 DOI: 10.1002/sim.8163] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 03/14/2019] [Accepted: 03/22/2019] [Indexed: 01/12/2023]
Abstract
A property of diagnostic tests and risk models deserving more attention is risk stratification, defined as the ability of a test or model to separate those at high absolute risk of disease from those at low absolute risk. Risk stratification fills a gap between measures of classification (ie, area under the curve (AUC)) that do not require absolute risks and decision analysis that requires not only absolute risks but also subjective specification of costs and utilities. We introduce mean risk stratification (MRS) as the average change in risk of disease (posttest-pretest) revealed by a diagnostic test or risk model dichotomized at a risk threshold. Mean risk stratification is particularly valuable for rare conditions, where AUC can be high but MRS can be low, identifying situations that temper overenthusiasm for screening with the new test/model. We apply MRS to the controversy over who should get testing for mutations in BRCA1/2 that cause high risks of breast and ovarian cancers. To reveal different properties of risk thresholds to refer women for BRCA1/2 testing, we propose an eclectic approach considering MRS and other metrics. The value of MRS is to interpret AUC in the context of BRCA1/2 mutation prevalence, providing a range of risk thresholds at which a risk model is "optimally informative," and to provide insight into why net benefit arrives to its conclusion.
Collapse
Affiliation(s)
- Hormuzd A Katki
- US National Cancer Institute, Division of Cancer Epidemiology and Genetics, Rockville, Maryland
| |
Collapse
|
28
|
Li G, Wang X. Prediction Accuracy Measures for a Nonlinear Model and for Right-Censored Time-to-Event Data. J Am Stat Assoc 2019; 114:1815-1825. [PMID: 32863480 PMCID: PMC7454169 DOI: 10.1080/01621459.2018.1515079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 08/08/2018] [Accepted: 08/19/2018] [Indexed: 01/10/2023]
Abstract
This article develops a pair of new prediction summary measures for a nonlinear prediction function with right-censored time-to-event data. The first measure, defined as the proportion of explained variance by a linearly corrected prediction function, quantifies the potential predictive power of the nonlinear prediction function. The second measure, defined as the proportion of explained prediction error by its corrected prediction function, gauges the closeness of the prediction function to its corrected version and serves as a supplementary measure to indicate (by a value less than 1) whether the correction is needed to fulfill its potential predictive power and quantify how much prediction error reduction can be realized with the correction. The two measures together provide a complete summary of the predictive accuracy of the nonlinear prediction function. We motivate these measures by first establishing a variance decomposition and a prediction error decomposition at the population level and then deriving uncensored and censored sample versions of these decompositions. We note that for the least square prediction function under the linear model with no censoring, the first measure reduces to the classical coefficient of determination and the second measure degenerates to 1. We show that the sample measures are consistent estimators of their population counterparts and conduct extensive simulations to investigate their finite sample properties. A real data illustration is provided using the PBC data. Supplementary materials for this article are available online. An R package PAmeasures has been developed and made available via the CRAN R library. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Gang Li
- Departments of Biostatistics and Biomathematics, University of California, Los Angeles, CA
| | - Xiaoyan Wang
- Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, CA
| |
Collapse
|
29
|
Pencina MJ, Parikh CR, Kimmel PL, Cook NR, Coresh J, Feldman HI, Foulkes A, Gimotty PA, Hsu CY, Lemley K, Song P, Wilkins K, Gossett DR, Xie Y, Star RA. Statistical methods for building better biomarkers of chronic kidney disease. Stat Med 2019; 38:1903-1917. [PMID: 30663113 DOI: 10.1002/sim.8091] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 10/17/2018] [Accepted: 12/12/2018] [Indexed: 12/23/2022]
Abstract
The last two decades have witnessed an explosion in research focused on the development and assessment of novel biomarkers for improved prognosis of diseases. As a result, best practice standards guiding biomarker research have undergone extensive development. Currently, there is great interest in the promise of biomarkers to enhance research efforts and clinical practice in the setting of chronic kidney disease, acute kidney injury, and glomerular disease. However, some have questioned whether biomarkers currently add value to the clinical practice of nephrology. The current state of the art pertaining to statistical analyses regarding the use of such measures is critical. In December 2014, the National Institute of Diabetes and Digestive and Kidney Diseases convened a meeting, "Toward Building Better Biomarker Statistical Methodology," with the goals of summarizing the current best practice recommendations and articulating new directions for methodological research. This report summarizes its conclusions and describes areas that need attention. Suggestions are made regarding metrics that should be commonly reported. We outline the methodological issues related to traditional metrics and considerations in prognostic modeling, including discrimination and case mix, calibration, validation, and cost-benefit analysis. We highlight the approach to improved risk communication and the value of graphical displays. Finally, we address some "new frontiers" in prognostic biomarker research, including the competing risk framework, the use of longitudinal biomarkers, and analyses in distributed research networks.
Collapse
Affiliation(s)
- Michael J Pencina
- Duke Clinical Research Institute, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina
| | - Chirag R Parikh
- Division of Nephrology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Paul L Kimmel
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Nancy R Cook
- Division of Preventive Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Josef Coresh
- Departments of Epidemiology, Medicine and Biostatistics, Johns Hopkins University, Baltimore, Maryland
| | - Harold I Feldman
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Andrea Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts
| | - Phyllis A Gimotty
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco, San Francisco, California
| | - Kevin Lemley
- Division of Nephrology, Children's Hospital Los Angeles, Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Peter Song
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Kenneth Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.,Department of Preventive Medicine and Biostatistics, F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Daniel R Gossett
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Yining Xie
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert A Star
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
30
|
Janssen MF, Bonsel GJ, Luo N. Is EQ-5D-5L Better Than EQ-5D-3L? A Head-to-Head Comparison of Descriptive Systems and Value Sets from Seven Countries. PHARMACOECONOMICS 2018; 36:675-697. [PMID: 29470821 PMCID: PMC5954015 DOI: 10.1007/s40273-018-0623-8] [Citation(s) in RCA: 219] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
OBJECTIVE This study describes the first empirical head-to-head comparison of EQ-5D-3L (3L) and EQ-5D-5L (5L) value sets for multiple countries. METHODS A large multinational dataset, including 3L and 5L data for eight patient groups and a student cohort, was used to compare 3L versus 5L value sets for Canada, China, England/UK (5L/3L, respectively), Japan, The Netherlands, South Korea and Spain. We used distributional analyses and two methods exploring discriminatory power: relative efficiency as assessed by the F statistic, and an area under the curve for the receiver-operating characteristics approach. Differences in outcomes were explored by separating descriptive system effects from valuation effects, and by exploring distributional location effects. RESULTS In terms of distributional evenness, efficiency of scale use and the face validity of the resulting distributions, 5L was superior, leading to an increase in sensitivity and precision in health status measurement. When compared with 5L, 3L systematically overestimated health problems and consequently underestimated utilities. This led to bias, i.e. over- or underestimations of discriminatory power. CONCLUSION We conclude that 5L provides more precise measurement at individual and group levels, both in terms of descriptive system data and utilities. The increased sensitivity and precision of 5L is likely to be generalisable to longitudinal studies, such as in intervention designs. Hence, we recommend the use of the 5L across applications, including economic evaluation, clinical and public health studies. The evaluative framework proved to be useful in assessing preference-based instruments and might be useful for future work in the development of descriptive systems or health classifications.
Collapse
Affiliation(s)
- Mathieu F Janssen
- Department of Medical Psychology and Psychotherapy, Erasmus MC, Erasmus University, PO Box 2040, 3000 CA, Rotterdam, The Netherlands.
| | - Gouke J Bonsel
- Department of Public Health, Erasmus MC, Erasmus University, Rotterdam, The Netherlands
- Division Mother and Child, UMC Utrecht, University of Utrecht, Utrecht, The Netherlands
| | - Nan Luo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| |
Collapse
|
31
|
Yi YG, Sung IY, Yuk JS. Comparison of Second and Third Editions of the Bayley Scales in Children With Suspected Developmental Delay. Ann Rehabil Med 2018; 42:313-320. [PMID: 29765885 PMCID: PMC5940608 DOI: 10.5535/arm.2018.42.2.313] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 07/20/2017] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVE To compare the scores of the Bayley Scales of Infant Development second edition (BSID-II) and the third edition, Bayley-III, in children with suspected developmental delay and to determine the cutoff score for developmental delay in the Bayley-III. METHODS Children younger than 42 months (n=62) with suspected developmental delay who visited our department between 2014 and 2015 were assessed with both the BSID-II and Bayley-III tests. RESULTS The mean Bayley-III Cognitive Language Composite (CLC) score was 5.8 points higher than the mean BSID-II Mental Developmental Index (MDI) score, and the mean Bayley-III Motor Composite (MC) score was 7.9 points higher than the mean BSID-II Psychomotor Developmental Index (PDI) score. In receiver operating characteristic (ROC) analysis of a BSID-II MDI score <70, Bayley-III CLC scores showed a cutoff of 78.0 (96.6% sensitivity and 93.9% specificity). In ROC analysis of a BSID-II PDI score <70, the Bayley-III MC score showed a cutoff of 80. CONCLUSION There was a strong correlation between the BSID-II and Bayley-III in children with suspected developmental delay. The Bayley-III identified fewer children with developmental delay. The recommended cutoff value for developmental delay increased from a BSID-II score of 70 to a Bayley-III CLC score of 78 and Bayley-III MC score of 80.
Collapse
Affiliation(s)
- You Gyoung Yi
- Department of Rehabilitation Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - In Young Sung
- Department of Physical Medicine and Rehabilitation, Division of Pediatric Rehabilitation Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Jin Sook Yuk
- Department of Physical Medicine and Rehabilitation, Division of Pediatric Rehabilitation Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| |
Collapse
|
32
|
Heriot GS, Tong SYC, Cheng AC, Liew D. What risk of endocarditis is low enough to justify the omission of transoesophageal echocardiography in Staphylococcus aureus bacteraemia? A narrative review. Clin Microbiol Infect 2018; 24:1251-1256. [PMID: 29581048 DOI: 10.1016/j.cmi.2018.03.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 03/13/2018] [Accepted: 03/15/2018] [Indexed: 02/08/2023]
Abstract
BACKGROUND Recent criteria which can identify patients with Staphylococcus aureus bacteraemia (SAB) who are at very low risk of endocarditis raise the question of whether transoesophageal echocardiography (TOE) is appropriate for these patients. AIMS To estimate the probability of occult endocarditis complicating SAB below which a TOE-guided treatment strategy no longer offers the best 180-day survival, and to examine the key uncertainties affecting this result. SOURCES Estimates of the parameters required to calculate the Pauker-Kassirer testing threshold were identified from studies published prior to 1 June 2017 using a composite search strategy that involved a systematic search for relevant controlled trials and guidelines, followed by a non-systematic iterative search of the observational literature. CONTENT Estimates of the necessary parameters were generally consistent across the literature with the exception of the procedural mortality of TOE. In our base-case scenario (TOE mortality 0.1%), the testing threshold for TOE in apparently uncomplicated SAB was a 1.1% probability of occult endocarditis. Sensitivity analyses revealed that the procedural mortality of TOE was a key uncertainty affecting estimates of the testing threshold. IMPLICATIONS None of the available clinical tools can place patients with SAB below this probability of endocarditis with 95% confidence. Future work in this area should concentrate on improving the precision of these tools and on exploring the value of alternative echocardiography strategies. In addition, a better understanding of the harms of TOE is required to ensure that recommendations regarding the role of this investigation in the management of patients with SAB are appropriate.
Collapse
Affiliation(s)
- G S Heriot
- School of Public Health and Preventative Medicine, Monash University, Level 4, 553 St Kilda Rd, Melbourne, 3004, Victoria, Australia; Victorian Infectious Diseases Service, The Royal Melbourne Hospital, Grattan St, Parkville, 3052, Victoria, Australia
| | - S Y C Tong
- Victorian Infectious Diseases Service, The Royal Melbourne Hospital, Grattan St, Parkville, 3052, Victoria, Australia; Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Grattan St, Parkville, 3052, Victoria, Australia; Menzies School of Health Research, Royal Darwin Hospital, Rocklands Dr, Casuarina, 0810, Northern Territory, Australia
| | - A C Cheng
- School of Public Health and Preventative Medicine, Monash University, Level 4, 553 St Kilda Rd, Melbourne, 3004, Victoria, Australia; Department of Infectious Diseases, Alfred Health, 55 Commercial Rd, Melbourne, 3004, Victoria, Australia; Infection Prevention and Healthcare Epidemiology Unit, Alfred Health, 55 Commercial Rd, Melbourne, 3004, Victoria, Australia
| | - D Liew
- School of Public Health and Preventative Medicine, Monash University, Level 4, 553 St Kilda Rd, Melbourne, 3004, Victoria, Australia.
| |
Collapse
|
33
|
Francis RA, Taylor JD, Dibble E, Strickland B, Petro VM, Easterwood C, Wang G. Restricted cross-scale habitat selection by American beavers. Curr Zool 2018; 63:703-710. [PMID: 29492032 PMCID: PMC5804220 DOI: 10.1093/cz/zox059] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 10/11/2017] [Indexed: 11/12/2022] Open
Abstract
Animal habitat selection, among other ecological phenomena, is spatially scale dependent. Habitat selection by American beavers Castor canadensis (hereafter, beaver) has been studied at singular spatial scales, but to date no research addresses multi-scale selection. Our objectives were to determine if beaver habitat selection was specialized to semiaquatic habitats and if variables explaining habitat selection are consistent between landscape and fine spatial scales. We built maximum entropy (MaxEnt) models to relate landscape-scale presence-only data to landscape variables, and used generalized linear mixed models to evaluate fine spatial scale habitat selection using global positioning system (GPS) relocation data. Explanatory variables between the landscape and fine spatial scale were compared for consistency. Our findings suggested that beaver habitat selection at coarse (study area) and fine (within home range) scales was congruent, and was influenced by increasing amounts of woody wetland edge density and shrub edge density, and decreasing amounts of open water edge density. Habitat suitability at the landscape scale also increased with decreasing amounts of grass frequency. As territorial, central-place foragers, beavers likely trade-off open water edge density (i.e., smaller non-forested wetlands or lodges closer to banks) for defense and shorter distances to forage and obtain construction material. Woody plants along edges and expanses of open water for predator avoidance may limit beaver fitness and subsequently determine beaver habitat selection.
Collapse
Affiliation(s)
- Robert A Francis
- Department of Wildlife, Fisheries and Aquaculture, Thompson Hall Mississippi State University, Starkville, MS, 39762, USA
| | - Jimmy D Taylor
- USDA, APHIS, Wildlife Services, National Wildlife Research Center, Corvallis, 3180 SW Jefferson Way OR, 97331, USA
| | - Eric Dibble
- Department of Wildlife, Fisheries and Aquaculture, Thompson Hall Mississippi State University, Starkville, MS, 39762, USA
| | - Bronson Strickland
- Department of Wildlife, Fisheries and Aquaculture, Thompson Hall Mississippi State University, Starkville, MS, 39762, USA
| | - Vanessa M Petro
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall Corvallis, OR, 97331, USA
| | - Christine Easterwood
- Environmental Management Division, US Army Garrison - Building 4488 Martin Rd SW Redstone, Redstone Arsenal, AL, 35898, USA
| | - Guiming Wang
- Department of Wildlife, Fisheries and Aquaculture, Thompson Hall Mississippi State University, Starkville, MS, 39762, USA
| |
Collapse
|
34
|
The Reproducibility of Changes in Diagnostic Figures of Merit Across Laboratory and Clinical Imaging Reader Studies. Acad Radiol 2017; 24:1436-1446. [PMID: 28666723 DOI: 10.1016/j.acra.2017.05.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 04/28/2017] [Accepted: 05/01/2017] [Indexed: 11/23/2022]
Abstract
RATIONALE AND OBJECTIVES In this paper we examine which comparisons of reading performance between diagnostic imaging systems made in controlled retrospective laboratory studies may be representative of what we observe in later clinical studies. The change in a meaningful diagnostic figure of merit between two diagnostic modalities should be qualitatively or quantitatively comparable across all kinds of studies. MATERIALS AND METHODS In this meta-study we examine the reproducibility of relative measures of sensitivity, false positive fraction (FPF), area under the receiver operating characteristic (ROC) curve, and expected utility across laboratory and observational clinical studies for several different breast imaging modalities, including screen film mammography, digital mammography, breast tomosynthesis, and ultrasound. RESULTS Across studies of all types, the changes in the FPFs yielded very small probabilities of having a common mean value. The probabilities of relative sensitivity being the same across ultrasound and tomosynthesis studies were low. No evidence was found for different mean values of relative area under the ROC curve or relative expected utility within any of the study sets. CONCLUSION The comparison demonstrates that the ratios of areas under the ROC curve and expected utilities are reproducible across laboratory and clinical studies, whereas sensitivity and FPF are not.
Collapse
|
35
|
Dötsch T, Dirkmann D, Bezinover D, Hartmann M, Treckmann J, Paul A, Saner F. Assessment of standard laboratory tests and rotational thromboelastometry for the prediction of postoperative bleeding in liver transplantation. Br J Anaesth 2017; 119:402-410. [DOI: 10.1093/bja/aex122] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023] Open
|
36
|
Pencina MJ, Fine JP, D'Agostino RB. Discrimination slope and integrated discrimination improvement - properties, relationships and impact of calibration. Stat Med 2016; 36:4482-4490. [PMID: 27699818 DOI: 10.1002/sim.7139] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 09/06/2016] [Accepted: 09/08/2016] [Indexed: 11/11/2022]
Abstract
Discrimination slope, defined as the slope of a linear regression of predicted probabilities of event derived from a prognostic model on the binary event status, has recently gained popularity as a measure of model performance. It is as a building block for the integrated discrimination improvement that equals the difference in discrimination slopes between the two models being compared. Several authors have pointed out that it does not make sense to apply the integrated discrimination improvement and discrimination slope when working with mis-calibrated models, whereas others have raised concerns about the ability of improving discrimination slope without adding new information. In this paper, we show that under certain assumptions the discrimination slope is asymptotically related to two other R-squared measures, one of which is a rescaled version of the Brier score, known to be proper. Furthermore, we illustrate how a simple recalibration makes the slope equal to the rescaled Brier R-squared metric. We also show that the discrimination slope can be interpreted as a measure of reduction in expected regret for the Gini-Brier regret function. Using theoretical and practical examples, we illustrate how all of these metrics are affected by different levels of model mis-calibration. In particular, we demonstrate that simple recalibration ascertaining calibration in-the-large and calibration slope equal to 1 are not sufficient to correct for some forms of mis-calibration. We conclude that R-squared metrics, including the discrimination slope, offer an attractive choice for quantifying model performance as long as one accounts for their sensitivity to model calibration. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Michael J Pencina
- Duke Clinical Research Institute, 2400 Pratt St, Rm. 7024, Durham, NC, 27710, U.S.A
| | - Jason P Fine
- University of North Carolina at Chapel Hill, 3103B McGavran-Greenberg Hall, CB #7420, Chapel Hill, NC, 27599, U.S.A
| | | |
Collapse
|
37
|
Pencina MJ, Steyerberg EW, D'Agostino RB. Net reclassification index at event rate: properties and relationships. Stat Med 2016; 36:4455-4467. [PMID: 27426413 DOI: 10.1002/sim.7041] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Revised: 06/15/2016] [Accepted: 06/16/2016] [Indexed: 01/05/2023]
Abstract
The net reclassification improvement (NRI) is an attractively simple summary measure quantifying improvement in performance because of addition of new risk marker(s) to a prediction model. Originally proposed for settings with well-established classification thresholds, it quickly extended into applications with no thresholds in common use. Here we aim to explore properties of the NRI at event rate. We express this NRI as a difference in performance measures for the new versus old model and show that the quantity underlying this difference is related to several global as well as decision analytic measures of model performance. It maximizes the relative utility (standardized net benefit) across all classification thresholds and can be viewed as the Kolmogorov-Smirnov distance between the distributions of risk among events and non-events. It can be expressed as a special case of the continuous NRI, measuring reclassification from the 'null' model with no predictors. It is also a criterion based on the value of information and quantifies the reduction in expected regret for a given regret function, casting the NRI at event rate as a measure of incremental reduction in expected regret. More generally, we find it informative to present plots of standardized net benefit/relative utility for the new versus old model across the domain of classification thresholds. Then, these plots can be summarized with their maximum values, and the increment in model performance can be described by the NRI at event rate. We provide theoretical examples and a clinical application on the evaluation of prognostic biomarkers for atrial fibrillation. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Michael J Pencina
- Department of Biostatistics and Bioinformatics, Duke Clinical Research Institute, Durham, NC, 27710, U.S.A
| | - Ewout W Steyerberg
- Department of Public Health, Erasmus MC - University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands
| | - Ralph B D'Agostino
- Department of Mathematics and Statistics, Boston University, Boston, MA, 02215, U.S.A
| |
Collapse
|
38
|
Martínez-Camblor P, Corral N, Rey C, Pascual J, Cernuda-Morollón E. Receiver operating characteristic curve generalization for non-monotone relationships. Stat Methods Med Res 2016; 26:113-123. [DOI: 10.1177/0962280214541095] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The receiver operating characteristic curve is a popular graphical method frequently used in order to study the diagnostic capacity of continuous markers. It represents in a plot true-positive rates against the false-positive ones. Both the practical and theoretical aspects of the receiver operating characteristic curve have been extensively studied. Conventionally, it is assumed that the considered marker has a monotone relationship with the studied characteristic; i.e., the upper (lower) values of the (bio)marker are associated with a higher probability of a positive result. However, there exist real situations where both the lower and the upper values of the marker are associated with higher probability of a positive result. We propose a receiver operating characteristic curve generalization, [Formula: see text], useful in this context. All pairs of possible cut-off points, one for the lower and another one for the upper marker values, are taken into account and the best of them are selected. The natural empirical estimator for the [Formula: see text] curve is considered and its uniform consistency and asymptotic distribution are derived. Finally, two real-world applications are studied.
Collapse
Affiliation(s)
- Pablo Martínez-Camblor
- Oficina de Investigación Biosanitaria de Asturias (OIB-FICYT)
- Universidad Autonoma de Chile, Chile
| | | | - Corsino Rey
- Universidad de Oviedo
- UCI Pediátrica, Departamento de Pediatría, Hospital Universitario Central de Asturias (HUCA)
| | - Julio Pascual
- Universidad de Oviedo
- Área de Neurociencias, Servicio de Neurología, Hospital Universitario Central de Asturias (HUCA)
| | - Eva Cernuda-Morollón
- Universidad de Oviedo
- Área de Neurociencias, Servicio de Neurología, Hospital Universitario Central de Asturias (HUCA)
| |
Collapse
|
39
|
Peters NCJ, Visser 't Hooft ME, Eggink AJ, Tibboel D, Ursem N, Wijnen RMH, Bonsel GJ, Cohen-Overbeek TE. Prenatal Prediction of the Type of Omphalocele Closure by Different Medical Consultants. Fetal Diagn Ther 2015; 39:40-9. [PMID: 26066620 DOI: 10.1159/000430439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/27/2015] [Indexed: 11/19/2022]
Abstract
INTRODUCTION To evaluate differences between consultants of different disciplines in the prenatal prediction of the type of postnatal surgical closure of an omphalocele. MATERIAL AND METHODS Twenty-one images of prenatally detected omphaloceles prior to 24 weeks of gestation were included. A standardized form provided known prenatal information and an ultrasound image for each case. Nineteen consultants were asked to assess the probability of primary closure of an omphalocele and to state which information was the most important for their assessment. RESULTS Primary closure (13/21 images) was predicted correctly in 5/13 images. The number of correct predictions per image ranged from 63 to 89%. The type of closure was predicted correctly in 7/8 images of cases which were not closed primarily, ranging from 58 to 84% correct predictions per image. There was no significant difference between consultants of different disciplines. Individual accuracy ranged from 10 to 62%. The consultants regarded omphalocele content as the most important information (34%) for counseling. DISCUSSION The consultants did not differ in their prenatal judgment of the primary closure of an omphalocele. The consultants tended to be too negative in their assessment, since 75% assessed the probability of primary closure overall to be <60%, whereas 62% of the cases were primarily closed. Omphalocele content was the most important information for the consultants' judgment.
Collapse
Affiliation(s)
- Nina C J Peters
- Division of Obstetrics and Prenatal Medicine, Department of Obstetrics and Gynecology, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10:e0118432. [PMID: 25738806 PMCID: PMC4349800 DOI: 10.1371/journal.pone.0118432] [Citation(s) in RCA: 1460] [Impact Index Per Article: 162.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 01/16/2015] [Indexed: 11/18/2022] Open
Abstract
Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
Collapse
Affiliation(s)
- Takaya Saito
- Computational Biology Unit, Department of Informatics, University of Bergen, P. O. Box 7803, N-5020, Bergen, Norway
- * E-mail: (TS); (MR)
| | - Marc Rehmsmeier
- Computational Biology Unit, Department of Informatics, University of Bergen, P. O. Box 7803, N-5020, Bergen, Norway
- * E-mail: (TS); (MR)
| |
Collapse
|
41
|
Mallett S, Halligan S, Collins GS, Altman DG. Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography. PLoS One 2014; 9:e107633. [PMID: 25353643 PMCID: PMC4212964 DOI: 10.1371/journal.pone.0107633] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 08/19/2014] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. METHODS In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. RESULTS Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. CONCLUSIONS The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
Collapse
Affiliation(s)
- Susan Mallett
- Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
| | - Steve Halligan
- Centre for Medical Imaging, University College London, London, United Kingdom
| | - Gary S. Collins
- Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Doug G. Altman
- Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
42
|
|
43
|
Assessment of Early Thromboelastometric Variables from Extrinsically Activated Assays With and Without Aprotinin for Rapid Detection of Fibrinolysis. Anesth Analg 2014; 119:533-542. [DOI: 10.1213/ane.0000000000000333] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
44
|
Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KGM. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2014; 68:279-89. [PMID: 25179855 DOI: 10.1016/j.jclinepi.2014.06.018] [Citation(s) in RCA: 368] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 06/18/2014] [Accepted: 06/30/2014] [Indexed: 01/01/2023]
Abstract
OBJECTIVES It is widely acknowledged that the performance of diagnostic and prognostic prediction models should be assessed in external validation studies with independent data from "different but related" samples as compared with that of the development sample. We developed a framework of methodological steps and statistical methods for analyzing and enhancing the interpretation of results from external validation studies of prediction models. STUDY DESIGN AND SETTING We propose to quantify the degree of relatedness between development and validation samples on a scale ranging from reproducibility to transportability by evaluating their corresponding case-mix differences. We subsequently assess the models' performance in the validation sample and interpret the performance in view of the case-mix differences. Finally, we may adjust the model to the validation setting. RESULTS We illustrate this three-step framework with a prediction model for diagnosing deep venous thrombosis using three validation samples with varying case mix. While one external validation sample merely assessed the model's reproducibility, two other samples rather assessed model transportability. The performance in all validation samples was adequate, and the model did not require extensive updating to correct for miscalibration or poor fit to the validation settings. CONCLUSION The proposed framework enhances the interpretation of findings at external validation of prediction models.
Collapse
Affiliation(s)
- Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Str. 6.131, PO Box 85500, 3508GA Utrecht, The Netherlands.
| | - Yvonne Vergouwe
- Department of Public Health, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Hendrik Koffijberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Str. 6.131, PO Box 85500, 3508GA Utrecht, The Netherlands
| | - Daan Nieboer
- Department of Public Health, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Ewout W Steyerberg
- Department of Public Health, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Str. 6.131, PO Box 85500, 3508GA Utrecht, The Netherlands
| |
Collapse
|
45
|
Peters NCJ, Hooft MEV', Ursem NTC, Eggink AJ, Wijnen RMH, Tibboel D, Bonsel GJ, Cohen-Overbeek TE. The relation between viscero-abdominal disproportion and type of omphalocele closure. Eur J Obstet Gynecol Reprod Biol 2014; 181:294-9. [PMID: 25201609 DOI: 10.1016/j.ejogrb.2014.08.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 07/28/2014] [Accepted: 08/07/2014] [Indexed: 10/24/2022]
Abstract
OBJECTIVE To investigate the relation between prenatal ultrasound measurements of viscero-abdominal disproportion and the expected type of postnatal surgical closure of an omphalocele. STUDY DESIGN Retrospectively, 24 fetuses diagnosed with an isolated omphalocele in the 2nd trimester of pregnancy were selected (period 2003-2013). An image of the axial plane of the abdomen at the level of the defect was retrieved. The ratio of omphalocele circumference to abdominal circumference (OC/AC), and the ratio of defect diameter to abdominal diameter (DD/DA) were calculated. Prognostic outcome was primary closure. Sensitivity and specificity and the corresponding area under the ROC curve of these ratios were calculated as measurements of prognostic accuracy. RESULTS Primary closure was achieved in 15/24 cases. For the OC/AC-ratio a cut-off value of 0.82 successfully predicted outcome in 23/24 cases with an area under the ROC curve of 0.99. A cut-off value of 0.61 for the DD/DA-ratio successfully predicted type of closure in 20/24 cases with an area under the ROC curve of 0.88. In all cases without eviscerated liver tissue, the defect was primarily closed. CONCLUSION In prenatal isolated omphalocele cases, the OC/AC-ratio is better at predicting postnatal surgical closure than the DD/DA-ratio and can be used as a prognostic tool for expected type of closure in the 2nd trimester of pregnancy.
Collapse
Affiliation(s)
- Nina C J Peters
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands.
| | - Michele E Visser 't Hooft
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands
| | - Nicolette T C Ursem
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands
| | - Alex J Eggink
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands
| | - René M H Wijnen
- Erasmus MC, University Medical Center Rotterdam, Department of Pediatric Surgery, Rotterdam, The Netherlands
| | - Dick Tibboel
- Erasmus MC, University Medical Center Rotterdam, Department of Pediatric Surgery, Rotterdam, The Netherlands
| | - Gouke J Bonsel
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands
| | - Titia E Cohen-Overbeek
- Erasmus MC, University Medical Center Rotterdam, Department of Obstetrics and Gynecology, Division of Obstetrics and Prenatal Medicine, Rotterdam, The Netherlands
| |
Collapse
|
46
|
Abbey CK, Gallas BD, Boone JM, Niklason LT, Hadjiiski LM, Sahiner B, Samuelson FW. Comparative statistical properties of expected utility and area under the ROC curve for laboratory studies of observer performance in screening mammography. Acad Radiol 2014; 21:481-90. [PMID: 24594418 DOI: 10.1016/j.acra.2013.12.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 12/11/2013] [Accepted: 12/11/2013] [Indexed: 11/25/2022]
Abstract
RATIONALE AND OBJECTIVES Our objective is to determine whether expected utility (EU) and the area under the receiver operator characteristic (AUC) are consistent with one another as endpoints of observer performance studies in mammography. These two measures characterize receiver operator characteristic performance somewhat differently. We compare these two study endpoints at the level of individual reader effects, statistical inference, and components of variance across readers and cases. MATERIALS AND METHODS We reanalyze three previously published laboratory observer performance studies that investigate various x-ray breast imaging modalities using EU and AUC. The EU measure is based on recent estimates of relative utility for screening mammography. RESULTS The AUC and EU measures are correlated across readers for individual modalities (r = 0.93) and differences in modalities (r = 0.94 to 0.98). Statistical inference for modality effects based on multi-reader multi-case analysis is very similar, with significant results (P < .05) in exactly the same conditions. Power analyses show mixed results across studies, with a small increase in power on average for EU that corresponds to approximately a 7% reduction in the number of readers. Despite a large number of crossing receiver operator characteristic curves (59% of readers), modality effects only rarely have opposite signs for EU and AUC (6%). CONCLUSIONS We do not find any evidence of systematic differences between EU and AUC in screening mammography observer studies. Thus, when utility approaches are viable (i.e., an appropriate value of relative utility exists), practical effects such as statistical efficiency may be used to choose study endpoints.
Collapse
|
47
|
|
48
|
Debray TPA, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KGM. Meta-analysis and aggregation of multiple published prediction models. Stat Med 2014; 33:2341-62. [PMID: 24752993 DOI: 10.1002/sim.6080] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Revised: 11/22/2013] [Accepted: 12/05/2013] [Indexed: 12/24/2022]
Abstract
Published clinical prediction models are often ignored during the development of novel prediction models despite similarities in populations and intended usage. The plethora of prediction models that arise from this practice may still perform poorly when applied in other populations. Incorporating prior evidence might improve the accuracy of prediction models and make them potentially better generalizable. Unfortunately, aggregation of prediction models is not straightforward, and methods to combine differently specified models are currently lacking. We propose two approaches for aggregating previously published prediction models when a validation dataset is available: model averaging and stacked regressions. These approaches yield user-friendly stand-alone models that are adjusted for the new validation data. Both approaches rely on weighting to account for model performance and between-study heterogeneity but adopt a different rationale (averaging versus combination) to combine the models. We illustrate their implementation in a clinical example and compare them with established methods for prediction modeling in a series of simulation studies. Results from the clinical datasets and simulation studies demonstrate that aggregation yields prediction models with better discrimination and calibration in a vast majority of scenarios, and results in equivalent performance (compared to developing a novel model from scratch) when validation datasets are relatively large. In conclusion, model aggregation is a promising strategy when several prediction models are available from the literature and a validation dataset is at hand. The aggregation methods do not require existing models to have similar predictors and can be applied when relatively few data are at hand.
Collapse
Affiliation(s)
- Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | |
Collapse
|
49
|
Chen MH, Willan AR. Value of information methods for assessing a new diagnostic test. Stat Med 2014; 33:1801-15. [PMID: 24403241 DOI: 10.1002/sim.6085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 12/10/2013] [Accepted: 12/11/2013] [Indexed: 11/08/2022]
Abstract
Value-of-information methods are applied to assess the evidence in support of a new diagnostic test and, where the evidence is insufficient for decision making, to determine the optimal sample size for future studies. Net benefit formulations are derived under various diagnostic and treatment scenarios. The expressions for the expected opportunity loss of adopting strategies that include the new test are given. Expressions for the expected value of information from future studies are derived. One-sample and two-sample designs, with or without known prevalence, are considered. An example is given.
Collapse
Affiliation(s)
- Maggie Hong Chen
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | | |
Collapse
|
50
|
|