1
|
Xie Z, Liu Y, He HY, Li M, Zhou ZH. Weakly Supervised AUC Optimization: A Unified Partial AUC Approach. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4780-4795. [PMID: 38265903 DOI: 10.1109/tpami.2024.3357814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.
Collapse
|
2
|
Glehr G, Riquelme P, Kronenberg K, Lohmayer R, López-Madrona VJ, Kapinsky M, Schlitt HJ, Geissler EK, Spang R, Haferkamp S, Hutchinson JA. Restricting datasets to classifiable samples augments discovery of immune disease biomarkers. Nat Commun 2024; 15:5417. [PMID: 38926389 PMCID: PMC11208602 DOI: 10.1038/s41467-024-49094-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 05/14/2024] [Indexed: 06/28/2024] Open
Abstract
Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation - namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker's informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.
Collapse
Affiliation(s)
- Gunther Glehr
- Department of Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Paloma Riquelme
- Department of Surgery, University Hospital Regensburg, Regensburg, Germany
| | | | - Robert Lohmayer
- Algorithmic Bioinformatics Research Group, Leibniz Institute for Immunotherapy, Regensburg, Germany
| | | | | | - Hans J Schlitt
- Department of Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Edward K Geissler
- Department of Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, University of Regensburg, Regensburg, Germany
| | - Sebastian Haferkamp
- Department of Dermatology, University Hospital Regensburg, Regensburg, Germany
| | - James A Hutchinson
- Department of Surgery, University Hospital Regensburg, Regensburg, Germany.
| |
Collapse
|
3
|
Chaibub Neto E, Yadav V, Sieberts SK, Omberg L. A novel estimator for the two-way partial AUC. BMC Med Inform Decis Mak 2024; 24:57. [PMID: 38378636 PMCID: PMC10877829 DOI: 10.1186/s12911-023-02382-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 11/27/2023] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND The two-way partial AUC has been recently proposed as a way to directly quantify partial area under the ROC curve with simultaneous restrictions on the sensitivity and specificity ranges of diagnostic tests or classifiers. The metric, as originally implemented in the tpAUC R package, is estimated using a nonparametric estimator based on a trimmed Mann-Whitney U-statistic, which becomes computationally expensive in large sample sizes. (Its computational complexity is of order [Formula: see text], where [Formula: see text] and [Formula: see text] represent the number of positive and negative cases, respectively). This is problematic since the statistical methodology for comparing estimates generated from alternative diagnostic tests/classifiers relies on bootstrapping resampling and requires repeated computations of the estimator on a large number of bootstrap samples. METHODS By leveraging the graphical and probabilistic representations of the AUC, partial AUCs, and two-way partial AUC, we derive a novel estimator for the two-way partial AUC, which can be directly computed from the output of any software able to compute AUC and partial AUCs. We implemented our estimator using the computationally efficient pROC R package, which leverages a nonparametric approach using the trapezoidal rule for the computation of AUC and partial AUC scores. (Its computational complexity is of order [Formula: see text], where [Formula: see text].). We compare the empirical bias and computation time of the proposed estimator against the original estimator provided in the tpAUC package in a series of simulation studies and on two real datasets. RESULTS Our estimator tended to be less biased than the original estimator based on the trimmed Mann-Whitney U-statistic across all experiments (and showed considerably less bias in the experiments based on small sample sizes). But, most importantly, because the computational complexity of the proposed estimator is of order [Formula: see text], rather than [Formula: see text], it is much faster to compute when sample sizes are large. CONCLUSIONS The proposed estimator provides an improvement for the computation of two-way partial AUC, and allows the comparison of diagnostic tests/machine learning classifiers in large datasets where repeated computations of the original estimator on bootstrap samples become too expensive to compute.
Collapse
Affiliation(s)
| | - Vijay Yadav
- Sage Bionetworks, 2901 Third Avenue, 98121, Seattle, USA
| | | | - Larsson Omberg
- Sage Bionetworks, 2901 Third Avenue, 98121, Seattle, USA
| |
Collapse
|
4
|
Wechsung M, Konietschke F. Simultaneous inference for partial areas under receiver operating curves—With a view towards efficiency. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2023.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
5
|
Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, Magwood O, Sheikh Y, McInnes M, Holzinger A. Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:329-341. [PMID: 35077357 DOI: 10.1109/tpami.2022.3145392] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optimal performance is desired for decision-making in any field with binary classifiers and diagnostic tests, however common performance measures lack depth in information. The area under the receiver operating characteristic curve (AUC) and the area under the precision recall curve are too general because they evaluate all decision thresholds including unrealistic ones. Conversely, accuracy, sensitivity, specificity, positive predictive value and the F1 score are too specific-they are measured at a single threshold that is optimal for some instances, but not others, which is not equitable. In between both approaches, we propose deep ROC analysis to measure performance in multiple groups of predicted risk (like calibration), or groups of true positive rate or false positive rate. In each group, we measure the group AUC (properly), normalized group AUC, and averages of: sensitivity, specificity, positive and negative predictive value, and likelihood ratio positive and negative. The measurements can be compared between groups, to whole measures, to point measures and between models. We also provide a new interpretation of AUC in whole or part, as balanced average accuracy, relevant to individuals instead of pairs. We evaluate models in three case studies using our method and Python toolkit and confirm its utility.
Collapse
|
6
|
Abstract
Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that ϕ (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.
Collapse
|
7
|
Zhang X, Ding Y, Shao Y, He J, Ma J, Guo H, Keerman M, Liu J, Si H, Guo S, Ma R. Visceral Obesity-Related Indices in the Identification of Individuals with Metabolic Syndrome Among Different Ethnicities in Xinjiang, China. Diabetes Metab Syndr Obes 2021; 14:1609-1620. [PMID: 33889002 PMCID: PMC8055644 DOI: 10.2147/dmso.s306908] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/20/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Few studies have focused on the predictive ability of visceral obesity-related indices for metabolic syndrome (MetS), especially in different ethnic groups. This study aimed to evaluate the applicability of visceral obesity-related indices for MetS screening among three major ethnic groups living in remote rural areas of Xinjiang. METHODS Based on multistage stratified cluster random sampling method, 3,192 Uyghurs, 3,054 Kazakhs, and 3,658 Hans were recruited from Xinjiang, China. The Joint Interim Statement (JIS) criteria were used to define MetS. The receiver operating characteristic curve (ROC), area under the ROC curve (AUC), and predictive value of each visceral obesity-related index were used to evaluate the predictive ability of MetS. RESULTS After adjusting for potential confounding factors, the lipid accumulation product (LAP), Chinese visceral adiposity index (CVAI), waist-to-height ratio (WHtR), and atherogenic index of plasma (AIP) were significantly correlated with MetS for each ethnic group, and the odds ratios (ORs) for MetS increased across quartiles. LAP was best able to identify MetS status in Kazakhs (AUC=0.853) and Uyghurs (AUC=0.851), with optimal cut-offs being 36.3 and 28.2, respectively. Both LAP (AUC=0.798) and CVAI (AUC=0.791) most accurately identified MetS status in Hans, with the optimal cut-offs being 27.3 and 85.0, respectively. Moreover, the AUC of the combination of these visceral obesity-related indices is higher for each ethnic group. However, compared with LAP, the improved value of combined screening was not significant. CONCLUSION LAP had the best discriminative capability for the screening of MetS among Kazakhs, Uyghurs, and Hans. The screening ability of CVAI for MetS was similar to that of LAP in Hans. Thus, LAP may be a complementary indicator for assessing MetS in various ethnic groups.
Collapse
Affiliation(s)
- Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Yusong Ding
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Yinbao Shao
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Mulatibieke Keerman
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Jiaming Liu
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
| | - Huili Si
- Department of Neurology, Shihezi People’s Hospital, Shihezi, Xinjiang, People’s Republic of China
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
- Department of Pathology and Key Laboratory of Xinjiang Endemic and Ethnic Diseases (Ministry of Education), Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
- Correspondence: Shuxia Guo Department of Public Health, Shihezi University School of Medicine, Suite 721, Building No. 1, Beier Road, Shihezi, 832000, Xinjiang, People’s Republic of ChinaTel +86 1800-9932-625Fax +86 993-2057-153 Email
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, Xinjiang, People’s Republic of China
- Rulin Ma Department of Public Health, Shihezi University School of Medicine, Suite 816, Building No. 1, Beier Road, Shihezi, 832000, Xinjiang, People’s Republic of ChinaTel +86 1330-9930-561Fax +86 993-2057-153Email
| |
Collapse
|
8
|
Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, Manuel DG. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inform Decis Mak 2020; 20:4. [PMID: 31906931 PMCID: PMC6945414 DOI: 10.1186/s12911-019-1014-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 12/20/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. METHODS We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data-as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. RESULTS Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. CONCLUSIONS The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. FUTURE WORK Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.
Collapse
Affiliation(s)
| | - Paul W Fieguth
- Faculty of Engineering, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Hammad Qazi
- School of Public Health and Health Systems, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Andreas Holzinger
- Holzinger Group (HCAI), Institute for Medical Informatics/Statistics, Medical University Graz, 8036, Graz, Austria.,Institute of Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria
| | - Helen H Chen
- School of Public Health and Health Systems, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Franz Mayr
- Universidad ORT Uruguay, 11100, Montevideo, Uruguay
| | - Douglas G Manuel
- Ottawa Hospital Research Institute, Ottawa, K1H 8L6, Canada.,Department of Family Medicine, University of Ottawa, Ottawa, Canada.,School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, Ottawa, Canada.,Institute for Clinical Evaluative Sciences, Ottawa, Canada.,Statistics Canada, Ottawa, Canada.,C.T. Lamont Primary Health Care Research Centre and Bruỳere Research Institute, Ottawa, Canada.,Division of Clinical Public Health, Dalla Lana School of Public Health, Toronto, Canada
| |
Collapse
|
9
|
Statistical Significance Assessment of Phase Synchrony in the Presence of Background Couplings: An ECoG Study. Brain Topogr 2019; 32:882-896. [PMID: 31129754 DOI: 10.1007/s10548-019-00718-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 05/13/2019] [Indexed: 01/03/2023]
Abstract
Statistical significance testing is a necessary step in connectivity analysis. Several statistical test methods have been employed to assess the significance of functional connectivity, but the performance of these methods has not been thoroughly evaluated. In addition, the effects of the intrinsic brain connectivity and background couplings on performance of statistical test methods in task-based studies have not been investigated yet. The background couplings may exist independent of cognitive state and can be observed on both pre- and post-stimulus time intervals. The background couplings may be falsely detected by a statistical test as task-related connections, which can mislead interpretations of the task-related functional networks. The aim of this study was to investigate the relative performance of four commonly used non-parametric statistical test methods-surrogate, demeaned surrogate, bootstrap resampling, and Monte Carlo permutation methods-in the presence of background couplings and noise, with different signal-to-noise ratios (SNRs). Using simulated electrocorticographic (ECoG) datasets and phase locking value (PLV) as a measure of functional connectivity, we evaluated the performances of the statistical test methods utilizing sensitivity, specificity, accuracy, and receiver operating curve (ROC) analysis. Furthermore, we calculated optimal p values for each statistical test method using the ROC analysis, and found that the optimal p values were increased by decreasing the SNR. We also found that the optimal p value of the bootstrap resampling was greater than that of other methods. Our results from the simulation datasets and a real ECoG dataset, as an illustrative case report, revealed that the bootstrap resampling is the most efficient non-parametric statistical test for identifying the significant PLV of ECoG data, especially in the presence of background couplings.
Collapse
|
10
|
Pitkänen A, Ekolle Ndode-Ekane X, Lapinlampi N, Puhakka N. Epilepsy biomarkers - Toward etiology and pathology specificity. Neurobiol Dis 2018; 123:42-58. [PMID: 29782966 DOI: 10.1016/j.nbd.2018.05.007] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2018] [Revised: 05/13/2018] [Accepted: 05/16/2018] [Indexed: 02/07/2023] Open
Abstract
A biomarker is a characteristic that is measured as an indicator of normal biologic processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions. Biomarker modalities include molecular, histologic, radiographic, or physiologic characteristics. In 2015, the FDA-NIH Joint Leadership Council developed the BEST Resource (Biomarkers, EndpointS, and other Tools) to improve the understanding and use of biomarker terminology in biomedical research, clinical practice, and medical product development. The BEST biomarker categories include: (a) susceptibility/risk biomarkers, (b) diagnostic biomarkers, (c) monitoring biomarkers, (d) prognostic biomarkers, (e) predictive biomarkers, (f) pharmacodynamic/response biomarkers, and (g) safety biomarkers. Here we review 30 epilepsy biomarker studies that have identified (a) diagnostic biomarkers for epilepsy, epileptogenesis, epileptogenicity, drug-refractoriness, and status epilepticus - some of the epileptogenesis and epileptogenicity biomarkers can also be considered prognostic biomarkers for the development of epilepsy in subjects with a given brain insult, (b) predictive biomarkers for epilepsy surgery outcome, and (c) a response biomarker for therapy outcome. The biomarker modalities include plasma/serum/exosomal and cerebrospinal fluid molecular biomarkers, brain tissue molecular biomarkers, imaging biomarkers, electrophysiologic biomarkers, and behavioral/cognitive biomarkers. Both single and combinatory biomarkers have been described. Most of the reviewed biomarkers have an area under the curve >0.800 in receiver operating characteristics analysis, suggesting high sensitivity and specificity. As discussed in this review, we are in the early phase of the learning curve in epilepsy biomarker discovery. Many of the seven biomarker categories lack epilepsy-related biomarkers. There is a need for epilepsy biomarker discovery using proper, statistically powered study designs with validation cohorts, and the development and use of novel analytical methods. A strategic roadmap to discuss the research priorities in epilepsy biomarker discovery, regulatory issues, and optimization of the use of resources, similar to those devised in the cancer and Alzheimer's disease research areas, is also needed.
Collapse
Affiliation(s)
- Asla Pitkänen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, PO Box 1627, FIN-70211 Kuopio, Finland.
| | - Xavier Ekolle Ndode-Ekane
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, PO Box 1627, FIN-70211 Kuopio, Finland
| | - Niina Lapinlampi
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, PO Box 1627, FIN-70211 Kuopio, Finland
| | - Noora Puhakka
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, PO Box 1627, FIN-70211 Kuopio, Finland
| |
Collapse
|