1
|
Wang J, Tian L. Optimal Cut-Point Selection Methods Under Binary Classification When Subclasses Are Involved. Pharm Stat 2024. [PMID: 38972714 DOI: 10.1002/pst.2413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 05/21/2024] [Accepted: 05/30/2024] [Indexed: 07/09/2024]
Abstract
In practice, we often encounter binary classification problems where both main classes consist of multiple subclasses. For example, in an ovarian cancer study where biomarkers were evaluated for their accuracy of distinguishing noncancer cases from cancer cases, the noncancer class consists of healthy subjects and benign cases, while the cancer class consists of subjects at both early and late stages. This article aims to provide a large number of optimal cut-point selection methods for such setting. Furthermore, we also study confidence interval estimation of the optimal cut-points. Simulation studies are carried out to explore the performance of the proposed cut-point selection methods as well as confidence interval estimation methods. A real ovarian cancer data set is analyzed using the proposed methods.
Collapse
Affiliation(s)
- Jia Wang
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
2
|
Brewer BC, Bantis LE. Cutoff estimation and construction of their confidence intervals for continuous biomarkers under ternary umbrella and tree stochastic ordering settings. Stat Med 2024; 43:606-623. [PMID: 38038216 PMCID: PMC10880868 DOI: 10.1002/sim.9974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 10/30/2023] [Accepted: 11/17/2023] [Indexed: 12/02/2023]
Abstract
Tuberculosis (TB) studies often involve four different states under consideration, namely, "healthy," "latent infection," "pulmonary active disease," and "extra-pulmonary active disease." While highly accurate clinical diagnosis tests do exist, they are expensive and generally not accessible in regions where they are most needed; thus, there is an interest in assessing the accuracy of new and easily obtainable biomarkers. For some such biomarkers, the typical stochastic ordering assumption might not be justified for all disease classes under study, and usual ROC methodologies that involve ROC surfaces and hypersurfaces are inadequate. Different types of orderings may be appropriate depending on the setting, and these may involve a number of ambiguously ordered groups that stochastically exhibit larger (or lower) marker scores than the remaining groups. Recently, there has been scientific interest on ROC methods that can accommodate these so-called "tree" or "umbrella" orderings. However, there is limited work discussing the estimation of cutoffs in such settings. In this article, we discuss the estimation and inference around optimized cutoffs when accounting for such configurations. We explore different cutoff alternatives and provide parametric, flexible parametric, and non-parametric kernel-based approaches for estimation and inference. We evaluate our approaches using simulations and illustrate them through a real data set that involves TB patients.
Collapse
Affiliation(s)
- Benjamin C Brewer
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Leonidas E Bantis
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
3
|
Samawi H, Alsharman M, Keko M, Kersey J. Post-test diagnostic accuracy measures under tree ordering of disease classes. Stat Med 2023; 42:5135-5159. [PMID: 37720999 DOI: 10.1002/sim.9905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 09/19/2023]
Abstract
The medical field commonly employs post-test measures such as predictive values and likelihood ratios to assess diagnostic accuracy. Predictive values, including positive and negative values (PPV and NPV), indicate the probability that individuals have a target health condition based on test results. On the other hand, likelihood ratios, including positive and negative ratios (LR+ and LR- respectively), compare the probability of a particular test result between the diseased and non-diseased groups. While predictive values are useful in evaluating diagnostic test accuracy in populations with varying disease prevalence, likelihood ratios provide a direct link between pre-test and post-test probabilities in specific patients. In this study, we introduce and analyze a new approach called generalized predictive values and likelihood ratios, using a tree ordering of disease classes. We evaluate the effectiveness of these methods through simulation studies and illustrate their use with real data on lung cancer.
Collapse
Affiliation(s)
- Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Marwan Alsharman
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Mario Keko
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Jing Kersey
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| |
Collapse
|
4
|
Nan N, Tian L. A new accuracy metric under three classes when subclasses are involved and its confidence interval estimation. Stat Med 2023; 42:5207-5228. [PMID: 37779490 DOI: 10.1002/sim.9908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 07/26/2023] [Accepted: 09/04/2023] [Indexed: 10/03/2023]
Abstract
"Compound multi-class classification" refers to the setting where three or more main classes are involved and at least one of the main classes have multiple subclasses. A common practice in evaluating biomarker performance under "compound multi-class classification" is "subclasses pooling." In this article, we first explore the downsides of accuracy metrics based on pooled data. Then we propose a new accuracy measure proper for "compound multi-class classification" with three ordinal main classes, namely "volume under compoundR O C $$ ROC $$ surface (V U S C $$ VU{S}_C $$ )." The proposedV U S C $$ VU{S}_C $$ evaluates the accuracy of a biomarker appropriately by identifying main classes without requiring specification of an ordering for marker values of subclasses within each main class. For confidence interval estimation ofV U S C $$ VU{S}_C $$ , both parametric and nonparametric methods are studied, and simulation studies are carried out to assess coverage probabilities. A subset of Alzheimer's Disease Neuroimaging Initiative study dataset is analyzed.
Collapse
Affiliation(s)
- Nan Nan
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
5
|
Feng Y, Tian L. Flexible diagnostic measures and new cut-point selection methods under multiple ordered classes. Pharm Stat 2021; 21:220-240. [PMID: 34449107 DOI: 10.1002/pst.2166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 07/21/2021] [Accepted: 08/01/2021] [Indexed: 11/08/2022]
Abstract
Medical diagnosis is essentially a classification problem and usually it is done with multiple ordered classes. For example, cancer diagnosis might be "non-malignant," "early stage," or "late stage." Therefore, appropriate measures are needed to assess the accuracy of diagnostic markers under multiple ordered classes. However, all existing measures fail to differentiate among some distinctly different biomarkers. This paper presents a multi-step procedure for evaluating biomarker accuracy under multiple ordered classes. This procedure leads to two new flexible overall measures as well as three new cut-point selection methods with great computational ease. The performance of proposed measures and cut-point selection methods are numerically explored via a simulation study. In the end, an ovarian cancer dataset from the Prostate, Lung, Colorectal, and Ovarian cancer study is analyzed. The proposed accuracy measures were estimated for markers CA125 and HE4, and cut-points were estimated for the risk of ovarian malignancy algorithm score.
Collapse
Affiliation(s)
- Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
6
|
Gao Y, Tian L. Confidence interval estimation for sensitivity and difference between two sensitivities at a given specificity under tree ordering. Stat Med 2021; 40:3695-3723. [PMID: 33906262 DOI: 10.1002/sim.8993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/24/2021] [Accepted: 04/01/2021] [Indexed: 11/07/2022]
Abstract
This article considers a setting in diagnostic studies (or biomarker study) which involves a healthy class and a diseased class and the latter consists of several subclasses. The problem of interest is to evaluate the accuracy of a biomarker (or a diagnostic test) measured on a continuous scale correctly identifying healthy subjects from diseased subjects without requiring specification of an ordering in terms of marker values for subclasses relative to each other within the diseased class. Such setting is quite common in practice and it falls in the framework of tree ordering or umbrella ordering. This article explores several parametric and nonparametric approaches for estimating confidence intervals of sensitivity of single biomarker and difference between sensitivities of two correlated biomarkers under tree ordering at a given specificity. The performances of all the methods are evaluated and compared by a comprehensive simulation study. A published microarray data set is analyzed using the proposed methods.
Collapse
Affiliation(s)
- Yi Gao
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
7
|
Feng Y, Tian L. Issues and solutions in biomarker evaluation when subclasses are involved under binary classification. Stat Methods Med Res 2020; 30:87-98. [PMID: 32726186 DOI: 10.1177/0962280220938077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In practice, it is common to evaluate biomarkers in binary classification settings (e.g. non-cancer vs. cancer) where one or both main classes involve multiple subclasses. For example, non-cancer class might consist of healthy subjects and benign cases, while cancer class might consist of subjects at early and late stages. The standard practice is pooling within each main class, i.e. all non-cancer subclasses are pooled together to create a control group, and all cancer subclasses are pooled together to create a case group. Based on the pooled data, the area under ROC curve (AUC) and other characteristics are estimated under binary classification for the purpose of biomarker evaluation. Despite the popularity of this pooling strategy in practice, its validity and implication in biomarker evaluation have never been carefully inspected. This paper aims to demonstrate that pooling strategy can be seriously misleading in biomarker evaluation. Furthermore, we present a new diagnostic framework as well as new accuracy measures appropriate for biomaker evaluation under such settings. In the end, an ovarian cancer data set is analyzed.
Collapse
Affiliation(s)
- Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
8
|
Wang D, Feng Y, Attwood K, Tian L. Optimal threshold selection methods under tree or umbrella ordering. J Biopharm Stat 2018; 29:98-114. [DOI: 10.1080/10543406.2018.1489410] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Dan Wang
- TTx/Biomarker Statistics, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, USA
| | - Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Kristopher Attwood
- Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
9
|
Feng Y, Tian L. Measuring diagnostic accuracy for biomarkers under tree-ordering. Stat Methods Med Res 2018; 28:1328-1346. [DOI: 10.1177/0962280218755810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In the field of diagnostic studies for tree or umbrella ordering, under which the marker measurement for one class is lower or higher than those for the rest unordered classes, there exist a few diagnostic measures such as the naive AUC ( NAUC), the umbrella volume ( UV), and the recently proposed TAUC, i.e. area under a ROC curve for tree or umbrella ordering (TROC). However, an important characteristic about tree or umbrella ordering has been neglected. This paper mainly focuses on promoting the use of the integrated false negative rate under tree ordering ( ITFNR) as an additional diagnostic measure besides TAUC, and proposing the idea of using ( TAUC, ITFNR) instead of TAUC to evaluate the diagnostic accuracy of a biomarker under tree or umbrella ordering. Parametric and non-parametric approaches for constructing joint confidence region of ( TAUC, ITFNR) are proposed. Simulation studies under a variety of settings are carried out to assess and compare the performance of these methods. In the end, a published microarray data set is analyzed.
Collapse
Affiliation(s)
- Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
10
|
Fan J, Liang RZ. Stochastic learning of multi-instance dictionary for earth mover’s distance-based histogram comparison. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2603-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|