1
|
Wang J, Tian L. Optimal Cut-Point Selection Methods Under Binary Classification When Subclasses Are Involved. Pharm Stat 2024. [PMID: 38972714 DOI: 10.1002/pst.2413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 05/21/2024] [Accepted: 05/30/2024] [Indexed: 07/09/2024]
Abstract
In practice, we often encounter binary classification problems where both main classes consist of multiple subclasses. For example, in an ovarian cancer study where biomarkers were evaluated for their accuracy of distinguishing noncancer cases from cancer cases, the noncancer class consists of healthy subjects and benign cases, while the cancer class consists of subjects at both early and late stages. This article aims to provide a large number of optimal cut-point selection methods for such setting. Furthermore, we also study confidence interval estimation of the optimal cut-points. Simulation studies are carried out to explore the performance of the proposed cut-point selection methods as well as confidence interval estimation methods. A real ovarian cancer data set is analyzed using the proposed methods.
Collapse
Affiliation(s)
- Jia Wang
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
2
|
Wang J, Yin J, Tian L. Evaluating joint confidence region of hypervolume under ROC manifold and generalized Youden index. Stat Med 2024; 43:869-889. [PMID: 38115806 DOI: 10.1002/sim.9998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/25/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
In biomarker evaluation/diagnostic studies, the hypervolume under the receiver operating characteristic manifold (HUM K $$ {\mathrm{HUM}}_K $$ ) and the generalized Youden index (J K $$ {J}_K $$ ) are the most popular measures for assessing classification accuracy under multiple classes. WhileHUM K $$ {\mathrm{HUM}}_K $$ is frequently used to evaluate the overall accuracy,J K $$ {J}_K $$ provides direct measure of accuracy at the optimal cut-points. Simultaneous evaluation ofHUM K $$ {\mathrm{HUM}}_K $$ andJ K $$ {J}_K $$ provides a comprehensive picture about the classification accuracy of the biomarker/diagnostic test under consideration. This article studies both parametric and non-parametric approaches for estimating the confidence region ofHUM K $$ {\mathrm{HUM}}_K $$ andJ K $$ {J}_K $$ for a single biomarker. The performances of the proposed methods are investigated by an extensive simulation study and are applied to a real data set from the Alzheimer's Disease Neuroimaging Initiative.
Collapse
Affiliation(s)
- Jia Wang
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Jingjing Yin
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
3
|
Kersey J, Samawi H, Yin J, Rochani H, Zhang X. On diagnostic accuracy measure with cut-points criterion for ordinal disease classification based on concordance and discordance. J Appl Stat 2022. [DOI: 10.1080/02664763.2022.2041567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Jing Kersey
- Department of Biostatistics, Epidemiology and Environmental Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro
| | - Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro
| | - Jingjing Yin
- Department of Biostatistics, Epidemiology and Environmental Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro
| | - Haresh Rochani
- Department of Biostatistics, Epidemiology and Environmental Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro
| | - Xinyan Zhang
- Department of Biostatistics, Epidemiology and Environmental Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro
| |
Collapse
|
4
|
Yin J, Samawi H, Tian L. Joint inference about the AUC and Youden index for paired biomarkers. Stat Med 2022; 41:37-64. [PMID: 34964512 DOI: 10.1002/sim.9222] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 09/22/2021] [Accepted: 09/27/2021] [Indexed: 11/05/2022]
Abstract
It is common to compare biomarkers' diagnostic or prognostic performance using some summary ROC measures such as the area under the ROC curve (AUC) or the Youden index. We propose to compare two paired biomarkers using both the AUC and the Youden index since the two indices describe different aspects of the ROC curve. This comparison can be made by estimating the joint confidence region (an elliptical area) of the differences of the paired AUCs and the Youden indices. Furthermore, for deciding if one marker is better than the other in terms of both the A U C and the Youden index (J), we can test H 0 : A U C a ≤ A U C b or J a ≤ J b against H a : A U C a > A U C b and J a > J b using the paired differences. The construction of such a joint hypothesis is an example of the multivariate order-restricted hypotheses. For such a hypothesis, we propose and compare three testing procedures: (1) the intersection-union test ( I U T ); (2) the conditional test; and (3) the joint test. The performance of the proposed inference methods was evaluated and compared through simulations. The simulation results demonstrate that the proposed joint confidence region maintains the desired confidence level, and all three tests maintain the type I error under the null. Furthermore, among the three proposed testing methods, the conditional test is the preferred approach with markedly larger power consistently than the other two competing methods.
Collapse
Affiliation(s)
- Jingjing Yin
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Georgia Southern University, Statesboro, Georgia, USA
| | - Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Georgia Southern University, Statesboro, Georgia, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
5
|
Feng Y, Tian L. Flexible diagnostic measures and new cut-point selection methods under multiple ordered classes. Pharm Stat 2021; 21:220-240. [PMID: 34449107 DOI: 10.1002/pst.2166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 07/21/2021] [Accepted: 08/01/2021] [Indexed: 11/08/2022]
Abstract
Medical diagnosis is essentially a classification problem and usually it is done with multiple ordered classes. For example, cancer diagnosis might be "non-malignant," "early stage," or "late stage." Therefore, appropriate measures are needed to assess the accuracy of diagnostic markers under multiple ordered classes. However, all existing measures fail to differentiate among some distinctly different biomarkers. This paper presents a multi-step procedure for evaluating biomarker accuracy under multiple ordered classes. This procedure leads to two new flexible overall measures as well as three new cut-point selection methods with great computational ease. The performance of proposed measures and cut-point selection methods are numerically explored via a simulation study. In the end, an ovarian cancer dataset from the Prostate, Lung, Colorectal, and Ovarian cancer study is analyzed. The proposed accuracy measures were estimated for markers CA125 and HE4, and cut-points were estimated for the risk of ovarian malignancy algorithm score.
Collapse
Affiliation(s)
- Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
6
|
Samawi H, Yin J, Rochani H, Mo C, Kersey J. Post-Test Diagnostic Accuracy Measures of a Continuous Test With a Disease of Ordinal Multistages. Stat Biopharm Res 2021. [DOI: 10.1080/19466315.2021.1873841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA
| | - Jingjing Yin
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA
| | - Haresh Rochani
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA
| | - Chen Mo
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA
| | - Jing Kersey
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA
| |
Collapse
|
7
|
Hua J, Tian L. A comprehensive and comparative review of optimal cut-points selection methods for diseases with multiple ordinal stages. J Biopharm Stat 2019; 30:46-68. [PMID: 31250693 DOI: 10.1080/10543406.2019.1632876] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Cut-points selection is a key topic in the field of diagnostic studies. For binary classification, there exist several well-developed methods, some of which have been extended to three-class settings and beyond. This paper focuses on optimal cut-points selection methods for diseases with multiple ordinal stages. The purpose of this paper is two-fold: 1) to propose three new cut-points selection methods; and 2) to present a comprehensive simulation study to assess and compare the performance of all the available methods. Two real data sets, one from ovarian cancer and the other from pancreatic cancer, are analyzed.
Collapse
Affiliation(s)
- Jia Hua
- Department of Biostatistics, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
8
|
Qiu Z, Peng L, Manatunga A, Guo Y. A Smooth Nonparametric Approach to Determining Cut-Points of A Continuous Scale. Comput Stat Data Anal 2018; 134:86-210. [PMID: 31467457 DOI: 10.1016/j.csda.2018.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The problem of determining cut-points of a continuous scale according to an establish categorical scale is often encountered in practice for the purposes such as making diagnosis or treatment recommendation, determining study eligibility, or facilitating interpretations. A general analytic framework was recently proposed for assessing optimal cut-points defined based on some pre-specified criteria. However, the implementation of the existing nonparametric estimators under this framework and the associated inferences can be computationally intensive when more than a few cut-points need to be determined. To address this important issue, a smoothing-based modification of the current method is proposed and is found to substantially improve the computational speed as well as the asymptotic convergence rate. Moreover, a plug-in type variance estimation procedure is developed to further facilitate the computation. Extensive simulation studies confirm the theoretical results and demonstrate the computational benefits of the proposed method. The practical utility of the new approach is illustrated by an application to a mental health study.
Collapse
Affiliation(s)
- Zhiping Qiu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A.,School of Mathematical Sciences, Huaqiao University, Quanzhou, China
| | - Limin Peng
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A
| | - Amita Manatunga
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, U.S.A
| |
Collapse
|
9
|
Feng Y, Tian L. Measuring diagnostic accuracy for biomarkers under tree-ordering. Stat Methods Med Res 2018; 28:1328-1346. [DOI: 10.1177/0962280218755810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In the field of diagnostic studies for tree or umbrella ordering, under which the marker measurement for one class is lower or higher than those for the rest unordered classes, there exist a few diagnostic measures such as the naive AUC ( NAUC), the umbrella volume ( UV), and the recently proposed TAUC, i.e. area under a ROC curve for tree or umbrella ordering (TROC). However, an important characteristic about tree or umbrella ordering has been neglected. This paper mainly focuses on promoting the use of the integrated false negative rate under tree ordering ( ITFNR) as an additional diagnostic measure besides TAUC, and proposing the idea of using ( TAUC, ITFNR) instead of TAUC to evaluate the diagnostic accuracy of a biomarker under tree or umbrella ordering. Parametric and non-parametric approaches for constructing joint confidence region of ( TAUC, ITFNR) are proposed. Simulation studies under a variety of settings are carried out to assess and compare the performance of these methods. In the end, a published microarray data set is analyzed.
Collapse
Affiliation(s)
- Yingdong Feng
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|