1
|
Sun J, Tang C, Xie W, Zhou XH. Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard. Biometrics 2024; 80:ujae063. [PMID: 38994641 DOI: 10.1093/biomtc/ujae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/30/2024] [Accepted: 06/25/2024] [Indexed: 07/13/2024]
Abstract
This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.
Collapse
Affiliation(s)
- Jiarui Sun
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Chao Tang
- Beijing Airdoc Technology Co., Ltd., Beijing, 100089, China
| | - Wuxiang Xie
- Heart and Vascular Health Research Center, Peking University Clinical Research Institute, Peking University First Hospital, Beijing, 100034, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, 100083, China
| | - Xiao-Hua Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
- Department of Biostatistics and People's Hospital, Peking University, Beijing, 100191, China
| |
Collapse
|
2
|
WONG KINYAU, ZENG DONGLIN, LIN DY. SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA. Ann Stat 2022; 50:487-510. [PMID: 35813218 PMCID: PMC9269993 DOI: 10.1214/21-aos2117] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2023]
Abstract
In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- KIN YAU WONG
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - DONGLIN ZENG
- Department of Biostatistics, University of North Carolina at Chapel Hill, USA
| | - D. Y. LIN
- Department of Biostatistics, University of North Carolina at Chapel Hill, USA
| |
Collapse
|
3
|
Liu S, Yu T. Kernel density estimation in mixture models with known mixture proportions. Stat Med 2021; 40:6360-6372. [PMID: 34474504 DOI: 10.1002/sim.9187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 06/18/2021] [Accepted: 08/17/2021] [Indexed: 11/11/2022]
Abstract
In this article, we consider the density estimation for data with a mixture structure, where the component densities are assumed unknown, but for each observation, the probabilities of its membership to the subpopulations are known or estimable from other resources. Data of this kind arise from practice and have wide applications. Motivated from the classical kernel density estimation method for a single population, we propose a weighted kernel density estimation method to estimate the component density functions nonparametrically. Within the framework of the EM algorithm, we derive an algorithm that computes our proposed estimates effectively. Via extensive simulation studies, we demonstrate that our methods outperform the existing methods in most occasions. We further compare our methods with existing methods by real data examples.
Collapse
Affiliation(s)
- Siyun Liu
- Department of Statistics and Data Science, National University of Singapore, Singapore
| | - Tao Yu
- Department of Statistics and Data Science, National University of Singapore, Singapore
| |
Collapse
|
4
|
Yu T, Li P, Qin J. Maximum smoothed likelihood component density estimation in mixture models with known mixing proportions. Electron J Stat 2019. [DOI: 10.1214/19-ejs1620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Xu K, Ma Y, Wang Y. Nonparametric distribution estimation in the presence of familial correlation and censoring. Electron J Stat 2017. [DOI: 10.1214/17-ejs1274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
6
|
Wang Y, Liang B, Tong X, Marder K, Bressman S, Orr-Urtreger A, Giladi N, Zeng D. Efficient Estimation of Nonparametric Genetic Risk Function with Censored Data. Biometrika 2015; 102:515-532. [PMID: 26412864 PMCID: PMC4581539 DOI: 10.1093/biomet/asv030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
With an increasing number of causal genes discovered for complex human disorders, it is crucial to assess the genetic risk of disease onset for individuals who are carriers of these causal mutations and compare the distribution of age-at-onset with that in non-carriers. In many genetic epidemiological studies aiming at estimating causal gene effect on disease, the age-at-onset of disease is subject to censoring. In addition, some individuals' mutation carrier or non-carrier status can be unknown due to the high cost of in-person ascertainment to collect DNA samples or death in older individuals. Instead, the probability of these individuals' mutation status can be obtained from various sources. When mutation status is missing, the available data take the form of censored mixture data. Recently, various methods have been proposed for risk estimation from such data, but none is efficient for estimating a nonparametric distribution. We propose a fully efficient sieve maximum likelihood estimation method, in which we estimate the logarithm of the hazard ratio between genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation for the reference baseline hazard function. Our estimator can be calculated via an expectation-maximization algorithm which is much faster than existing methods. We show that our estimator is consistent and semiparametrically efficient and establish its asymptotic distribution. Simulation studies demonstrate superior performance of the proposed method, which is applied to the estimation of the distribution of the age-at-onset of Parkinson's disease for carriers of mutations in the leucine-rich repeat kinase 2 gene.
Collapse
Affiliation(s)
- Yuanjia Wang
- Department of Biostatistics, Mailman School of Public Health, 722 W168th Street, New York 10032, U.S.A.
| | - Baosheng Liang
- School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China.
| | - Xingwei Tong
- School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China.
| | - Karen Marder
- Department of Neurology and Psychiatry, College of Physicians and Surgeons, Columbia University, New York 10032, U.S.A.
| | - Susan Bressman
- The Alan and Barbara Mirken Department of Neurology, Beth Israel Medical Center, New York, 10003, U.S.A.
| | - Avi Orr-Urtreger
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Nir Giladi
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Donglin Zeng
- Department of Biostatistics, CB # 7420, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, U.S.A.
| |
Collapse
|
7
|
Qin J, Garcia TP, Ma Y, Tang MX, Marder K, Wang Y. COMBINING ISOTONIC REGRESSION AND EM ALGORITHM TO PREDICT GENETIC RISK UNDER MONOTONICITY CONSTRAINT. Ann Appl Stat 2014; 8:1182-1208. [PMID: 25404955 DOI: 10.1214/14-aoas730] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities of a patient having a certain mutation status are available. Also, ages of disease-onset are subject to right censoring. Existing methods to estimate the cumulative risk using such family-based data only provide estimation at individual time points, and are not guaranteed to be monotonic, nor non-negative. In this paper, we develop a novel method that combines Expectation-Maximization and isotonic regression to estimate the cumulative risk across the entire support. Our estimator is monotonic, satisfies self-consistent estimating equations, and has high power in detecting differences between the cumulative risks of different populations. Application of our estimator to a Parkinson's disease (PD) study provides the age-at-onset distribution of PD in PARK2 mutation carriers and non-carriers, and reveals a significant difference between the distribution in compound heterozygous carriers compared to non-carriers, but not between heterozygous carriers and non-carriers.
Collapse
Affiliation(s)
- Jing Qin
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, 6700B Rockledge Drive, MSC 7609, Bethesda, MD 20892-7609
| | - Tanya P Garcia
- Department of Epidemiology and Biostatistics, Texas A&M University Health Science Center, TAMU 1266, College Station, TX 77843-1266
| | - Yanyuan Ma
- Department of Statistics, Texas A&M University, TAMU 3143, College Station, TX 77843-3143
| | - Ming-Xin Tang
- Department of Biostatistics, Columbia University, 630 West 168th Street, New York, New York 10032
| | - Karen Marder
- Department of Biostatistics, Columbia University, 630 West 168th Street, New York, New York 10032
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, 630 West 168th Street, New York, New York 10032
| |
Collapse
|
8
|
Ma Y, Wang Y. Estimating disease onset distribution functions in mutation carriers with censored mixture data. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/rssc.12025] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Yanyuan Ma
- Texas A&M University; College Station USA
| | | |
Collapse
|
9
|
Wang Y, Garcia TP, Ma Y. Nonparametric estimation for censored mixture data with application to the Cooperative Huntington's Observational Research Trial. J Am Stat Assoc 2012; 107:1324-1338. [PMID: 24489419 DOI: 10.1080/01621459.2012.699353] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
This work presents methods for estimating genotype-specific distributions from genetic epidemiology studies where the event times are subject to right censoring, the genotypes are not directly observed, and the data arise from a mixture of scientifically meaningful subpopulations. Examples of such studies include kin-cohort studies and quantitative trait locus (QTL) studies. Current methods for analyzing censored mixture data include two types of nonparametric maximum likelihood estimators (NPMLEs) which do not make parametric assumptions on the genotype-specific density functions. Although both NPMLEs are commonly used, we show that one is inefficient and the other inconsistent. To overcome these deficiencies, we propose three classes of consistent nonparametric estimators which do not assume parametric density models and are easy to implement. They are based on the inverse probability weighting (IPW), augmented IPW (AIPW), and nonparametric imputation (IMP). The AIPW achieves the efficiency bound without additional modeling assumptions. Extensive simulation experiments demonstrate satisfactory performance of these estimators even when the data are heavily censored. We apply these estimators to the Cooperative Huntington's Observational Research Trial (COHORT), and provide age-specific estimates of the effect of mutation in the Huntington gene on mortality using a sample of family members. The close approximation of the estimated non-carrier survival rates to that of the U.S. population indicates small ascertainment bias in the COHORT family sample. Our analyses underscore an elevated risk of death in Huntington gene mutation carriers compared to non-carriers for a wide age range, and suggest that the mutation equally affects survival rates in both genders. The estimated survival rates are useful in genetic counseling for providing guidelines on interpreting the risk of death associated with a positive genetic testing, and in facilitating future subjects at risk to make informed decisions on whether to undergo genetic mutation testings.
Collapse
Affiliation(s)
- Yuanjia Wang
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Tanya P Garcia
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143
| | - Yanyuan Ma
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143
| |
Collapse
|