1
|
Li S, Hu T, Wang L, McMahan CS, Tebbs JM. Regression analysis of group-tested current status data. Biometrika 2024; 111:1047-1061. [PMID: 39691693 PMCID: PMC11648127 DOI: 10.1093/biomet/asae006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Indexed: 12/19/2024] Open
Abstract
Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (Petito & Jewell, 2016). To analyse these complex data, we propose methods that estimate a proportional hazard regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.
Collapse
Affiliation(s)
- Shuwei Li
- School of Economics and Statistics, Guangzhou University, Daxuecheng Road 230, Guangzhou, Guangdong 510006, China
| | - Tao Hu
- School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
| | - Lianming Wang
- Department of Statistics, University of South Carolina, 209A LeConte College, Columbia, South Carolina 29208, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Martin Hall, Clemson, South Carolina 29634, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, 217 LeConte College, Columbia, South Carolina 29208, USA
| |
Collapse
|
2
|
Huang R, McLain AC, Herrin BH, Nolan M, Cai B, Self S. Bayesian group testing regression models for spatial data. Spat Spatiotemporal Epidemiol 2024; 50:100677. [PMID: 39181610 PMCID: PMC11347770 DOI: 10.1016/j.sste.2024.100677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 07/12/2024] [Accepted: 07/15/2024] [Indexed: 08/27/2024]
Abstract
Spatial patterns are common in infectious disease epidemiology. Disease mapping is essential to infectious disease surveillance. Under a group testing protocol, biomaterial from multiple individuals is physically combined into a pooled specimen, which is then tested for infection. If the pool tests negative, all contributing individuals are generally assumed to be uninfected. If the pool tests positive, the individuals are usually retested to determine who is infected. When the prevalence of infection is low, group testing provides significant cost savings over traditional individual testing by reducing the number of tests required. However, the lack of statistical methods capable of producing maps from group testing data has limited the use of group testing in disease mapping. We develop a Bayesian methodology that can simultaneously map disease prevalence using group testing data and identify risk factors for infection. We illustrate its real-world utility using two datasets from vector-borne disease surveillance.
Collapse
Affiliation(s)
- Rongjie Huang
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Brian H Herrin
- College of Veterinary Medicine, Kansas State University, 1700 Denison Ave, Manhattan, 66502, KS, USA
| | - Melissa Nolan
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Bo Cai
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Stella Self
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA.
| |
Collapse
|
3
|
Juwara L, Yang YA, Velly AM, Saha-Chaudhuri P. Privacy-preserving analysis of time-to-event data under nested case-control sampling. Stat Methods Med Res 2024; 33:96-111. [PMID: 38093410 PMCID: PMC10863373 DOI: 10.1177/09622802231215804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Analyses of distributed data networks of rare diseases are constrained by legitimate privacy and ethical concerns. Analytical centers (e.g. research institutions) are thus confronted with the challenging task of obtaining data from recruiting sites that are often unable or unwilling to share personal records of participants. For time-to-event data, recently popularized disclosure techniques with privacy guarantees (e.g., Differentially Private Generative Adversarial Networks) are generally computationally expensive or inaccessible to applied researchers. To perform the widely used Cox proportional hazards regression, we propose an easy-to-implement privacy-preserving data analysis technique by pooling (i.e. aggregating) individual records of covariates at recruiting sites under the nested case-control sampling framework before sharing the pooled nested case-control subcohort. We show that the pooled hazard ratio estimators, under the pooled nested case-control subsamples from the contributing sites, are maximum likelihood estimators and provide consistent estimates of the individual level full cohort HRs. Furthermore, a sampling technique for generating pseudo-event times for individual subjects that constitute the pooled nested case-control subsamples is proposed. Our method is demonstrated using extensive simulations and analysis of the National Lung Screening Trial data. The utility of our proposed approach is compared to the gold standard (full cohort) and synthetic data generated using classification and regression trees. The proposed pooling technique performs to near-optimal levels comparable to full cohort analysis or synthetic data; the efficiency improves in rare event settings when more controls are matched on during nested case-control subcohort sampling.
Collapse
Affiliation(s)
- Lamin Juwara
- Quantitative Life Sciences, McGill University, Montreal, Canada
- Lady Davis Institute for Medical Research, Montreal, Quebec, Canada
| | - Yi Archer Yang
- Quantitative Life Sciences, McGill University, Montreal, Canada
- Department of Mathematics and Statistis, McGill University, Montreal, Quebec, Canada
| | - Ana M Velly
- Lady Davis Institute for Medical Research, Montreal, Quebec, Canada
- Department of Dentistry, McGill University, Montreal, Quebec, Canada
| | | |
Collapse
|
4
|
Roy S, Adhya S, Rana S. Estimation of odds ratio from group testing data with misclassified exposure. Biom J 2024; 66:e2200254. [PMID: 38285402 DOI: 10.1002/bimj.202200254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 08/21/2023] [Accepted: 09/17/2023] [Indexed: 01/30/2024]
Abstract
For low prevalence disease, we consider estimation of the odds ratio for two specified groups of individuals using group testing data. Broadly the two groups may be classified as "the exposed" and "the unexposed." Often in observational studies, the exposure status is not correctly recorded. In addition, diagnostic tests are rarely completely accurate. The proposed model accounts for imperfect sensitivity and specificity of diagnostic tests along with the misclassification in the exposure status. For model identifiability, we make use of internal validation data, where a subsample of reasonably small size is selected from the original sample by simple random sampling without replacement. Pseudo-maximum likelihood method is employed for the estimation of the model parameters. The performance of group testing methodology is compared with individual testing for different parametric configurations. A limited data study related to COVID-19 prevalence is performed to illustrate the methodology.
Collapse
Affiliation(s)
- Surupa Roy
- Department of Statistics, St Xavier's College (Autonomous), Kolkata, West Bengal, India
| | - Sumanta Adhya
- Department of Statistics, West Bengal State University, Kolkata, West Bengal, India
| | - Subrata Rana
- Department of Statistics, Krishnagar Government College, Kolkata, West Bengal, India
| |
Collapse
|
5
|
Delaigle A, Tan R. Group testing regression analysis with covariates and specimens subject to missingness. Stat Med 2023; 42:731-744. [PMID: 36646446 DOI: 10.1002/sim.9640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 09/06/2022] [Accepted: 12/16/2022] [Indexed: 01/18/2023]
Abstract
We develop parametric estimators of a conditional prevalence in the group testing context. Group testing is applied when a binary outcome variable, often a disease indicator, is assessed by testing a specimen for the presence of the disease. Instead of testing all individual specimens separately, these are pooled in groups and the grouped specimens are tested for the disease, which permits to significantly reduce the number of tests to be performed. Various techniques have been developed in the literature for estimating a conditional prevalence from group testing data, but most of them are not valid when the data are subject to missingness. We consider this problem in the case where the specimen and the covariates are subject to nonmonotone missingness. We propose parametric estimators of the conditional prevalence, establish identifiability conditions for a logistic missing not at random model, and introduce an ignorable missing at random model. In theory, our estimators could be applied with multiple covariates missing, but in practice, they face numerical challenges when more than one covariate is missing for given individuals. We illustrate the method on simulated data and on a dataset from the Demographics and Health Survey.
Collapse
Affiliation(s)
- Aurore Delaigle
- School of Mathematics and Statistics, University of Melbourne, 3010, Victoria, Parkville, Australia
| | - Ruoxu Tan
- School of Mathematics and Statistics, University of Melbourne, 3010, Victoria, Parkville, Australia
- Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
6
|
Wang D, Mou X, Liu Y. Varying-coefficient regression analysis for pooled biomonitoring. Biometrics 2022; 78:1328-1341. [PMID: 34190334 PMCID: PMC8716640 DOI: 10.1111/biom.13516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 03/16/2021] [Indexed: 12/30/2022]
Abstract
Human biomonitoring involves measuring the accumulation of contaminants in biological specimens (such as blood or urine) to assess individuals' exposure to environmental contamination. Due to the expensive cost of a single assay, the method of pooling has become increasingly common in environmental studies. The implementation of pooling starts by physically mixing specimens into pools, and then measures pooled specimens for the concentration of contaminants. An important task is to reconstruct individual-level statistical characteristics based on pooled measurements. In this article, we propose to use the varying-coefficient regression model for individual-level biomonitoring and provide methods to estimate the varying coefficients based on different types of pooled data. Asymptotic properties of the estimators are presented. We illustrate our methodology via simulation and with application to pooled biomonitoring of a brominated flame retardant provided by the National Health and Nutrition Examination Survey (NHANES).
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Xichen Mou
- Division of Epidemiology, Biostatistics, and Environmental Health, Scholl of Public Health, University of Memphis, Memphis, TN 38152, U.S.A
| | - Yan Liu
- School of Community Health Sciences, University of Nevada, Reno, NV 89557, U.S.A
| |
Collapse
|
7
|
Self S, McMahan C, Mokalled S. Capturing the pool dilution effect in group testing regression: A Bayesian approach. Stat Med 2022; 41:4682-4696. [PMID: 35879887 PMCID: PMC9489666 DOI: 10.1002/sim.9532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 01/07/2023]
Abstract
Group (pooled) testing is becoming a popular strategy for screening large populations for infectious diseases. This popularity is owed to the cost savings that can be realized through implementing group testing methods. These methods involve physically combining biomaterial (eg, saliva, blood, urine) collected on individuals into pooled specimens which are tested for an infection of interest. Through testing these pooled specimens, group testing methods reduce the cost of diagnosing all individuals under study by reducing the number of tests performed. Even though group testing offers substantial cost reductions, some practitioners are hesitant to adopt group testing methods due to the so-called dilution effect. The dilution effect describes the phenomenon in which biomaterial from negative individuals dilute the contributions from positive individuals to such a degree that a pool is incorrectly classified. Ignoring the dilution effect can reduce classification accuracy and lead to bias in parameter estimates and inaccurate inference. To circumvent these issues, we propose a Bayesian regression methodology which directly acknowledges the dilution effect while accommodating data that arises from any group testing protocol. As a part of our estimation strategy, we are able to identify pool specific optimal classification thresholds which are aimed at maximizing the classification accuracy of the group testing protocol being implemented. These two features working in concert effectively alleviate the primary concerns raised by practitioners regarding group testing. The performance of our methodology is illustrated via an extensive simulation study and by being applied to Hepatitis B data collected on Irish prisoners.
Collapse
Affiliation(s)
- Stella Self
- Department of Epidemiology and Biostatistics, Arnold School of Public HealthUniversity of South CarolinaColumbiaSouth CarolinaUSA
| | - Christopher McMahan
- School of Mathematical and Statistical SciencesClemson UniversityClemsonSouth CarolinaUSA
| | - Stefani Mokalled
- School of Mathematical and Statistical SciencesClemson UniversityClemsonSouth CarolinaUSA
| |
Collapse
|
8
|
Warasi MS, Hungerford LL, Lahmers K. Optimizing Pooled Testing for Estimating the Prevalence of Multiple Diseases. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2022; 27:713-727. [PMID: 35975123 PMCID: PMC9373899 DOI: 10.1007/s13253-022-00511-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/27/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022]
Abstract
Pooled testing can enhance the efficiency of diagnosing individuals with diseases of low prevalence. Often, pooling is implemented using standard groupings (2, 5, 10, etc.). On the other hand, optimization theory can provide specific guidelines in finding the ideal pool size and pooling strategy. This article focuses on optimizing the precision of disease prevalence estimators calculated from multiplex pooled testing data. In the context of a surveillance application of animal diseases, we study the estimation efficiency (i.e., precision) and cost efficiency of the estimators with adjustments for the number of expended tests. This enables us to determine the pooling strategies that offer the highest benefits when jointly estimating the prevalence of multiple diseases, such as theileriosis and anaplasmosis. The outcomes of our work can be used in designing pooled testing protocols, not only in simple pooling scenarios but also in more complex scenarios where individual retesting is performed in order to identify positive cases. A software application using the shiny package in R is provided with this article to facilitate implementation of our methods. Supplementary materials accompanying this paper appear online.
Collapse
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Whitt Hall 224, Radford, VA 24142 USA
| | - Laura L. Hungerford
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| | - Kevin Lahmers
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| |
Collapse
|
9
|
Barlaam A, Sannella A, Ferrari N, Temesgen T, Rinaldi L, Normanno G, Cacciò S, Robertson L, Giangaspero A. Ready-to-eat salads and berry fruits purchased in Italy contaminated by Cryptosporidium spp., Giardia duodenalis, and Entamoeba histolytica. Int J Food Microbiol 2022; 370:109634. [DOI: 10.1016/j.ijfoodmicro.2022.109634] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/01/2022] [Accepted: 03/13/2022] [Indexed: 01/11/2023]
|
10
|
Warasi MS. groupTesting: an R package for group testing estimation. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.2009867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA, USA
| |
Collapse
|
11
|
Yu J, Huang Y, Shen ZJ. Optimizing and evaluating PCR-based pooled screening during COVID-19 pandemics. Sci Rep 2021; 11:21460. [PMID: 34728759 PMCID: PMC8564549 DOI: 10.1038/s41598-021-01065-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 10/19/2021] [Indexed: 12/13/2022] Open
Abstract
Population screening played a substantial role in safely reopening the economy and avoiding new outbreaks of COVID-19. PCR-based pooled screening makes it possible to test the population with limited resources by pooling multiple individual samples. Our study compared different population-wide screening methods as transmission-mitigating interventions, including pooled PCR, individual PCR, and antigen screening. Incorporating testing-isolation process and individual-level viral load trajectories into an epidemic model, we further studied the impacts of testing-isolation on test sensitivities. Results show that the testing-isolation process could maintain a stable test sensitivity during the outbreak by removing most infected individuals, especially during the epidemic decline. Moreover, we compared the efficiency, accuracy, and cost of different screening methods during the pandemic. Our results show that PCR-based pooled screening is cost-effective in reversing the pandemic at low prevalence. When the prevalence is high, PCR-based pooled screening may not stop the outbreak. In contrast, antigen screening with sufficient frequency could reverse the epidemic, despite the high cost and the large numbers of false positives in the screening process.
Collapse
Affiliation(s)
- Jiali Yu
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua University, Shenzhen, China
| | - Yiduo Huang
- Department of Civil and Environmental Engineering, University of California Berkeley, Berkeley, CA, USA
| | - Zuo-Jun Shen
- College of Engineering, University of California Berkeley, Berkeley, CA, USA.
- Faculty of Engineering and Faculty of Business and Economics, University of Hong Kong, Hong Kong, China.
| |
Collapse
|
12
|
Liu Y, McMahan CS, Tebbs JM, Gallagher CM, Bilder CR. Generalized additive regression for group testing data. Biostatistics 2021; 22:873-889. [PMID: 32061081 PMCID: PMC8511943 DOI: 10.1093/biostatistics/kxaa003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 01/04/2020] [Accepted: 01/13/2020] [Indexed: 11/13/2022] Open
Abstract
In screening applications involving low-prevalence diseases, pooling specimens (e.g., urine, blood, swabs, etc.) through group testing can be far more cost effective than testing specimens individually. Estimation is a common goal in such applications and typically involves modeling the probability of disease as a function of available covariates. In recent years, several authors have developed regression methods to accommodate the complex structure of group testing data but often under the assumption that covariate effects are linear. Although linearity is a reasonable assumption in some applications, it can lead to model misspecification and biased inference in others. To offer a more flexible framework, we propose a Bayesian generalized additive regression approach to model the individual-level probability of disease with potentially misclassified group testing data. Our approach can be used to analyze data arising from any group testing protocol with the goal of estimating multiple unknown smooth functions of covariates, standard linear effects for other covariates, and assay classification accuracy probabilities. We illustrate the methods in this article using group testing data on chlamydia infection in Iowa.
Collapse
Affiliation(s)
- Yan Liu
- School of Community Health Sciences, University of Nevada, Reno, 1664 N. Virginia St, Reno, NV 89557, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, 1523 Greene St, Columbia, SC 29208, USA
| | - Colin M Gallagher
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, 340 Hardin Hall North, Lincoln, NE 68583, USA
| |
Collapse
|
13
|
Mokalled SC, McMahan CS, Tebbs JM, Andrew Brown D, Bilder CR. Incorporating the dilution effect in group testing regression. Stat Med 2021; 40:2540-2555. [PMID: 33598950 DOI: 10.1002/sim.8916] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 11/25/2020] [Accepted: 02/03/2021] [Indexed: 11/10/2022]
Abstract
When screening for infectious diseases, group testing has proven to be a cost efficient alternative to individual level testing. Cost savings are realized by testing pools of individual specimens (eg, blood, urine, saliva, and so on) rather than by testing the specimens separately. However, a common concern that arises in group testing is the so-called "dilution effect." This occurs if the signal from a positive individual's specimen is diluted past an assay's threshold of detection when it is pooled with multiple negative specimens. In this article, we propose a new statistical framework for group testing data that merges estimation and case identification, which are often treated separately in the literature. Our approach considers analyzing continuous biomarker levels (eg, antibody levels, antigen concentrations, and so on) from pooled samples to estimate both a binary regression model for the probability of disease and the biomarker distributions for cases and controls. To increase case identification accuracy, we then show how estimates of the biomarker distributions can be used to select diagnostic thresholds on a pool-by-pool basis. Our proposals are evaluated through numerical studies and are illustrated using hepatitis B virus data collected on a prison population in Ireland.
Collapse
Affiliation(s)
- Stefani C Mokalled
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Derek Andrew Brown
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
14
|
Wang D, Mou X, Li X, Huang X. Local polynomial regression for pooled response data. J Nonparametr Stat 2020; 32:814-837. [PMID: 33762800 PMCID: PMC7986571 DOI: 10.1080/10485252.2020.1834104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 10/03/2020] [Indexed: 10/23/2022]
Abstract
We propose local polynomial estimators for the conditional mean of a continuous response when only pooled response data are collected under different pooling designs. Asymptotic properties of these estimators are investigated and compared. Extensive simulation studies are carried out to compare finite sample performance of the proposed estimators under various model settings and pooling strategies. We apply the proposed local polynomial regression methods to two real-life applications to illustrate practical implementation and performance of the estimators for the mean function.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A
| | - Xichen Mou
- Division of Epidemiology, Biostatistics, and Environmental Health, University of Memphis, Memphis, Tennessee, U.S.A
| | - Xiang Li
- JPMorgan Chase, Jersey City, New Jersey 07310, U.S.A
| | - Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A
| |
Collapse
|
15
|
Joyner CN, McMahan CS, Tebbs JM, Bilder CR. From mixed effects modeling to spike and slab variable selection: A Bayesian regression model for group testing data. Biometrics 2020; 76:913-923. [PMID: 31729015 PMCID: PMC7944974 DOI: 10.1111/biom.13176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 10/22/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022]
Abstract
Due to reductions in both time and cost, group testing is a popular alternative to individual-level testing for disease screening. These reductions are obtained by testing pooled biospecimens (eg, blood, urine, swabs, etc.) for the presence of an infectious agent. However, these reductions come at the expense of data complexity, making the task of conducting disease surveillance more tenuous when compared to using individual-level data. This is because an individual's disease status may be obscured by a group testing protocol and the effect of imperfect testing. Furthermore, unlike individual-level testing, a given participant could be involved in multiple testing outcomes and/or may never be tested individually. To circumvent these complexities and to incorporate all available information, we propose a Bayesian generalized linear mixed model that accommodates data arising from any group testing protocol, estimates unknown assay accuracy probabilities and accounts for potential heterogeneity in the covariate effects across population subgroups (eg, clinic sites, etc.); this latter feature is of key interest to practitioners tasked with conducting disease surveillance. To achieve model selection, our proposal uses spike and slab priors for both fixed and random effects. The methodology is illustrated through numerical studies and is applied to chlamydia surveillance data collected in Iowa.
Collapse
Affiliation(s)
- Chase N. Joyner
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Christopher S. McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
16
|
Delaigle A, Huang W, Lei S. Estimation of Conditional Prevalence From Group Testing Data With Missing Covariates. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2019.1566071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Aurore Delaigle
- School of Mathematics and Statistics and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), University of Melbourne, Parkville, Australia
| | - Wei Huang
- School of Mathematics and Statistics and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), University of Melbourne, Parkville, Australia
| | - Shaoke Lei
- Health Services, Murdoch Children’s Research Institute and Health Services Research Unit, The Royal Children’s Hospital, Melbourne, Australia
| |
Collapse
|
17
|
|
18
|
Nguyen NT, Aprahamian H, Bish EK, Bish DR. A methodology for deriving the sensitivity of pooled testing, based on viral load progression and pooling dilution. J Transl Med 2019; 17:252. [PMID: 31387586 PMCID: PMC6683472 DOI: 10.1186/s12967-019-1992-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 07/17/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pooled testing, in which biological specimens from multiple subjects are combined into a testing pool and tested via a single test, is a common testing method for both surveillance and screening activities. The sensitivity of pooled testing for various pool sizes is an essential input for surveillance and screening optimization, including testing pool design. However, clinical data on test sensitivity values for different pool sizes are limited, and do not provide a functional relationship between test sensitivity and pool size. We develop a novel methodology to accurately compute the sensitivity of pooled testing, while accounting for viral load progression and pooling dilution. We demonstrate our methodology on the nucleic acid amplification testing (NAT) technology for the human immunodeficiency virus (HIV). METHODS Our methodology integrates mathematical models of viral load progression and pooling dilution to derive test sensitivity values for various pool sizes. This methodology derives the conditional test sensitivity, conditioned on the number of infected specimens in a pool, and uses the law of total probability, along with higher dimensional integrals, to derive pooled test sensitivity values. We also develop a highly accurate and easy-to-compute approximation function for pooled test sensitivity of the HIV ULTRIO Plus NAT Assay. We calibrate model parameters using published efficacy data for the HIV ULTRIO Plus NAT Assay, and clinical data on viral RNA load progression in HIV-infected patients, and use this methodology to derive and validate the sensitivity of the HIV ULTRIO Plus Assay for various pool sizes. RESULTS We demonstrate the value of this methodology through optimal testing pool design for HIV prevalence estimation in Sub-Saharan Africa. This case study indicates that the optimal testing pool design is highly efficient, and outperforms a benchmark pool design. CONCLUSIONS The proposed methodology accounts for both viral load progression and pooling dilution, and is computationally tractable. We calibrate this model for the HIV ULTRIO Plus NAT Assay, show that it provides highly accurate sensitivity estimates for various pool sizes, and, thus, yields efficient testing pool design for HIV prevalence estimation. Our model is generic, and can be calibrated for other infections.
Collapse
Affiliation(s)
- Ngoc T Nguyen
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, 24061, USA.
| | - Hrayer Aprahamian
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Ebru K Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Douglas R Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, 24061, USA
| |
Collapse
|
19
|
Lin J, Wang D, Zheng Q. Regression analysis and variable selection for two-stage multiple-infection group testing data. Stat Med 2019; 38:4519-4533. [PMID: 31297869 DOI: 10.1002/sim.8311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 03/03/2019] [Accepted: 06/14/2019] [Indexed: 12/17/2022]
Abstract
Group testing, as a cost-effective strategy, has been widely used to perform large-scale screening for rare infections. Recently, the use of multiplex assays has transformed the goal of group testing from detecting a single disease to diagnosing multiple infections simultaneously. Existing research on multiple-infection group testing data either exclude individual covariate information or ignore possible retests on suspicious individuals. To incorporate both, we propose a new regression model. This new model allows us to perform a regression analysis for each infection using multiple-infection group testing data. Furthermore, we introduce an efficient variable selection method to reveal truly relevant risk factors for each disease. Our methodology also allows for the estimation of the assay sensitivity and specificity when they are unknown. We examine the finite sample performance of our method through extensive simulation studies and apply it to a chlamydia and gonorrhea screening data set to illustrate its practical usefulness.
Collapse
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, South Carolina
| | - Dewei Wang
- Department of Statistics, University of South Carolina, South Carolina
| | - Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, Kentucky
| |
Collapse
|
20
|
Determination of Varying Group Sizes for Pooling Procedure. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:4381084. [PMID: 31065292 PMCID: PMC6466917 DOI: 10.1155/2019/4381084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 01/17/2019] [Accepted: 02/05/2019] [Indexed: 11/17/2022]
Abstract
Pooling is an attractive strategy in screening infected specimens, especially for rare diseases. An essential step of performing the pooled test is to determine the group size. Sometimes, equal group size is not appropriate due to population heterogeneity. In this case, varying group sizes are preferred and could be determined while individual information is available. In this study, we propose a sequential procedure to determine varying group sizes through fully utilizing available information. This procedure is data driven. Simulations show that it has good performance in estimating parameters.
Collapse
|
21
|
Gregory KB, Wang D, McMahan CS. Adaptive elastic net for group testing. Biometrics 2019; 75:13-23. [PMID: 30267535 PMCID: PMC7938860 DOI: 10.1111/biom.12973] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 09/14/2018] [Indexed: 11/28/2022]
Abstract
For disease screening, group (pooled) testing can be a cost-saving alternative to one-at-a-time testing, with savings realized through assaying pooled biospecimen (eg, urine, blood, saliva). In many group testing settings, practitioners are faced with the task of conducting disease surveillance. That is, it is often of interest to relate individuals' true disease statuses to covariate information via binary regression. Several authors have developed regression methods for group testing data, which is challenging due to the effects of imperfect testing. That is, all testing outcomes (on pools and individuals) are subject to misclassification, and individuals' true statuses are never observed. To further complicate matters, individuals may be involved in several testing outcomes. For analyzing such data, we provide a novel regression methodology which generalizes and extends the aforementioned regression techniques and which incorporates regularization. Specifically, for model fitting and variable selection, we propose an adaptive elastic net estimator under the logistic regression model which can be used to analyze data from any group testing strategy. We provide an efficient algorithm for computing the estimator along with guidance on tuning parameter selection. Moreover, we establish the asymptotic properties of the proposed estimator and show that it possesses "oracle" properties. We evaluate the performance of the estimator through Monte Carlo studies and illustrate the methodology on a chlamydia data set from the State Hygienic Laboratory in Iowa City.
Collapse
Affiliation(s)
- Karl B. Gregory
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
22
|
Roy S, Banerjee T. Estimation of log-odds ratio from group testing data using Firth correction. Biom J 2019; 61:714-728. [PMID: 30645765 DOI: 10.1002/bimj.201800125] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 11/10/2022]
Abstract
We consider the estimation of the prevalence of a rare disease, and the log-odds ratio for two specified groups of individuals from group testing data. For a low-prevalence disease, the maximum likelihood estimate of the log-odds ratio is severely biased. However, Firth correction to the score function leads to a considerable improvement of the estimator. Also, for a low-prevalence disease, if the diagnostic test is imperfect, the group testing is found to yield more precise estimate of the log-odds ratio than the individual testing.
Collapse
Affiliation(s)
- Surupa Roy
- Department of Statistics, St Xavier's College, Kolkata, India
| | | |
Collapse
|
23
|
Enders LS, Hefley TJ, Girvin JJ, Whitworth RJ, Smith CM. Spatiotemporal Distribution and Environmental Drivers of Barley yellow dwarf virus and Vector Abundance in Kansas. PHYTOPATHOLOGY 2018; 108:1196-1205. [PMID: 29750593 DOI: 10.1094/phyto-10-17-0340-r] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Several aphid species transmit barley yellow dwarf, a globally destructive disease caused by viruses that infect cereal grain crops. Data from >400 samples collected across Kansas wheat fields in 2014 and 2015 were used to develop spatiotemporal models predicting the extent to which landcover, temperature and precipitation affect spring aphid vector abundance and presence of individuals carrying Barley yellow dwarf virus (BYDV). The distribution of Rhopalosiphum padi abundance was not correlated with climate or landcover, but Sitobion avenae abundance was positively correlated with fall temperature and negatively correlated to spring temperature and precipitation. The abundance of Schizaphis graminum was negatively correlated with fall precipitation and winter temperature. The incidence of viruliferous (+BYDV) R. padi was positively correlated with fall precipitation but negatively correlated with winter precipitation. In contrast, the probability of +BYDV S. avenae was unaffected by precipitation but was positively correlated with fall temperatures and distance to forest or shrubland. R. padi and S. avenae were more prevalent at eastern sample sites where ground cover is more grassland than cropland, suggesting that grassland may provide over-summering sites for vectors and pose a risk as potential BYDV reservoirs. Nevertheless, land cover patterns were not strongly associated with differences in abundance or the probability that viruliferous aphids were present.
Collapse
Affiliation(s)
- L S Enders
- First author: Department of Entomology, Purdue University, West Lafayette, IN; first, third, fourth, and fifth authors: Department of Entomology, Kansas State University, Manhattan; second author: Department of Statistics, Kansas State University, Manhattan; and third author: USDA-APHIS-PPQ, Federal Way, WA
| | - T J Hefley
- First author: Department of Entomology, Purdue University, West Lafayette, IN; first, third, fourth, and fifth authors: Department of Entomology, Kansas State University, Manhattan; second author: Department of Statistics, Kansas State University, Manhattan; and third author: USDA-APHIS-PPQ, Federal Way, WA
| | - J J Girvin
- First author: Department of Entomology, Purdue University, West Lafayette, IN; first, third, fourth, and fifth authors: Department of Entomology, Kansas State University, Manhattan; second author: Department of Statistics, Kansas State University, Manhattan; and third author: USDA-APHIS-PPQ, Federal Way, WA
| | - R J Whitworth
- First author: Department of Entomology, Purdue University, West Lafayette, IN; first, third, fourth, and fifth authors: Department of Entomology, Kansas State University, Manhattan; second author: Department of Statistics, Kansas State University, Manhattan; and third author: USDA-APHIS-PPQ, Federal Way, WA
| | - C M Smith
- First author: Department of Entomology, Purdue University, West Lafayette, IN; first, third, fourth, and fifth authors: Department of Entomology, Kansas State University, Manhattan; second author: Department of Statistics, Kansas State University, Manhattan; and third author: USDA-APHIS-PPQ, Federal Way, WA
| |
Collapse
|
24
|
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
25
|
Nguyen NT, Bish EK, Aprahamian H. Sequential prevalence estimation with pooling and continuous test outcomes. Stat Med 2018; 37:2391-2426. [PMID: 29687473 DOI: 10.1002/sim.7657] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2017] [Revised: 01/17/2018] [Accepted: 02/15/2018] [Indexed: 01/02/2023]
Abstract
Prevalence estimation is crucial for controlling the spread of infections and diseases and for planning of health care services. Prevalence estimation is typically conducted via pooled, or group, testing due to limited testing budgets. We study a sequential estimation procedure that uses continuous pool readings and considers the dilution effect of pooling so as to efficiently estimate an unknown prevalence rate. Embedded into the sequential estimation procedure is an optimization model that determines the optimal pooling design (number of pools and pool sizes) under a limited testing budget, considering the trade-off between testing cost and estimation accuracy. Our numerical study indicates that the proposed sequential estimation procedure outperforms single-stage procedures, or procedures that use binary test outcomes. Further, the sequential procedure provides robust prevalence estimates in cases where the initial estimate of the unknown prevalence rate is poor, or the assumed distribution of the biomarker load in infected subjects is inaccurate. Thus, when limited and unreliable information is available about the current status of, or biomarker dynamics related to, an infection, the sequential procedure becomes an attractive estimation strategy, due to its ability to mitigate the initial bias.
Collapse
Affiliation(s)
- Ngoc T Nguyen
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Ebru K Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Hrayer Aprahamian
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| |
Collapse
|
26
|
McMahan CS, Tebbs JM, Hanson TE, Bilder CR. Bayesian regression for group testing data. Biometrics 2017; 73:1443-1452. [PMID: 28405965 PMCID: PMC5638690 DOI: 10.1111/biom.12704] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 03/01/2017] [Accepted: 03/01/2017] [Indexed: 01/10/2023]
Abstract
Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia, and we make user-friendly R code available to practitioners.
Collapse
Affiliation(s)
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Timothy E. Hanson
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
27
|
Caradonna T, Marangi M, Del Chierico F, Ferrari N, Reddel S, Bracaglia G, Normanno G, Putignani L, Giangaspero A. Detection and prevalence of protozoan parasites in ready-to-eat packaged salads on sale in Italy. Food Microbiol 2017. [DOI: 10.1016/j.fm.2017.06.006] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
28
|
Warasi MS, McMahan CS, Tebbs JM, Bilder CR. Group testing regression models with dilution submodels. Stat Med 2017; 36:4860-4872. [PMID: 28856774 DOI: 10.1002/sim.7455] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 05/27/2017] [Accepted: 08/11/2017] [Indexed: 12/21/2022]
Abstract
Group testing, where specimens are tested initially in pools, is widely used to screen individuals for sexually transmitted diseases. However, a common problem encountered in practice is that group testing can increase the number of false negative test results. This occurs primarily when positive individual specimens within a pool are diluted by negative ones, resulting in positive pools testing negatively. If the goal is to estimate a population-level regression model relating individual disease status to observed covariates, severe bias can result if an adjustment for dilution is not made. Recognizing this as a critical issue, recent binary regression approaches in group testing have utilized continuous biomarker information to acknowledge the effect of dilution. In this paper, we have the same overall goal but take a different approach. We augment existing group testing regression models (that assume no dilution) with a parametric dilution submodel for pool-level sensitivity and estimate all parameters using maximum likelihood. An advantage of our approach is that it does not rely on external biomarker test data, which may not be available in surveillance studies. Furthermore, unlike previous approaches, our framework allows one to formally test whether dilution is present based on the observed group testing data. We use simulation to illustrate the performance of our estimation and inference methods, and we apply these methods to 2 infectious disease data sets.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA 24142, USA
| | | | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, NE, USA
| |
Collapse
|
29
|
Liu Y, McMahan C, Gallagher C. A general framework for the regression analysis of pooled biomarker assessments. Stat Med 2017; 36:2363-2377. [PMID: 28349583 PMCID: PMC5484591 DOI: 10.1002/sim.7291] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 02/17/2017] [Accepted: 03/06/2017] [Indexed: 11/11/2022]
Abstract
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yan Liu
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher McMahan
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Colin Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
30
|
Huang X, Sarker Warasi MS. Maximum Likelihood Estimators in Regression Models for Error‐prone Group Testing Data. Scand Stat Theory Appl 2017. [DOI: 10.1111/sjos.12282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xianzheng Huang
- Department of Statistics, College of Arts & Sciences University of South Carolina
| | | |
Collapse
|
31
|
Mitchell EM, Plowden TC, Schisterman EF. Estimating relative risk of a log-transformed exposure measured in pools. Stat Med 2016; 35:5477-5494. [PMID: 27530506 DOI: 10.1002/sim.7075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 07/08/2016] [Accepted: 07/22/2016] [Indexed: 11/07/2022]
Abstract
Pooling biospecimens prior to performing laboratory assays is a useful tool to reduce costs, achieve minimum volume requirements and mitigate assay measurement error. When estimating the risk of a continuous, pooled exposure on a binary outcome, specialized statistical techniques are required. Current methods include a regression calibration approach, where the expectation of the individual-level exposure is calculated by adjusting the observed pooled measurement with additional covariate data. While this method employs a linear regression calibration model, we propose an alternative model that can accommodate log-linear relationships between the exposure and predictive covariates. The proposed model permits direct estimation of the relative risk associated with a log-transformation of an exposure measured in pools. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Torie C Plowden
- Program in Reproductive and Adult Endocrinology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| |
Collapse
|
32
|
Abstract
Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k$\end{document} groups that include \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$n$\end{document} independent individuals in total. If the unknown prevalence is low and the screening test suffers from misclassification, it is also possible to obtain more precise prevalence estimates than those obtained from testing all \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$n$\end{document} samples separately (Tu et al., 1994). In some applications, the individual binary response corresponds to whether an underlying time-to-event variable \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} is less than an observed screening time \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document}, a data structure known as current status data. Given sufficient variation in the observed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document} values, it is possible to estimate the distribution function \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F$\end{document} of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} nonparametrically, at least at some points in its support, using the pool-adjacent-violators algorithm (Ayer et al., 1955). Here, we consider nonparametric estimation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F$\end{document} based on group-tested current status data for groups of size \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k$\end{document} where the group tests positive if and only if any individual’s unobserved \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} is less than the corresponding observed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document}. We investigate the performance of the group-based estimator as compared to the individual test nonparametric maximum likelihood estimator, and show that the former can be more precise in the presence of misclassification for low values of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F(t)$\end{document}. Potential applications include testing for the presence of various diseases in pooled samples where interest focuses on the age-at-incidence distribution rather than overall prevalence. We apply this estimator to the age-at-incidence curve for hepatitis C infection in a sample of U.S. women who gave birth to a child in 2014, where group assignment is done at random and based on maternal age. We discuss connections to other work in the literature, as well as potential extensions.
Collapse
Affiliation(s)
- L C Petito
- Division of Biostatistics, School of Public Health, 101 Haviland Hall, University of California, Berkeley, California 94720,
| | - N P Jewell
- Division of Biostatistics, School of Public Health, 101 Haviland Hall, University of California, Berkeley, California 94720,
| |
Collapse
|
33
|
Warasi MS, Tebbs JM, McMahan CS, Bilder CR. Estimating the prevalence of multiple diseases from two-stage hierarchical pooling. Stat Med 2016; 35:3851-64. [PMID: 27090057 PMCID: PMC4965323 DOI: 10.1002/sim.6964] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2015] [Revised: 12/31/2015] [Accepted: 03/17/2016] [Indexed: 11/08/2022]
Abstract
Testing protocols in large-scale sexually transmitted disease screening applications often involve pooling biospecimens (e.g., blood, urine, and swabs) to lower costs and to increase the number of individuals who can be tested. With the recent development of assays that detect multiple diseases, it is now common to test biospecimen pools for multiple infections simultaneously. Recent work has developed an expectation-maximization algorithm to estimate the prevalence of two infections using a two-stage, Dorfman-type testing algorithm motivated by current screening practices for chlamydia and gonorrhea in the USA. In this article, we have the same goal but instead take a more flexible Bayesian approach. Doing so allows us to incorporate information about assay uncertainty during the testing process, which involves testing both pools and individuals, and also to update information as individuals are tested. Overall, our approach provides reliable inference for disease probabilities and accurately estimates assay sensitivity and specificity even when little or no information is provided in the prior distributions. We illustrate the performance of our estimation methods using simulation and by applying them to chlamydia and gonorrhea data collected in Nebraska. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | | | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, 68583, NE, U.S.A
| |
Collapse
|
34
|
McMahan CS, McLain AC, Gallagher CM, Schisterman EF. Estimating covariate-adjusted measures of diagnostic accuracy based on pooled biomarker assessments. Biom J 2016; 58:944-61. [PMID: 26927583 DOI: 10.1002/bimj.201500195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/31/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022]
Abstract
There is a need for epidemiological and medical researchers to identify new biomarkers (biological markers) that are useful in determining exposure levels and/or for the purposes of disease detection. Often this process is stunted by high testing costs associated with evaluating new biomarkers. Traditionally, biomarker assessments are individually tested within a target population. Pooling has been proposed to help alleviate the testing costs, where pools are formed by combining several individual specimens. Methods for using pooled biomarker assessments to estimate discriminatory ability have been developed. However, all these procedures have failed to acknowledge confounding factors. In this paper, we propose a regression methodology based on pooled biomarker measurements that allow the assessment of the discriminatory ability of a biomarker of interest. In particular, we develop covariate-adjusted estimators of the receiver-operating characteristic curve, the area under the curve, and Youden's index. We establish the asymptotic properties of these estimators and develop inferential techniques that allow one to assess whether a biomarker is a good discriminator between cases and controls, while controlling for confounders. The finite sample performance of the proposed methodology is illustrated through simulation. We apply our methods to analyze myocardial infarction (MI) data, with the goal of determining whether the pro-inflammatory cytokine interleukin-6 is a good predictor of MI after controlling for the subjects' cholesterol levels.
Collapse
Affiliation(s)
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, USA
| | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| |
Collapse
|
35
|
Delaigle A, Zhou WX. Nonparametric and Parametric Estimators of Prevalence From Group Testing Data With Aggregated Covariates. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1054491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
36
|
Delaigle A, Hall P. Nonparametric methods for group testing data, taking dilution into account. Biometrika 2015. [DOI: 10.1093/biomet/asv049] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
37
|
Wang D, McMahan CS, Gallagher CM. A general regression framework for group testing data, which incorporates pool dilution effects. Stat Med 2015; 34:3606-21. [PMID: 26173957 DOI: 10.1002/sim.6578] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 04/21/2015] [Accepted: 06/15/2015] [Indexed: 01/01/2023]
Abstract
Group testing, through the use of pooling, has been widely implemented as a more efficient means to screen individuals for infectious diseases. Typically, in these settings, practitioners are tasked with the complimentary goals of both case identification and estimation. For these purposes, many group testing strategies have been proposed, which address issues such as preserving anonymity in estimation studies, quality control, and classification. In general, these strategies require that a significant number of the individuals be retested, either in pools or individually. In order to provide practitioners with a general methodology that can be used to accurately and precisely analyze data of this form, herein, we propose a binary regression framework that can incorporate data arising from any group testing strategy. Further, we relax previously made assumptions regarding testing error rates by relating the diagnostic testing results to the latent biological marker levels of the individuals being tested. We investigate the finite sample performance of our proposed methodology through simulation and by applying our techniques to hepatitis B data collected as part of a study involving Irish prisoners.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29028, U.S.A
| | | | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| |
Collapse
|
38
|
Mitchell EM, Lyles RH, Schisterman EF. Positing, fitting, and selecting regression models for pooled biomarker data. Stat Med 2015; 34:2544-58. [PMID: 25846980 DOI: 10.1002/sim.6496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 02/18/2015] [Accepted: 03/13/2015] [Indexed: 01/31/2023]
Abstract
Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, 30322, GA, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| |
Collapse
|
39
|
Mitchell EM, Lyles RH, Manatunga AK, Schisterman EF. Semiparametric regression models for a right-skewed outcome subject to pooling. Am J Epidemiol 2015; 181:541-8. [PMID: 25737248 DOI: 10.1093/aje/kwu301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Pooling specimens prior to performing laboratory assays has various benefits. Pooling can help to reduce cost, preserve irreplaceable specimens, meet minimal volume requirements for certain lab tests, and even reduce information loss when a limit of detection is present. Regardless of the motivation for pooling, appropriate analytical techniques must be applied in order to obtain valid inference from composite specimens. When biomarkers are treated as the outcome in a regression model, techniques applicable to individually measured specimens may not be valid when measurements are taken from pooled specimens, particularly when the biomarker is positive and right skewed. In this paper, we propose a novel semiparametric estimation method based on an adaptation of the quasi-likelihood approach that can be applied to a right-skewed outcome subject to pooling. We use simulation studies to compare this method with an existing estimation technique that provides valid estimates only when pools are formed from specimens with identical predictor values. Simulation results and analysis of a motivating example demonstrate that, when appropriate estimation techniques are applied to strategically formed pools, valid and efficient estimation of the regression coefficients can be achieved.
Collapse
|
40
|
Mitchell EM, Lyles RH, Manatunga AK, Perkins NJ, Schisterman EF. A highly efficient design strategy for regression with outcome pooling. Stat Med 2014; 33:5028-40. [PMID: 25220822 DOI: 10.1002/sim.6305] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 05/19/2014] [Accepted: 08/25/2014] [Indexed: 11/06/2022]
Abstract
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | | | | | | | | |
Collapse
|
41
|
Delaigle A, Hall P, Wishart JR. New approaches to nonparametric and semiparametric regression for univariate and multivariate group testing data. Biometrika 2014. [DOI: 10.1093/biomet/asu025] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
42
|
Zhang Z, Liu C, Kim S, Liu A. Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies. Stat Med 2014; 33:4482-500. [PMID: 25043925 DOI: 10.1002/sim.6268] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Revised: 06/24/2014] [Accepted: 06/30/2014] [Indexed: 11/07/2022]
Abstract
We consider the problem of estimating the prevalence of a disease under a group testing framework. Because assays are usually imperfect, misclassification of disease status is a major challenge in prevalence estimation. To account for possible misclassification, it is usually assumed that the sensitivity and specificity of the assay are known and independent of the group size. This assumption is often questionable, and substitution of incorrect values of an assay's sensitivity and specificity can result in a large bias in the prevalence estimate, which we refer to as the mis-substitution bias. In this article, we propose simple designs and methods for prevalence estimation that do not require known values of assay sensitivity and specificity. If a gold standard test is available, it can be applied to a validation subsample to yield information on the imperfect assay's sensitivity and specificity. When a gold standard is unavailable, it is possible to estimate assay sensitivity and specificity, either as unknown constants or as specified functions of the group size, from group testing data with varying group size. We develop methods for estimating parameters and for finding or approximating optimal designs, and perform extensive simulation experiments to evaluate and compare the different designs. An example concerning human immunodeficiency virus infection is used to illustrate the validation subsample design.
Collapse
Affiliation(s)
- Zhiwei Zhang
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MA, U.S.A
| | | | | | | |
Collapse
|
43
|
Wang D, McMahan CS, Gallagher CM, Kulasekera KB. Semiparametric group testing regression models. Biometrika 2014. [DOI: 10.1093/biomet/asu007] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
44
|
Mitchell EM, Lyles RH, Manatunga AK, Danaher M, Perkins NJ, Schisterman EF. Regression for skewed biomarker outcomes subject to pooling. Biometrics 2014; 70:202-11. [PMID: 24521420 DOI: 10.1111/biom.12134] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/01/2013] [Accepted: 11/01/2013] [Indexed: 11/26/2022]
Abstract
Epidemiological studies involving biomarkers are often hindered by prohibitively expensive laboratory tests. Strategically pooling specimens prior to performing these lab assays has been shown to effectively reduce cost with minimal information loss in a logistic regression setting. When the goal is to perform regression with a continuous biomarker as the outcome, regression analysis of pooled specimens may not be straightforward, particularly if the outcome is right-skewed. In such cases, we demonstrate that a slight modification of a standard multiple linear regression model for poolwise data can provide valid and precise coefficient estimates when pools are formed by combining biospecimens from subjects with identical covariate values. When these x-homogeneous pools cannot be formed, we propose a Monte Carlo expectation maximization (MCEM) algorithm to compute maximum likelihood estimates (MLEs). Simulation studies demonstrate that these analytical methods provide essentially unbiased estimates of coefficient parameters as well as their standard errors when appropriate assumptions are met. Furthermore, we show how one can utilize the fully observed covariate data to inform the pooling strategy, yielding a high level of statistical efficiency at a fraction of the total lab cost.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, U.S.A
| | | | | | | | | | | |
Collapse
|
45
|
Birkner T, Aban IB, Katholi CR. Evaluation of a Frequentist Hierarchical Model to Estimate Prevalence when sampling from a large geographic area using Pool Screening. COMMUN STAT-THEOR M 2013; 42. [PMID: 24347808 DOI: 10.1080/03610926.2011.633732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We present a frequentist Bernoulli-Beta hierarchical model to relax the constant prevalence assumption underlying the traditional prevalence estimation approach based on pooled data. This assumption is called into question when sampling from a large geographic area. Pool screening is a method that combines individual items into pools. Each pool will either test positive (at least one of the items is positive) or negative (all items are negative). Pool screening is commonly applied to the study of tropical diseases where pools consist of vectors (e.g. black flies) that can transmit the disease. The goal is to estimate the proportion of infected vectors. Intermediate estimators (model parameters) and estimators of ultimate interest (pertaining to prevalence) are evaluated by standard measures of merit, such as bias, variance and mean squared error making extensive use of expansions. Using the hierarchical model an investigator can determine the probability of the prevalence being below a prespecified threshold value, a value at which no reemergence of the disease is expected. An investigation into the least biased choice of the α parameter in the Beta (α, β) prevalence distribution leads to the choice of α = 1.
Collapse
Affiliation(s)
- Thomas Birkner
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Inmaculada B Aban
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Charles R Katholi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| |
Collapse
|
46
|
Zhang B, Bilder CR, Tebbs JM. Regression analysis for multiple-disease group testing data. Stat Med 2013; 32:4954-66. [PMID: 23703944 PMCID: PMC4301740 DOI: 10.1002/sim.5858] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 04/29/2013] [Indexed: 11/06/2022]
Abstract
Group testing, where individual specimens are composited into groups to test for the presence of a disease (or other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Group testing data are unique in that only group responses may be available, but inferences are needed at the individual level. A further methodological challenge arises when individuals are tested in groups for multiple diseases simultaneously, because unobserved individual disease statuses are likely correlated. In this paper, we propose new regression techniques for multiple-disease group testing data. We develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures. We apply our proposed methodology to chlamydia and gonorrhea screening data collected in Nebraska as part of the Infertility Prevention Project and to prenatal infectious disease screening data from Kenya.
Collapse
Affiliation(s)
- Boan Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | | | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
47
|
Tebbs JM, McMahan CS, Bilder CR. Two-stage hierarchical group testing for multiple infections with application to the infertility prevention project. Biometrics 2013; 69:1064-73. [PMID: 24117173 PMCID: PMC4371872 DOI: 10.1111/biom.12080] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 06/01/2013] [Accepted: 06/01/2013] [Indexed: 11/30/2022]
Abstract
Screening for sexually transmitted diseases (STDs) has benefited greatly from the use of group testing (pooled testing) to lower costs. With the development of assays that detect multiple infections, screening practices now involve testing pools of individuals for multiple infections simultaneously. Building on the research for single infection group testing procedures, we examine the performance of group testing for multiple infections. Our work is motivated by chlamydia and gonorrhea testing for the infertility prevention project (IPP), a national program in the United States. We consider a two-stage pooling algorithm currently used to perform testing for the IPP. We first derive the operating characteristics of this algorithm for classification purposes (e.g., expected number of tests, misclassification probabilities, etc.) and identify pool sizes that minimize the expected number of tests. We then develop an expectation-maximization (EM) algorithm to estimate probabilities of infection using both group and individual retest responses. Our research shows that group testing can offer large cost savings when classifying individuals for multiple infections and can provide prevalence estimates that are actually more efficient than those from individual testing.
Collapse
Affiliation(s)
- Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | | | |
Collapse
|
48
|
McMahan CS, Tebbs JM, Bilder CR. Regression models for group testing data with pool dilution effects. Biostatistics 2013; 14:284-98. [PMID: 23197382 PMCID: PMC3590921 DOI: 10.1093/biostatistics/kxs045] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Revised: 10/17/2012] [Accepted: 10/19/2012] [Indexed: 11/13/2022] Open
Abstract
Group testing is widely used to reduce the cost of screening individuals for infectious diseases. There is an extensive literature on group testing, most of which traditionally has focused on estimating the probability of infection in a homogeneous population. More recently, this research area has shifted towards estimating individual-specific probabilities in a regression context. However, existing regression approaches have assumed that the sensitivity and specificity of pooled biospecimens are constant and do not depend on the pool sizes. For those applications, where this assumption may not be realistic, these existing approaches can lead to inaccurate inference, especially when pool sizes are large. Our new approach, which exploits the information readily available from underlying continuous biomarker distributions, provides reliable inference in settings where pooling would be most beneficial and does so even for larger pool sizes. We illustrate our methodology using hepatitis B data from a study involving Irish prisoners.
Collapse
|
49
|
Hund L, Pagano M. Estimating HIV prevalence from surveys with low individual consent rates: annealing individual and pooled samples. Emerg Themes Epidemiol 2013; 10:2. [PMID: 23446064 PMCID: PMC3649931 DOI: 10.1186/1742-7622-10-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 02/20/2013] [Indexed: 11/30/2022] Open
Abstract
Many HIV prevalence surveys are plagued by the problem that a sizeable number of surveyed individuals do not consent to contribute blood samples for testing. One can ignore this problem, as is often done, but the resultant bias can be of sufficient magnitude to invalidate the results of the survey, especially if the number of non-responders is high and the reason for refusing to participate is related to the individual’s HIV status. One reason for refusing to participate may be for reasons of privacy. For those individuals, we suggest offering the option of being tested in a pool. This form of testing is less certain than individual testing, but, if it convinces more people to submit to testing, it should reduce the potential for bias and give a cleaner answer to the question of prevalence. This paper explores the logistics of implementing a combined individual and pooled testing approach and evaluates the analytical advantages to such a combined testing strategy. We quantify improvements in a prevalence estimator based on this combined testing strategy, relative to an individual testing only approach and a pooled testing only approach. Minimizing non-response is key for reducing bias, and, if pooled testing assuages privacy concerns, offering a pooled testing strategy has the potential to substantially improve HIV prevalence estimates.
Collapse
Affiliation(s)
- Lauren Hund
- Department of Family and Community Medicine, University of New Mexico, 2400 Tucker NE, Albuquerque, NM 87106, USA.
| | | |
Collapse
|
50
|
Zhang B, Bilder CR, Tebbs JM. Group testing regression model estimation when case identification is a goal. Biom J 2013; 55:173-89. [PMID: 23401252 DOI: 10.1002/bimj.201200168] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Revised: 11/16/2012] [Accepted: 12/22/2012] [Indexed: 11/10/2022]
Abstract
Group testing is frequently used to reduce the costs of screening a large number of individuals for infectious diseases or other binary characteristics in small prevalence situations. In many applications, the goals include both identifying individuals as positive or negative and estimating the probability of positivity. The identification aspect leads to additional tests being performed, known as "retests", beyond those performed for initial groups of individuals. In this paper, we investigate how regression models can be fit to estimate the probability of positivity while also incorporating the extra information from these retests. We present simulation evidence showing that significant gains in efficiency occur by incorporating retesting information, and we further examine which testing protocols are the most efficient to use. Our investigations also demonstrate that some group testing protocols can actually lead to more efficient estimates than individual testing when diagnostic tests are imperfect. The proposed methods are applied retrospectively to chlamydia screening data from the Infertility Prevention Project. We demonstrate that significant cost savings could occur through the use of particular group testing protocols.
Collapse
Affiliation(s)
- Boan Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | | | | |
Collapse
|