1
|
Roy S, Adhya S, Rana S. Estimation of odds ratio from group testing data with misclassified exposure. Biom J 2024; 66:e2200254. [PMID: 38285402 DOI: 10.1002/bimj.202200254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 08/21/2023] [Accepted: 09/17/2023] [Indexed: 01/30/2024]
Abstract
For low prevalence disease, we consider estimation of the odds ratio for two specified groups of individuals using group testing data. Broadly the two groups may be classified as "the exposed" and "the unexposed." Often in observational studies, the exposure status is not correctly recorded. In addition, diagnostic tests are rarely completely accurate. The proposed model accounts for imperfect sensitivity and specificity of diagnostic tests along with the misclassification in the exposure status. For model identifiability, we make use of internal validation data, where a subsample of reasonably small size is selected from the original sample by simple random sampling without replacement. Pseudo-maximum likelihood method is employed for the estimation of the model parameters. The performance of group testing methodology is compared with individual testing for different parametric configurations. A limited data study related to COVID-19 prevalence is performed to illustrate the methodology.
Collapse
Affiliation(s)
- Surupa Roy
- Department of Statistics, St Xavier's College (Autonomous), Kolkata, West Bengal, India
| | - Sumanta Adhya
- Department of Statistics, West Bengal State University, Kolkata, West Bengal, India
| | - Subrata Rana
- Department of Statistics, Krishnagar Government College, Kolkata, West Bengal, India
| |
Collapse
|
2
|
Islam A, Islam S, Islam M, Hossain ME, Munro S, Samad MA, Rahman MK, Shirin T, Flora MS, Hassan MM, Rahman MZ, Epstein JH. Prevalence and risk factors for avian influenza virus (H5 and H9) contamination in peri-urban and rural live bird markets in Bangladesh. Front Public Health 2023; 11:1148994. [PMID: 37151580 PMCID: PMC10158979 DOI: 10.3389/fpubh.2023.1148994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 03/27/2023] [Indexed: 05/09/2023] Open
Abstract
Avian influenza viruses (AIV) have been frequently detected in live bird markets (LBMs) around the world, primarily in urban areas, and have the ability to spillover to other species, including humans. Despite frequent detection of AIV in urban LBMs, the contamination of AIV on environmental surfaces in rural and peri-urban LBMs in Bangladesh is poorly documented. Therefore, we conducted this study to determine the prevalence of AIV subtypes within a subset of peri-urban and rural LBMs in Bangladesh and to further identify associated risk factors. Between 2017 and 2018, we collected faecal and offal samples from 200 stalls in 63 LBMs across four sub-districts. We tested the samples for the AIV matrix gene (M-gene) followed by H5, H7, and H9 subtypes using real-time reverse transcriptase-polymerase chain reaction (rRT-PCR). We performed a descriptive analysis of market cleanliness and sanitation practices in order to further elucidate the relationship between LBM biosecurity and AIV subtypes by species, sample types, and landscape. Subsequently, we conducted a univariate analysis and a generalized linear mixed model (GLMM) to determine the risk factors associated with AIV contamination at individual stalls within LBMs. Our findings indicate that practices related to hygiene and the circulation of AIV significantly differed between rural and peri-urban live bird markets. 42.5% (95% CI: 35.56-49.67) of stalls were positive for AIV. A/H5, A/H9, and A HA/Untyped were detected in 10.5% (95% CI: 6.62-15.60), 9% (95% CI: 5.42-13.85), and 24.0% (95% CI: 18.26-30.53) of stalls respectively, with no detection of A/H7. Significantly higher levels of AIV were found in the Sonali chicken strain compared to the exotic broiler, and in offal samples compared to fecal samples. In the GLMM analysis, we identified several significant risk factors associated with AIV contamination in LBMs at the stall level. These include: landscape (AOR: 3.02; 95% CI: 1.18-7.72), the number of chicken breeds present (AOR: 2.4; 95% CI: 1.01-5.67), source of birds (AOR: 2.35; 95% CI: 1.0-5.53), separation of sick birds (AOR: 3.04; 95% CI: 1.34-6.92), disposal of waste/dead birds (AOR: 3.16; 95% CI: 1.41-7.05), cleaning agent (AOR: 5.99; 95% CI: 2.26-15.82), access of dogs (AOR: 2.52; 95% CI: 1.12-5.7), wild birds observed on site (AOR: 2.31; 95% CI: 1.01-5.3). The study further revealed a substantial prevalence of AIV with H5 and H9 subtypes in peri-urban and rural LBMs. The inadequate biosecurity measures at poultry stalls in Bangladesh increase the risk of AIV transmission from poultry to humans. To prevent the spread of AIV to humans and wild birds, we suggest implementing regular surveillance at live bird markets and enhancing biosecurity practices in peri-urban and rural areas in Bangladesh.
Collapse
Affiliation(s)
- Ariful Islam
- EcoHealth Alliance, New York, NY, United States
- Centre for Integrative Ecology, School of Life and Environmental Science, Deakin University, Geelong Waurn Ponds, VIC, Australia
- *Correspondence: Ariful Islam,
| | - Shariful Islam
- Institute of Epidemiology, Disease Control and Research (IEDCR), Dhaka, Bangladesh
| | - Monjurul Islam
- Institute of Epidemiology, Disease Control and Research (IEDCR), Dhaka, Bangladesh
| | - Mohammad Enayet Hossain
- One Health Laboratory, International Centre for Diarrheal Diseases Research, Bangladesh (icddr,b), Dhaka, Bangladesh
| | - Sarah Munro
- EcoHealth Alliance, New York, NY, United States
| | - Mohammed Abdus Samad
- National Reference Laboratory for Avian Influenza, Bangladesh Livestock Research Institute (BLRI), Savar, Bangladesh
| | - Md. Kaisar Rahman
- Institute of Epidemiology, Disease Control and Research (IEDCR), Dhaka, Bangladesh
| | - Tahmina Shirin
- Institute of Epidemiology, Disease Control and Research (IEDCR), Dhaka, Bangladesh
| | | | - Mohammad Mahmudul Hassan
- Queensland Alliance for One Health Sciences, School of Veterinary Science, University of Queensland, Brisbane, QLD, Australia
| | - Mohammed Ziaur Rahman
- One Health Laboratory, International Centre for Diarrheal Diseases Research, Bangladesh (icddr,b), Dhaka, Bangladesh
| | | |
Collapse
|
3
|
Warasi MS, Hungerford LL, Lahmers K. Optimizing Pooled Testing for Estimating the Prevalence of Multiple Diseases. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2022; 27:713-727. [PMID: 35975123 PMCID: PMC9373899 DOI: 10.1007/s13253-022-00511-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/27/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022]
Abstract
Pooled testing can enhance the efficiency of diagnosing individuals with diseases of low prevalence. Often, pooling is implemented using standard groupings (2, 5, 10, etc.). On the other hand, optimization theory can provide specific guidelines in finding the ideal pool size and pooling strategy. This article focuses on optimizing the precision of disease prevalence estimators calculated from multiplex pooled testing data. In the context of a surveillance application of animal diseases, we study the estimation efficiency (i.e., precision) and cost efficiency of the estimators with adjustments for the number of expended tests. This enables us to determine the pooling strategies that offer the highest benefits when jointly estimating the prevalence of multiple diseases, such as theileriosis and anaplasmosis. The outcomes of our work can be used in designing pooled testing protocols, not only in simple pooling scenarios but also in more complex scenarios where individual retesting is performed in order to identify positive cases. A software application using the shiny package in R is provided with this article to facilitate implementation of our methods. Supplementary materials accompanying this paper appear online.
Collapse
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Whitt Hall 224, Radford, VA 24142 USA
| | - Laura L. Hungerford
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| | - Kevin Lahmers
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| |
Collapse
|
4
|
Warasi MS. groupTesting: an R package for group testing estimation. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.2009867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA, USA
| |
Collapse
|
5
|
Liu Y, McMahan CS, Tebbs JM, Gallagher CM, Bilder CR. Generalized additive regression for group testing data. Biostatistics 2021; 22:873-889. [PMID: 32061081 PMCID: PMC8511943 DOI: 10.1093/biostatistics/kxaa003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 01/04/2020] [Accepted: 01/13/2020] [Indexed: 11/13/2022] Open
Abstract
In screening applications involving low-prevalence diseases, pooling specimens (e.g., urine, blood, swabs, etc.) through group testing can be far more cost effective than testing specimens individually. Estimation is a common goal in such applications and typically involves modeling the probability of disease as a function of available covariates. In recent years, several authors have developed regression methods to accommodate the complex structure of group testing data but often under the assumption that covariate effects are linear. Although linearity is a reasonable assumption in some applications, it can lead to model misspecification and biased inference in others. To offer a more flexible framework, we propose a Bayesian generalized additive regression approach to model the individual-level probability of disease with potentially misclassified group testing data. Our approach can be used to analyze data arising from any group testing protocol with the goal of estimating multiple unknown smooth functions of covariates, standard linear effects for other covariates, and assay classification accuracy probabilities. We illustrate the methods in this article using group testing data on chlamydia infection in Iowa.
Collapse
Affiliation(s)
- Yan Liu
- School of Community Health Sciences, University of Nevada, Reno, 1664 N. Virginia St, Reno, NV 89557, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, 1523 Greene St, Columbia, SC 29208, USA
| | - Colin M Gallagher
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, 340 Hardin Hall North, Lincoln, NE 68583, USA
| |
Collapse
|
6
|
Hoegh A, Peel AJ, Madden W, Ruiz Aravena M, Morris A, Washburne A, Plowright RK. Estimating viral prevalence with data fusion for adaptive two-phase pooled sampling. Ecol Evol 2021; 11:14012-14023. [PMID: 34707835 PMCID: PMC8525136 DOI: 10.1002/ece3.8107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 06/09/2021] [Accepted: 06/18/2021] [Indexed: 11/16/2022] Open
Abstract
The COVID-19 pandemic has highlighted the importance of efficient sampling strategies and statistical methods for monitoring infection prevalence, both in humans and in reservoir hosts. Pooled testing can be an efficient tool for learning pathogen prevalence in a population. Typically, pooled testing requires a second-phase retesting procedure to identify infected individuals, but when the goal is solely to learn prevalence in a population, such as a reservoir host, there are more efficient methods for allocating the second-phase samples.To estimate pathogen prevalence in a population, this manuscript presents an approach for data fusion with two-phased testing of pooled samples that allows more efficient estimation of prevalence with less samples than traditional methods. The first phase uses pooled samples to estimate the population prevalence and inform efficient strategies for the second phase. To combine information from both phases, we introduce a Bayesian data fusion procedure that combines pooled samples with individual samples for joint inferences about the population prevalence.Data fusion procedures result in more efficient estimation of prevalence than traditional procedures that only use individual samples or a single phase of pooled sampling.The manuscript presents guidance on implementing the first-phase and second-phase sampling plans using data fusion. Such methods can be used to assess the risk of pathogen spillover from reservoir hosts to humans, or to track pathogens such as SARS-CoV-2 in populations.
Collapse
Affiliation(s)
- Andrew Hoegh
- Department of Mathematical SciencesMontana State UniversityBozemanMTUSA
| | - Alison J. Peel
- Centre for Planetary Health and Food SecurityGriffith UniversityNathanQLDAustralia
| | - Wyatt Madden
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| | - Manuel Ruiz Aravena
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| | - Aaron Morris
- Department of Veterinary MedicineUniversity of CambridgeCambridgeUK
| | | | - Raina K. Plowright
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| |
Collapse
|
7
|
Yuan A, Piao J, Ning J, Qin J. Semiparametric isotonic regression modelling and estimation for group testing data. CAN J STAT 2021; 49:659-677. [PMID: 34690407 PMCID: PMC8528191 DOI: 10.1002/cjs.11581] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 05/04/2020] [Indexed: 11/11/2022]
Abstract
In the group testing procedure, several individual samples are grouped and the pooled samples, instead of each individual sample, are tested for outcome status (e.g., infectious disease status). Although this cost-effectiveness strategy in data collection is both labor and time efficient, it poses statistical challenges to derive statistically and computationally efficient estimators under semiparametric models. We consider semiparametric isotonic regression models for the simultaneous estimation of the conditional probability curve and covariate effects, in which a parametric form for combining the covariate information is assumed and the monotonic link function is left unspecified. We develop an expectation-maximization algorithm to overcome the computational challenge and embed the pool-adjacent violators algorithm in the M-step to facilitate the computation. We establish the large sample behavior of the proposed estimators and examine their finite sample performance in simulation studies. We apply the proposed method to data from the National Health and Nutrition Examination Survey for illustration.
Collapse
Affiliation(s)
- Ao Yuan
- Department of Biostatistics, Bioinformatics & Biomathematics, Georgetown University, Washington, DC USA
| | - Jin Piao
- Department of Preventive Medicine, The University of Southern California, Los Angeles, CA USA
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Jing Qin
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Rockville, MD USA
| |
Collapse
|
8
|
Zhang W, Liu A, Li Q, Albert PS. Nonparametric estimation of distributions and diagnostic accuracy based on group-tested results with differential misclassification. Biometrics 2020; 76:1147-1156. [PMID: 32083733 PMCID: PMC8581970 DOI: 10.1111/biom.13236] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 12/06/2019] [Accepted: 01/27/2020] [Indexed: 11/30/2022]
Abstract
This article concerns the problem of estimating a continuous distribution in a diseased or nondiseased population when only group-based test results on the disease status are available. The problem is challenging in that individual disease statuses are not observed and testing results are often subject to misclassification, with further complication that the misclassification may be differential as the group size and the number of the diseased individuals in the group vary. We propose a method to construct nonparametric estimation of the distribution and obtain its asymptotic properties. The performance of the distribution estimator is evaluated under various design considerations concerning group sizes and classification errors. The method is exemplified with data from the National Health and Nutrition Examination Survey study to estimate the distribution and diagnostic accuracy of C-reactive protein in blood samples in predicting chlamydia incidence.
Collapse
Affiliation(s)
- Wei Zhang
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Qizhai Li
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Paul S. Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
9
|
Joyner CN, McMahan CS, Tebbs JM, Bilder CR. From mixed effects modeling to spike and slab variable selection: A Bayesian regression model for group testing data. Biometrics 2020; 76:913-923. [PMID: 31729015 PMCID: PMC7944974 DOI: 10.1111/biom.13176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 10/22/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022]
Abstract
Due to reductions in both time and cost, group testing is a popular alternative to individual-level testing for disease screening. These reductions are obtained by testing pooled biospecimens (eg, blood, urine, swabs, etc.) for the presence of an infectious agent. However, these reductions come at the expense of data complexity, making the task of conducting disease surveillance more tenuous when compared to using individual-level data. This is because an individual's disease status may be obscured by a group testing protocol and the effect of imperfect testing. Furthermore, unlike individual-level testing, a given participant could be involved in multiple testing outcomes and/or may never be tested individually. To circumvent these complexities and to incorporate all available information, we propose a Bayesian generalized linear mixed model that accommodates data arising from any group testing protocol, estimates unknown assay accuracy probabilities and accounts for potential heterogeneity in the covariate effects across population subgroups (eg, clinic sites, etc.); this latter feature is of key interest to practitioners tasked with conducting disease surveillance. To achieve model selection, our proposal uses spike and slab priors for both fixed and random effects. The methodology is illustrated through numerical studies and is applied to chlamydia surveillance data collected in Iowa.
Collapse
Affiliation(s)
- Chase N. Joyner
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Christopher S. McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
10
|
|
11
|
Zhang W, Liu A, Li Q, Albert PS. Incorporating retesting outcomes for estimation of disease prevalence. Stat Med 2019; 39:687-697. [PMID: 31758594 DOI: 10.1002/sim.8439] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 10/31/2019] [Accepted: 11/03/2019] [Indexed: 11/12/2022]
Abstract
Group testing has been widely used as a cost-effective strategy to screen for and estimate the prevalence of a rare disease. While it is well-recognized that retesting is necessary for identifying infected subjects, it is not required for estimating the prevalence. For a test without misclassification, gains in statistical efficiency are expected from incorporating retesting results in the estimation of the prevalence. However, when the test is subject to misclassification, it is not clear how much gain should be expected. There are a number of theoretical challenges in addressing this issue, including (1) enumerating the potential test results from retesting individual subjects in a group, (2) the dependence among these test results and the test result from testing at the group level, and (3) differential misclassification due to pooling of biospecimens. Overcoming some of these challenges, we show that retesting subjects in either positive or negative groups can substantially improve the efficiency of the estimation and that retesting positive groups yields higher efficiency than retesting a same number or proportion of negative groups.
Collapse
Affiliation(s)
- Wei Zhang
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Qizhai Li
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Institutes of Health, Rockville, Maryland
| |
Collapse
|
12
|
Gregory KB, Wang D, McMahan CS. Adaptive elastic net for group testing. Biometrics 2019; 75:13-23. [PMID: 30267535 PMCID: PMC7938860 DOI: 10.1111/biom.12973] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 09/14/2018] [Indexed: 11/28/2022]
Abstract
For disease screening, group (pooled) testing can be a cost-saving alternative to one-at-a-time testing, with savings realized through assaying pooled biospecimen (eg, urine, blood, saliva). In many group testing settings, practitioners are faced with the task of conducting disease surveillance. That is, it is often of interest to relate individuals' true disease statuses to covariate information via binary regression. Several authors have developed regression methods for group testing data, which is challenging due to the effects of imperfect testing. That is, all testing outcomes (on pools and individuals) are subject to misclassification, and individuals' true statuses are never observed. To further complicate matters, individuals may be involved in several testing outcomes. For analyzing such data, we provide a novel regression methodology which generalizes and extends the aforementioned regression techniques and which incorporates regularization. Specifically, for model fitting and variable selection, we propose an adaptive elastic net estimator under the logistic regression model which can be used to analyze data from any group testing strategy. We provide an efficient algorithm for computing the estimator along with guidance on tuning parameter selection. Moreover, we establish the asymptotic properties of the proposed estimator and show that it possesses "oracle" properties. We evaluate the performance of the estimator through Monte Carlo studies and illustrate the methodology on a chlamydia data set from the State Hygienic Laboratory in Iowa City.
Collapse
Affiliation(s)
- Karl B. Gregory
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
13
|
McMahan CS, Tebbs JM, Hanson TE, Bilder CR. Bayesian regression for group testing data. Biometrics 2017; 73:1443-1452. [PMID: 28405965 PMCID: PMC5638690 DOI: 10.1111/biom.12704] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 03/01/2017] [Accepted: 03/01/2017] [Indexed: 01/10/2023]
Abstract
Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia, and we make user-friendly R code available to practitioners.
Collapse
Affiliation(s)
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Timothy E. Hanson
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
14
|
Warasi MS, McMahan CS, Tebbs JM, Bilder CR. Group testing regression models with dilution submodels. Stat Med 2017; 36:4860-4872. [PMID: 28856774 DOI: 10.1002/sim.7455] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 05/27/2017] [Accepted: 08/11/2017] [Indexed: 12/21/2022]
Abstract
Group testing, where specimens are tested initially in pools, is widely used to screen individuals for sexually transmitted diseases. However, a common problem encountered in practice is that group testing can increase the number of false negative test results. This occurs primarily when positive individual specimens within a pool are diluted by negative ones, resulting in positive pools testing negatively. If the goal is to estimate a population-level regression model relating individual disease status to observed covariates, severe bias can result if an adjustment for dilution is not made. Recognizing this as a critical issue, recent binary regression approaches in group testing have utilized continuous biomarker information to acknowledge the effect of dilution. In this paper, we have the same overall goal but take a different approach. We augment existing group testing regression models (that assume no dilution) with a parametric dilution submodel for pool-level sensitivity and estimate all parameters using maximum likelihood. An advantage of our approach is that it does not rely on external biomarker test data, which may not be available in surveillance studies. Furthermore, unlike previous approaches, our framework allows one to formally test whether dilution is present based on the observed group testing data. We use simulation to illustrate the performance of our estimation and inference methods, and we apply these methods to 2 infectious disease data sets.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA 24142, USA
| | | | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, NE, USA
| |
Collapse
|
15
|
Liu Y, McMahan C, Gallagher C. A general framework for the regression analysis of pooled biomarker assessments. Stat Med 2017; 36:2363-2377. [PMID: 28349583 PMCID: PMC5484591 DOI: 10.1002/sim.7291] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 02/17/2017] [Accepted: 03/06/2017] [Indexed: 11/11/2022]
Abstract
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yan Liu
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher McMahan
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Colin Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
16
|
Warasi MS, Tebbs JM, McMahan CS, Bilder CR. Estimating the prevalence of multiple diseases from two-stage hierarchical pooling. Stat Med 2016; 35:3851-64. [PMID: 27090057 PMCID: PMC4965323 DOI: 10.1002/sim.6964] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2015] [Revised: 12/31/2015] [Accepted: 03/17/2016] [Indexed: 11/08/2022]
Abstract
Testing protocols in large-scale sexually transmitted disease screening applications often involve pooling biospecimens (e.g., blood, urine, and swabs) to lower costs and to increase the number of individuals who can be tested. With the recent development of assays that detect multiple diseases, it is now common to test biospecimen pools for multiple infections simultaneously. Recent work has developed an expectation-maximization algorithm to estimate the prevalence of two infections using a two-stage, Dorfman-type testing algorithm motivated by current screening practices for chlamydia and gonorrhea in the USA. In this article, we have the same goal but instead take a more flexible Bayesian approach. Doing so allows us to incorporate information about assay uncertainty during the testing process, which involves testing both pools and individuals, and also to update information as individuals are tested. Overall, our approach provides reliable inference for disease probabilities and accurately estimates assay sensitivity and specificity even when little or no information is provided in the prior distributions. We illustrate the performance of our estimation methods using simulation and by applying them to chlamydia and gonorrhea data collected in Nebraska. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | | | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, 68583, NE, U.S.A
| |
Collapse
|
17
|
McMahan CS, McLain AC, Gallagher CM, Schisterman EF. Estimating covariate-adjusted measures of diagnostic accuracy based on pooled biomarker assessments. Biom J 2016; 58:944-61. [PMID: 26927583 DOI: 10.1002/bimj.201500195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/31/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022]
Abstract
There is a need for epidemiological and medical researchers to identify new biomarkers (biological markers) that are useful in determining exposure levels and/or for the purposes of disease detection. Often this process is stunted by high testing costs associated with evaluating new biomarkers. Traditionally, biomarker assessments are individually tested within a target population. Pooling has been proposed to help alleviate the testing costs, where pools are formed by combining several individual specimens. Methods for using pooled biomarker assessments to estimate discriminatory ability have been developed. However, all these procedures have failed to acknowledge confounding factors. In this paper, we propose a regression methodology based on pooled biomarker measurements that allow the assessment of the discriminatory ability of a biomarker of interest. In particular, we develop covariate-adjusted estimators of the receiver-operating characteristic curve, the area under the curve, and Youden's index. We establish the asymptotic properties of these estimators and develop inferential techniques that allow one to assess whether a biomarker is a good discriminator between cases and controls, while controlling for confounders. The finite sample performance of the proposed methodology is illustrated through simulation. We apply our methods to analyze myocardial infarction (MI) data, with the goal of determining whether the pro-inflammatory cytokine interleukin-6 is a good predictor of MI after controlling for the subjects' cholesterol levels.
Collapse
Affiliation(s)
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, USA
| | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| |
Collapse
|
18
|
Wang D, McMahan CS, Gallagher CM. A general regression framework for group testing data, which incorporates pool dilution effects. Stat Med 2015; 34:3606-21. [PMID: 26173957 DOI: 10.1002/sim.6578] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 04/21/2015] [Accepted: 06/15/2015] [Indexed: 01/01/2023]
Abstract
Group testing, through the use of pooling, has been widely implemented as a more efficient means to screen individuals for infectious diseases. Typically, in these settings, practitioners are tasked with the complimentary goals of both case identification and estimation. For these purposes, many group testing strategies have been proposed, which address issues such as preserving anonymity in estimation studies, quality control, and classification. In general, these strategies require that a significant number of the individuals be retested, either in pools or individually. In order to provide practitioners with a general methodology that can be used to accurately and precisely analyze data of this form, herein, we propose a binary regression framework that can incorporate data arising from any group testing strategy. Further, we relax previously made assumptions regarding testing error rates by relating the diagnostic testing results to the latent biological marker levels of the individuals being tested. We investigate the finite sample performance of our proposed methodology through simulation and by applying our techniques to hepatitis B data collected as part of a study involving Irish prisoners.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29028, U.S.A
| | | | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| |
Collapse
|