1
|
Huang R, McLain AC, Herrin BH, Nolan M, Cai B, Self S. Bayesian group testing regression models for spatial data. Spat Spatiotemporal Epidemiol 2024; 50:100677. [PMID: 39181610 PMCID: PMC11347770 DOI: 10.1016/j.sste.2024.100677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 07/12/2024] [Accepted: 07/15/2024] [Indexed: 08/27/2024]
Abstract
Spatial patterns are common in infectious disease epidemiology. Disease mapping is essential to infectious disease surveillance. Under a group testing protocol, biomaterial from multiple individuals is physically combined into a pooled specimen, which is then tested for infection. If the pool tests negative, all contributing individuals are generally assumed to be uninfected. If the pool tests positive, the individuals are usually retested to determine who is infected. When the prevalence of infection is low, group testing provides significant cost savings over traditional individual testing by reducing the number of tests required. However, the lack of statistical methods capable of producing maps from group testing data has limited the use of group testing in disease mapping. We develop a Bayesian methodology that can simultaneously map disease prevalence using group testing data and identify risk factors for infection. We illustrate its real-world utility using two datasets from vector-borne disease surveillance.
Collapse
Affiliation(s)
- Rongjie Huang
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Brian H Herrin
- College of Veterinary Medicine, Kansas State University, 1700 Denison Ave, Manhattan, 66502, KS, USA
| | - Melissa Nolan
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Bo Cai
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA
| | - Stella Self
- Department of Epidemiology and Biostatistics, University of South Carolina, 915 Greene Street, Columbia, 29208, SC, USA.
| |
Collapse
|
2
|
Roy S, Adhya S, Rana S. Estimation of odds ratio from group testing data with misclassified exposure. Biom J 2024; 66:e2200254. [PMID: 38285402 DOI: 10.1002/bimj.202200254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 08/21/2023] [Accepted: 09/17/2023] [Indexed: 01/30/2024]
Abstract
For low prevalence disease, we consider estimation of the odds ratio for two specified groups of individuals using group testing data. Broadly the two groups may be classified as "the exposed" and "the unexposed." Often in observational studies, the exposure status is not correctly recorded. In addition, diagnostic tests are rarely completely accurate. The proposed model accounts for imperfect sensitivity and specificity of diagnostic tests along with the misclassification in the exposure status. For model identifiability, we make use of internal validation data, where a subsample of reasonably small size is selected from the original sample by simple random sampling without replacement. Pseudo-maximum likelihood method is employed for the estimation of the model parameters. The performance of group testing methodology is compared with individual testing for different parametric configurations. A limited data study related to COVID-19 prevalence is performed to illustrate the methodology.
Collapse
Affiliation(s)
- Surupa Roy
- Department of Statistics, St Xavier's College (Autonomous), Kolkata, West Bengal, India
| | - Sumanta Adhya
- Department of Statistics, West Bengal State University, Kolkata, West Bengal, India
| | - Subrata Rana
- Department of Statistics, Krishnagar Government College, Kolkata, West Bengal, India
| |
Collapse
|
3
|
Delaigle A, Tan R. Group testing regression analysis with covariates and specimens subject to missingness. Stat Med 2023; 42:731-744. [PMID: 36646446 DOI: 10.1002/sim.9640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 09/06/2022] [Accepted: 12/16/2022] [Indexed: 01/18/2023]
Abstract
We develop parametric estimators of a conditional prevalence in the group testing context. Group testing is applied when a binary outcome variable, often a disease indicator, is assessed by testing a specimen for the presence of the disease. Instead of testing all individual specimens separately, these are pooled in groups and the grouped specimens are tested for the disease, which permits to significantly reduce the number of tests to be performed. Various techniques have been developed in the literature for estimating a conditional prevalence from group testing data, but most of them are not valid when the data are subject to missingness. We consider this problem in the case where the specimen and the covariates are subject to nonmonotone missingness. We propose parametric estimators of the conditional prevalence, establish identifiability conditions for a logistic missing not at random model, and introduce an ignorable missing at random model. In theory, our estimators could be applied with multiple covariates missing, but in practice, they face numerical challenges when more than one covariate is missing for given individuals. We illustrate the method on simulated data and on a dataset from the Demographics and Health Survey.
Collapse
Affiliation(s)
- Aurore Delaigle
- School of Mathematics and Statistics, University of Melbourne, 3010, Victoria, Parkville, Australia
| | - Ruoxu Tan
- School of Mathematics and Statistics, University of Melbourne, 3010, Victoria, Parkville, Australia
- Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
4
|
Best AF, Malinovsky Y, Albert PS. The efficient design of Nested Group Testing algorithms for disease identification in clustered data. J Appl Stat 2022; 50:2228-2245. [PMID: 37434628 PMCID: PMC10332225 DOI: 10.1080/02664763.2022.2071419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 04/23/2022] [Indexed: 10/18/2022]
Abstract
Group testing study designs have been used since the 1940s to reduce screening costs for uncommon diseases; for rare diseases, all cases are identifiable with substantially fewer tests than the population size. Substantial research has identified efficient designs under this paradigm. However, little work has focused on the important problem of disease screening among clustered data, such as geographic heterogeneity in HIV prevalence. We evaluated designs where we first estimate disease prevalence and then apply efficient group testing algorithms using these estimates. Specifically, we evaluate prevalence using individual testing on a fixed-size subset of each cluster and use these prevalence estimates to choose group sizes that minimize the corresponding estimated average number of tests per subject. We compare designs where we estimate cluster-specific prevalences as well as a common prevalence across clusters, use different group testing algorithms, construct groups from individuals within and in different clusters, and consider misclassification. For diseases with low prevalence, our results suggest that accounting for clustering is unnecessary. However, for diseases with higher prevalence and sizeable between-cluster heterogeneity, accounting for clustering in study design and implementation improves efficiency. We consider the practical aspects of our design recommendations with two examples with strong clustering effects: (1) Identification of HIV carriers in the US population and (2) Laboratory screening of anti-cancer compounds using cell lines.
Collapse
Affiliation(s)
- Ana F. Best
- Biostatistics Branch, Biometrics Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yaakov Malinovsky
- Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Paul S. Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
5
|
Comess S, Wang H, Holmes S, Donnat C. Statistical Modeling for Practical Pooled Testing During the COVID-19 Pandemic. Stat Sci 2022. [DOI: 10.1214/22-sts857] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Saskia Comess
- Saskia Comess is a PhD student, Emmett Interdisciplinary Program in Environment and Resources, Stanford University, Stanford, California
| | - Hannah Wang
- Hannah Wang is a resident physician, Department of Anatomic and Clinical Pathology, Stanford University School of Medicine, Stanford, California
| | - Susan Holmes
- Susan Holmes is a Professor, Department of Statistics, Stanford University, Stanford, California
| | - Claire Donnat
- Claire Donnat is an Assistant Professor, Department of Statistics, The University of Chicago, Chicago, Illinois
| |
Collapse
|
6
|
Warasi MS. groupTesting: an R package for group testing estimation. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.2009867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA, USA
| |
Collapse
|
7
|
Liu Y, McMahan CS, Tebbs JM, Gallagher CM, Bilder CR. Generalized additive regression for group testing data. Biostatistics 2021; 22:873-889. [PMID: 32061081 PMCID: PMC8511943 DOI: 10.1093/biostatistics/kxaa003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 01/04/2020] [Accepted: 01/13/2020] [Indexed: 11/13/2022] Open
Abstract
In screening applications involving low-prevalence diseases, pooling specimens (e.g., urine, blood, swabs, etc.) through group testing can be far more cost effective than testing specimens individually. Estimation is a common goal in such applications and typically involves modeling the probability of disease as a function of available covariates. In recent years, several authors have developed regression methods to accommodate the complex structure of group testing data but often under the assumption that covariate effects are linear. Although linearity is a reasonable assumption in some applications, it can lead to model misspecification and biased inference in others. To offer a more flexible framework, we propose a Bayesian generalized additive regression approach to model the individual-level probability of disease with potentially misclassified group testing data. Our approach can be used to analyze data arising from any group testing protocol with the goal of estimating multiple unknown smooth functions of covariates, standard linear effects for other covariates, and assay classification accuracy probabilities. We illustrate the methods in this article using group testing data on chlamydia infection in Iowa.
Collapse
Affiliation(s)
- Yan Liu
- School of Community Health Sciences, University of Nevada, Reno, 1664 N. Virginia St, Reno, NV 89557, USA
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, 1523 Greene St, Columbia, SC 29208, USA
| | - Colin M Gallagher
- School of Mathematical and Statistical Sciences, Clemson University, O-110 Martin Hall, Box 340975, Clemson, SC 29634, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, 340 Hardin Hall North, Lincoln, NE 68583, USA
| |
Collapse
|
8
|
Hoegh A, Peel AJ, Madden W, Ruiz Aravena M, Morris A, Washburne A, Plowright RK. Estimating viral prevalence with data fusion for adaptive two-phase pooled sampling. Ecol Evol 2021; 11:14012-14023. [PMID: 34707835 PMCID: PMC8525136 DOI: 10.1002/ece3.8107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 06/09/2021] [Accepted: 06/18/2021] [Indexed: 11/16/2022] Open
Abstract
The COVID-19 pandemic has highlighted the importance of efficient sampling strategies and statistical methods for monitoring infection prevalence, both in humans and in reservoir hosts. Pooled testing can be an efficient tool for learning pathogen prevalence in a population. Typically, pooled testing requires a second-phase retesting procedure to identify infected individuals, but when the goal is solely to learn prevalence in a population, such as a reservoir host, there are more efficient methods for allocating the second-phase samples.To estimate pathogen prevalence in a population, this manuscript presents an approach for data fusion with two-phased testing of pooled samples that allows more efficient estimation of prevalence with less samples than traditional methods. The first phase uses pooled samples to estimate the population prevalence and inform efficient strategies for the second phase. To combine information from both phases, we introduce a Bayesian data fusion procedure that combines pooled samples with individual samples for joint inferences about the population prevalence.Data fusion procedures result in more efficient estimation of prevalence than traditional procedures that only use individual samples or a single phase of pooled sampling.The manuscript presents guidance on implementing the first-phase and second-phase sampling plans using data fusion. Such methods can be used to assess the risk of pathogen spillover from reservoir hosts to humans, or to track pathogens such as SARS-CoV-2 in populations.
Collapse
Affiliation(s)
- Andrew Hoegh
- Department of Mathematical SciencesMontana State UniversityBozemanMTUSA
| | - Alison J. Peel
- Centre for Planetary Health and Food SecurityGriffith UniversityNathanQLDAustralia
| | - Wyatt Madden
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| | - Manuel Ruiz Aravena
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| | - Aaron Morris
- Department of Veterinary MedicineUniversity of CambridgeCambridgeUK
| | | | - Raina K. Plowright
- Department of Microbiology and ImmunologyMontana State UniversityBozemanMTUSA
| |
Collapse
|
9
|
Molenberghs G, Buyse M, Abrams S, Hens N, Beutels P, Faes C, Verbeke G, Van Damme P, Goossens H, Neyens T, Herzog S, Theeten H, Pepermans K, Abad AA, Van Keilegom I, Speybroeck N, Legrand C, De Buyser S, Hulstaert F. Infectious diseases epidemiology, quantitative methodology, and clinical research in the midst of the COVID-19 pandemic: Perspective from a European country. Contemp Clin Trials 2020; 99:106189. [PMID: 33132155 PMCID: PMC7581408 DOI: 10.1016/j.cct.2020.106189] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 10/04/2020] [Accepted: 10/16/2020] [Indexed: 01/08/2023]
Abstract
Starting from historic reflections, the current SARS-CoV-2 induced COVID-19 pandemic is examined from various perspectives, in terms of what it implies for the implementation of non-pharmaceutical interventions, the modeling and monitoring of the epidemic, the development of early-warning systems, the study of mortality, prevalence estimation, diagnostic and serological testing, vaccine development, and ultimately clinical trials. Emphasis is placed on how the pandemic had led to unprecedented speed in methodological and clinical development, the pitfalls thereof, but also the opportunities that it engenders for national and international collaboration, and how it has simplified and sped up procedures. We also study the impact of the pandemic on clinical trials in other indications. We note that it has placed biostatistics, epidemiology, virology, infectiology, and vaccinology, and related fields in the spotlight in an unprecedented way, implying great opportunities, but also the need to communicate effectively, often amidst controversy.
Collapse
Affiliation(s)
- Geert Molenberghs
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; Interuniversity Institute for Biostatistics and statistical Bioinformatics, KU Leuven, Belgium
| | - Marc Buyse
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; International Drug Development Institute, Belgium; CluePoints, Belgium.
| | - Steven Abrams
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; Global Health Institute, Department of Epidemiology and Social Medicine, University of Antwerp, Belgium
| | - Niel Hens
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | - Philippe Beutels
- Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | - Christel Faes
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium
| | - Geert Verbeke
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; Interuniversity Institute for Biostatistics and statistical Bioinformatics, KU Leuven, Belgium
| | - Pierre Van Damme
- Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | | | - Thomas Neyens
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Data Science Institute, Hasselt University, Belgium; Interuniversity Institute for Biostatistics and statistical Bioinformatics, KU Leuven, Belgium
| | - Sereina Herzog
- Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | - Heidi Theeten
- Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | - Koen Pepermans
- Centre for Health Economics Research and Modelling of Infectious Diseases, University of Antwerp, Belgium; Vaccine & Infectious Disease Institute, University of Antwerp, Belgium
| | - Ariel Alonso Abad
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, KU Leuven, Belgium
| | | | | | - Catherine Legrand
- Institute of Statistics, Biostatistics and Actuarial Sciences, UC Louvain, Belgium
| | | | | |
Collapse
|
10
|
Joyner CN, McMahan CS, Tebbs JM, Bilder CR. From mixed effects modeling to spike and slab variable selection: A Bayesian regression model for group testing data. Biometrics 2020; 76:913-923. [PMID: 31729015 PMCID: PMC7944974 DOI: 10.1111/biom.13176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 10/22/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022]
Abstract
Due to reductions in both time and cost, group testing is a popular alternative to individual-level testing for disease screening. These reductions are obtained by testing pooled biospecimens (eg, blood, urine, swabs, etc.) for the presence of an infectious agent. However, these reductions come at the expense of data complexity, making the task of conducting disease surveillance more tenuous when compared to using individual-level data. This is because an individual's disease status may be obscured by a group testing protocol and the effect of imperfect testing. Furthermore, unlike individual-level testing, a given participant could be involved in multiple testing outcomes and/or may never be tested individually. To circumvent these complexities and to incorporate all available information, we propose a Bayesian generalized linear mixed model that accommodates data arising from any group testing protocol, estimates unknown assay accuracy probabilities and accounts for potential heterogeneity in the covariate effects across population subgroups (eg, clinic sites, etc.); this latter feature is of key interest to practitioners tasked with conducting disease surveillance. To achieve model selection, our proposal uses spike and slab priors for both fixed and random effects. The methodology is illustrated through numerical studies and is applied to chlamydia surveillance data collected in Iowa.
Collapse
Affiliation(s)
- Chase N. Joyner
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Christopher S. McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
11
|
Delaigle A, Huang W, Lei S. Estimation of Conditional Prevalence From Group Testing Data With Missing Covariates. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2019.1566071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Aurore Delaigle
- School of Mathematics and Statistics and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), University of Melbourne, Parkville, Australia
| | - Wei Huang
- School of Mathematics and Statistics and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), University of Melbourne, Parkville, Australia
| | - Shaoke Lei
- Health Services, Murdoch Children’s Research Institute and Health Services Research Unit, The Royal Children’s Hospital, Melbourne, Australia
| |
Collapse
|
12
|
|
13
|
Zhang W, Liu A, Li Q, Albert PS. Incorporating retesting outcomes for estimation of disease prevalence. Stat Med 2019; 39:687-697. [PMID: 31758594 DOI: 10.1002/sim.8439] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 10/31/2019] [Accepted: 11/03/2019] [Indexed: 11/12/2022]
Abstract
Group testing has been widely used as a cost-effective strategy to screen for and estimate the prevalence of a rare disease. While it is well-recognized that retesting is necessary for identifying infected subjects, it is not required for estimating the prevalence. For a test without misclassification, gains in statistical efficiency are expected from incorporating retesting results in the estimation of the prevalence. However, when the test is subject to misclassification, it is not clear how much gain should be expected. There are a number of theoretical challenges in addressing this issue, including (1) enumerating the potential test results from retesting individual subjects in a group, (2) the dependence among these test results and the test result from testing at the group level, and (3) differential misclassification due to pooling of biospecimens. Overcoming some of these challenges, we show that retesting subjects in either positive or negative groups can substantially improve the efficiency of the estimation and that retesting positive groups yields higher efficiency than retesting a same number or proportion of negative groups.
Collapse
Affiliation(s)
- Wei Zhang
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Qizhai Li
- LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Institutes of Health, Rockville, Maryland
| |
Collapse
|
14
|
Lin J, Wang D, Zheng Q. Regression analysis and variable selection for two-stage multiple-infection group testing data. Stat Med 2019; 38:4519-4533. [PMID: 31297869 DOI: 10.1002/sim.8311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 03/03/2019] [Accepted: 06/14/2019] [Indexed: 12/17/2022]
Abstract
Group testing, as a cost-effective strategy, has been widely used to perform large-scale screening for rare infections. Recently, the use of multiplex assays has transformed the goal of group testing from detecting a single disease to diagnosing multiple infections simultaneously. Existing research on multiple-infection group testing data either exclude individual covariate information or ignore possible retests on suspicious individuals. To incorporate both, we propose a new regression model. This new model allows us to perform a regression analysis for each infection using multiple-infection group testing data. Furthermore, we introduce an efficient variable selection method to reveal truly relevant risk factors for each disease. Our methodology also allows for the estimation of the assay sensitivity and specificity when they are unknown. We examine the finite sample performance of our method through extensive simulation studies and apply it to a chlamydia and gonorrhea screening data set to illustrate its practical usefulness.
Collapse
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, South Carolina
| | - Dewei Wang
- Department of Statistics, University of South Carolina, South Carolina
| | - Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, Kentucky
| |
Collapse
|
15
|
Determination of Varying Group Sizes for Pooling Procedure. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:4381084. [PMID: 31065292 PMCID: PMC6466917 DOI: 10.1155/2019/4381084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 01/17/2019] [Accepted: 02/05/2019] [Indexed: 11/17/2022]
Abstract
Pooling is an attractive strategy in screening infected specimens, especially for rare diseases. An essential step of performing the pooled test is to determine the group size. Sometimes, equal group size is not appropriate due to population heterogeneity. In this case, varying group sizes are preferred and could be determined while individual information is available. In this study, we propose a sequential procedure to determine varying group sizes through fully utilizing available information. This procedure is data driven. Simulations show that it has good performance in estimating parameters.
Collapse
|
16
|
Gregory KB, Wang D, McMahan CS. Adaptive elastic net for group testing. Biometrics 2019; 75:13-23. [PMID: 30267535 PMCID: PMC7938860 DOI: 10.1111/biom.12973] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 09/14/2018] [Indexed: 11/28/2022]
Abstract
For disease screening, group (pooled) testing can be a cost-saving alternative to one-at-a-time testing, with savings realized through assaying pooled biospecimen (eg, urine, blood, saliva). In many group testing settings, practitioners are faced with the task of conducting disease surveillance. That is, it is often of interest to relate individuals' true disease statuses to covariate information via binary regression. Several authors have developed regression methods for group testing data, which is challenging due to the effects of imperfect testing. That is, all testing outcomes (on pools and individuals) are subject to misclassification, and individuals' true statuses are never observed. To further complicate matters, individuals may be involved in several testing outcomes. For analyzing such data, we provide a novel regression methodology which generalizes and extends the aforementioned regression techniques and which incorporates regularization. Specifically, for model fitting and variable selection, we propose an adaptive elastic net estimator under the logistic regression model which can be used to analyze data from any group testing strategy. We provide an efficient algorithm for computing the estimator along with guidance on tuning parameter selection. Moreover, we establish the asymptotic properties of the proposed estimator and show that it possesses "oracle" properties. We evaluate the performance of the estimator through Monte Carlo studies and illustrate the methodology on a chlamydia data set from the State Hygienic Laboratory in Iowa City.
Collapse
Affiliation(s)
- Karl B. Gregory
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
17
|
Roy S, Banerjee T. Estimation of log-odds ratio from group testing data using Firth correction. Biom J 2019; 61:714-728. [PMID: 30645765 DOI: 10.1002/bimj.201800125] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 11/10/2022]
Abstract
We consider the estimation of the prevalence of a rare disease, and the log-odds ratio for two specified groups of individuals from group testing data. For a low-prevalence disease, the maximum likelihood estimate of the log-odds ratio is severely biased. However, Firth correction to the score function leads to a considerable improvement of the estimator. Also, for a low-prevalence disease, if the diagnostic test is imperfect, the group testing is found to yield more precise estimate of the log-odds ratio than the individual testing.
Collapse
Affiliation(s)
- Surupa Roy
- Department of Statistics, St Xavier's College, Kolkata, India
| | | |
Collapse
|
18
|
Van Domelen DR, Mitchell EM, Perkins NJ, Schisterman EF, Manatunga AK, Huang Y, Lyles RH. Logistic regression with a continuous exposure measured in pools and subject to errors. Stat Med 2018; 37:4007-4021. [PMID: 30022497 DOI: 10.1002/sim.7891] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 05/23/2018] [Accepted: 06/08/2018] [Indexed: 11/07/2022]
Abstract
In a multivariable logistic regression setting where measuring a continuous exposure requires an expensive assay, a design in which the biomarker is measured in pooled samples from multiple subjects can be very cost effective. A logistic regression model for poolwise data is available, but validity requires that the assay yields the precise mean exposure for members of each pool. To account for errors, we assume the assay returns the true mean exposure plus a measurement error (ME) and/or a processing error (PE). We pursue likelihood-based inference for a binary health-related outcome modeled by logistic regression coupled with a normal linear model relating individual-level exposure to covariates and assuming that the ME and PE components are independent and normally distributed regardless of pool size. We compare this approach with a discriminant function-based alternative, and we demonstrate the potential value of incorporating replicates into the study design. Applied to a reproductive health dataset with pools of size 2 along with individual samples and replicates, the model fit with both ME and PE had a lower AIC than a model accounting for ME only. Relative to ignoring errors, this model suggested a somewhat higher (though still nonsignificant) adjusted log-odds ratio associating the cytokine MCP-1 with risk of spontaneous abortion. Simulations modeled after these data confirm validity of the methods, demonstrate how ME and particularly PE can reduce the efficiency advantage of a pooling design, and highlight the value of replicates in improving stability when both errors are present.
Collapse
Affiliation(s)
- Dane R Van Domelen
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Emily M Mitchell
- The Center for Financing, Access, and Cost Trends, Agency for Healthcare Research and Quality, Rockville, Maryland
| | - Neil J Perkins
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, Bethesda, Maryland
| | - Enrique F Schisterman
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, Bethesda, Maryland
| | - Amita K Manatunga
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Yijian Huang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| |
Collapse
|
19
|
Nguyen NT, Bish EK, Aprahamian H. Sequential prevalence estimation with pooling and continuous test outcomes. Stat Med 2018; 37:2391-2426. [PMID: 29687473 DOI: 10.1002/sim.7657] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2017] [Revised: 01/17/2018] [Accepted: 02/15/2018] [Indexed: 01/02/2023]
Abstract
Prevalence estimation is crucial for controlling the spread of infections and diseases and for planning of health care services. Prevalence estimation is typically conducted via pooled, or group, testing due to limited testing budgets. We study a sequential estimation procedure that uses continuous pool readings and considers the dilution effect of pooling so as to efficiently estimate an unknown prevalence rate. Embedded into the sequential estimation procedure is an optimization model that determines the optimal pooling design (number of pools and pool sizes) under a limited testing budget, considering the trade-off between testing cost and estimation accuracy. Our numerical study indicates that the proposed sequential estimation procedure outperforms single-stage procedures, or procedures that use binary test outcomes. Further, the sequential procedure provides robust prevalence estimates in cases where the initial estimate of the unknown prevalence rate is poor, or the assumed distribution of the biomarker load in infected subjects is inaccurate. Thus, when limited and unreliable information is available about the current status of, or biomarker dynamics related to, an infection, the sequential procedure becomes an attractive estimation strategy, due to its ability to mitigate the initial bias.
Collapse
Affiliation(s)
- Ngoc T Nguyen
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Ebru K Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Hrayer Aprahamian
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| |
Collapse
|
20
|
Affiliation(s)
- Gregory Haber
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| | - Yaakov Malinovsky
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| | - Paul S. Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| |
Collapse
|
21
|
McMahan CS, Tebbs JM, Hanson TE, Bilder CR. Bayesian regression for group testing data. Biometrics 2017; 73:1443-1452. [PMID: 28405965 PMCID: PMC5638690 DOI: 10.1111/biom.12704] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 03/01/2017] [Accepted: 03/01/2017] [Indexed: 01/10/2023]
Abstract
Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia, and we make user-friendly R code available to practitioners.
Collapse
Affiliation(s)
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Timothy E. Hanson
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | | |
Collapse
|
22
|
Warasi MS, McMahan CS, Tebbs JM, Bilder CR. Group testing regression models with dilution submodels. Stat Med 2017; 36:4860-4872. [PMID: 28856774 DOI: 10.1002/sim.7455] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 05/27/2017] [Accepted: 08/11/2017] [Indexed: 12/21/2022]
Abstract
Group testing, where specimens are tested initially in pools, is widely used to screen individuals for sexually transmitted diseases. However, a common problem encountered in practice is that group testing can increase the number of false negative test results. This occurs primarily when positive individual specimens within a pool are diluted by negative ones, resulting in positive pools testing negatively. If the goal is to estimate a population-level regression model relating individual disease status to observed covariates, severe bias can result if an adjustment for dilution is not made. Recognizing this as a critical issue, recent binary regression approaches in group testing have utilized continuous biomarker information to acknowledge the effect of dilution. In this paper, we have the same overall goal but take a different approach. We augment existing group testing regression models (that assume no dilution) with a parametric dilution submodel for pool-level sensitivity and estimate all parameters using maximum likelihood. An advantage of our approach is that it does not rely on external biomarker test data, which may not be available in surveillance studies. Furthermore, unlike previous approaches, our framework allows one to formally test whether dilution is present based on the observed group testing data. We use simulation to illustrate the performance of our estimation and inference methods, and we apply these methods to 2 infectious disease data sets.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA 24142, USA
| | | | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, NE, USA
| |
Collapse
|
23
|
Liu Y, McMahan C, Gallagher C. A general framework for the regression analysis of pooled biomarker assessments. Stat Med 2017; 36:2363-2377. [PMID: 28349583 PMCID: PMC5484591 DOI: 10.1002/sim.7291] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 02/17/2017] [Accepted: 03/06/2017] [Indexed: 11/11/2022]
Abstract
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yan Liu
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher McMahan
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Colin Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
24
|
Huang X, Sarker Warasi MS. Maximum Likelihood Estimators in Regression Models for Error‐prone Group Testing Data. Scand Stat Theory Appl 2017. [DOI: 10.1111/sjos.12282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xianzheng Huang
- Department of Statistics, College of Arts & Sciences University of South Carolina
| | | |
Collapse
|
25
|
Montesinos-López OA, Montesinos-López A, Eskridge K, Crossa J. Inverse sampling regression for pooled data. Stat Methods Med Res 2017; 26:1093-1109. [PMID: 25601742 DOI: 10.1177/0962280214568047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Because pools are tested instead of individuals in group testing, this technique is helpful for estimating prevalence in a population or for classifying a large number of individuals into two groups at a low cost. For this reason, group testing is a well-known means of saving costs and producing precise estimates. In this paper, we developed a mixed-effect group testing regression that is useful when the data-collecting process is performed using inverse sampling. This model allows including covariate information at the individual level to incorporate heterogeneity among individuals and identify which covariates are associated with positive individuals. We present an approach to fit this model using maximum likelihood and we performed a simulation study to evaluate the quality of the estimates. Based on the simulation study, we found that the proposed regression method for inverse sampling with group testing produces parameter estimates with low bias when the pre-specified number of positive pools (r) to stop the sampling process is at least 10 and the number of clusters in the sample is also at least 10. We performed an application with real data and we provide an NLMIXED code that researchers can use to implement this method.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- 2 Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Guanajuato, México
| | - Kent Eskridge
- 3 Statistics Department, University of Nebraska, Lincoln, Nebraska, USA
| | - José Crossa
- 4 Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico
| |
Collapse
|
26
|
Abstract
Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k$\end{document} groups that include \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$n$\end{document} independent individuals in total. If the unknown prevalence is low and the screening test suffers from misclassification, it is also possible to obtain more precise prevalence estimates than those obtained from testing all \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$n$\end{document} samples separately (Tu et al., 1994). In some applications, the individual binary response corresponds to whether an underlying time-to-event variable \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} is less than an observed screening time \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document}, a data structure known as current status data. Given sufficient variation in the observed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document} values, it is possible to estimate the distribution function \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F$\end{document} of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} nonparametrically, at least at some points in its support, using the pool-adjacent-violators algorithm (Ayer et al., 1955). Here, we consider nonparametric estimation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F$\end{document} based on group-tested current status data for groups of size \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$k$\end{document} where the group tests positive if and only if any individual’s unobserved \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$T$\end{document} is less than the corresponding observed \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$C$\end{document}. We investigate the performance of the group-based estimator as compared to the individual test nonparametric maximum likelihood estimator, and show that the former can be more precise in the presence of misclassification for low values of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$F(t)$\end{document}. Potential applications include testing for the presence of various diseases in pooled samples where interest focuses on the age-at-incidence distribution rather than overall prevalence. We apply this estimator to the age-at-incidence curve for hepatitis C infection in a sample of U.S. women who gave birth to a child in 2014, where group assignment is done at random and based on maternal age. We discuss connections to other work in the literature, as well as potential extensions.
Collapse
Affiliation(s)
- L C Petito
- Division of Biostatistics, School of Public Health, 101 Haviland Hall, University of California, Berkeley, California 94720,
| | - N P Jewell
- Division of Biostatistics, School of Public Health, 101 Haviland Hall, University of California, Berkeley, California 94720,
| |
Collapse
|
27
|
Warasi MS, Tebbs JM, McMahan CS, Bilder CR. Estimating the prevalence of multiple diseases from two-stage hierarchical pooling. Stat Med 2016; 35:3851-64. [PMID: 27090057 PMCID: PMC4965323 DOI: 10.1002/sim.6964] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2015] [Revised: 12/31/2015] [Accepted: 03/17/2016] [Indexed: 11/08/2022]
Abstract
Testing protocols in large-scale sexually transmitted disease screening applications often involve pooling biospecimens (e.g., blood, urine, and swabs) to lower costs and to increase the number of individuals who can be tested. With the recent development of assays that detect multiple diseases, it is now common to test biospecimen pools for multiple infections simultaneously. Recent work has developed an expectation-maximization algorithm to estimate the prevalence of two infections using a two-stage, Dorfman-type testing algorithm motivated by current screening practices for chlamydia and gonorrhea in the USA. In this article, we have the same goal but instead take a more flexible Bayesian approach. Doing so allows us to incorporate information about assay uncertainty during the testing process, which involves testing both pools and individuals, and also to update information as individuals are tested. Overall, our approach provides reliable inference for disease probabilities and accurately estimates assay sensitivity and specificity even when little or no information is provided in the prior distributions. We illustrate the performance of our estimation methods using simulation and by applying them to chlamydia and gonorrhea data collected in Nebraska. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Md S Warasi
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, 29208, SC, U.S.A
| | | | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, 68583, NE, U.S.A
| |
Collapse
|
28
|
Delaigle A, Zhou WX. Nonparametric and Parametric Estimators of Prevalence From Group Testing Data With Aggregated Covariates. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1054491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
29
|
Delaigle A, Hall P. Nonparametric methods for group testing data, taking dilution into account. Biometrika 2015. [DOI: 10.1093/biomet/asv049] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
30
|
Wang D, McMahan CS, Gallagher CM. A general regression framework for group testing data, which incorporates pool dilution effects. Stat Med 2015; 34:3606-21. [PMID: 26173957 DOI: 10.1002/sim.6578] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 04/21/2015] [Accepted: 06/15/2015] [Indexed: 01/01/2023]
Abstract
Group testing, through the use of pooling, has been widely implemented as a more efficient means to screen individuals for infectious diseases. Typically, in these settings, practitioners are tasked with the complimentary goals of both case identification and estimation. For these purposes, many group testing strategies have been proposed, which address issues such as preserving anonymity in estimation studies, quality control, and classification. In general, these strategies require that a significant number of the individuals be retested, either in pools or individually. In order to provide practitioners with a general methodology that can be used to accurately and precisely analyze data of this form, herein, we propose a binary regression framework that can incorporate data arising from any group testing strategy. Further, we relax previously made assumptions regarding testing error rates by relating the diagnostic testing results to the latent biological marker levels of the individuals being tested. We investigate the finite sample performance of our proposed methodology through simulation and by applying our techniques to hepatitis B data collected as part of a study involving Irish prisoners.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29028, U.S.A
| | | | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, U.S.A
| |
Collapse
|
31
|
Mitchell EM, Lyles RH, Manatunga AK, Perkins NJ, Schisterman EF. A highly efficient design strategy for regression with outcome pooling. Stat Med 2014; 33:5028-40. [PMID: 25220822 DOI: 10.1002/sim.6305] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 05/19/2014] [Accepted: 08/25/2014] [Indexed: 11/06/2022]
Abstract
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | | | | | | | | |
Collapse
|
32
|
Delaigle A, Hall P, Wishart JR. New approaches to nonparametric and semiparametric regression for univariate and multivariate group testing data. Biometrika 2014. [DOI: 10.1093/biomet/asu025] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
33
|
Zhang Z, Liu C, Kim S, Liu A. Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies. Stat Med 2014; 33:4482-500. [PMID: 25043925 DOI: 10.1002/sim.6268] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Revised: 06/24/2014] [Accepted: 06/30/2014] [Indexed: 11/07/2022]
Abstract
We consider the problem of estimating the prevalence of a disease under a group testing framework. Because assays are usually imperfect, misclassification of disease status is a major challenge in prevalence estimation. To account for possible misclassification, it is usually assumed that the sensitivity and specificity of the assay are known and independent of the group size. This assumption is often questionable, and substitution of incorrect values of an assay's sensitivity and specificity can result in a large bias in the prevalence estimate, which we refer to as the mis-substitution bias. In this article, we propose simple designs and methods for prevalence estimation that do not require known values of assay sensitivity and specificity. If a gold standard test is available, it can be applied to a validation subsample to yield information on the imperfect assay's sensitivity and specificity. When a gold standard is unavailable, it is possible to estimate assay sensitivity and specificity, either as unknown constants or as specified functions of the group size, from group testing data with varying group size. We develop methods for estimating parameters and for finding or approximating optimal designs, and perform extensive simulation experiments to evaluate and compare the different designs. An example concerning human immunodeficiency virus infection is used to illustrate the validation subsample design.
Collapse
Affiliation(s)
- Zhiwei Zhang
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MA, U.S.A
| | | | | | | |
Collapse
|
34
|
Wang D, McMahan CS, Gallagher CM, Kulasekera KB. Semiparametric group testing regression models. Biometrika 2014. [DOI: 10.1093/biomet/asu007] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Birkner T, Aban IB, Katholi CR. Evaluation of a Frequentist Hierarchical Model to Estimate Prevalence when sampling from a large geographic area using Pool Screening. COMMUN STAT-THEOR M 2013; 42. [PMID: 24347808 DOI: 10.1080/03610926.2011.633732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We present a frequentist Bernoulli-Beta hierarchical model to relax the constant prevalence assumption underlying the traditional prevalence estimation approach based on pooled data. This assumption is called into question when sampling from a large geographic area. Pool screening is a method that combines individual items into pools. Each pool will either test positive (at least one of the items is positive) or negative (all items are negative). Pool screening is commonly applied to the study of tropical diseases where pools consist of vectors (e.g. black flies) that can transmit the disease. The goal is to estimate the proportion of infected vectors. Intermediate estimators (model parameters) and estimators of ultimate interest (pertaining to prevalence) are evaluated by standard measures of merit, such as bias, variance and mean squared error making extensive use of expansions. Using the hierarchical model an investigator can determine the probability of the prevalence being below a prespecified threshold value, a value at which no reemergence of the disease is expected. An investigation into the least biased choice of the α parameter in the Beta (α, β) prevalence distribution leads to the choice of α = 1.
Collapse
Affiliation(s)
- Thomas Birkner
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Inmaculada B Aban
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| | - Charles R Katholi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama
| |
Collapse
|
36
|
Zhang B, Bilder CR, Tebbs JM. Regression analysis for multiple-disease group testing data. Stat Med 2013; 32:4954-66. [PMID: 23703944 PMCID: PMC4301740 DOI: 10.1002/sim.5858] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 04/29/2013] [Indexed: 11/06/2022]
Abstract
Group testing, where individual specimens are composited into groups to test for the presence of a disease (or other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Group testing data are unique in that only group responses may be available, but inferences are needed at the individual level. A further methodological challenge arises when individuals are tested in groups for multiple diseases simultaneously, because unobserved individual disease statuses are likely correlated. In this paper, we propose new regression techniques for multiple-disease group testing data. We develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures. We apply our proposed methodology to chlamydia and gonorrhea screening data collected in Nebraska as part of the Infertility Prevention Project and to prenatal infectious disease screening data from Kenya.
Collapse
Affiliation(s)
- Boan Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | | | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
37
|
McMahan CS, Tebbs JM, Bilder CR. Regression models for group testing data with pool dilution effects. Biostatistics 2013; 14:284-98. [PMID: 23197382 PMCID: PMC3590921 DOI: 10.1093/biostatistics/kxs045] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Revised: 10/17/2012] [Accepted: 10/19/2012] [Indexed: 11/13/2022] Open
Abstract
Group testing is widely used to reduce the cost of screening individuals for infectious diseases. There is an extensive literature on group testing, most of which traditionally has focused on estimating the probability of infection in a homogeneous population. More recently, this research area has shifted towards estimating individual-specific probabilities in a regression context. However, existing regression approaches have assumed that the sensitivity and specificity of pooled biospecimens are constant and do not depend on the pool sizes. For those applications, where this assumption may not be realistic, these existing approaches can lead to inaccurate inference, especially when pool sizes are large. Our new approach, which exploits the information readily available from underlying continuous biomarker distributions, provides reliable inference in settings where pooling would be most beneficial and does so even for larger pool sizes. We illustrate our methodology using hepatitis B data from a study involving Irish prisoners.
Collapse
|
38
|
Wang D, Zhou H, Kulasekera KB. A semi-local likelihood regression estimator of the proportion based on group testing data. J Nonparametr Stat 2013. [DOI: 10.1080/10485252.2012.750726] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
39
|
Hund L, Pagano M. Estimating HIV prevalence from surveys with low individual consent rates: annealing individual and pooled samples. Emerg Themes Epidemiol 2013; 10:2. [PMID: 23446064 PMCID: PMC3649931 DOI: 10.1186/1742-7622-10-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 02/20/2013] [Indexed: 11/30/2022] Open
Abstract
Many HIV prevalence surveys are plagued by the problem that a sizeable number of surveyed individuals do not consent to contribute blood samples for testing. One can ignore this problem, as is often done, but the resultant bias can be of sufficient magnitude to invalidate the results of the survey, especially if the number of non-responders is high and the reason for refusing to participate is related to the individual’s HIV status. One reason for refusing to participate may be for reasons of privacy. For those individuals, we suggest offering the option of being tested in a pool. This form of testing is less certain than individual testing, but, if it convinces more people to submit to testing, it should reduce the potential for bias and give a cleaner answer to the question of prevalence. This paper explores the logistics of implementing a combined individual and pooled testing approach and evaluates the analytical advantages to such a combined testing strategy. We quantify improvements in a prevalence estimator based on this combined testing strategy, relative to an individual testing only approach and a pooled testing only approach. Minimizing non-response is key for reducing bias, and, if pooled testing assuages privacy concerns, offering a pooled testing strategy has the potential to substantially improve HIV prevalence estimates.
Collapse
Affiliation(s)
- Lauren Hund
- Department of Family and Community Medicine, University of New Mexico, 2400 Tucker NE, Albuquerque, NM 87106, USA.
| | | |
Collapse
|
40
|
Lyles RH, Tang L, Lin J, Zhang Z, Mukherjee B. Likelihood-based methods for regression analysis with binary exposure status assessed by pooling. Stat Med 2012; 31:2485-97. [PMID: 22415630 DOI: 10.1002/sim.4426] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Accepted: 09/05/2011] [Indexed: 01/05/2023]
Abstract
The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this paper, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene-disease association in a population-based case-control study of colorectal cancer.
Collapse
Affiliation(s)
- Robert H Lyles
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health, Emory University, 1518 Clifton Rd. N.E., Atlanta, GA 30322, USA.
| | | | | | | | | |
Collapse
|
41
|
|
42
|
Malinovsky Y, Albert PS, Schisterman EF. Pooling designs for outcomes under a Gaussian random effects model. Biometrics 2011; 68:45-52. [PMID: 21981372 DOI: 10.1111/j.1541-0420.2011.01673.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Due to the rising cost of laboratory assays, it has become increasingly common in epidemiological studies to pool biospecimens. This is particularly true in longitudinal studies, where the cost of performing multiple assays over time can be prohibitive. In this article, we consider the problem of estimating the parameters of a Gaussian random effects model when the repeated outcome is subject to pooling. We consider different pooling designs for the efficient maximum likelihood estimation of variance components, with particular attention to estimating the intraclass correlation coefficient. We evaluate the efficiencies of different pooling design strategies using analytic and simulation study results. We examine the robustness of the designs to skewed distributions and consider unbalanced designs. The design methodology is illustrated with a longitudinal study of premenopausal women focusing on assessing the reproducibility of F2-isoprostane, a biomarker of oxidative stress, over the menstrual cycle.
Collapse
Affiliation(s)
- Yaakov Malinovsky
- Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
43
|
Zhang Z, Liu A, Lyles RH, Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Stat Med 2011; 31:2473-84. [PMID: 21953741 DOI: 10.1002/sim.4367] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Revised: 07/11/2011] [Accepted: 07/26/2011] [Indexed: 11/07/2022]
Abstract
There is growing interest in pooling specimens across subjects in epidemiologic studies, especially those involving biomarkers. This paper is concerned with regression analysis of epidemiologic data where a binary exposure is subject to pooling and the pooled measurement is dichotomized to indicate either that no subjects in the pool are exposed or that some are exposed, without revealing further information about the exposed subjects in the latter case. The pooling process may be stratified on the disease status (a binary outcome) and possibly other variables but is otherwise assumed random. We propose methods for estimating parameters in a prospective logistic regression model and illustrate these with data from a population-based case-control study of colorectal cancer. Simulation results show that the proposed methods perform reasonably well in realistic settings and that pooling can lead to sizable gains in cost efficiency. We make recommendations with regard to the choice of design for pooled epidemiologic studies.
Collapse
Affiliation(s)
- Z Zhang
- Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-7510, USA.
| | | | | | | |
Collapse
|
44
|
|
45
|
Pritchard NA, Tebbs JM. Estimating Disease Prevalence Using Inverse Binomial Pooled Testing. JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS 2011; 16:70-87. [PMID: 21743789 PMCID: PMC3131210 DOI: 10.1007/s13253-010-0036-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Monitoring populations of hosts as well as insect vectors is an important part of agricultural and public health risk assessment. In applications where pathogen prevalence is likely low, it is common to test pools of subjects for the presence of infection, rather than to test subjects individually. This technique is known as pooled (group) testing. In this paper, we revisit the problem of estimating the population prevalence p from pooled testing, but we consider applications where inverse binomial sampling is used. Our work is unlike previous research in pooled testing, which has largely assumed a binomial model. Inverse sampling is natural to implement when there is a need to report estimates early on in the data collection process and has been used in individual testing applications when disease incidence is low. We consider point and interval estimation procedures for p in this new pooled testing setting, and we use example data sets from the literature to describe and to illustrate our methods.
Collapse
Affiliation(s)
- Nicholas A. Pritchard
- Department of Mathematics and Statistics, Coastal Carolina University, Conway, SC 29528, USA
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
46
|
Chen P, Tebbs JM, Bilder CR. Global goodness-of-fit tests for group testing regression models. Stat Med 2009; 28:2912-28. [PMID: 19610130 DOI: 10.1002/sim.3678] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In a variety of biomedical applications, particularly those involving screening for infectious diseases, testing individuals (e.g. blood/urine samples, etc.) in pools has become a standard method of data collection. This experimental design, known as group testing (or pooled testing), can provide a large reduction in testing costs and can offer nearly the same precision as individual testing. To account for covariate information on individual subjects, regression models for group testing data have been proposed recently. However, there are currently no tools available to check the adequacy of these models. In this paper, we present various global goodness-of-fit tests for regression models with group testing data. We use simulation to examine the small-sample size and power properties of the tests for different pool composition strategies. We illustrate our methods using two infectious disease data sets, one from an HIV study in Kenya and one from the Infertility Prevention Project.
Collapse
Affiliation(s)
- Peng Chen
- Takeda Global Research and Development Center, Inc., 675 North Field Drive, Lake Forest, IL 60045, USA
| | | | | |
Collapse
|