1
|
Yang J, Zhang W, Albert PS, Liu A, Chen Z. Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group-Tested Data. Stat Med 2024. [PMID: 39375883 DOI: 10.1002/sim.10230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 08/22/2024] [Accepted: 09/11/2024] [Indexed: 10/09/2024]
Abstract
We consider the problem of combining multiple biomarkers to improve the diagnostic accuracy of detecting a disease when only group-tested data on the disease status are available. There are several challenges in addressing this problem, including unavailable individual disease statuses, differential misclassification depending on group size and number of diseased individuals in the group, and extensive computation due to a large number of possible combinations of multiple biomarkers. To tackle these issues, we propose a pairwise model fitting approach to estimating the distribution of the optimal linear combination of biomarkers and its diagnostic accuracy under the assumption of a multivariate normal distribution. The approach is evaluated in simulation studies and applied to data on chlamydia detection and COVID-19 diagnosis.
Collapse
Affiliation(s)
- Jin Yang
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Wei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Zhen Chen
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
2
|
Alt EM, Chang X, Jiang X, Liu Q, Mo M, Xia HA, Ibrahim JG. LEAP: the latent exchangeability prior for borrowing information from historical data. Biometrics 2024; 80:ujae083. [PMID: 39329230 DOI: 10.1093/biomtc/ujae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 01/08/2024] [Accepted: 08/09/2024] [Indexed: 09/28/2024]
Abstract
It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic predictive prior, provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score approaches have been proposed. However, these approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme.
Collapse
Affiliation(s)
- Ethan M Alt
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, United States
| | - Xiuya Chang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, United States
| | - Xun Jiang
- Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA
| | - Qing Liu
- Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA
| | - May Mo
- Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA
| | - Hong Amy Xia
- Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, United States
| |
Collapse
|
3
|
Warren JL, Wang Q, Ciarleglio MM. A scaled kernel density estimation prior for dynamic borrowing of historical information with application to clinical trial design. Stat Med 2024; 43:1615-1626. [PMID: 38345148 PMCID: PMC11483151 DOI: 10.1002/sim.10032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/07/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
Incorporating historical data into a current data analysis can improve estimation of parameters shared across both datasets and increase the power to detect associations of interest while reducing the time and cost of new data collection. Several methods for prior distribution elicitation have been introduced to allow for the data-driven borrowing of historical information within a Bayesian analysis of the current data. We propose scaled Gaussian kernel density estimation (SGKDE) prior distributions as potentially more flexible alternatives. SGKDE priors directly use posterior samples collected from a historical data analysis to approximate probability density functions, whose variances depend on the degree of similarity between the historical and current datasets, which are used as prior distributions in the current data analysis. We compare the performances of the SGKDE priors with some existing approaches using a simulation study. Data from a recently completed phase III clinical trial of a maternal vaccine for respiratory syncytial virus are used to further explore the properties of SGKDE priors when designing a new clinical trial while incorporating historical data. Overall, both studies suggest that the new approach results in improved parameter estimation and power in the current data analysis compared to the considered existing methods.
Collapse
Affiliation(s)
- Joshua L Warren
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Qi Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Maria M Ciarleglio
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
4
|
Warasi S, Tebbs JM, McMahan CS, Bilder CR. Estimating the prevalence of two or more diseases using outcomes from multiplex group testing. Biom J 2023; 65:e2200270. [PMID: 37192524 PMCID: PMC11099910 DOI: 10.1002/bimj.202200270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/26/2023] [Accepted: 03/01/2023] [Indexed: 05/18/2023]
Abstract
When screening a population for infectious diseases, pooling individual specimens (e.g., blood, swabs, urine, etc.) can provide enormous cost savings when compared to testing specimens individually. In the biostatistics literature, testing pools of specimens is commonly known as group testing or pooled testing. Although estimating a population-level prevalence with group testing data has received a large amount of attention, most of this work has focused on applications involving a single disease, such as human immunodeficiency virus. Modern methods of screening now involve testing pools and individuals for multiple diseases simultaneously through the use of multiplex assays. Hou et al. (2017, Biometrics, 73, 656-665) and Hou et al. (2020, Biostatistics, 21, 417-431) recently proposed group testing protocols for multiplex assays and derived relevant case identification characteristics, including the expected number of tests and those which quantify classification accuracy. In this article, we describe Bayesian methods to estimate population-level disease probabilities from implementing these protocols or any other multiplex group testing protocol which might be carried out in practice. Our estimation methods can be used with multiplex assays for two or more diseases while incorporating the possibility of test misclassification for each disease. We use chlamydia and gonorrhea testing data collected at the State Hygienic Laboratory at the University of Iowa to illustrate our work. We also provide an online R resource practitioners can use to implement the methods in this article.
Collapse
Affiliation(s)
- S. Warasi
- Department of Mathematics and Statistics, Radford University, Radford, VA 24142, USA
| | - Joshua M. Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher S. McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, USA
| | | |
Collapse
|
5
|
Warasi MS, Hungerford LL, Lahmers K. Optimizing Pooled Testing for Estimating the Prevalence of Multiple Diseases. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2022; 27:713-727. [PMID: 35975123 PMCID: PMC9373899 DOI: 10.1007/s13253-022-00511-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/27/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022]
Abstract
Pooled testing can enhance the efficiency of diagnosing individuals with diseases of low prevalence. Often, pooling is implemented using standard groupings (2, 5, 10, etc.). On the other hand, optimization theory can provide specific guidelines in finding the ideal pool size and pooling strategy. This article focuses on optimizing the precision of disease prevalence estimators calculated from multiplex pooled testing data. In the context of a surveillance application of animal diseases, we study the estimation efficiency (i.e., precision) and cost efficiency of the estimators with adjustments for the number of expended tests. This enables us to determine the pooling strategies that offer the highest benefits when jointly estimating the prevalence of multiple diseases, such as theileriosis and anaplasmosis. The outcomes of our work can be used in designing pooled testing protocols, not only in simple pooling scenarios but also in more complex scenarios where individual retesting is performed in order to identify positive cases. A software application using the shiny package in R is provided with this article to facilitate implementation of our methods. Supplementary materials accompanying this paper appear online.
Collapse
Affiliation(s)
- Md S. Warasi
- Department of Mathematics and Statistics, Radford University, Whitt Hall 224, Radford, VA 24142 USA
| | - Laura L. Hungerford
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| | - Kevin Lahmers
- Virginia-Maryland College of Veterinary Medicine, Virginia Tech, Blacksburg, VA 24061 USA
| |
Collapse
|
6
|
Sobczyk J, Pyne MT, Barker A, Mayer J, Hanson KE, Samore MH, Noriega R. Efficient and effective single-step screening of individual samples for SARS-CoV-2 RNA using multi-dimensional pooling and Bayesian inference. J R Soc Interface 2021; 18:20210155. [PMID: 34129787 PMCID: PMC8205536 DOI: 10.1098/rsif.2021.0155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Rapid and widespread implementation of infectious disease surveillance is a critical component in the response to novel health threats. Molecular assays are the preferred method to detect a broad range of viral pathogens with high sensitivity and specificity. The implementation of molecular assay testing in a rapidly evolving public health emergency, such as the ongoing COVID-19 pandemic, can be hindered by resource availability or technical constraints. We present a screening strategy that is easily scaled up to support a sustained large volume of testing over long periods of time. This non-adaptive pooled-sample screening protocol employs Bayesian inference to yield a reportable outcome for each individual sample in a single testing step (no confirmation of positive results required). The proposed method is validated using clinical specimens tested using a real-time reverse transcription polymerase chain reaction test for SARS-CoV-2. This screening protocol has substantial advantages for its implementation, including higher sample throughput, faster time to results, no need to retrieve previously screened samples from storage to undergo retesting, and excellent performance of the algorithm's sensitivity and specificity compared with the individual test's metrics.
Collapse
Affiliation(s)
- Juliana Sobczyk
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Michael T Pyne
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Adam Barker
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Jeanmarie Mayer
- Division of Epidemiology, University of Utah Health Sciences Center, Salt Lake City, UT, USA.,Division of Infectious Diseases, University of Utah Health Sciences Center, Salt Lake City, UT, USA
| | - Kimberly E Hanson
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA.,Division of Infectious Disease, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Matthew H Samore
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA.,Informatics, Decision Enhancement, and Analytic Science (IDEAS) Center of Innovation, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Rodrigo Noriega
- Department of Chemistry, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
7
|
Tan JG, Omar A, Lee WBY, Wong MS. Considerations for Group Testing: A Practical Approach for the Clinical Laboratory. Clin Biochem Rev 2020; 41:79-92. [PMID: 33343043 PMCID: PMC7731934 DOI: 10.33176/aacb-20-00007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Group testing, also known as pooled sample testing, was first proposed by Robert Dorfman in 1943. While sample pooling has been widely practiced in blood-banking, it is traditionally seen as anathema for clinical laboratories. However, the ongoing COVID-19 pandemic has re-ignited interest for group testing among clinical laboratories to mitigate supply shortages. We propose five criteria to assess the suitability of an analyte for pooled sample testing in general and outline a practical approach that a clinical laboratory may use to implement pooled testing for SARS-CoV-2 PCR testing. The five criteria we propose are: (1) the analyte concentrations in the diseased persons should be at least one order of magnitude (10 times) higher than in healthy persons; (2) sample dilution should not overly reduce clinical sensitivity; (3) the current prevalence must be sufficiently low for the number of samples pooled for the specific protocol; (4) there is no requirement for a fast turnaround time; and (5) there is an imperative need for resource rationing to maximise public health outcomes. The five key steps we suggest for a successful implementation are: (1) determination of when pooling takes place (pre-pre analytical, pre-analytical, analytical); (2) validation of the pooling protocol; (3) ensuring an adequate infrastructure and archival system; (4) configuration of the laboratory information system; and (5) staff training. While pool testing is not a panacea to overcome reagent shortage, it may allow broader access to testing but at the cost of reduction in sensitivity and increased turnaround time.
Collapse
Affiliation(s)
- Jun G Tan
- Department of Laboratory Medicine, Khoo Teck Puat Hospital, Singapore
| | - Aznan Omar
- Department of Laboratory Medicine, Khoo Teck Puat Hospital, Singapore
| | - Wendy BY Lee
- Department of Laboratory Medicine, Khoo Teck Puat Hospital, Singapore
| | - Moh S Wong
- Department of Laboratory Medicine, Khoo Teck Puat Hospital, Singapore
| |
Collapse
|
8
|
Lin J, Wang D, Zheng Q. Regression analysis and variable selection for two-stage multiple-infection group testing data. Stat Med 2019; 38:4519-4533. [PMID: 31297869 DOI: 10.1002/sim.8311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 03/03/2019] [Accepted: 06/14/2019] [Indexed: 12/17/2022]
Abstract
Group testing, as a cost-effective strategy, has been widely used to perform large-scale screening for rare infections. Recently, the use of multiplex assays has transformed the goal of group testing from detecting a single disease to diagnosing multiple infections simultaneously. Existing research on multiple-infection group testing data either exclude individual covariate information or ignore possible retests on suspicious individuals. To incorporate both, we propose a new regression model. This new model allows us to perform a regression analysis for each infection using multiple-infection group testing data. Furthermore, we introduce an efficient variable selection method to reveal truly relevant risk factors for each disease. Our methodology also allows for the estimation of the assay sensitivity and specificity when they are unknown. We examine the finite sample performance of our method through extensive simulation studies and apply it to a chlamydia and gonorrhea screening data set to illustrate its practical usefulness.
Collapse
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, South Carolina
| | - Dewei Wang
- Department of Statistics, University of South Carolina, South Carolina
| | - Qi Zheng
- Department of Bioinformatics and Biostatistics, University of Louisville, Kentucky
| |
Collapse
|
9
|
Haber G, Malinovsky Y. Efficient methods for the estimation of the multinomial parameter for the two-trait group testing model. Electron J Stat 2019; 13:2624-2657. [PMID: 34267856 DOI: 10.1214/19-ejs1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Estimation of a single Bernoulli parameter using pooled sampling is among the oldest problems in the group testing literature. To carry out such estimation, an array of efficient estimators have been introduced covering a wide range of situations routinely encountered in applications. More recently, there has been growing interest in using group testing to simultaneously estimate the joint probabilities of two correlated traits using a multinomial model. Unfortunately, basic estimation results, such as the maximum likelihood estimator (MLE), have not been adequately addressed in the literature for such cases. In this paper, we show that finding the MLE for this problem is equivalent to maximizing a multinomial likelihood with a restricted parameter space. A solution using the EM algorithm is presented which is guaranteed to converge to the global maximizer, even on the boundary of the parameter space. Two additional closed form estimators are presented with the goal of minimizing the bias and/or mean square error. The methods are illustrated by considering an application to the joint estimation of transmission prevalence for two strains of the Potato virus Y by the aphid Myzus persicae.
Collapse
Affiliation(s)
- Gregory Haber
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Yaakov Malinovsky
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| |
Collapse
|
10
|
Malinovsky Y, Albert PS. Revisiting Nested Group Testing Procedures: New Results, Comparisons, and Robustness. AM STAT 2018; 73:117-125. [PMID: 31814627 DOI: 10.1080/00031305.2017.1366367] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Group testing has its origin in the identification of syphilis in the U.S. army during World War II. Much of the theoretical framework of group testing was developed starting in the late 1950s, with continued work into the 1990s. Recently, with the advent of new laboratory and genetic technologies, there has been an increasing interest in group testing designs for cost saving purposes. In this article, we compare different nested designs, including Dorfman, Sterrett and an optimal nested procedure obtained through dynamic programming. To elucidate these comparisons, we develop closed-form expressions for the optimal Sterrett procedure and provide a concise review of the prior literature for other commonly used procedures. We consider designs where the prevalence of disease is known as well as investigate the robustness of these procedures, when it is incorrectly assumed. This article provides a technical presentation that will be of interest to researchers as well as from a pedagogical perspective. Supplementary material for this article available online.
Collapse
Affiliation(s)
- Yaakov Malinovsky
- Department of Mathematics and Statistics University of Maryland, Baltimore County, Baltimore, MD
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics National Cancer Institute, Rockville, MD
| |
Collapse
|
11
|
Nguyen NT, Bish EK, Aprahamian H. Sequential prevalence estimation with pooling and continuous test outcomes. Stat Med 2018; 37:2391-2426. [PMID: 29687473 DOI: 10.1002/sim.7657] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2017] [Revised: 01/17/2018] [Accepted: 02/15/2018] [Indexed: 01/02/2023]
Abstract
Prevalence estimation is crucial for controlling the spread of infections and diseases and for planning of health care services. Prevalence estimation is typically conducted via pooled, or group, testing due to limited testing budgets. We study a sequential estimation procedure that uses continuous pool readings and considers the dilution effect of pooling so as to efficiently estimate an unknown prevalence rate. Embedded into the sequential estimation procedure is an optimization model that determines the optimal pooling design (number of pools and pool sizes) under a limited testing budget, considering the trade-off between testing cost and estimation accuracy. Our numerical study indicates that the proposed sequential estimation procedure outperforms single-stage procedures, or procedures that use binary test outcomes. Further, the sequential procedure provides robust prevalence estimates in cases where the initial estimate of the unknown prevalence rate is poor, or the assumed distribution of the biomarker load in infected subjects is inaccurate. Thus, when limited and unreliable information is available about the current status of, or biomarker dynamics related to, an infection, the sequential procedure becomes an attractive estimation strategy, due to its ability to mitigate the initial bias.
Collapse
Affiliation(s)
- Ngoc T Nguyen
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Ebru K Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | - Hrayer Aprahamian
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia, 24061, USA
| |
Collapse
|
12
|
Affiliation(s)
- Gregory Haber
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| | - Yaakov Malinovsky
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| | - Paul S. Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| |
Collapse
|