1
|
Mou X, Wang D. Additive partially linear model for pooled biomonitoring data. Comput Stat Data Anal 2024; 190:107862. [PMID: 38187953 PMCID: PMC10769007 DOI: 10.1016/j.csda.2023.107862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Human biomonitoring involves monitoring human health by measuring the accumulation of harmful chemicals, typically in specimens like blood samples. The high cost of chemical analysis has led researchers to adopt a cost-effective approach. This approach physically combines specimens and subsequently analyzes the concentration of toxic substances within the merged pools. Consequently, there arises a need for innovative regression techniques to effectively interpret these aggregated measurements. To address this need, a new regression framework is proposed by extending the additive partially linear model (APLM) to accommodate the pooling context. The APLM is well-known for its versatility in capturing the complex association between outcomes and covariates, which is particularly valuable in assessing the complex interplay between chemical bioaccumulation and potential risk factors. Consistent estimators of the APLM are obtained through an iterative process that disaggregates information from the pooled observations. The performance is evaluated through simulations and an environmental health study focused on brominated flame retardants using data from the National Health and Nutrition Examination Survey.
Collapse
Affiliation(s)
- Xichen Mou
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, TN 38152, U.S.A
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| |
Collapse
|
2
|
Liu Y, Wang D, Li L, Li D. Assessing disparities in Americans' exposure to PCBs and PBDEs based on NHANES pooled biomonitoring data. J Am Stat Assoc 2023; 118:1538-1550. [PMID: 38046816 PMCID: PMC10691854 DOI: 10.1080/01621459.2023.2195546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 03/16/2023] [Indexed: 03/30/2023]
Abstract
The National Health and Nutrition Examination Survey (NHANES) has been continuously biomonitoring Americans' exposure to two families of harmful environmental chemicals: polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs). However, biomonitoring these chemicals is expensive. To save cost, in 2005, NHANES resorted to pooled biomonitoring; i.e., amalgamating individual specimens to form a pool and measuring chemical levels from pools. Despite being publicly available, these pooled data gain limited applications in health studies. Among the few studies using these data, racial/age disparities were detected, but there is no control for confounding effects. These disadvantages are due to the complexity of pooled measurements and a dearth of statistical tools. Herein, we developed a regression-based method to unzip pooled measurements, which facilitated a comprehensive assessment of disparities in exposure to these chemicals. We found increasing dependence of PCBs on age and income, whereas PBDEs were the highest among adolescents and seniors and were elevated among the low-income population. In addition, Hispanics had the lowest PCBs and PBDEs among all demographic groups after controlling for potential confounders. These findings can guide the development of population-specific interventions to promote environmental justice. Moreover, both chemical levels declined throughout the period, indicating the effectiveness of existing regulatory policies.
Collapse
Affiliation(s)
- Yan Liu
- School of Public Health, University of Nevada, Reno, NV 89557, USA
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Li Li
- School of Public Health, University of Nevada, Reno, NV 89557, USA
| | - Dingsheng Li
- School of Public Health, University of Nevada, Reno, NV 89557, USA
| |
Collapse
|
3
|
Wang D, Mou X, Liu Y. Varying-coefficient regression analysis for pooled biomonitoring. Biometrics 2022; 78:1328-1341. [PMID: 34190334 PMCID: PMC8716640 DOI: 10.1111/biom.13516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 03/16/2021] [Indexed: 12/30/2022]
Abstract
Human biomonitoring involves measuring the accumulation of contaminants in biological specimens (such as blood or urine) to assess individuals' exposure to environmental contamination. Due to the expensive cost of a single assay, the method of pooling has become increasingly common in environmental studies. The implementation of pooling starts by physically mixing specimens into pools, and then measures pooled specimens for the concentration of contaminants. An important task is to reconstruct individual-level statistical characteristics based on pooled measurements. In this article, we propose to use the varying-coefficient regression model for individual-level biomonitoring and provide methods to estimate the varying coefficients based on different types of pooled data. Asymptotic properties of the estimators are presented. We illustrate our methodology via simulation and with application to pooled biomonitoring of a brominated flame retardant provided by the National Health and Nutrition Examination Survey (NHANES).
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, U.S.A
| | - Xichen Mou
- Division of Epidemiology, Biostatistics, and Environmental Health, Scholl of Public Health, University of Memphis, Memphis, TN 38152, U.S.A
| | - Yan Liu
- School of Community Health Sciences, University of Nevada, Reno, NV 89557, U.S.A
| |
Collapse
|
4
|
Zhong Y, Xu P, Zhong S, Ding J. A sequential decoding procedure for pooled quantitative measure. Seq Anal 2022. [DOI: 10.1080/07474946.2022.2043049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Yunning Zhong
- School of Mathematics and Statistics, Fujian Normal University, Fuzhou, Fujian, China
| | - Ping Xu
- School of Mathematics and Statistics, Guangxi Normal University, Guilin, Guangxi, China
| | - Siming Zhong
- School of Mathematics and Statistics, Guangxi Normal University, Guilin, Guangxi, China
| | - Juan Ding
- School of Mathematics and Statistics, Guangxi Normal University, Guilin, Guangxi, China
| |
Collapse
|
5
|
Brand A, May S, Hughes JP, Nakigozi G, Reynolds SJ, Gabriel EE. Prediction-driven pooled testing methods: Application to HIV treatment monitoring in Rakai, Uganda. Stat Med 2021; 40:4185-4199. [PMID: 34046930 PMCID: PMC8487918 DOI: 10.1002/sim.9022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 04/19/2021] [Accepted: 04/21/2021] [Indexed: 11/11/2022]
Abstract
Chronic medical conditions often necessitate regular testing for proper treatment. Regular testing of all afflicted individuals may not be feasible due to limited resources, as is true with HIV monitoring in resource-limited settings. Pooled testing methods have been developed in order to allow regular testing for all while reducing resource burden. However, the most commonly used methods do not make use of covariate information predictive of treatment failure, which could improve performance. We propose and evaluate four prediction-driven pooled testing methods that incorporate covariate information to improve pooled testing performance. We then compare these methods in the HIV treatment management setting to current methods with respect to testing efficiency, sensitivity, and number of testing rounds using simulated data and data collected in Rakai, Uganda. Results show that the prediction-driven methods increase efficiency by up to 20% compared with current methods while maintaining equivalent sensitivity and reducing number of testing rounds by up to 70%. When predictions were incorrect, the performance of prediction-based matrix methods remained robust. The best performing method using our motivating data from Rakai was a prediction-driven hybrid method, maintaining sensitivity over 96% and efficiency over 75% in likely scenarios. If these methods perform similarly in the field, they may contribute to improving mortality and reducing transmission in resource-limited settings. Although we evaluate our proposed pooling methods in the HIV treatment setting, they can be applied to any setting that necessitates testing of a quantitative biomarker for a threshold-based decision.
Collapse
Affiliation(s)
- Adam Brand
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Sweden
| | - Susanne May
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - James P. Hughes
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | | | - Steven J. Reynolds
- Johns Hopkins University, School of Medicine, Baltimore, MD, USA
- Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Erin E. Gabriel
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Sweden
| |
Collapse
|
6
|
Wang D, Mou X, Li X, Huang X. Local polynomial regression for pooled response data. J Nonparametr Stat 2020; 32:814-837. [PMID: 33762800 PMCID: PMC7986571 DOI: 10.1080/10485252.2020.1834104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 10/03/2020] [Indexed: 10/23/2022]
Abstract
We propose local polynomial estimators for the conditional mean of a continuous response when only pooled response data are collected under different pooling designs. Asymptotic properties of these estimators are investigated and compared. Extensive simulation studies are carried out to compare finite sample performance of the proposed estimators under various model settings and pooling strategies. We apply the proposed local polynomial regression methods to two real-life applications to illustrate practical implementation and performance of the estimators for the mean function.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A
| | - Xichen Mou
- Division of Epidemiology, Biostatistics, and Environmental Health, University of Memphis, Memphis, Tennessee, U.S.A
| | - Xiang Li
- JPMorgan Chase, Jersey City, New Jersey 07310, U.S.A
| | - Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A
| |
Collapse
|
7
|
Cheng C, Wang M. STATISTICAL METHODS FOR ANALYSIS OF COMBINED CATEGORICAL BIOMARKER DATA FROM MULTIPLE STUDIES. Ann Appl Stat 2020; 14:1146-1163. [PMID: 33633815 PMCID: PMC7903924 DOI: 10.1214/20-aoas1337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In the analysis of pooled data from multiple studies involving a biomarker exposure, the biomarker measurements can vary across laboratories and usually require calibration to a reference assay prior to pooling. Previous researches consider the measurements from a reference laboratory as the gold standard, even though measurements in the reference laboratory are not necessarily closer to the underlying truth in reality. In this paper we do not treat any laboratory measurements as the gold standard, and we develop two statistical methods, the exact calibration and cut-off calibration methods, for the analysis of aggregated categorical biomarker data. We compare the performance of both methods for estimating the biomarker-disease relationship under a random sample or controls-only calibration design. Our findings include: (1) the exact calibration method provides significantly less biased estimates and more accurate confidence intervals than the other method; (2) the cut-off calibration method could yield estimates with minimal bias and valid confidence intervals under small measurement errors and/or small exposure effects; (3) controls-only calibration design can result in additional bias, but the bias is minimal if the exposure effects and/or disease prevalences are small. Finally, we illustrate the methods in an application evaluating the relationship between circulating vitamin D levels and colorectal cancer risk in a pooling project.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
- Department of Mathematical Sciences, Tsinghua University
| | - Molin Wang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| |
Collapse
|
8
|
Formulas and Web Application for Designing a Biospecimen Pooling Study to Compare Group Means. Epidemiology 2019; 31:98-102. [PMID: 31567748 DOI: 10.1097/ede.0000000000001104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND When research focuses on biomarker assessment in settings where per-assay costs are high relative to per-subject costs, a biospecimen pooling study design can be extremely cost-effective. However, designing a study to maximize cost savings is complicated by the fact that pooled measurements are typically subject to processing error, inducing additional variability caused by combining biospecimens, and may also be affected by assay-related measurement error. METHODS We provide formulas and an interactive web application (hereafter called app) for designing a pooling study to compare group means. Power and sample size formulas are justified by Central Limit Theorem arguments that make no distributional assumptions on the biomarker. Errors can be assumed mean-0 additive or mean-1 multiplicative, the latter being well-suited for skewed biomarkers. RESULTS User inputs for the app include usual power parameters as well as per-assay and per-subject costs and information about the errors: which are present, whether they are additive or multiplicative, and their variances. The app generates plots revealing the optimal pool size, required number of assays, cost savings, and sensitivity to the hard-to-predict processing error variance. CONCLUSIONS These tools should aid in the design and deployment of pooling studies powered to detect group mean differences while minimizing total study costs.
Collapse
|
9
|
Van Domelen DR, Mitchell EM, Perkins NJ, Schisterman EF, Manatunga AK, Huang Y, Lyles RH. Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors. Biostatistics 2019; 22:250-265. [PMID: 31373355 DOI: 10.1093/biostatistics/kxz028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 04/05/2019] [Accepted: 06/23/2019] [Indexed: 11/14/2022] Open
Abstract
Measuring a biomarker in pooled samples from multiple cases or controls can lead to cost-effective estimation of a covariate-adjusted odds ratio, particularly for expensive assays. But pooled measurements may be affected by assay-related measurement error (ME) and/or pooling-related processing error (PE), which can induce bias if ignored. Building on recently developed methods for a normal biomarker subject to additive errors, we present two related estimators for a right-skewed biomarker subject to multiplicative errors: one based on logistic regression and the other based on a Gamma discriminant function model. Applied to a reproductive health dataset with a right-skewed cytokine measured in pools of size 1 and 2, both methods suggest no association with spontaneous abortion. The fitted models indicate little ME but fairly severe PE, the latter of which is much too large to ignore. Simulations mimicking these data with a non-unity odds ratio confirm validity of the estimators and illustrate how PE can detract from pooling-related gains in statistical efficiency. These methods address a key issue associated with the homogeneous pools study design and should facilitate valid odds ratio estimation at a lower cost in a wide range of scenarios.
Collapse
Affiliation(s)
- Dane R Van Domelen
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Rd., Atlanta, GA, USA
| | - Emily M Mitchell
- Center for Financing, Access, and Cost Trends, Agency for Healthcare Research and Quality, 5600 Fishers Lane, Rockville, MD, USA
| | - Neil J Perkins
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, 6710B Rockledge Drive, Bethesda, MD, USA
| | - Enrique F Schisterman
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, 6710B Rockledge Drive, Bethesda, MD, USA
| | - Amita K Manatunga
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Rd., Atlanta, GA 30322, USA
| | - Yijian Huang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Rd., Atlanta, GA 30322, USA
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Rd., Atlanta, GA 30322, USA
| |
Collapse
|
10
|
Van Domelen DR, Mitchell EM, Perkins NJ, Schisterman EF, Manatunga AK, Huang Y, Lyles RH. Logistic regression with a continuous exposure measured in pools and subject to errors. Stat Med 2018; 37:4007-4021. [PMID: 30022497 DOI: 10.1002/sim.7891] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 05/23/2018] [Accepted: 06/08/2018] [Indexed: 11/07/2022]
Abstract
In a multivariable logistic regression setting where measuring a continuous exposure requires an expensive assay, a design in which the biomarker is measured in pooled samples from multiple subjects can be very cost effective. A logistic regression model for poolwise data is available, but validity requires that the assay yields the precise mean exposure for members of each pool. To account for errors, we assume the assay returns the true mean exposure plus a measurement error (ME) and/or a processing error (PE). We pursue likelihood-based inference for a binary health-related outcome modeled by logistic regression coupled with a normal linear model relating individual-level exposure to covariates and assuming that the ME and PE components are independent and normally distributed regardless of pool size. We compare this approach with a discriminant function-based alternative, and we demonstrate the potential value of incorporating replicates into the study design. Applied to a reproductive health dataset with pools of size 2 along with individual samples and replicates, the model fit with both ME and PE had a lower AIC than a model accounting for ME only. Relative to ignoring errors, this model suggested a somewhat higher (though still nonsignificant) adjusted log-odds ratio associating the cytokine MCP-1 with risk of spontaneous abortion. Simulations modeled after these data confirm validity of the methods, demonstrate how ME and particularly PE can reduce the efficiency advantage of a pooling design, and highlight the value of replicates in improving stability when both errors are present.
Collapse
Affiliation(s)
- Dane R Van Domelen
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Emily M Mitchell
- The Center for Financing, Access, and Cost Trends, Agency for Healthcare Research and Quality, Rockville, Maryland
| | - Neil J Perkins
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, Bethesda, Maryland
| | - Enrique F Schisterman
- Eunice Kennedy Shriver National Institute of Child Health and Human Development, Epidemiology Branch, Division of Intramural Population Health Research, Bethesda, Maryland
| | - Amita K Manatunga
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Yijian Huang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia
| |
Collapse
|
11
|
Hyun N, Gastwirth JL, Graubard BI. Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents. Stat Med 2018; 37:2174-2186. [PMID: 29579785 DOI: 10.1002/sim.7648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 01/01/2018] [Accepted: 02/07/2018] [Indexed: 11/06/2022]
Abstract
Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods.
Collapse
Affiliation(s)
- Noorie Hyun
- Division of Biostatistics, Institute of Health and Equity, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Joseph L Gastwirth
- Department of Statistics, George Washington University, Washington, DC, USA
| | - Barry I Graubard
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, U.S.A
| |
Collapse
|
12
|
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
13
|
Wang D, McMahan CS, Tebbs JM, Bilder CR. Group testing case identification with biomarker information. Comput Stat Data Anal 2018; 122:156-166. [PMID: 29977101 DOI: 10.1016/j.csda.2018.01.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Screening procedures for infectious diseases, such as HIV, often involve pooling individual specimens together and testing the pools. For diseases with low prevalence, group testing (or pooled testing) can be used to classify individuals as diseased or not while providing considerable cost savings when compared to testing specimens individually. The pooling literature is replete with group testing case identification algorithms including Dorfman testing, higher-stage hierarchical procedures, and array testing. Although these algorithms are usually evaluated on the basis of the expected number of tests and classification accuracy, most evaluations in the literature do not account for the continuous nature of the testing responses and thus invoke potentially restrictive assumptions to characterize an algorithm's performance. Commonly used case identification algorithms in group testing are considered and are evaluated by taking a different approach. Instead of treating testing responses as binary random variables (i.e., diseased/not), evaluations are made by exploiting an assay's underlying continuous biomarker distributions for positive and negative individuals. In doing so, a general framework to describe the operating characteristics of group testing case identification algorithms is provided when these distributions are known. The methodology is illustrated using two HIV testing examples taken from the pooling literature.
Collapse
Affiliation(s)
- Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | | | - Joshua M Tebbs
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Christopher R Bilder
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| |
Collapse
|
14
|
Liu Y, McMahan C, Gallagher C. A general framework for the regression analysis of pooled biomarker assessments. Stat Med 2017; 36:2363-2377. [PMID: 28349583 PMCID: PMC5484591 DOI: 10.1002/sim.7291] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Revised: 02/17/2017] [Accepted: 03/06/2017] [Indexed: 11/11/2022]
Abstract
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yan Liu
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher McMahan
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Colin Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
15
|
McMahan CS, McLain AC, Gallagher CM, Schisterman EF. Estimating covariate-adjusted measures of diagnostic accuracy based on pooled biomarker assessments. Biom J 2016; 58:944-61. [PMID: 26927583 DOI: 10.1002/bimj.201500195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/31/2015] [Accepted: 01/06/2016] [Indexed: 11/10/2022]
Abstract
There is a need for epidemiological and medical researchers to identify new biomarkers (biological markers) that are useful in determining exposure levels and/or for the purposes of disease detection. Often this process is stunted by high testing costs associated with evaluating new biomarkers. Traditionally, biomarker assessments are individually tested within a target population. Pooling has been proposed to help alleviate the testing costs, where pools are formed by combining several individual specimens. Methods for using pooled biomarker assessments to estimate discriminatory ability have been developed. However, all these procedures have failed to acknowledge confounding factors. In this paper, we propose a regression methodology based on pooled biomarker measurements that allow the assessment of the discriminatory ability of a biomarker of interest. In particular, we develop covariate-adjusted estimators of the receiver-operating characteristic curve, the area under the curve, and Youden's index. We establish the asymptotic properties of these estimators and develop inferential techniques that allow one to assess whether a biomarker is a good discriminator between cases and controls, while controlling for confounders. The finite sample performance of the proposed methodology is illustrated through simulation. We apply our methods to analyze myocardial infarction (MI) data, with the goal of determining whether the pro-inflammatory cytokine interleukin-6 is a good predictor of MI after controlling for the subjects' cholesterol levels.
Collapse
Affiliation(s)
| | - Alexander C McLain
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC 29208, USA
| | - Colin M Gallagher
- Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA
| |
Collapse
|
16
|
Delaigle A, Zhou WX. Nonparametric and Parametric Estimators of Prevalence From Group Testing Data With Aggregated Covariates. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1054491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Heavner K, Newschaffer C, Hertz-Picciotto I, Bennett D, Burstyn I. Pooling Bio-Specimens in the Presence of Measurement Error and Non-Linearity in Dose-Response: Simulation Study in the Context of a Birth Cohort Investigating Risk Factors for Autism Spectrum Disorders. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2015; 12:14780-99. [PMID: 26610532 PMCID: PMC4661679 DOI: 10.3390/ijerph121114780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 11/04/2015] [Accepted: 11/06/2015] [Indexed: 11/16/2022]
Abstract
We sought to determine the potential effects of pooling on power, false positive rate (FPR), and bias of the estimated associations between hypothetical environmental exposures and dichotomous autism spectrum disorders (ASD) status. Simulated birth cohorts in which ASD outcome was assumed to have been ascertained with uncertainty were created. We investigated the impact on the power of the analysis (using logistic regression) to detect true associations with exposure (X1) and the FPR for a non-causal correlate of exposure (X2, r = 0.7) for a dichotomized ASD measure when the pool size, sample size, degree of measurement error variance in exposure, strength of the true association, and shape of the exposure-response curve varied. We found that there was minimal change (bias) in the measures of association for the main effect (X1). There is some loss of power but there is less chance of detecting a false positive result for pooled compared to individual level models. The number of pools had more effect on the power and FPR than the overall sample size. This study supports the use of pooling to reduce laboratory costs while maintaining statistical efficiency in scenarios similar to the simulated prospective risk-enriched ASD cohort.
Collapse
Affiliation(s)
- Karyn Heavner
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA.
| | - Craig Newschaffer
- A.J. Drexel Autism Institute, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA.
| | - Irva Hertz-Picciotto
- Department of Public Health Sciences, University of California at Davis, Davis, CA 95616, USA.
| | - Deborah Bennett
- Department of Public Health Sciences, University of California at Davis, Davis, CA 95616, USA.
| | - Igor Burstyn
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA.
- A.J. Drexel Autism Institute, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA.
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
18
|
Lyles RH, Van Domelen D, Mitchell EM, Schisterman EF. A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2015; 12:14723-40. [PMID: 26593934 PMCID: PMC4661676 DOI: 10.3390/ijerph121114723] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 10/15/2015] [Accepted: 11/06/2015] [Indexed: 11/26/2022]
Abstract
Pooling biological specimens prior to performing expensive laboratory assays has been shown to be a cost effective approach for estimating parameters of interest. In addition to requiring specialized statistical techniques, however, the pooling of samples can introduce assay errors due to processing, possibly in addition to measurement error that may be present when the assay is applied to individual samples. Failure to account for these sources of error can result in biased parameter estimates and ultimately faulty inference. Prior research addressing biomarker mean and variance estimation advocates hybrid designs consisting of individual as well as pooled samples to account for measurement and processing (or pooling) error. We consider adapting this approach to the problem of estimating a covariate-adjusted odds ratio (OR) relating a binary outcome to a continuous exposure or biomarker level assessed in pools. In particular, we explore the applicability of a discriminant function-based analysis that assumes normal residual, processing, and measurement errors. A potential advantage of this method is that maximum likelihood estimation of the desired adjusted log OR is straightforward and computationally convenient. Moreover, in the absence of measurement and processing error, the method yields an efficient unbiased estimator for the parameter of interest assuming normal residual errors. We illustrate the approach using real data from an ancillary study of the Collaborative Perinatal Project, and we use simulations to demonstrate the ability of the proposed estimators to alleviate bias due to measurement and processing error.
Collapse
Affiliation(s)
- Robert H Lyles
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, GA 30322, USA.
| | - Dane Van Domelen
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, GA 30322, USA.
| | - Emily M Mitchell
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
| |
Collapse
|
19
|
Mitchell EM, Lyles RH, Schisterman EF. Positing, fitting, and selecting regression models for pooled biomarker data. Stat Med 2015; 34:2544-58. [PMID: 25846980 DOI: 10.1002/sim.6496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 02/18/2015] [Accepted: 03/13/2015] [Indexed: 01/31/2023]
Abstract
Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, 30322, GA, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| |
Collapse
|
20
|
Mitchell EM, Lyles RH, Manatunga AK, Schisterman EF. Semiparametric regression models for a right-skewed outcome subject to pooling. Am J Epidemiol 2015; 181:541-8. [PMID: 25737248 DOI: 10.1093/aje/kwu301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Pooling specimens prior to performing laboratory assays has various benefits. Pooling can help to reduce cost, preserve irreplaceable specimens, meet minimal volume requirements for certain lab tests, and even reduce information loss when a limit of detection is present. Regardless of the motivation for pooling, appropriate analytical techniques must be applied in order to obtain valid inference from composite specimens. When biomarkers are treated as the outcome in a regression model, techniques applicable to individually measured specimens may not be valid when measurements are taken from pooled specimens, particularly when the biomarker is positive and right skewed. In this paper, we propose a novel semiparametric estimation method based on an adaptation of the quasi-likelihood approach that can be applied to a right-skewed outcome subject to pooling. We use simulation studies to compare this method with an existing estimation technique that provides valid estimates only when pools are formed from specimens with identical predictor values. Simulation results and analysis of a motivating example demonstrate that, when appropriate estimation techniques are applied to strategically formed pools, valid and efficient estimation of the regression coefficients can be achieved.
Collapse
|
21
|
Mitchell EM, Lyles RH, Manatunga AK, Perkins NJ, Schisterman EF. A highly efficient design strategy for regression with outcome pooling. Stat Med 2014; 33:5028-40. [PMID: 25220822 DOI: 10.1002/sim.6305] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 05/19/2014] [Accepted: 08/25/2014] [Indexed: 11/06/2022]
Abstract
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | | | | | | | | |
Collapse
|