1
|
Mitchell EM, Plowden TC, Schisterman EF. Estimating relative risk of a log-transformed exposure measured in pools. Stat Med 2016; 35:5477-5494. [PMID: 27530506 DOI: 10.1002/sim.7075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 07/08/2016] [Accepted: 07/22/2016] [Indexed: 11/07/2022]
Abstract
Pooling biospecimens prior to performing laboratory assays is a useful tool to reduce costs, achieve minimum volume requirements and mitigate assay measurement error. When estimating the risk of a continuous, pooled exposure on a binary outcome, specialized statistical techniques are required. Current methods include a regression calibration approach, where the expectation of the individual-level exposure is calculated by adjusting the observed pooled measurement with additional covariate data. While this method employs a linear regression calibration model, we propose an alternative model that can accommodate log-linear relationships between the exposure and predictive covariates. The proposed model permits direct estimation of the relative risk associated with a log-transformation of an exposure measured in pools. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Torie C Plowden
- Program in Reproductive and Adult Endocrinology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| |
Collapse
|
2
|
Lyles RH, Mitchell EM, Weinberg CR, Umbach DM, Schisterman EF. An efficient design strategy for logistic regression using outcome- and covariate-dependent pooling of biospecimens prior to assay. Biometrics 2016; 72:965-75. [PMID: 26964741 DOI: 10.1111/biom.12489] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2015] [Revised: 10/01/2015] [Accepted: 12/01/2015] [Indexed: 11/30/2022]
Abstract
Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718-726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call "(y,c)-pooling," which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure.
Collapse
Affiliation(s)
- Robert H Lyles
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, Georgia 30322, U.S.A..
| | - Emily M Mitchell
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, U.S.A
| | - Clarice R Weinberg
- Biostatistics and Computational Biology Branch, MD A303, National Institute of Environmental Health Sciences, National Institutes of Health, P.O. Box 12233, Research Triangle Park, North Carolina 27709, U.S.A
| | - David M Umbach
- Biostatistics and Computational Biology Branch, MD A303, National Institute of Environmental Health Sciences, National Institutes of Health, P.O. Box 12233, Research Triangle Park, North Carolina 27709, U.S.A
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, U.S.A
| |
Collapse
|
3
|
Lyles RH, Van Domelen D, Mitchell EM, Schisterman EF. A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2015; 12:14723-40. [PMID: 26593934 PMCID: PMC4661676 DOI: 10.3390/ijerph121114723] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 10/15/2015] [Accepted: 11/06/2015] [Indexed: 11/26/2022]
Abstract
Pooling biological specimens prior to performing expensive laboratory assays has been shown to be a cost effective approach for estimating parameters of interest. In addition to requiring specialized statistical techniques, however, the pooling of samples can introduce assay errors due to processing, possibly in addition to measurement error that may be present when the assay is applied to individual samples. Failure to account for these sources of error can result in biased parameter estimates and ultimately faulty inference. Prior research addressing biomarker mean and variance estimation advocates hybrid designs consisting of individual as well as pooled samples to account for measurement and processing (or pooling) error. We consider adapting this approach to the problem of estimating a covariate-adjusted odds ratio (OR) relating a binary outcome to a continuous exposure or biomarker level assessed in pools. In particular, we explore the applicability of a discriminant function-based analysis that assumes normal residual, processing, and measurement errors. A potential advantage of this method is that maximum likelihood estimation of the desired adjusted log OR is straightforward and computationally convenient. Moreover, in the absence of measurement and processing error, the method yields an efficient unbiased estimator for the parameter of interest assuming normal residual errors. We illustrate the approach using real data from an ancillary study of the Collaborative Perinatal Project, and we use simulations to demonstrate the ability of the proposed estimators to alleviate bias due to measurement and processing error.
Collapse
Affiliation(s)
- Robert H Lyles
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, GA 30322, USA.
| | - Dane Van Domelen
- Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, GA 30322, USA.
| | - Emily M Mitchell
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892, USA.
| |
Collapse
|
4
|
Mitchell EM, Lyles RH, Manatunga AK, Schisterman EF. Semiparametric regression models for a right-skewed outcome subject to pooling. Am J Epidemiol 2015; 181:541-8. [PMID: 25737248 DOI: 10.1093/aje/kwu301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Pooling specimens prior to performing laboratory assays has various benefits. Pooling can help to reduce cost, preserve irreplaceable specimens, meet minimal volume requirements for certain lab tests, and even reduce information loss when a limit of detection is present. Regardless of the motivation for pooling, appropriate analytical techniques must be applied in order to obtain valid inference from composite specimens. When biomarkers are treated as the outcome in a regression model, techniques applicable to individually measured specimens may not be valid when measurements are taken from pooled specimens, particularly when the biomarker is positive and right skewed. In this paper, we propose a novel semiparametric estimation method based on an adaptation of the quasi-likelihood approach that can be applied to a right-skewed outcome subject to pooling. We use simulation studies to compare this method with an existing estimation technique that provides valid estimates only when pools are formed from specimens with identical predictor values. Simulation results and analysis of a motivating example demonstrate that, when appropriate estimation techniques are applied to strategically formed pools, valid and efficient estimation of the regression coefficients can be achieved.
Collapse
|
5
|
Mitchell EM, Lyles RH, Manatunga AK, Perkins NJ, Schisterman EF. A highly efficient design strategy for regression with outcome pooling. Stat Med 2014; 33:5028-40. [PMID: 25220822 DOI: 10.1002/sim.6305] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 05/19/2014] [Accepted: 08/25/2014] [Indexed: 11/06/2022]
Abstract
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | | | | | | | | |
Collapse
|
6
|
Mitchell EM, Lyles RH, Manatunga AK, Danaher M, Perkins NJ, Schisterman EF. Regression for skewed biomarker outcomes subject to pooling. Biometrics 2014; 70:202-11. [PMID: 24521420 DOI: 10.1111/biom.12134] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/01/2013] [Accepted: 11/01/2013] [Indexed: 11/26/2022]
Abstract
Epidemiological studies involving biomarkers are often hindered by prohibitively expensive laboratory tests. Strategically pooling specimens prior to performing these lab assays has been shown to effectively reduce cost with minimal information loss in a logistic regression setting. When the goal is to perform regression with a continuous biomarker as the outcome, regression analysis of pooled specimens may not be straightforward, particularly if the outcome is right-skewed. In such cases, we demonstrate that a slight modification of a standard multiple linear regression model for poolwise data can provide valid and precise coefficient estimates when pools are formed by combining biospecimens from subjects with identical covariate values. When these x-homogeneous pools cannot be formed, we propose a Monte Carlo expectation maximization (MCEM) algorithm to compute maximum likelihood estimates (MLEs). Simulation studies demonstrate that these analytical methods provide essentially unbiased estimates of coefficient parameters as well as their standard errors when appropriate assumptions are met. Furthermore, we show how one can utilize the fully observed covariate data to inform the pooling strategy, yielding a high level of statistical efficiency at a fraction of the total lab cost.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, U.S.A
| | | | | | | | | | | |
Collapse
|
7
|
Abstract
Many statisticians have contributed to studies of the HIV epidemic and progression to AIDS. They have developed new statistical methodology, where needed, to address HIV-related issues. The transfer of methods from one area to another often involves a substantial delay. This paper points to methods that were developed in the HIV context and have either already found applications in other areas of medical research or have the potential for such applications, with the hope that this will promote a speedier transfer of the research methods. Among the new tools that HIV studies have placed firmly into the pool of statistical methods for medical research are the methods of back-calculation, methods for the analysis of retrospective ascertainment data and methods of analysis for the combined data from clinical trials and associated longitudinal studies. Notions that have been stimulated substantially are use of surrogate endpoints in clinical trials and screening blood products by the use of pooled serum samples. Research activity in many other areas has been boosted substantially through contributions motivated by HIV/AIDS studies. Noteworthy examples are analyses for doubly-censored lifetime data and methods for assessing vaccines for transmissible diseases.
Collapse
Affiliation(s)
- N G Becker
- National Centre for Epidemiology and Population Health, Australian National University, Canberra, ACT 0200, Australia.
| | | |
Collapse
|