1
|
Wang J, Zhao Y, Tang LL, Mueller C, Li Q. A resample-replace lasso procedure for combining high-dimensional markers with limit of detection. J Appl Stat 2021; 49:4278-4293. [DOI: 10.1080/02664763.2021.1977785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jinjuan Wang
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Yunpeng Zhao
- School of Mathematical and Natural Sciences, Arizona State University, Tempe, AZ, USA
| | - Larry L. Tang
- Department of Statistics and National Center for Forensic Science, University of Central Florida, Orlando, FL, USA
- Department of Statistics, Rehabilitation Medicine Department, NIH Clinical Center, Bethesda, MD, USA
| | | | - Qizhai Li
- LSC Academy of Mathematics and Systems Science, Chinese Academy of Sciences and University of Chinese Academy of Sciences, Beijing, People's Republic of China
| |
Collapse
|
2
|
Curley B. A nonlinear measurement error model and its application to describing the dependency of health outcomes on dietary intake. J Appl Stat 2021; 49:1485-1518. [DOI: 10.1080/02664763.2020.1870671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- B. Curley
- Moravian College, Bethlehem, PA, USA
| |
Collapse
|
3
|
|
4
|
Zhang W, Liu A, Albert PS, Ashmead RD, Schisterman EF, Mills JL. A pooling strategy to effectively use genotype data in quantitative traits genome-wide association studies. Stat Med 2018; 37:4083-4095. [PMID: 30003569 DOI: 10.1002/sim.7898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 04/17/2018] [Accepted: 06/01/2018] [Indexed: 11/11/2022]
Abstract
The goal of quantitative traits genome-wide association studies is to identify associations between a phenotypic variable, such as a vitamin level and genetic variants, often single-nucleotide polymorphisms. When funding limits the number of assays that can be performed to measure the level of the phenotypic variable, a subgroup of subjects is often randomly selected from the genotype database and the level of the phenotypic variable is then measured for each subject. Because only a proportion of the genotype data can be used, such a simple random sampling method may suffer from substantial loss of efficiency, especially when the number of assays is relative small and the frequency of the less common variant (minor allele frequency) is low. We propose a pooling strategy in which subjects in a randomly selected reference subgroup are aligned with randomly selected subjects from the remaining study subjects to form independent pools; blood samples from subjects in each pool are mixed; and the level of the phenotypic variable is measured for each pool. We demonstrate that the proposed pooling approach produces considerable gains in efficiency over the simple random sampling method for inference concerning the phenotype-genotype association, resulting in higher precision and power. The methods are illustrated using genotypic and phenotypic data from the Trinity Students Study, a quantitative genome-wide association study.
Collapse
Affiliation(s)
- Wei Zhang
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Paul S Albert
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Robert D Ashmead
- Center for Statistical Research and Methodology, US Census Bureau, Washington, District of Columbia
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - James L Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
5
|
Affiliation(s)
- Juexin Lin
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| | - Dewei Wang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
6
|
Mitchell EM, Plowden TC, Schisterman EF. Estimating relative risk of a log-transformed exposure measured in pools. Stat Med 2016; 35:5477-5494. [PMID: 27530506 DOI: 10.1002/sim.7075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 07/08/2016] [Accepted: 07/22/2016] [Indexed: 11/07/2022]
Abstract
Pooling biospecimens prior to performing laboratory assays is a useful tool to reduce costs, achieve minimum volume requirements and mitigate assay measurement error. When estimating the risk of a continuous, pooled exposure on a binary outcome, specialized statistical techniques are required. Current methods include a regression calibration approach, where the expectation of the individual-level exposure is calculated by adjusting the observed pooled measurement with additional covariate data. While this method employs a linear regression calibration model, we propose an alternative model that can accommodate log-linear relationships between the exposure and predictive covariates. The proposed model permits direct estimation of the relative risk associated with a log-transformation of an exposure measured in pools. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Torie C Plowden
- Program in Reproductive and Adult Endocrinology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, Maryland, U.S.A
| |
Collapse
|
7
|
Perkins NJ, Mitchell EM, Lyles RH, Schisterman EF. Case-control data analysis for randomly pooled biomarkers. Biom J 2016; 58:1007-20. [PMID: 26824757 DOI: 10.1002/bimj.201500010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 09/11/2015] [Accepted: 10/07/2015] [Indexed: 11/09/2022]
Abstract
Pooled study designs, where individual biospecimens are combined prior to measurement via a laboratory assay, can reduce lab costs while maintaining statistical efficiency. Analysis of the resulting pooled measurements, however, often requires specialized techniques. Existing methods can effectively estimate the relation between a binary outcome and a continuous pooled exposure when pools are matched on disease status. When pools are of mixed disease status, however, the existing methods may not be applicable. By exploiting characteristics of the gamma distribution, we propose a flexible method for estimating odds ratios from pooled measurements of mixed and matched status. We use simulation studies to compare consistency and efficiency of risk effect estimates from our proposed methods to existing methods. We then demonstrate the efficacy of our method applied to an analysis of pregnancy outcomes and pooled cytokine concentrations. Our proposed approach contributes to the toolkit of available methods for analyzing odds ratios of a pooled exposure, without restricting pools to be matched on a specific outcome.
Collapse
Affiliation(s)
- Neil J Perkins
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, Bethesda, MD, 20892, USA.
| | - Emily M Mitchell
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, Bethesda, MD, 20892, USA
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, USA
| | - Enrique F Schisterman
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, Bethesda, MD, 20892, USA
| |
Collapse
|
8
|
Mitchell EM, Lyles RH, Schisterman EF. Positing, fitting, and selecting regression models for pooled biomarker data. Stat Med 2015; 34:2544-58. [PMID: 25846980 DOI: 10.1002/sim.6496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 02/18/2015] [Accepted: 03/13/2015] [Indexed: 01/31/2023]
Abstract
Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision.
Collapse
Affiliation(s)
- Emily M Mitchell
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| | - Robert H Lyles
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, 30322, GA, U.S.A
| | - Enrique F Schisterman
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, 20892, MD, U.S.A
| |
Collapse
|
9
|
Mitchell EM, Lyles RH, Manatunga AK, Schisterman EF. Semiparametric regression models for a right-skewed outcome subject to pooling. Am J Epidemiol 2015; 181:541-8. [PMID: 25737248 DOI: 10.1093/aje/kwu301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Pooling specimens prior to performing laboratory assays has various benefits. Pooling can help to reduce cost, preserve irreplaceable specimens, meet minimal volume requirements for certain lab tests, and even reduce information loss when a limit of detection is present. Regardless of the motivation for pooling, appropriate analytical techniques must be applied in order to obtain valid inference from composite specimens. When biomarkers are treated as the outcome in a regression model, techniques applicable to individually measured specimens may not be valid when measurements are taken from pooled specimens, particularly when the biomarker is positive and right skewed. In this paper, we propose a novel semiparametric estimation method based on an adaptation of the quasi-likelihood approach that can be applied to a right-skewed outcome subject to pooling. We use simulation studies to compare this method with an existing estimation technique that provides valid estimates only when pools are formed from specimens with identical predictor values. Simulation results and analysis of a motivating example demonstrate that, when appropriate estimation techniques are applied to strategically formed pools, valid and efficient estimation of the regression coefficients can be achieved.
Collapse
|
10
|
Vexler A, Tao G, Chen X. A toolkit for clinical statisticians to fix problems based on biomarker measurements subject to instrumental limitations: from repeated measurement techniques to a hybrid pooled-unpooled design. Methods Mol Biol 2015; 1208:439-60. [PMID: 25323525 DOI: 10.1007/978-1-4939-1441-8_31] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The aim of this chapter is to review and examine different methods in order to display correct and efficient statistical techniques based on complete/incomplete data subject to different sorts of measurement error (ME) problems. Instrument inaccuracies, biological variations, and/or errors in questionnaire-based self-report data can lead to significant MEs in various clinical experiments. Ignoring MEs can cause bias or inconsistency of statistical inferences. The biostatistical literature well addresses two categories of MEs: errors related to additive models and errors caused by the limit of detection (LOD). Several statistical approaches have been developed to analyze data affected by MEs, including the parametric/nonparametric likelihood methodologies, Bayesian methods, the single and multiple imputation techniques, and the repeated measurement design of experiment. We present a novel hybrid pooled-unpooled design as one of the strategies to provide correct statistical inferences when data is subject to MEs. This hybrid design and the classical techniques are compared to show the advantages and disadvantages of the considered methods.
Collapse
Affiliation(s)
- Albert Vexler
- Department of Biostatistics, New York State University at Buffalo, 715 Kimball Tower, 3435 Main Street, Buffalo, NY, 14214, USA,
| | | | | |
Collapse
|
11
|
Mitchell EM, Lyles RH, Manatunga AK, Perkins NJ, Schisterman EF. A highly efficient design strategy for regression with outcome pooling. Stat Med 2014; 33:5028-40. [PMID: 25220822 DOI: 10.1002/sim.6305] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 05/19/2014] [Accepted: 08/25/2014] [Indexed: 11/06/2022]
Abstract
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, U.S.A
| | | | | | | | | |
Collapse
|
12
|
Mitchell EM, Lyles RH, Manatunga AK, Danaher M, Perkins NJ, Schisterman EF. Regression for skewed biomarker outcomes subject to pooling. Biometrics 2014; 70:202-11. [PMID: 24521420 DOI: 10.1111/biom.12134] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/01/2013] [Accepted: 11/01/2013] [Indexed: 11/26/2022]
Abstract
Epidemiological studies involving biomarkers are often hindered by prohibitively expensive laboratory tests. Strategically pooling specimens prior to performing these lab assays has been shown to effectively reduce cost with minimal information loss in a logistic regression setting. When the goal is to perform regression with a continuous biomarker as the outcome, regression analysis of pooled specimens may not be straightforward, particularly if the outcome is right-skewed. In such cases, we demonstrate that a slight modification of a standard multiple linear regression model for poolwise data can provide valid and precise coefficient estimates when pools are formed by combining biospecimens from subjects with identical covariate values. When these x-homogeneous pools cannot be formed, we propose a Monte Carlo expectation maximization (MCEM) algorithm to compute maximum likelihood estimates (MLEs). Simulation studies demonstrate that these analytical methods provide essentially unbiased estimates of coefficient parameters as well as their standard errors when appropriate assumptions are met. Furthermore, we show how one can utilize the fully observed covariate data to inform the pooling strategy, yielding a high level of statistical efficiency at a fraction of the total lab cost.
Collapse
Affiliation(s)
- Emily M Mitchell
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, U.S.A
| | | | | | | | | | | |
Collapse
|