1
|
LOTSPEICH SC, AMORIM GGC, SHAW PA, TAO R, SHEPHERD BE. Optimal multiwave validation of secondary use data with outcome and exposure misclassification. CAN J STAT 2024; 52:532-554. [PMID: 39629097 PMCID: PMC11610482 DOI: 10.1002/cjs.11772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 12/23/2022] [Indexed: 04/03/2023]
Abstract
Observational databases provide unprecedented opportunities for secondary use in biomedical research. However, these data can be error-prone and must be validated before use. It is usually unrealistic to validate the whole database because of resource constraints. A cost-effective alternative is a two-phase design that validates a subset of records enriched for information about a particular research question. We consider odds ratio estimation under differential outcome and exposure misclassification and propose optimal designs that minimize the variance of the maximum likelihood estimator. Our adaptive grid search algorithm can locate the optimal design in a computationally feasible manner. Because the optimal design relies on unknown parameters, we introduce a multiwave strategy to approximate the optimal design. We demonstrate the proposed design's efficiency gains through simulations and two large observational studies.
Collapse
Affiliation(s)
- Sarah C. LOTSPEICH
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, 27109, North Carolina, U.S.A
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, 37203, Tennessee, U.S.A
| | - Gustavo G. C. AMORIM
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, 37203, Tennessee, U.S.A
| | - Pamela A. SHAW
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, 19104, Pennsylvania, U.S.A
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, 98101, Washington, U.S.A
| | - Ran TAO
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, 37203, Tennessee, U.S.A
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, 37232, Tennessee, U.S.A
| | - Bryan E. SHEPHERD
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, 37203, Tennessee, U.S.A
| |
Collapse
|
2
|
Sneed NM, Heerman WJ, Shaw PA, Han K, Chen T, Bian A, Pugh S, Duda S, Lumley T, Shepherd BE. Associations Between Gestational Weight Gain, Gestational Diabetes, and Childhood Obesity Incidence. Matern Child Health J 2024; 28:372-381. [PMID: 37966561 PMCID: PMC10922599 DOI: 10.1007/s10995-023-03853-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2023] [Indexed: 11/16/2023]
Abstract
INTRODUCTION Excessive maternal gestational weight gain (GWG) is strongly correlated with childhood obesity, yet how excess maternal weight gain and gestational diabetes mellitus (GDM) interact to affect early childhood obesity is poorly understood. The purpose of this study was to investigate whether overall and trimester-specific maternal GWG and GDM were associated with obesity in offspring by age 6 years. METHODS A cohort of 10,335 maternal-child dyads was established from electronic health records. Maternal weights at conception and delivery were estimated from weight trajectory fits using functional principal components analysis. Kaplan-Meier curves and Cox regression, together with generalized raking, examined time-to-childhood-obesity. RESULTS Obesity diagnosed prior to age 6 years was estimated at 19.7% (95% CI: 18.3, 21.1). Maternal weight gain during pregnancy was a strong predictor of early childhood obesity (p < 0.0001). The occurrence of early childhood obesity was lower among mothers with GDM compared with those without diabetes (adjusted hazard ratio = 0.58, p = 0.014). There was no interaction between maternal weight gain and GDM (p = 0.55). Higher weight gain during the first trimester was associated with lower risk of early childhood obesity (p = 0.0002) whereas higher weight gain during the second and third trimesters was associated with higher risk (p < 0.0001). DISCUSSION Results indicated total and trimester-specific maternal weight gain was a strong predictor of early childhood obesity, though obesity risk by age 6 was lower for children of mothers with GDM. Additional research is needed to elucidate underlying mechanisms directly related to trimester-specific weight gain and GDM that impede or protect against obesity prevalence during early childhood.
Collapse
Affiliation(s)
- Nadia M Sneed
- Department of Pediatrics, Vanderbilt University Medical Center, 2146 Belcourt Ave., Nashville, TN, 37212, USA.
- Center for Research Development and Scholarship, Vanderbilt University School of Nursing, 319E Godchaux Hall, Nashville, TN, 37240, USA.
| | - William J Heerman
- Department of Pediatrics, Vanderbilt University Medical Center, 2146 Belcourt Ave., Nashville, TN, 37212, USA
| | - Pamela A Shaw
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave Suite 1600, Seattle, WA, 98101, USA
| | - Kyunghee Han
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 851 S Morgan St, 503 Science and Engineering Offices, Chicago, IL, 60607, USA
| | - Tong Chen
- Department of Statistics, University of Auckland, 38 Princes St., Auckland, 1010, New Zealand
| | - Aihua Bian
- Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave., Room/Suite 11124, Nashville, TN, 37203, USA
| | - Shannon Pugh
- Department of Emergency Medicine, Vanderbilt University Medical Center, 1211 Medical Center Drive, Nashville, TN, 37232, USA
| | - Stephany Duda
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave., Nashville, TN, 37203, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, 38 Princes St., Auckland, 1010, New Zealand
| | - Bryan E Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave., Room/Suite 11124, Nashville, TN, 37203, USA
| |
Collapse
|
3
|
Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Patel RC, Shepherd BE. Three-phase generalized raking and multiple imputation estimators to address error-prone data. Stat Med 2024; 43:379-394. [PMID: 37987515 PMCID: PMC10842111 DOI: 10.1002/sim.9967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 09/23/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023]
Abstract
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
Collapse
Affiliation(s)
- Gustavo Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sarah Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| | - Pamela A. Shaw
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Rena C. Patel
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
4
|
Sauer S, Hedt-Gauthier B, Haneuse S. Practical strategies for operationalizing optimal allocation in stratified cluster-based outcome-dependent sampling designs. Stat Med 2023; 42:917-935. [PMID: 36650619 PMCID: PMC10006324 DOI: 10.1002/sim.9650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/08/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Cluster-based outcome-dependent sampling (ODS) has the potential to yield efficiency gains when the outcome of interest is relatively rare, and resource constraints allow only a certain number of clusters to be visited for data collection. Previous research has shown that when the intended analysis is inverse-probability weighted generalized estimating equations, and the number of clusters that can be sampled is fixed, optimal allocation of the (cluster-level) sample size across strata defined by auxiliary variables readily available at the design stage has the potential to increase efficiency in the estimation of the parameter(s) of interest. In such a setting, the optimal allocation formulae depend on quantities that are unknown in practice, currently making such designs difficult to implement. In this paper, we consider a two-wave adaptive sampling approach, in which data is collected from a first wave sample, and subsequently used to compute the optimal second wave stratum-specific sample sizes. We consider two strategies for estimating the necessary components using the first wave data: an inverse-probability weighting (IPW) approach and a multiple imputation (MI) approach. In a comprehensive simulation study, we show that the adaptive sampling approach performs well, and that the MI approach yields designs that are very near-optimal, regardless of the covariate type. The IPW approach, on the other hand, has mixed results. Finally, we illustrate the proposed adaptive sampling procedures with data on maternal characteristics and birth outcomes among women enrolled in the Safer Deliveries program in Zanzibar, Tanzania.
Collapse
Affiliation(s)
- Sara Sauer
- Department of Global Health and Social Medicine, Harvard Medical School, MA, USA
| | - Bethany Hedt-Gauthier
- Department of Global Health and Social Medicine, Harvard Medical School, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA, USA
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA, USA
| |
Collapse
|