Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

23
(from Reference Citation Analysis)

Article PDFs (2)

Cited by > 0 (18)

Searched Name

Biased sampling

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	On a simple estimation of the proportional odds model under right truncation. LIFETIME DATA ANALYSIS 2023;29:537-554. [PMID: 36602639 DOI: 10.1007/s10985-022-09584-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 12/01/2022] [Indexed: 06/13/2023] Abstract Retrospective sampling can be useful in epidemiological research for its convenience to explore an etiological association. One particular retrospective sampling is that disease outcomes of the time-to-event type are collected subject to right truncation, along with other covariates of interest. For regression analysis of the right-truncated time-to-event data, the so-called proportional reverse-time hazards model has been proposed, but the interpretation of its regression parameters tends to be cumbersome, which has greatly hampered its application in practice. In this paper, we instead consider the proportional odds model, an appealing alternative to the popular proportional hazards model. Under the proportional odds model, there is an embedded relationship between the reverse-time hazard function and the usual hazard function. Building on this relationship, we provide a simple procedure to estimate the regression parameters in the proportional odds model for the right truncated data. Weighted estimations are also studied. Collapse Key Words Biased sampling Odds ratio Reverse-time hazard function Collapse MESH Headings Collapse Grants Collapse
2	Nonparametric bounds for the survivor function under general dependent truncation. Scand Stat Theory Appl 2023;50:327-357. [PMID: 37179756 PMCID: PMC10181817 DOI: 10.1111/sjos.12582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 01/18/2022] [Indexed: 11/28/2022] Abstract Truncation occurs in cohort studies with complex sampling schemes. When truncation is ignored or incorrectly assumed to be independent of the event time in the observable region, bias can result. We derive completely nonparametric bounds for the survivor function under truncation and censoring; these extend prior nonparametric bounds derived in the absence of truncation. We also define a hazard ratio function that links the unobservable region in which event time is less than truncation time, to the observable region in which event time is greater than truncation time, under dependent truncation. When this function can be bounded, and the probability of truncation is known approximately, it yields narrower bounds than the purely nonparametric bounds. Importantly, our approach targets the true marginal survivor function over its entire support, and is not restricted to the observable region, unlike alternative estimators. We evaluate the methods in simulations and in clinical applications. Collapse Key Words Biased sampling Cross-ratio function Kendall’s tau Peterson-type bounds Quasi-independence product-form estimator Collapse MESH Headings Collapse Grants P30 AG066512 NIA NIH HHS R01 NS094610 NINDS NIH HHS Collapse
3	A numerically stable algorithm for integrating Bayesian models using Markov melding. STATISTICS AND COMPUTING 2022;32:24. [PMID: 35310545 PMCID: PMC8924096 DOI: 10.1007/s11222-022-10086-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 01/29/2022] [Indexed: 06/14/2023] Abstract When statistical analyses consider multiple data sources, Markov melding provides a method for combining the source-specific Bayesian models. Markov melding joins together submodels that have a common quantity. One challenge is that the prior for this quantity can be implicit, and its prior density must be estimated. We show that error in this density estimate makes the two-stage Markov chain Monte Carlo sampler employed by Markov melding unstable and unreliable. We propose a robust two-stage algorithm that estimates the required prior marginal self-density ratios using weighted samples, dramatically improving accuracy in the tails of the distribution. The stabilised version of the algorithm is pragmatic and provides reliable inference. We demonstrate our approach using an evidence synthesis for inferring HIV prevalence, and an evidence synthesis of A/H1N1 influenza. Collapse Key Words Biased sampling Data integration Evidence synthesis Kernel density estimation Multi-source inference Self-density ratio Weighted sampling Collapse MESH Headings Collapse Grants MC_UU_00002/2 Medical Research Council Alan Turing Institute Medical Research Council (GB) Collapse
4	Umbrella Sampling-Based Method to Compute Ligand-Binding Affinity. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2385:313-323. [PMID: 34888726 DOI: 10.1007/978-1-0716-1767-0_14] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Abstract Many proteins have a solvent-exposed binding cleft, which permits their inhibitors to bind and unbind without significant protein conformation transforms. The binding/unbinding pathways of these protein-inhibitor complexes can be rather straightforwardly sampled by using umbrella sampling (US) simulation methods. During a US simulation, the C_α atoms of the protein are restrained via a harmonic force. The potential of mean force (PMF) along the binding pathway can be estimated by using the weighted histogram analysis method (WHAM). The binding affinity is then computed as the difference in PMF between the binding and unbinding states. Collapse Key Words Biased sampling Ligand-binding free energy Potential of mean force Steered MD Umbrella sampling WHAM calculation Collapse MESH Headings Collapse Grants Collapse
5	Modeling the effect of age on quantiles of the incubation period distribution of COVID-19. BMC Public Health 2021;21:1762. [PMID: 34579681 PMCID: PMC8474900 DOI: 10.1186/s12889-021-11761-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 09/09/2021] [Indexed: 01/08/2023] Open Abstract BACKGROUND The novel coronavirus SARS-CoV-2 (coronavirus disease 2019, COVID-19) has caused serious consequences on many aspects of social life throughout the world since the first case of pneumonia with unknown etiology was identified in Wuhan, Hubei province in China in December 2019. Note that the incubation period distribution is key to the prevention and control efforts of COVID-19. This study aimed to investigate the conditional distribution of the incubation period of COVID-19 given the age of infected cases and estimate its corresponding quantiles from the information of 2172 confirmed cases from 29 provinces outside Hubei in China. METHODS We collected data on the infection dates, onset dates, and ages of the confirmed cases through February 16th, 2020. All the data were downloaded from the official websites of the health commission. As the epidemic was still ongoing at the time we collected data, the observations subject to biased sampling. To address this issue, we developed a new maximum likelihood method, which enables us to comprehensively study the effect of age on the incubation period. RESULTS Based on the collected data, we found that the conditional quantiles of the incubation period distribution of COVID-19 vary by age. In detail, the high conditional quantiles of people in the middle age group are shorter than those of others while the low quantiles did not show the same differences. We estimated that the 0.95-th quantile related to people in the age group 23 ∼55 is less than 15 days. CONCLUSIONS Observing that the conditional quantiles vary across age, we may take more precise measures for people of different ages. For example, we may consider carrying out an age-dependent quarantine duration in practice, rather than a uniform 14-days quarantine period. Remarkably, we may need to extend the current quarantine duration for people aged 0 ∼22 and over 55 because the related 0.95-th quantiles are much greater than 14 days. Collapse Key Words Biased sampling COVID-19 Conditional quantiles Incubation period Collapse MESH Headings Adolescent Adult Aged Aged, 80 and over COVID-19/epidemiology Child Child, Preschool China/epidemiology Epidemics Humans Infant Infant, Newborn Infectious Disease Incubation Period Middle Aged Quarantine SARS-CoV-2 Young Adult Collapse Grants 11971208 national natural science foundation of china 11601197 national natural science foundation of china 2016M600511 China Postdoctoral Science Foundation 2017T100475 China Postdoctoral Science Foundation 20171ACB21030 Natural Science Foundation of Jiangxi Province (CN) 20192BAB201005 natural science foundation of jiangxi province 81671297 national natural science foundation of china Collapse
6	Potential environmental impact resulting from biased fish sampling in intensive aquaculture operations. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020;707:135630. [PMID: 31784173 DOI: 10.1016/j.scitotenv.2019.135630] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 11/16/2019] [Accepted: 11/17/2019] [Indexed: 06/10/2023] Abstract Aquaculture contributes to global food security, producing over 70 million tons of fish and aquatic products annually. Protein rich fish feeds, together with labor costs are the most expensive component costs in aquaculture. Feed application is given as percent of fish weight and therefore, reliable biomass assessment is essential for profitable and environmentally sound aquaculture. Fish biomass estimates are typically based on sampling <2% of the fish population. The goals of this research were to estimate potential biases associated with fish sampling in recirculating aquaculture systems (RAS), and the potential economic and environmental implications of such biased estimations. The size of the biased sampling-based estimates of fish biomass in two cultured species was shown to be larger than what the confidence interval suggests, even after >20% of the population was sampled. Such biases, if indeed common, will most likely result in over/underfeeding, both entailing negative economic and environmental consequences. We advocate conducting similar studies with major cultured fish to generate "bias correction tables" for adjusting fish feeding rate to bias-corrected biomass. These will help reduce the potential economic losses and negative environmental impacts of aquaculture practice. Collapse Key Words Aquaculture Biased sampling Environmental impact Fish-stock evaluation Recirculating aquaculture system (RAS) Collapse MESH Headings Animals Aquaculture Environment Fishes Food Supply Seafood Collapse Grants Collapse
7	How well do hoarding research samples represent cases that rise to community attention? Behav Res Ther 2020;126:103555. [PMID: 32044474 PMCID: PMC10636773 DOI: 10.1016/j.brat.2020.103555] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 01/02/2020] [Accepted: 01/13/2020] [Indexed: 12/29/2022] Abstract This study used archival data from three different research groups and case file data from three independent community organizations to explore how well research samples reflect cases of hoarding that come to community attention. Using data from 824 individuals with hoarding, we found that research volunteers differ from community clients in several ways: community clients are older, more likely to be male and less likely to be partnered; they have lower socio-economic status and are less likely to demonstrate good or fair insight regarding hoarding severity and consequences. The homes of community clients had greater clutter volume and were more likely to have problematic conditions in the home, including squalor and fire hazards or fire safety concerns. Clutter volume was a strong predictor of these conditions in the home, but demographic variables were not. Even after accounting for the influence of clutter volume, the homes of community-based clients were more likely to have squalor. These findings suggest limitations on the generalizability of research samples to hoarding as it is encountered by community agencies. Collapse Key Words Biased sampling Community mental health Hoarding Sampling (experimental) Collapse MESH Headings Age Factors Aged Female Hoarding/psychology Hoarding Disorder/psychology Humans Male Mental Health Middle Aged Research Subjects/psychology Sex Factors Socioeconomic Factors Collapse Grants R01 MH074934 NIMH NIH HHS R01 MH068008 NIMH NIH HHS R01 MH068007 NIMH NIH HHS MOP-142197 CIHR R01 MH101163 NIMH NIH HHS Collapse
8	Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data. LIFETIME DATA ANALYSIS 2020;26:85-108. [PMID: 30617753 PMCID: PMC6612481 DOI: 10.1007/s10985-019-09461-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 01/02/2019] [Indexed: 06/09/2023] Abstract We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided. Collapse Key Words Bernstein polynomial Biased sampling Missing data Proportional hazards model Sieve estimation Collapse MESH Headings Bias Computer Simulation Data Interpretation, Statistical Humans Likelihood Functions Regression Analysis Time Collapse Grants P30 ES010126 NIEHS NIH HHS P01 CA142538 NCI NIH HHS R01 ES021900 NIEHS NIH HHS Collapse
9	Regression analysis of longitudinal data with outcome-dependent sampling and informative censoring. Scand Stat Theory Appl 2019;46:831-847. [PMID: 32066989 PMCID: PMC7025472 DOI: 10.1111/sjos.12373] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 11/03/2018] [Indexed: 11/28/2022] Abstract We consider regression analysis of longitudinal data in the presence of outcome-dependent observation times and informative censoring. Existing approaches commonly require correct specification of the joint distribution of the longitudinal measurements, observation time process and informative censoring time under the joint modeling framework, and can be computationally cumbersome due to the complex form of the likelihood function. In view of these issues, we propose a semi-parametric joint regression model and construct a composite likelihood function based on a conditional order statistics argument. As a major feature of our proposed methods, the aforementioned joint distribution is not required to be specified and the random effect in the proposed joint model is treated as a nuisance parameter. Consequently, the derived composite likelihood bypasses the need to integrate over the random effect and offers the advantage of easy computation. We show that the resulting estimators are consistent and asymptotically normal. We use simulation studies to evaluate the finite-sample performance of the proposed method, and apply it to a study of weight loss data that motivated our investigation. Collapse Key Words Biased sampling composite likelihood informative censoring joint modeling time-varying covariate Collapse MESH Headings Collapse Grants P30 CA016672 NCI NIH HHS R01 CA193878 NCI NIH HHS UL1 TR003167 NCATS NIH HHS Collapse
10	Biased sampling activity: an investigation to promote discussion. TEACHING STATISTICS 2018;41:8-13. [PMID: 30906081 PMCID: PMC6407882 DOI: 10.1111/test.12165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/09/2023] Abstract The statistical concept of sampling is often given little direct attention, typically reduced to the mantra "take a random sample". This low resource and adaptable activity demonstrates sampling and explores issues that arise due to biased sampling. Collapse Key Words Biased sampling Practical activity Sampling Teaching Teaching statistics Collapse MESH Headings Collapse Grants MC_U105292687 Medical Research Council MC_UU_00002/12 Medical Research Council PDF-2015-08-044 Department of Health U105292687 Medical Research Council Collapse
11	Semiparametric model and inference for spontaneous abortion data with a cured proportion and biased sampling. Biostatistics 2018;19:54-70. [PMID: 28525542 DOI: 10.1093/biostatistics/kxx024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 04/12/2017] [Indexed: 11/14/2022] Open Abstract Evaluating and understanding the risk and safety of using medications for autoimmune disease in a woman during her pregnancy will help both clinicians and pregnant women to make better treatment decisions. However, utilizing spontaneous abortion (SAB) data collected in observational studies of pregnancy to derive valid inference poses two major challenges. First, the data from the observational cohort are not random samples of the target population due to the sampling mechanism. Pregnant women with early SAB are more likely to be excluded from the cohort, and there may be substantial differences between the observed SAB time and those in the target population. Second, the observed data are heterogeneous and contain a "cured" proportion. In this article, we consider semiparametric models to simultaneously estimate the probability of being cured and the distribution of time to SAB for the uncured subgroup. To derive the maximum likelihood estimators, we appropriately adjust the sampling bias in the likelihood function and develop an expectation-maximization algorithm to overcome the computational challenge. We apply the empirical process theory to prove the consistency and asymptotic normality of the estimators. We examine the finite sample performance of the proposed estimators in simulation studies and illustrate the proposed method through an application to SAB data from pregnant women. Collapse Key Words Biased sampling Cure rate model EM algorithm Left truncation Collapse MESH Headings Collapse Grants Collapse
12	Outcome-dependent sampling with interval-censored failure time data. Biometrics 2017;74:58-67. [PMID: 28771664 DOI: 10.1111/biom.12744] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 06/01/2017] [Accepted: 06/01/2017] [Indexed: 11/30/2022] Abstract Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration. Collapse Key Words Biased sampling Empirical likelihood Interval-censoring Semiparametric inference Sieve estimation Collapse MESH Headings Collapse Grants Collapse
13	Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 2017;73:1210-1220. [PMID: 28346824 DOI: 10.1111/biom.12680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 01/01/2017] [Accepted: 02/01/2017] [Indexed: 11/29/2022] Abstract We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study. Collapse Key Words Biased sampling High-dimensional data SNP set analysis Sparsity Variable selection Variance component test Weighted BIC Collapse MESH Headings Collapse Grants Collapse
14	Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies. J Am Stat Assoc 2017;112:1468-1476. [PMID: 29479125 DOI: 10.1080/01621459.2017.1295864] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Abstract In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective solution is the two-phase design, under which the outcome and inexpensive covariates are observed for all subjects during the first phase and that information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Herein, we consider general two-phase designs, where the outcome can be continuous or discrete, and inexpensive covariates can be continuous and correlated with expensive covariates. We propose a semiparametric approach to regression analysis by approximating the conditional density functions of expensive covariates given inexpensive covariates with B-spline sieves. We devise a computationally efficient and numerically stable EM-algorithm to maximize the sieve likelihood. In addition, we establish the consistency, asymptotic normality, and asymptotic efficiency of the estimators. Furthermore, we demonstrate the superiority of the proposed methods over existing ones through extensive simulation studies. Finally, we present applications to the aforementioned NHLBI ESP. Collapse Key Words Biased sampling EM algorithm Genome sequencing Responseselective sampling Semiparametric efficiency Sieve approximation Collapse MESH Headings Collapse Grants Collapse
15	Nonparametric Bayes modeling with sample survey weights. Stat Probab Lett 2016;113:41-48. [PMID: 31427835 PMCID: PMC6699172 DOI: 10.1016/j.spl.2016.02.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Abstract In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including sample survey weights into Bayesian analyses, existing methods require complex models or ignore the stratified design underlying the survey weights. We propose a simple approach based on modeling the distribution of the selected sample as a mixture, with the mixture weights appropriately adjusted, while accounting for uncertainty in the adjustment. We focus for simplicity on Dirichlet process mixtures but the proposed approach can be applied more broadly. We sketch a simple Markov chain Monte Carlo algorithm for computation, and assess the approach via simulations and an application. Collapse Key Words Biased sampling Dirichlet process Mixture model Stratified sampling Survey data Collapse MESH Headings Collapse Grants P2C HD050924 NICHD NIH HHS R01 ES020619 NIEHS NIH HHS R01 ES027498 NIEHS NIH HHS R01 HD057046 NICHD NIH HHS Collapse
16	Evaluating and comparing biomarkers with respect to the area under the receiver operating characteristics curve in two-phase case-control studies. Biostatistics 2016;17:499-522. [PMID: 26883772 PMCID: PMC4915610 DOI: 10.1093/biostatistics/kxw003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 01/04/2016] [Indexed: 11/13/2022] Open Abstract Two-phase sampling design, where biomarkers are subsampled from a phase-one cohort sample representative of the target population, has become the gold standard in biomarker evaluation. Many two-phase case-control studies involve biased sampling of cases and/or controls in the second phase. For example, controls are often frequency-matched to cases with respect to other covariates. Ignoring biased sampling of cases and/or controls can lead to biased inference regarding biomarkers' classification accuracy. Considering the problems of estimating and comparing the area under the receiver operating characteristics curve (AUC) for a binary disease outcome, the impact of biased sampling of cases and/or controls on inference and the strategy to efficiently account for the sampling scheme have not been well studied. In this project, we investigate the inverse-probability-weighted method to adjust for biased sampling in estimating and comparing AUC. Asymptotic properties of the estimator and its inference procedure are developed for both Bernoulli sampling and finite-population stratified sampling. In simulation studies, the weighted estimators provide valid inference for estimation and hypothesis testing, while the standard empirical estimators can generate invalid inference. We demonstrate the use of the analytical variance formula for optimizing sampling schemes in biomarker study design and the application of the proposed AUC estimators to examples in HIV vaccine research and prostate cancer research. Collapse Key Words AUC Biased sampling Frequency match Inverse probability weighting ROC curve Two-phase studies Collapse MESH Headings Collapse Grants Collapse
17	A cautionary note on using secondary phenotypes in neuroimaging genetic studies. Neuroimage 2015;121:136-45. [PMID: 26220747 PMCID: PMC4604049 DOI: 10.1016/j.neuroimage.2015.07.058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 06/12/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open Abstract Almost all genome-wide association studies (GWASs), including Alzheimer's Disease Neuroimaging Initiative (ADNI), are based on the case-control study design, implying that the resulting case-control data are likely a biased, not random, sample of the target population. Although association analysis of the disease (e.g. Alzheimer's disease in the ADNI) can be conducted using a standard logistic regression by ignoring the biased case-control sampling, a standard linear regression analysis on a secondary phenotype (e.g. any neuroimaging phenotype in the ADNI) may in general lead to biased inference, including biased parameter estimates, inflated Type I errors and reduced power for association testing. Despite of this well known result in genetic epidemiology, to our surprise, all the published studies on secondary phenotypes with the ADNI data have ignored this potential problem. Here we aim to answer whether such a standard analysis of a secondary phenotype is valid or problematic with the ADNI data. Through both real data analyses and simulation studies, we found that, strikingly, such an analysis was generally valid (with only small biases or slightly inflated Type I errors) for the ADNI data, though cautions must be taken when analyzing other data. We also illustrate applications and possible problems of two methods specifically developed for valid analysis of secondary phenotypes. Collapse Key Words ADNI Biased sampling Case–control design GWAS Inverse probability weighting Linear regression Logistic regression SPREG Collapse MESH Headings Aged Aged, 80 and over Alzheimer Disease/genetics Alzheimer Disease/pathology Brain/pathology Computer Simulation Data Interpretation, Statistical Genome-Wide Association Study/standards Genotype Humans Middle Aged Neuroimaging/standards Phenotype Research Design/standards Selection Bias Collapse Grants R01 GM081535 NIGMS NIH HHS R01GM081535 NIGMS NIH HHS R01 HL105397 NHLBI NIH HHS R01 HL116720 NHLBI NIH HHS R01HL105397 NHLBI NIH HHS R01GM113250 NIGMS NIH HHS R01HL116720 NHLBI NIH HHS U01 AG024904 NIA NIH HHS R01 GM113250 NIGMS NIH HHS Collapse
18	Semiparametric likelihood inference for left-truncated and right-censored data. Biostatistics 2015;16:785-98. [PMID: 25796430 DOI: 10.1093/biostatistics/kxv012] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 02/25/2015] [Indexed: 11/14/2022] Open Abstract This paper proposes a new estimation procedure for the survival time distribution with left-truncated and right-censored data, where the distribution of the truncation time is known up to a finite-dimensional parameter vector. The paper expands on the Vardis multiplicative censoring model (Vardi, 1989. Multiplicative censoring, renewal processes, deconvolution and decreasing density: non-parametric estimation. Biometrika 76: , 751-761), establishes the connection between the likelihood under a generalized multiplicative censoring model and that for left-truncated and right-censored survival time data, and derives an Expectation-Maximization algorithm for model estimation. A formal test for checking the truncation time distribution is constructed based on the semiparametric likelihood ratio test statistic. In particular, testing the stationarity assumption that the underlying truncation time is uniformly distributed is performed by embedding the null uniform truncation time distribution in a smooth alternative (Neyman, 1937. Smooth test for goodness of fit. Skandinavisk Aktuarietidskrift 20: , 150-199). Asymptotic properties of the proposed estimator are established. Simulations are performed to evaluate the finite-sample performance of the proposed methods. The methods and theories are illustrated by analyzing the Canadian Study of Health and Aging and the Channing House data, where the stationarity assumption with respect to disease incidence holds for the former but not the latter. Collapse Key Words Biased sampling Cross-sectional studies Prevalent sampling Profile likelihood Smooth tests of goodness of fit Collapse MESH Headings Collapse Grants Collapse
19	Survival analysis without survival data: connecting length-biased and case-control data. Biometrika 2013;100. [PMID: 24391222 DOI: 10.1093/biomet/ast008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract We show that relative mean survival parameters of a semiparametric log-linear model can be estimated using covariate data from an incident sample and a prevalent sample, even when there is no prospective follow-up to collect any survival data. Estimation is based on an induced semiparametric density ratio model for covariates from the two samples, and it shares the same structure as for a logistic regression model for case-control data. Likelihood inference coincides with well-established methods for case-control data. We show two further related results. First, estimation of interaction parameters in a survival model can be performed using covariate information only from a prevalent sample, analogous to a case-only analysis. Furthermore, propensity score and conditional exposure effect parameters on survival can be estimated using only covariate data collected from incident and prevalent samples. Collapse Key Words Accelerated failure time model Biased sampling Empirical likelihood Prevalent cohort Propensity score Proportional mean residual life model Collapse MESH Headings Collapse Grants Collapse
20	BAYESIAN SEMIPARAMETRIC ANALYSIS FOR TWO-PHASE STUDIES OF GENE-ENVIRONMENT INTERACTION. Ann Appl Stat 2013;7:543-569. [PMID: 24587840 DOI: 10.1214/12-aoas599] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Abstract The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected sub-sample. It is natural to apply such a strategy for collecting genetic data in a sub-sample enriched for exposure to environmental factors for gene-environment interaction (G × E) analysis. In this paper, we consider two-phase studies of G × E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phase I and phase II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade-off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the non-parametric Bayes construction of Dunson and Xing (2009). We carry out a small-scale simulation study to compare the proposed Bayesian method with weighted likelihood and pseudo likelihood methods that are standard choices for analyzing two-phase data. The motivating example originates from an ongoing case-control study of colorectal cancer, where the goal is to explore the interaction between the use of statins (a drug used for lowering lipid levels) and 294 genetic markers in the lipid metabolism/cholesterol synthesis pathway. The sub-sample of cases and controls on which these genetic markers were measured is enriched in terms of statin users. The example and simulation results illustrate that the proposed Bayesian approach has a number of advantages for characterizing joint effects of genotype and exposure over existing alternatives and makes efficient use of all available data in both phases. Collapse Key Words Biased sampling Colorectal cancer Dirichlet prior Exposure enriched Gene-environment independence Joint effects Multivariate categorical distribution Spike and slab prior sampling Collapse MESH Headings Collapse Grants Collapse
21	Using the RosettaSurface algorithm to predict protein structure at mineral surfaces. Methods Enzymol 2013;532:343-66. [PMID: 24188775 DOI: 10.1016/b978-0-12-416617-2.00016-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Abstract Determination of protein structure on mineral surfaces is necessary to understand biomineralization processes toward better treatment of biomineralization diseases and design of novel protein-synthesized materials. To date, limited atomic-resolution data have hindered experimental structure determination for proteins on mineral surfaces. Molecular simulation represents a complementary approach. In this chapter, we review RosettaSurface, a computational structure prediction-based algorithm designed to broadly sample conformational space to identify low-energy structures. We summarize the computational approaches, the published applications, and the new releases of the code in the Rosetta 3 framework. In addition, we provide a protocol capture to demonstrate the practical steps to employ RosettaSurface. As an example, we provide input files and output data analysis for a previously unstudied mineralization protein, osteocalcin. Finally, we summarize ongoing challenges in energy function optimization and conformational searching and suggest that the fusion between experiment and calculation is the best route forward. Collapse Key Words Biased sampling Biomineralization Experimental constraints Hydroxyapatite Monte Carlo docking Osteocalcin Protein–surface interactions RosettaSurface Statherin Collapse MESH Headings Collapse Grants Collapse
22	Proportional mean residual life model for right-censored length-biased data. Biometrika 2012;99:995-1000. [PMID: 23843676 DOI: 10.1093/biomet/ass049] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open Abstract To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika77, 409-10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Collapse Key Words Biased sampling Bivariate survival data Proportional hazards model Renewal process Collapse MESH Headings Collapse Grants Collapse
23	Empirical Likelihood-Based Estimation of the Treatment Effect in a Pretest-Posttest Study. J Am Stat Assoc 2012;103:1270-1280. [PMID: 23729942 DOI: 10.1198/016214508000000625] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract The pretest-posttest study design is commonly used in medical and social science research to assess the effect of a treatment or an intervention. Recently, interest has been rising in developing inference procedures that improve efficiency while relaxing assumptions used in the pretest-posttest data analysis, especially when the posttest measurement might be missing. In this article we propose a semiparametric estimation procedure based on empirical likelihood (EL) that incorporates the common baseline covariate information to improve efficiency. The proposed method also yields an asymptotically unbiased estimate of the response distribution. Thus functions of the response distribution, such as the median, can be estimated straightforwardly, and the EL method can provide a more appealing estimate of the treatment effect for skewed data. We show that, compared with existing methods, the proposed EL estimator has appealing theoretical properties, especially when the working model for the underlying relationship between the pretest and posttest measurements is misspecified. A series of simulation studies demonstrates that the EL-based estimator outperforms its competitors when the working model is misspecified and the data are missing at random. We illustrate the methods by analyzing data from an AIDS clinical trial (ACTG 175). Collapse Key Words Auxiliary information Biased sampling Causal inference Observational study Survey sampling Collapse MESH Headings Collapse Grants Collapse