1
|
Shi X, Pan Z, Miao W. Data Integration in Causal Inference. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2023; 15:e1581. [PMID: 36713955 PMCID: PMC9880960 DOI: 10.1002/wics.1581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/24/2022] [Accepted: 03/01/2022] [Indexed: 04/12/2023]
Abstract
Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets collected by potentially different designs from potentially heterogeneous populations. We summarize recent advances on combining randomized clinical trial with external information from observational studies or historical controls, combining samples when no single sample has all relevant variables with application to two-sample Mendelian randomization, distributed data setting under privacy concerns for comparative effectiveness and safety research using real-world data, Bayesian causal inference, and causal discovery methods.
Collapse
Affiliation(s)
- Xu Shi
- Department of BiostatisticsUniversity of MichiganAnn ArborMichiganUSA
| | - Ziyang Pan
- Department of BiostatisticsUniversity of MichiganAnn ArborMichiganUSA
| | - Wang Miao
- Department of Probability and StatisticsPeking UniversityBeijingChina
| |
Collapse
|
2
|
Lin C, Peng J, Qin Y, Li Y, Yang Y. Optimal integrating learning for split questionnaire design type data. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2118753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Cunjie Lin
- Center for Applied Statistics, Renmin University of China, Beijing, China
- School of Statistics, Renmin University of China, Beijing, China
| | - Jingfu Peng
- School of Statistics, Renmin University of China, Beijing, China
| | - Yichen Qin
- Department of Operations, Business Analytics, and Information Systems, University of Cincinnati, OH, USA
| | - Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing, China
- School of Statistics, Renmin University of China, Beijing, China
| | - Yuhong Yang
- School of Statistics, University of Minnesota, MN, USA
| |
Collapse
|
3
|
Zhao Y. Diagnostic checking of multiple imputation models. ASTA ADVANCES IN STATISTICAL ANALYSIS 2022. [DOI: 10.1007/s10182-021-00429-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
Zvavitch P, Rendall MS, Hurtado C, Shattuck RM. Contraceptive Consistency and Poverty after Birth. POPULATION RESEARCH AND POLICY REVIEW 2021; 40:1277-1311. [PMID: 34857977 PMCID: PMC8629354 DOI: 10.1007/s11113-020-09623-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 10/18/2020] [Indexed: 10/23/2022]
Abstract
Unplanned pregnancies in the U.S. disproportionately occur among poor, less educated, and minority women, but it is unclear whether poverty following a birth is itself an outcome of this pregnancy planning status. Using the National Longitudinal Survey of Youth 1997 (n=2,101) and National Survey of Family Growth (n=778), we constructed two-year sequences of contraceptive use before a birth that signal an unplanned versus a planned birth. We regressed poverty in the year of the birth both on this contraceptive-sequence variable and on sociodemographic indicators including previous employment and poverty status in the year before the birth, race/ethnicity, education, partnership status, birth order, and family background. Compared to sequences indicating a planned birth, sequences of inconsistent use and non-use of contraception were associated with a higher likelihood of poverty following a birth, both before and after controlling for sociodemographic variables, and before and after additionally controlling for poverty status before the birth. In pooled-survey estimates with all controls included, having not used contraception consistently is associated with a 42% higher odds of poverty after birth. The positive association of poverty after birth with contraceptive inconsistency or non-use, however, is limited to women with low to medium educational attainment. These findings encourage further exploration into relationships between contraceptive access and behavior and subsequent adverse outcomes for the mother and her children.
Collapse
Affiliation(s)
- Polina Zvavitch
- Department of Sociology and Maryland Population Center, University of Maryland College Park, Maryland, USA
| | - Michael S. Rendall
- Department of Sociology and Maryland Population Center, University of Maryland College Park, Maryland, USA
| | - Constanza Hurtado
- Department of Sociology and Maryland Population Center, University of Maryland College Park, Maryland, USA
| | - Rachel M. Shattuck
- Maryland Population Research Center, University of Maryland, College Park, Maryland USA
| |
Collapse
|
5
|
Carreras G, Lachi A, Cortini B, Gallus S, López MJ, López-Nicolás Á, Lugo A, Pastor MT, Soriano JB, Fernandez E, Gorini G, Castellano Y, Fu M, Ballbè M, Amalia B, Tigova O, López MJ, Continente X, Arechavala T, Henderson E, Gallus S, Lugo A, Liu X, Borroni E, Colombo P, Semple S, O’Donnell R, Dobson R, Clancy L, Keogan S, Byrne H, Behrakis P, Tzortzi A, Vardavas C, Vyzikidou VK, Bakelas G, Mattiampa G, Boffi R, Ruprecht A, De Marco C, Borgini A, Veronese C, Bertoldi M, Tittarelli A, Gorini G, Carreras G, Cortini B, Verdi S, Lachi A, Chellini E, López-Nicolás Á, Trapero-Bertran M, Guerrero DC, Radu-Loghin C, Nguyen D, Starchenko P, Soriano JB, Ancochea J, Alonso T, Pastor MT, Erro M, Roca A, Pérez P, García-Castillo E. Burden of disease from exposure to secondhand smoke in children in Europe. Pediatr Res 2021; 90:216-222. [PMID: 33149260 DOI: 10.1038/s41390-020-01223-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 10/09/2020] [Accepted: 10/13/2020] [Indexed: 11/09/2022]
Abstract
BACKGROUND Secondhand smoke (SHS) exposure at home and fetal SHS exposure during pregnancy are a major cause of disease among children. The aim of this study is quantifying the burden of disease due to SHS exposure in children and in pregnancy in 2006-2017 for the 28 European Union (EU) countries. METHODS Exposure to SHS was estimated using a multiple imputation procedure based on the Eurobarometer surveys, and SHS exposure burden was estimated with the comparative risk assessment method using meta-analytical relative risks. Data on deaths and disability-adjusted life years (DALYs) were collected from National statistics and from the Global Burden of Disease Study. RESULTS Exposure to SHS and its attributable burden stalled in 2006-2017; in pregnant women, SHS exposure was 19.8% in 2006, 19.1% in 2010, and 21.0% in 2017; in children it was 10.1% in 2006, 9.6% in 2010, and 12.1% in 2017. In 2017, 35,633 DALYs among children were attributable to SHS exposure in the EU, mainly due to low birth weight. CONCLUSIONS Comprehensive smoking bans up to 2010 contributed to reduce SHS exposure and its burden in children immediately after their implementation; however, SHS exposure still occurs, and in 2017, its burden in children was still relevant. IMPACT Exposure to secondhand smoke at home and in pregnancy is a major cause of disease among children. Smoking legislation produced the adoption of voluntary smoking bans in homes; however, secondhand smoke exposure at home still occurs and its burden is substantial. In 2017, the number of deaths and disability-adjusted life years in children attributable to exposure to secondhand smoke in the European Union countries were, respectively, 335 and 35,633. Low birth weight caused by secondhand smoke exposure in pregnancy showed the largest burden. Eastern European Union countries showed the highest burden.
Collapse
Affiliation(s)
- Giulia Carreras
- Oncologic Network, Prevention and Research Institute (ISPRO), Florence, Italy.
| | - Alessio Lachi
- Oncologic Network, Prevention and Research Institute (ISPRO), Florence, Italy
| | - Barbara Cortini
- Oncologic Network, Prevention and Research Institute (ISPRO), Florence, Italy
| | - Silvano Gallus
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS (IRFMN), Milan, Italy
| | - Maria José López
- Public Health Agency of Barcelona (ASPB), Barcelona, Spain.,CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain.,IIB Sant Pau, Barcelona, Spain
| | | | - Alessandra Lugo
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS (IRFMN), Milan, Italy
| | | | - Joan B Soriano
- Hospital Universitario La Princesa (IISP), Madrid, Spain.,Consortium for Biomedical Research in Respiratory Diseases (CIBER en Enfermedades Respiratorias, CIBERES), Madrid, Spain
| | - Esteve Fernandez
- Consortium for Biomedical Research in Respiratory Diseases (CIBER en Enfermedades Respiratorias, CIBERES), Madrid, Spain.,Catalan Institute of Oncology (ICO), L'Hopitalet de Llobregat, Barcelona, Spain.,Bellvitge Biomedical Research Institute (IDIBELL), L'Hopitalet de Llobregat, Barcelona, Spain.,University of Barcelona, Barcelona, Spain
| | - Giuseppe Gorini
- Oncologic Network, Prevention and Research Institute (ISPRO), Florence, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Hwang WH, Heinze D, Stoklosa J. A weighted partial likelihood approach for zero-truncated models. Biom J 2019; 61:1073-1087. [PMID: 31090104 DOI: 10.1002/bimj.201800328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 04/07/2019] [Accepted: 04/10/2019] [Indexed: 11/07/2022]
Abstract
Zero-truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well-known software packages, and additional programming is often required. Motivated by the Rao-Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero-truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.
Collapse
Affiliation(s)
- Wen-Han Hwang
- Institute of Statistics, National Chung Hsing University, Taichung, Taiwan
| | - Dean Heinze
- Research Centre of Applied Alpine Ecology, La Trobe University, Victoria, Australia
| | - Jakub Stoklosa
- School of Mathematics and Statistics and Evolution & Ecology Research Centre, The University of New South Wales, Sydney, Australia
| |
Collapse
|
7
|
Inequality Perceptions, Preferences Conducive to Redistribution, and the Conditioning Role of Social Position. SOCIETIES 2018. [DOI: 10.3390/soc8040099] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Inequality poses one of the biggest challenges of our time. It is not self-correcting in the sense that citizens demand more redistributive measures in light of rising inequality, which recent studies suggest may be due to the fact that citizens’ perceptions of inequality diverge from objective levels. Moreover, it is not the latter, but the former, which are related to preferences conducive to redistribution. However, the nascent literature on inequality perceptions has, so far, not accounted for the role of subjective position in society. The paper advances the argument that the relationship between inequality perceptions and preferences towards redistribution is conditional on the subjective position of respondents. To that end, I analyze comprehensive survey data on inequality perceptions from the social inequality module of the International Social Survey Programme (1992, 1999, and 2009). Results show that inequality perceptions are associated with preferences conducive to redistribution particularly among those perceive to be at the top of the social ladder. Gaining a better understanding of inequality perceptions contributes to comprehending the absence self-correcting inequality.
Collapse
|
8
|
Kamgar S, Meinfelder F, Münnich R, Navvabpour H. Estimation within the new integrated system of household surveys in Germany. Stat Pap (Berl) 2018. [DOI: 10.1007/s00362-018-1023-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Antonelli J, Zigler C, Dominici F. Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research. Biostatistics 2018; 18:553-568. [PMID: 28334230 DOI: 10.1093/biostatistics/kxx003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 01/06/2017] [Indexed: 11/12/2022] Open
Abstract
In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.
Collapse
Affiliation(s)
- Joseph Antonelli
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| | - Corwin Zigler
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| | - Francesca Dominici
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA
| |
Collapse
|
10
|
Siddique J, de Chavez PJ, Howe G, Cruden G, Brown CH. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2018; 19:95-108. [PMID: 28243827 PMCID: PMC5572105 DOI: 10.1007/s11121-017-0760-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.
Collapse
Affiliation(s)
- Juned Siddique
- Department of Preventive Medicine, Northwestern University, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA.
| | - Peter J de Chavez
- Department of Preventive Medicine, Northwestern University, 680 N. Lake Shore Dr., Suite 1400, Chicago, IL, 60611, USA
| | - George Howe
- Department of Psychology, George Washington University, Washington, DC, USA
| | - Gracelyn Cruden
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL, USA
| | - C Hendricks Brown
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL, USA
| |
Collapse
|
11
|
Pink S. Anticipated (Grand-)Parental Childcare Support and the Decision to Become a Parent. EUROPEAN JOURNAL OF POPULATION = REVUE EUROPEENNE DE DEMOGRAPHIE 2017; 34:691-720. [PMID: 30976258 DOI: 10.1007/s10680-017-9447-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Accepted: 09/23/2017] [Indexed: 11/27/2022]
Abstract
Based on a cost-reduction argument, this study explored whether anticipated childcare support from their mothers influenced adult daughters' decisions to have their first child. Using six waves of the German Family Panel (pairfam), discrete-time hazard models (N = 3155 women) were estimated for the transition to the decision to have the first child. Anticipated childcare support from the women's mothers was approximated by the travelling distance between adult daughters and their mothers, a measure whose suitability was tested empirically. The results indicated that women in a position to anticipate having access to childcare support in the future decided to make the transition to parenthood earlier. This finding highlights both the strength of social interaction effects on fertility decision-making and the importance of intergenerational relationships for individual fertility histories already at their very beginning.
Collapse
|
12
|
Chipperfield JO, Barr ML, Steel DG. Split Questionnaire Designs: collecting only the data that you need through MCAR and MAR designs. J Appl Stat 2017. [DOI: 10.1080/02664763.2017.1375085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | - Margo L. Barr
- Centre for Epidemiology and Evidence, New South Wales Ministry of Health, Australia
| | - David. G. Steel
- National Institute for Applied Statistics Research Australia, University of Wollongong Wollongong, Australia
| |
Collapse
|
13
|
Nguyen CD, Carlin JB, Lee KJ. Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 2017; 14:8. [PMID: 28852415 PMCID: PMC5569512 DOI: 10.1186/s12982-017-0062-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 08/07/2017] [Indexed: 11/20/2022] Open
Abstract
Background Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models.
Analysis In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. Conclusions As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
Electronic supplementary material The online version of this article (doi:10.1186/s12982-017-0062-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road, Parkville, VIC 3052 Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, The Royal Children's Hospital, University of Melbourne, Flemington Road, Parkville, VIC 3052 Australia
| |
Collapse
|
14
|
|
15
|
Gauthier GR, Hill PW, McQuillan J, Spiegel AN, Diamond J. The potential scientist's dilemma: How the Masculinization of Science Shapes Friendships and Science Job Preferences . SOCIAL SCIENCES (BASEL, SWITZERLAND) 2017; 6:14. [PMID: 28491465 PMCID: PMC5421378 DOI: 10.3390/socsci6010014] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the United States, girls and boys have similar science achievement, yet fewer girls aspire to science careers than boys. This paradox emerges in middle school, when peers begin to play a stronger role in shaping adolescent identities. We use complete network data from a single middle school and theories of gender, identity, and social distance to explore how friendship patterns might influence this gender and science paradox. Three patterns highlight the social dimensions of gendered science persistence: (1) boys and girls do not differ in self-perceived science potential and science career aspirations; (2) consistent with gender-based norms, both middle school boys and girls report that the majority of their female friends are not science kinds of people; and (3) youth with gender-inconsistent science aspirations are more likely to be friends with each other than youth with gender normative science aspirations. Together, this evidence suggests that friendship dynamics contribute to gendered patterns in science career aspirations.
Collapse
Affiliation(s)
- G. Robin Gauthier
- Department of Sociology, University of Nebraska-Lincoln, Nebraska, USA
- Research, Evaluation and Analysis for Community Health, Department of Sociology, University of Nebraska-Lincoln, Nebraska, USA
| | | | - Julia McQuillan
- Department of Sociology, University of Nebraska-Lincoln, Nebraska, USA
| | - Amy N. Spiegel
- Center for Instructional Innovation, 215 Teachers College Hall, University of Nebraska-Lincoln, Nebraska, USA
| | - Judy Diamond
- University of Nebraska State Museum, University of Nebraska-Lincoln, Nebraska, USA
| |
Collapse
|
16
|
Keister LA, Aronson B. Immigrants in the one percent: The national origin of top wealth owners. PLoS One 2017; 12:e0172876. [PMID: 28231335 PMCID: PMC5322981 DOI: 10.1371/journal.pone.0172876] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 02/11/2017] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Economic inequality in the United States is extreme, but little is known about the national origin of affluent households. Households in the top one percent by total wealth own vastly disproportionate quantities of household assets and have correspondingly high levels of economic, social, and political influence. The overrepresentation of white natives (i.e., those born in the U.S.) among high-wealth households is well-documented, but changing migration dynamics suggest that a growing portion of top households may be immigrants. METHODS Because no single survey dataset contains top wealth holders and data about country of origin, this paper uses two publicly-available data sets: the Survey of Consumer Finances (SCF) and the Survey of Income and Program Participation (SIPP). Multiple imputation is used to impute country of birth from the SIPP into the SCF. Descriptive statistics are used to demonstrate reliability of the method, to estimate the prevalence of immigrants among top wealth holders, and to document patterns of asset ownership among affluent immigrants. RESULTS Significant numbers of top wealth holders who are usually classified as white natives may be immigrants. Many top wealth holders appear to be European and Canadian immigrants, and increasing numbers of top wealth holders are likely from Asia and Latin America as well. Results suggest that of those in the top one percent of wealth holders, approximately 3% are European and Canadian immigrants, .5% are from Mexico or Cuban, and 1.7% are from Asia (especially Hong Kong, Taiwan, Mainland China, and India). Ownership of key assets varies considerably across affluent immigrant groups. CONCLUSION Although the percentage of top wealth holders who are immigrants is relatively small, these percentages represent large numbers of households with considerable resources and corresponding social and political influence. Evidence that the propensity to allocate wealth to real and financial assets varies across immigrant groups suggests that wealth ownership is more global than previous research suggests and that immigrant groups are likely to become more prevalent in top wealth positions in the U.S. As the representation of immigrants in top wealth positions grows, their economic, social, and political influence is likely to increase as well.
Collapse
Affiliation(s)
- Lisa A. Keister
- Duke University, Department of Sociology, Durham, North Carolina, United States of America
| | - Brian Aronson
- Duke University, Department of Sociology, Durham, North Carolina, United States of America
| |
Collapse
|
17
|
Bobbitt-Zeher D, Downey DB, Merry J. Number of Siblings During Childhood and the Likelihood of Divorce in Adulthood. JOURNAL OF FAMILY ISSUES 2016; 37:2075-2094. [PMID: 27833216 PMCID: PMC5098899 DOI: 10.1177/0192513x14560641] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Despite fertility decline across economically developed countries, relatively little is known about the social consequences of children being raised with fewer siblings. Much research suggests that growing up with fewer siblings is probably positive, as children tend to do better in school when sibship size is small. Less scholarship, however, has explored how growing up with few siblings influences children's ability to get along with peers and develop long-term meaningful relationships. If siblings serve as important social practice partners during childhood, individuals with few or no siblings may struggle to develop successful social lives later in adulthood. With data from the General Social Surveys 1972-2012, we explore this possibility by testing whether sibship size during childhood predicts the probability of divorce in adulthood. We find that, among those who ever marry, each additional sibling is associated with a three percent decline in the likelihood of divorce, net of covariates.
Collapse
|
18
|
Chen G, Åstebro T. How to Deal with Missing Categorical Data: Test of a Simple Bayesian Method. ORGANIZATIONAL RESEARCH METHODS 2016. [DOI: 10.1177/1094428103254672] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The authors analyze the efficiency of six missing data techniques for categorical item nonresponse under the assumption that data are missing at random or missing completely at random. By efficiency, the authors mean a procedure that produces an unbiased estimate of true sample properties that is also easy to implement. The investigated techniques include listwise deletion, mode substitution, random imputation, two regression imputations, and a Bayesian model-based procedure. The authors analyze efficiency under six experimental conditions for a survey-based data set. They find that listwise deletion is efficient for the data analyzed. If data loss due to listwise deletion is an issue, the analysis points to the Bayesian method. Regression imputation is also efficient, but the result is conditioned on the specific data structure and may not hold in general. Additional problems arise when using regression imputation, making it less appropriate.
Collapse
|
19
|
Lee SM, Hwang WH, de Dieu Tapsoba J. Estimation in closed capture-recapture models when covariates are missing at random. Biometrics 2016; 72:1294-1304. [PMID: 26909877 DOI: 10.1111/biom.12498] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 01/01/2016] [Accepted: 01/01/2016] [Indexed: 11/29/2022]
Abstract
Individual covariates are commonly used in capture-recapture models as they can provide important information for population size estimation. However, in practice, one or more covariates may be missing at random for some individuals, which can lead to unreliable inference if records with missing data are treated as missing completely at random. We show that, in general, such a naive complete-case analysis in closed capture-recapture models with some covariates missing at random underestimates the population size. We develop methods for estimating regression parameters and population size using regression calibration, inverse probability weighting, and multiple imputation without any distributional assumptions about the covariates. We show that the inverse probability weighting and multiple imputation approaches are asymptotically equivalent. We present a simulation study to investigate the effects of missing covariates and to evaluate the performance of the proposed methods. We also illustrate an analysis using data on the bird species yellow-bellied prinia collected in Hong Kong.
Collapse
Affiliation(s)
- Shen-Ming Lee
- Department of Statistics, Feng Chia University, Taichung City, Taiwan
| | - Wen-Han Hwang
- Institute of Statistics, National Chung Hsing University, Taichung City, Taiwan
| | - Jean de Dieu Tapsoba
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A
| |
Collapse
|
20
|
Siddique J, Reiter JP, Brincks A, Gibbons RD, Crespi CM, Brown CH. Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis. Stat Med 2015; 34:3399-414. [PMID: 26095855 PMCID: PMC4596762 DOI: 10.1002/sim.6562] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 02/24/2015] [Accepted: 05/26/2015] [Indexed: 11/05/2022]
Abstract
There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values.
Collapse
Affiliation(s)
- Juned Siddique
- Department of Preventive Medicine, Northwestern University, Chicago, IL
| | | | - Ahnalee Brincks
- Department of Public Health Science, University of Miami, Miami, FL
| | - Robert D. Gibbons
- Departments of Medicine and Health Studies, University of Chicago, Chicago, IL
| | - Catherine M. Crespi
- Department of Biostatistics, University of California Los Angeles, Los Angeles, CA
| | - C. Hendricks Brown
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL
| |
Collapse
|
21
|
Miles A, Vaisey S. Morality and politics: Comparing alternate theories. SOCIAL SCIENCE RESEARCH 2015; 53:252-269. [PMID: 26188452 DOI: 10.1016/j.ssresearch.2015.06.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 03/25/2015] [Accepted: 06/04/2015] [Indexed: 06/04/2023]
Abstract
Debates about the American "culture wars" have led scholars to develop several theories relating morality to political attitudes and behaviors. However, researchers have not adequately compared these theories, nor have they examined the overall contribution of morality to explaining political variation. This study uses nationally representative data to compare the utility of 19 moral constructs from four research traditions - associated with the work of Hunter, Lakoff, Haidt, and Schwartz - for predicting political orientation (liberalism/conservatism). Results indicate that morality explains a third of the variation in political orientation - more than basic demographic and religious predictors - but that no one theory provides a fully adequate explanation of this phenomenon. Instead, political orientation is best predicted by selected moral constructs that are unique to each of the four traditions, and by two moral constructs that crosscut them. Future work should investigate how these moral constructs can be synthesized to create a more comprehensive theory of morality and politics.
Collapse
|
22
|
Morisot A, Bessaoud F, Landais P, Rébillard X, Trétarre B, Daurès JP. Prostate cancer: net survival and cause-specific survival rates after multiple imputation. BMC Med Res Methodol 2015. [PMID: 26216355 PMCID: PMC4517373 DOI: 10.1186/s12874-015-0048-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Background Estimations of survival rates are diverse and the choice of the appropriate method depends on the context. Given the increasing interest in multiple imputation methods, we explored the interest of a multiple imputation approach in the estimation of cause-specific survival, when a subset of causes of death was observed. Methods By using European Randomized Study of Screening for Prostate Cancer (ERSPC), 20 multiply imputed datasets were created and analyzed with a Multivariate Imputation by Chained Equation (MICE) algorithm. Then, cause-specific survival was estimated on each dataset with two methods: Kaplan-Meier and competing risks. The two pooled cause-specific survival and confidence intervals were obtained using Rubin’s rules after complementary log-log transformation. Net survival was estimated using Pohar-Perme’s estimator and was compared to pooled cause-specific survival. Finally, a sensitivity analysis was performed to test the robustness of our constructed multiple imputation model. Results Cause-specific survival performed better than net survival, since this latter exceeded 100 % for almost the first 2 years of follow-up and after 9 years whereas the cause-specific survival decreased slowly and than stabilized at around 94 % at 9 years. Sensibility study results were satisfactory. Conclusions On our basis of prostate cancer data, the results obtained by cause-specific survival after multiple imputation appeared to be better and more realistic than those obtained using net survival. Electronic supplementary material The online version of this article (doi:10.1186/s12874-015-0048-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Adeline Morisot
- University of Montpellier, Laboratory of Biostatistics, Epidemiology and Public Health (EA2415), 641, avenue du doyen Gaston Giraud, Montpellier Cedex 5, 34093, France.
| | - Faïza Bessaoud
- Hérault Cancer Registry, 208, rue des Apothicaires, Montpellier Cedex 5, 34298, France
| | - Paul Landais
- University of Montpellier, Laboratory of Biostatistics, Epidemiology and Public Health (EA2415), 641, avenue du doyen Gaston Giraud, Montpellier Cedex 5, 34093, France
| | - Xavier Rébillard
- Department of Urology - BeauSoleil Clinic, 119 avenue de Lodève, Montpellier, 34070, France
| | - Brigitte Trétarre
- Hérault Cancer Registry, 208, rue des Apothicaires, Montpellier Cedex 5, 34298, France
| | - Jean-Pierre Daurès
- University of Montpellier, Laboratory of Biostatistics, Epidemiology and Public Health (EA2415), 641, avenue du doyen Gaston Giraud, Montpellier Cedex 5, 34093, France
| |
Collapse
|
23
|
Crameri A, von Wyl A, Koemeda M, Schulthess P, Tschuschke V. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy. Front Psychol 2015; 6:1042. [PMID: 26283989 PMCID: PMC4515885 DOI: 10.3389/fpsyg.2015.01042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 07/08/2015] [Indexed: 11/13/2022] Open
Abstract
The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.
Collapse
Affiliation(s)
- Aureliano Crameri
- School of Applied Psychology, Zurich University of Applied Sciences Zurich, Switzerland
| | - Agnes von Wyl
- School of Applied Psychology, Zurich University of Applied Sciences Zurich, Switzerland
| | | | | | - Volker Tschuschke
- Division of Medical Psychology, University Hospital of Cologne Cologne, Germany ; Faculty of Psychotherapy Sciences, Sigmund Freud University Berlin, Germany
| |
Collapse
|
24
|
Nguyen CD, Lee KJ, Carlin JB. Posterior predictive checking of multiple imputation models. Biom J 2015; 57:676-94. [PMID: 25939490 DOI: 10.1002/bimj.201400034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 11/13/2014] [Accepted: 12/05/2014] [Indexed: 11/09/2022]
Abstract
Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia.,Department of Paediatrics (RCH Academic Centre), Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, The Royal Children's Hospital, Flemington Road Parkville, Victoria, 3052, Australia
| |
Collapse
|
25
|
Liu W, Li S. A multiple imputation approach to nonlinear mixed-effects models with covariate measurement errors and missing values. J Appl Stat 2014. [DOI: 10.1080/02664763.2014.960372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Huang H, Ma X, Waagepetersen R, Holford TR, Wang R, Risch H, Mueller L, Guan Y. A new estimation approach for combining epidemiological data from multiple sources. J Am Stat Assoc 2014; 109:11-23. [PMID: 24683281 PMCID: PMC3964681 DOI: 10.1080/01621459.2013.870904] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
We propose a novel two-step procedure to combine epidemiological data obtained from diverse sources with the aim to quantify risk factors affecting the probability that an individual develops certain disease such as cancer. In the first step we derive all possible unbiased estimating functions based on a group of cases and a group of controls each time. In the second step, we combine these estimating functions efficiently in order to make full use of the information contained in data. Our approach is computationally simple and flexible. We illustrate its efficacy through simulation and apply it to investigate pancreatic cancer risks based on data obtained from the Connecticut Tumor Registry, a population-based case-control study, and the Behavioral Risk Factor Surveillance System which is a state-based system of health surveys.
Collapse
Affiliation(s)
- Hui Huang
- Department of Management Science, University of Miami, Coral Gables, FL 33124
| | - Xiaomei Ma
- Yale School of Public Health, New Haven, CT 06520
| | - Rasmus Waagepetersen
- Department of Mathematical Sciences, Aalborg University, Fredrik Bajersvej 7G, DK-9220 Aalborg, Denmark
| | | | - Rong Wang
- Yale School of Public Health, New Haven, CT 06520
| | - Harvey Risch
- Yale School of Public Health, New Haven, CT 06520
| | - Lloyd Mueller
- Connecticut Department of Public Health, 410 Capitol Avenue, MS# 11HCQ, Hartford, CT 06134
| | - Yongtao Guan
- Department of Management Science, University of Miami, Coral Gables, FL 33124
| |
Collapse
|
27
|
Nguyen CD, Carlin JB, Lee KJ. Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study. BMC Med Res Methodol 2013; 13:144. [PMID: 24252653 PMCID: PMC3840572 DOI: 10.1186/1471-2288-13-144] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 11/12/2013] [Indexed: 11/20/2022] Open
Abstract
Background Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic. Methods Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios. Results The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test. Conclusions Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.
Collapse
Affiliation(s)
- Cattram D Nguyen
- Clinical Epidemiology & Biostatistics Unit, Murdoch Childrens Research Institute, The Royal Children's Hospital, Flemington Road Parkville, Melbourne, Victoria 3052, Australia.
| | | | | |
Collapse
|
28
|
Rendall MS, Ghosh-Dastidar B, Weden MM, Baker EH, Nazarov Z. Multiple Imputation For Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys. SOCIOLOGICAL METHODS & RESEARCH 2013; 42:10.1177/0049124113502947. [PMID: 24223447 PMCID: PMC3820019 DOI: 10.1177/0049124113502947] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Within-survey multiple imputation (MI) methods are adapted to pooled-survey regression estimation where one survey has more regressors, but typically fewer observations, than the other. This adaptation is achieved through: (1) larger numbers of imputations to compensate for the higher fraction of missing values; (2) model-fit statistics to check the assumption that the two surveys sample from a common universe; and (3) specificying the analysis model completely from variables present in the survey with the larger set of regressors, thereby excluding variables never jointly observed. In contrast to the typical within-survey MI context, cross-survey missingness is monotonic and easily satisfies the Missing At Random (MAR) assumption needed for unbiased MI. Large efficiency gains and substantial reduction in omitted variable bias are demonstrated in an application to sociodemographic differences in the risk of child obesity estimated from two nationally-representative cohort surveys.
Collapse
|
29
|
Hardt J, Herke M, Brian T, Laubach W. Multiple Imputation of Missing Data: A Simulation Study on a Binary Response. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/ojs.2013.35043] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
30
|
Weden MM, Brownell P, Rendall MS. Prenatal, perinatal, early life, and sociodemographic factors underlying racial differences in the likelihood of high body mass index in early childhood. Am J Public Health 2012; 102:2057-67. [PMID: 22994179 PMCID: PMC3477944 DOI: 10.2105/ajph.2012.300686] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2011] [Indexed: 11/04/2022]
Abstract
OBJECTIVES We investigated early childhood disparities in high body mass index (BMI) between Black and White US children. METHODS We compared differences in Black and White children's prevalence of sociodemographic, prenatal, perinatal, and early life risk and protective factors; fit logistic regression models predicting high BMI (≥ 95th percentile) at age 4 to 5 years to 2 nationally representative samples followed from birth; and performed separate and pooled-survey estimations of these models. RESULTS After adjustment for sample design-related variables, models predicting high BMI in the 2 samples were statistically indistinguishable. In the pooled-survey models, Black children's odds of high BMI were 59% higher than White children's (odds ratio [OR] = 1.59; 95% confidence interval [CI]= 1.32, 1.92). Sociodemographic predictors reduced the racial disparity to 46% (OR = 1.46; 95% CI = 1.17, 1.81). Prenatal, perinatal, and early life predictors reduced the disparity to nonsignificance (OR = 1.18; 95% CI = 0.93, 1.49). Maternal prepregnancy obesity and short-duration or no breastfeeding were among predictors for which racial differences in children's exposures most disadvantaged Black children. CONCLUSIONS Racial disparities in early childhood high BMI were largely explained by potentially modifiable risk and protective factors.
Collapse
|
31
|
He Y, Zaslavsky AM. Diagnosing imputation models by applying target analyses to posterior replicates of completed data. Stat Med 2011; 31:1-18. [PMID: 22139814 DOI: 10.1002/sim.4413] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2010] [Accepted: 08/02/2011] [Indexed: 01/10/2023]
Abstract
Multiple imputation fills in missing data with posterior predictive draws from imputation models. To assess the adequacy of imputation models, we can compare completed data with their replicates simulated under the imputation model. We apply analyses of substantive interest to both datasets and use posterior predictive checks of the differences of these estimates to quantify the evidence of model inadequacy. We can further integrate out the imputed missing data and their replicates over the completed-data analyses to reduce variance in the comparison. In many cases, the checking procedure can be easily implemented using standard imputation software by treating re-imputations under the model as posterior predictive replicates. Thus, it can be applied for non-Bayesian imputation methods. We also sketch several strategies for applying the method in the context of practical imputation analyses. We illustrate the method using two real data applications and study its property using a simulation.
Collapse
Affiliation(s)
- Yulei He
- Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, USA.
| | | |
Collapse
|
32
|
|
33
|
Research note: imputing large group averages for missing data, using rural-urban continuum codes for density driven industry sectors. JOURNAL OF POPULATION RESEARCH 2009. [DOI: 10.1007/s12546-009-9018-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
34
|
He Y, Zaslavsky AM, Landrum MB, Harrington DP, Catalano P. Multiple imputation in a large-scale complex survey: a practical guide. Stat Methods Med Res 2009; 19:653-70. [PMID: 19654173 DOI: 10.1177/0962280208101273] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium is a multisite, multimode, multiwave study of the quality and patterns of care delivered to population-based cohorts of newly diagnosed patients with lung and colorectal cancer. As is typical in observational studies, missing data are a serious concern for CanCORS, following complicated patterns that impose severe challenges to the consortium investigators. Despite the popularity of multiple imputation of missing data, its acceptance and application still lag in large-scale studies with complicated data sets such as CanCORS. We use sequential regression multiple imputation, implemented in public-available software, to deal with non-response in the CanCORS surveys and construct a centralised completed database that can be easily used by investigators from multiple sites. Our work illustrates the feasibility of multiple imputation in a large-scale multiobjective survey, showing its capacity to handle complex missing data. We present the implementation process in detail as an example for practitioners and discuss some of the challenging issues which need further research.
Collapse
Affiliation(s)
- Y He
- Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave., Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
35
|
Jackson CH, Best NG, Richardson S. Bayesian graphical models for regression on multiple data sets with different variables. Biostatistics 2008; 10:335-51. [PMID: 19039032 PMCID: PMC2648903 DOI: 10.1093/biostatistics/kxn041] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Routinely collected administrative data sets, such as national registers, aim to collect information on a limited number of variables for the whole population. In contrast, survey and cohort studies contain more detailed data from a sample of the population. This paper describes Bayesian graphical models for fitting a common regression model to a combination of data sets with different sets of covariates. The methods are applied to a study of low birth weight and air pollution in England and Wales using a combination of register, survey, and small-area aggregate data. We discuss issues such as multiple imputation of confounding variables missing in one data set, survey selection bias, and appropriate propagation of information between model components. From the register data, there appears to be an association between low birth weight and environmental exposure to NO(2), but after adjusting for confounding by ethnicity and maternal smoking by combining the register and survey data under our models, we find there is no significant association. However, NO(2) was associated with a small but significant reduction in birth weight, modeled as a continuous variable.
Collapse
Affiliation(s)
- C H Jackson
- MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge CB2 0SR, UK.
| | | | | |
Collapse
|
36
|
Wood AM, White IR, Royston P. How should variable selection be performed with multiply imputed data? Stat Med 2008; 27:3227-46. [PMID: 18203127 DOI: 10.1002/sim.3177] [Citation(s) in RCA: 273] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Multiple imputation is a popular technique for analysing incomplete data. Given the imputed data and a particular model, Rubin's rules (RR) for estimating parameters and standard errors are well established. However, there are currently no guidelines for variable selection in multiply imputed data sets. The usual practice is to perform variable selection amongst the complete cases, a simple but inefficient and potentially biased procedure. Alternatively, variable selection can be performed by repeated use of RR, which is more computationally demanding. An approximation can be obtained by a simple 'stacked' method that combines the multiply imputed data sets into one and uses a weighting scheme to account for the fraction of missing data in each covariate. We compare these and other approaches using simulations based around a trial in community psychiatry. Most methods improve on the naïve complete-case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is our recommended approach.
Collapse
Affiliation(s)
- Angela M Wood
- Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Worts Causeway, Cambridge CB2 8RN, UK.
| | | | | |
Collapse
|
37
|
|
38
|
Jenkinson C, Heffernan C, Doll H, Fitzpatrick R. The Parkinson's Disease Questionnaire (PDQ-39): evidence for a method of imputing missing data. Age Ageing 2006; 35:497-502. [PMID: 16772362 DOI: 10.1093/ageing/afl055] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The Parkinson's Disease Questionnaire (PDQ-39) is the most widely used Parkinson's specific measure of health status. It is increasingly used in treatment trials, sometimes as a primary end-point, where any missing data can potentially cause difficulties in analyses. OBJECTIVES The purpose of this article is to evaluate the Expectation Maximisation (EM) algorithm for the imputation of missing dimension scores on the 39-item PDQ-39. METHODS A postal survey of patients diagnosed with Parkinson's disease (PD). A total of 1,372 patients were surveyed and 839 (61.15%) questionnaires returned completed or partially completed. Of these, complete PDQ data were available in 715 (85.22%) cases. Data were deleted from this complete dataset and a sub-set of 200 respondents from this dataset and then imputed using the EM algorithm; results were then compared to the dataset before data deletion. RESULTS Results gained from imputation of data closely mirrored that of the complete dataset in each case. Descriptive statistics, mean scores and spread of scores were almost identical between original and imputed datasets. Furthermore, original and imputed datasets were highly correlated [intra-class correlation coefficient (ICC) = 0.93 or greater], and mean differences were small (+/-1.00). CONCLUSIONS The results suggest that the use of EM for the PDQ-39 provides data that closely mirrors the original when this has been deliberately removed. Consequently, EM is likely to be appropriate for trials using the PDQ that contains missing data points.
Collapse
Affiliation(s)
- Crispin Jenkinson
- University of Oxford, Department of Public Health, Old Road Campus, Headington, Oxford OX3 7LF, UK.
| | | | | | | |
Collapse
|
39
|
Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M. Multiple Imputation for Model Checking: Completed-Data Plots with Missing and Latent Data. Biometrics 2005; 61:74-85. [PMID: 15737080 DOI: 10.1111/j.0006-341x.2005.031010.x] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.
Collapse
Affiliation(s)
- Andrew Gelman
- Department of Statistics, Columbia University, New York 10027, USA.
| | | | | | | | | |
Collapse
|
40
|
Wu H, Wu L. A multiple imputation method for missing covariates in non-linear mixed-effects models with application to HIV dynamics. Stat Med 2001; 20:1755-69. [PMID: 11406839 DOI: 10.1002/sim.816] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We propose a three-step multiple imputation method, implemented by Gibbs sampler, for estimating parameters in non-linear mixed-effects models with missing covariates. Estimates obtained by the proposed multiple imputation method are compared to those obtained by the mean-value imputation method and the complete-case method through simulations. We find that the proposed multiple imputation method offers smaller biases and smaller mean-squared errors for the estimates of covariate coefficients compared to other two methods. We apply the three missing data methods to modelling HIV viral dynamics from an AIDS clinical trial. We believe that the results from the proposed multiple imputation method are more reliable than that from the other two commonly used methods.
Collapse
Affiliation(s)
- H Wu
- Statistical and Data Analysis Center, Harvard School of Public Health, Frontier Science & Technology Research Foundation, Inc., 1244 Boylston Street, Suite 303, Chestnut Hill, Massachusetts 02467, USA.
| | | |
Collapse
|
41
|
|