1
|
Wissler A, Blevins KE, Buikstra JE. Missing data in bioarchaeology II: A test of ordinal and continuous data imputation. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2022; 179:349-364. [PMID: 36790608 PMCID: PMC9825894 DOI: 10.1002/ajpa.24614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 07/22/2022] [Accepted: 08/17/2022] [Indexed: 11/11/2022]
Abstract
OBJECTIVES Previous research has shown that while missing data are common in bioarchaeological studies, they are seldom handled using statistically rigorous methods. The primary objective of this article is to evaluate the ability of imputation to manage missing data and encourage the use of advanced statistical methods in bioarchaeology and paleopathology. An overview of missing data management in biological anthropology is provided, followed by a test of imputation and deletion methods for handling missing data. MATERIALS AND METHODS Missing data were simulated on complete datasets of ordinal (n = 287) and continuous (n = 369) bioarchaeological data. Missing values were imputed using five imputation methods (mean, predictive mean matching, random forest, expectation maximization, and stochastic regression) and the success of each at obtaining the parameters of the original dataset compared with pairwise and listwise deletion. RESULTS In all instances, listwise deletion was least successful at approximating the original parameters. Imputation of continuous data was more effective than ordinal data. Overall, no one method performed best and the amount of missing data proved a stronger predictor of imputation success. DISCUSSION These findings support the use of imputation methods over deletion for handling missing bioarchaeological and paleopathology data, especially when the data are continuous. Whereas deletion methods reduce sample size, imputation maintains sample size, improving statistical power and preventing bias from being introduced into the dataset.
Collapse
Affiliation(s)
- Amanda Wissler
- Department of AnthropologyUniversity of South CarolinaColumbiaSouth CarolinaUSA
| | | | - Jane E. Buikstra
- Center for Bioarchaeological Research, School of Human Evolution and Social ChangeArizona State UniversityTempeArizonaUSA
| |
Collapse
|
2
|
The Contribution of JAK2 46/1 Haplotype in the Predisposition to Myeloproliferative Neoplasms. Int J Mol Sci 2022; 23:ijms232012582. [PMID: 36293440 PMCID: PMC9604447 DOI: 10.3390/ijms232012582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/13/2022] [Accepted: 10/15/2022] [Indexed: 11/17/2022] Open
Abstract
Haplotype 46/1 (GGCC) consists of a set of genetic variations distributed along chromosome 9p.24.1, which extend from the Janus Kinase 2 gene to Insulin like 4. Marked by four jointly inherited variants (rs3780367, rs10974944, rs12343867, and rs1159782), this haplotype has a strong association with the development of BCR-ABL1-negative myeloproliferative neoplasms (MPNs) because it precedes the acquisition of the JAK2V617F variant, a common genetic alteration in individuals with these hematological malignancies. It is also described as one of the factors that increases the risk of familial MPNs by more than five times, 46/1 is associated with events related to inflammatory dysregulation, splenomegaly, splanchnic vein thrombosis, Budd–Chiari syndrome, increases in RBC count, platelets, leukocytes, hematocrit, and hemoglobin, which are characteristic of MPNs, as well as other findings that are still being elucidated and which are of great interest for the etiopathological understanding of these hematological neoplasms. Considering these factors, the present review aims to describe the main findings and discussions involving the 46/1 haplotype, and highlights the molecular and immunological aspects and their relevance as a tool for clinical practice and investigation of familial cases.
Collapse
|
3
|
Zhang W, Jin X, Wang Y, Chen C, Zhu B. Genetic structure analyses and ancestral information inference of Chinese Kyrgyz group via a panel of 39 AIM-DIPs. Genomics 2021; 113:2056-2064. [PMID: 33711452 DOI: 10.1016/j.ygeno.2021.03.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Revised: 10/18/2020] [Accepted: 03/05/2021] [Indexed: 11/29/2022]
Abstract
Ancestry informative markers have extensive uses and advantages in inferring ancestral origins and estimating ancestral genetic information components of admixed populations. With the characteristics of highly cultural exchange and the admixed genetic structure of the Kyrgyz group, it is essential to enrich the genetic data of the Kyrgyz group. In this study, we used a self-developed ancestry informative marker-deletion/insertion polymorphic (AIM-DIP) panel to explore ancestral components of Chinese Kyrgyz group and population genetic relationships between the Kyrgyz group and reference populations. Results showed that all AIM-DIP loci were conformed to Hardy-Weinberg equilibrium. There were 36 AIM-DIP loci that contributed significantly to genetic information inference. Multiple statistical analyses revealed that Chinese Kyrgyz group had a closer genetic relationship with Chinese Uyghur group. The ancestral components of the Kyrgyz group, being mostly composed of genetic components of European and East Asian populations, were more similar to the ancestral components of Chinese Uyghur group.
Collapse
Affiliation(s)
- Wenqing Zhang
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China
| | - Xiaoye Jin
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; College of Forensic Medicine, Xi'an Jiaotong University Health Science Center, Xi'an 710061, China
| | - Yijie Wang
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China
| | - Chong Chen
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; College of Forensic Medicine, Xi'an Jiaotong University Health Science Center, Xi'an 710061, China
| | - Bofeng Zhu
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, Xi'an 710004, China; Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China.
| |
Collapse
|
4
|
Stern JA, White SN, Meurs KM. Extent of linkage disequilibrium in large-breed dogs: chromosomal and breed variation. Mamm Genome 2013; 24:409-15. [PMID: 24062056 DOI: 10.1007/s00335-013-9474-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 08/01/2013] [Indexed: 11/24/2022]
Abstract
The aim of this study was to better define the extent of linkage disequilibrium (LD) in populations of large-breed dogs and its variation by breed and chromosomal region. Understanding the extent of LD is a crucial component for successful utilization of genome-wide association studies and allows researchers to better define regions of interest and target candidate genes. Twenty-four Golden Retriever dogs, 28 Rottweiler dogs, and 24 Newfoundland dogs were genotyped for single-nucleotide polymorphism (SNP) data using a high-density SNP array. LD was calculated for all autosomes using Haploview. Decay of the squared correlation coefficient (r (2)) was plotted on a per-breed and per-chromosome basis as well as in a genome-wide fashion. The point of 50 % decay of r (2) was used to estimate the difference in extent of LD between breeds. Extent of LD was significantly shorter for Newfoundland dogs based upon 50 % decay of r (2) data at a mean of 344 kb compared to Golden Retriever and Rottweiler dogs at 715 and 834 kb, respectively (P < 0.0001). Notable differences in LD by chromosome were present within each breed and not strictly related to the length of the corresponding chromosome. Extent of LD is breed and chromosome dependent. To our knowledge, this is the first report of SNP-based LD for Newfoundland dogs, the first report based on genome-wide SNPs for Rottweilers, and an almost tenfold improvement in marker density over previous genome-wide studies of LD in Golden Retrievers.
Collapse
Affiliation(s)
- Joshua A Stern
- Department of Clinical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, 27607, USA,
| | | | | |
Collapse
|
5
|
O'Seaghdha CM, Fox CS. Genome-wide association studies of chronic kidney disease: what have we learned? Nat Rev Nephrol 2011; 8:89-99. [PMID: 22143329 DOI: 10.1038/nrneph.2011.189] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The past 3 years have witnessed a dramatic expansion in our knowledge of the genetic determinants of estimated glomerular filtration rate (eGFR) and chronic kidney disease (CKD). However, heritability estimates of eGFR indicate that we have only identified a small proportion of the total heritable contribution to the phenotypic variation. The majority of associations reported from genome-wide association studies identify genomic regions of interest and further work will be required to identify the causal variants responsible for a specific phenotype. Progress in this area is likely to stem from the identification of novel risk genotypes, which will offer insight into the pathogenesis of disease and potential novel therapeutic targets. Follow-up studies stimulated by findings from genome-wide association studies of kidney disease are already yielding promising results, such as the identification of an association between urinary uromodulin levels and incident CKD. Although this work is at an early stage, prospects for progress in our understanding of CKD and its treatment look more promising now than at any point in the past.
Collapse
Affiliation(s)
- Conall M O'Seaghdha
- National Heart, Lung and Blood Institute's Framingham Heart Study and the Center for Population Studies, 73 Mount Wayte Avenue, Suite 2, Framingham, MA 01702, USA
| | | |
Collapse
|
6
|
Smith B, Chen Z, Reimers L, van Doorslaer K, Schiffman M, DeSalle R, Herrero R, Yu K, Wacholder S, Wang T, Burk RD. Sequence imputation of HPV16 genomes for genetic association studies. PLoS One 2011; 6:e21375. [PMID: 21731721 PMCID: PMC3121793 DOI: 10.1371/journal.pone.0021375] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 05/30/2011] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. METHODS A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. RESULTS HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. CONCLUSIONS Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.
Collapse
Affiliation(s)
- Benjamin Smith
- Department of Pediatrics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Zigui Chen
- Department of Pediatrics, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Laura Reimers
- Department of Obstetrics, Gynecology and Women's Health, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Koenraad van Doorslaer
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Mark Schiffman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Rob DeSalle
- Sackler Institute of Comparative Genomics, American Museum of Natural History, New York, New York, United States of America
| | - Rolando Herrero
- Proyecto Epidemiológico Guanacaste, Fundación INCIENSA, San José, Costa Rica
| | - Kai Yu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Sholom Wacholder
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Tao Wang
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Robert D. Burk
- Department of Pediatrics, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Obstetrics, Gynecology and Women's Health, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
7
|
de Bakker PIW, Neale BM, Daly MJ. Meta-analysis of genome-wide association studies. Cold Spring Harb Protoc 2010; 2010:pdb.top81. [PMID: 20516189 DOI: 10.1101/pdb.top81] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Individual genome-wide association studies have only limited power to find novel loci underlying complex traits and common diseases. With relatively modest sample and effect sizes, a true association between genotype and phenotype may never meet genome-wide statistical significance (P < 5 x 10(-8)) in a single study. Through meta-analysis, novel susceptibility loci can be discovered by effectively summing the statistical evidence of individually underpowered studies. Most genetic discoveries for complex traits are now made through meta-analysis collaborations, which so far have been restricted to single-locus analyses, testing for main effects at a single polymorphism at a time. A key benefit of this approach is that individual-level genotype (and phenotype) data do not need to be exchanged between research groups. In this article, we focus on meta-analysis at individual single-nucleotide polymorphisms (SNPs), paying particular attention to how imputation uncertainty can be incorporated into the association analysis and subsequent meta-analysis. Probably the most important aspect of genome-wide association meta-analysis is harmonization of the study results. As studies differ in design, sample collection, genotyping platforms, and association analysis methods, it is important that the association results (per SNP) of each study can be formatted, exchanged, and analyzed in such a way that the statistical evidence can be combined appropriately and that no valuable information is lost. Without minimizing the importance of having a clear phenotype definition (and corresponding measurements), we will assume that investigators representing the various studies have made sensible agreements about phenotype definitions, necessary sample exclusions, and appropriate covariate modeling.
Collapse
|