1
|
Liao SY, Tan YD. Sister haplotypes and recombination disequilibrium: a new approach to identify associations of haplotypes with complex diseases. Front Genet 2024; 14:1295327. [PMID: 38292437 PMCID: PMC10825010 DOI: 10.3389/fgene.2023.1295327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/13/2023] [Indexed: 02/01/2024] Open
Abstract
Haplotype-based association analysis has several advantages over single-SNP association analysis. However, to date all haplotype-disease associations have not excluded recombination interference among multiple loci and hence some results might be confounded by recombination interference. Association of sister haplotypes with a complex disease, based on recombination disequilibrium (RD) was presented. Sister haplotypes can be determined by translating notation of DNA base haplotypes to notation of genetic genotypes. Sister haplotypes provide haplotype pairs available for haplotype-disease association analysis. After performing RD tests in control and case cohorts, a two-by-two contingency table can be constructed using sister haplotype pair and case-control pair. With this standard two-by-two table, one can perform classical Chi-square test to find statistical haplotype-disease association. Applying this method to a haplotype dataset of Alzheimer disease (AD), association of sister haplotypes containing ApoE3/4 with risk for AD was identified under no RD. Haplotypes within gene IL-13 were not associated with risk for breast cancer in the case of no RD and no association of haplotypes in gene IL-17A with risk for coronary artery disease were detected without RD. The previously reported associations of haplotypes within these genes with risk for these diseases might be due to strong RD and/or inappropriate haplotype pairs.
Collapse
Affiliation(s)
- Shun-Yao Liao
- Institute of Gerontology, Center for Genetics, Sichuan Academy & Sichuan Provincial People Hospital, University of Electronic Science and Technology of China, Chendu, Sichuan, China
| | - Yuan-De Tan
- Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
2
|
Khvorykh GV, Sapozhnikov NA, Limborska SA, Khrunin AV. Evaluation of Density-Based Spatial Clustering for Identifying Genomic Loci Associated with Ischemic Stroke in Genome-Wide Data. Int J Mol Sci 2023; 24:15355. [PMID: 37895035 PMCID: PMC10607504 DOI: 10.3390/ijms242015355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/19/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023] Open
Abstract
The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case and control groups. The purpose of this research is to develop an alternative approach in which groups of SNPs are examined rather than individual ones. We proposed, validated and applied to real data a new workflow consisting of three key stages: grouping SNPs in clusters, inferring the haplotypes in the clusters and testing haplotypes for the association with phenotype. To group SNPs, we applied the clustering algorithms DBSCAN and HDBSCAN to linkage disequilibrium (LD) matrices, representing pairwise r2 values between all genotyped SNPs. These clustering algorithms have never before been applied to genotype data as part of the workflow of associative studies. In total, 883,908 SNPs and insertion/deletion polymorphisms from people of European ancestry (4929 cases and 652 controls) were processed. The subsequent testing for frequencies of haplotypes restored in the clusters of SNPs revealed dozens of genes associated with IS and suggested the complex role that protocadherin molecules play in IS. The developed workflow was validated with the use of a simulated dataset of similar ancestry and the same sample sizes. The results of classic GWASs are also provided and discussed. The considered clustering algorithms can be applied to genotypic data to identify the genomic loci associated with different qualitative traits, using the workflow presented in this research.
Collapse
Affiliation(s)
| | | | | | - Andrey V. Khrunin
- National Research Centre “Kurchatov Institute”, Kurchatov Sq. 2, Moscow 123182, Russia; (G.V.K.); (N.A.S.); (S.A.L.)
| |
Collapse
|
3
|
Qureshi S, Hardy JJ, Pombar C, Berman AJ, Malcher A, Gingrich T, Hvasta R, Kuong J, Munyoki S, Hwang K, Orwig KE, Ahmed J, Olszewska M, Kurpisz M, Conrad DF, Jaseem Khan M, Yatsenko AN. Genomic study of TEX15 variants: prevalence and allelic heterogeneity in men with spermatogenic failure. Front Genet 2023; 14:1134849. [PMID: 37234866 PMCID: PMC10206016 DOI: 10.3389/fgene.2023.1134849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/12/2023] [Indexed: 05/28/2023] Open
Abstract
Introduction: Human spermatogenesis is a highly intricate process that requires the input of thousands of testis-specific genes. Defects in any of them at any stage of the process can have detrimental effects on sperm production and/or viability. In particular, the function of many meiotic proteins encoded by germ cell specific genes is critical for maturation of haploid spermatids and viable spermatozoa, necessary for fertilization, and is also extremely sensitive to even the slightest change in coding DNA. Methods: Here, using whole exome and genome approaches, we identified and reported novel, clinically significant variants in testis-expressed gene 15 (TEX15), in unrelated men with spermatogenic failure (SPGF). Results: TEX15 mediates double strand break repair during meiosis. Recessive loss-of-function (LOF) TEX15 mutations are associated with SPGF in humans and knockout male mice are infertile. We expand earlier reports documenting heterogeneous allelic pathogenic TEX15 variants that cause a range of SPGF phenotypes from oligozoospermia (low sperm) to nonobstructive azoospermia (no sperm) with meiotic arrest and report the prevalence of 0.6% of TEX15 variants in our patient cohort. Among identified possible LOF variants, one homozygous missense substitution c.6835G>A (p.Ala2279Thr) co-segregated with cryptozoospermia in a family with SPGF. Additionally, we observed numerous cases of inferred in trans compound heterozygous variants in TEX15 among unrelated individuals with varying degrees of SPGF. Variants included splice site, insertions/deletions (indels), and missense substitutions, many of which resulted in LOF effects (i.e., frameshift, premature stop, alternative splicing, or potentially altered posttranslational modification sites). Conclusion: In conclusion, we performed an extensive genomic study of familial and sporadic SPGF and identified potentially damaging TEX15 variants in 7 of 1097 individuals of our combined cohorts. We hypothesize that SPGF phenotype severity is dictated by individual TEX15 variant's impact on structure and function. Resultant LOFs likely have deleterious effects on crossover/recombination in meiosis. Our findings support the notion of increased gene variant frequency in SPGF and its genetic and allelic heterogeneity as it relates to complex disease such as male infertility.
Collapse
Affiliation(s)
- Sidra Qureshi
- Department of Molecular Biology and Genetics, Institute of Basic Medical Sciences, Khyber Medical University, Peshawar, Pakistan
| | - Jimmaline J. Hardy
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Christopher Pombar
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Andrea J. Berman
- Department of Biological Sciences, Dietrich School of Arts and Sciences, University of Pittsburgh, Pittsburgh, PA, United States
| | - Agnieszka Malcher
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Tara Gingrich
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Rachel Hvasta
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Jannah Kuong
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Sarah Munyoki
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Kathleen Hwang
- Department of Urology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Kyle E. Orwig
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Jawad Ahmed
- Department of Molecular Biology and Genetics, Institute of Basic Medical Sciences, Khyber Medical University, Peshawar, Pakistan
| | - Marta Olszewska
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Maciej Kurpisz
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Donald F. Conrad
- Department of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR, United States
| | - Muhammad Jaseem Khan
- Department of Molecular Biology and Genetics, Institute of Basic Medical Sciences, Khyber Medical University, Peshawar, Pakistan
| | - Alexander N. Yatsenko
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Magee-Women’s Research Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
4
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
5
|
Que TN, Khanh NB, Khanh BQ, Van Son C, Van Anh NT, Anh TTT, Tung PD, Thang ND. Allele and Haplotype Frequencies of HLA-A, -B, -C, and -DRB1 Genes in 3,750 Cord Blood Units From a Kinh Vietnamese Population. Front Immunol 2022; 13:875283. [PMID: 35844516 PMCID: PMC9277059 DOI: 10.3389/fimmu.2022.875283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 05/30/2022] [Indexed: 11/13/2022] Open
Abstract
The frequencies and diversities of human leukocyte antigen (HLA) alleles and haplotypes are representative of ethnicities. Matching HLA alleles is essential for many clinical applications, including blood transfusion, stem cell transplantation, and tissue/organ transplantation. To date, the information about the frequencies and distributions of HLA alleles and haplotypes among the Kinh Vietnamese population is limited because of the small sample size. In this study, more than 3,750 cord blood units from individuals belonging to the Kinh Vietnamese population were genotyped using PCR sequence-specific oligonucleotide (PCR-SSO) for HLA testing. The results of the study demonstrated that the most frequently occurring HLA-A, -B, -C, and -DRB1 alleles were A*11:01 (25%), A*24:02 (12.3%), A*02:01 (11.2); A*03:03 (8.95%), A*02:03 (7.81%), A*29:01 (7.03%); B*15:02 (15.1%), B*46:01 (10.7%), B*58:01 (7.65%), B*38:02 (7.29%); C*08:01 (17.2), C*07:02 (16.2%), C*01:02 (15.2), C*03:02 (8.3%), C*15:05 (6.13); DRB1*12:02 (31.0%), DRB1*09:01 (10.47%), DRB1*15:02 (7.54%); DRB1*07:01 (6.68%), DRB1*10:01 (6.63%), respectively, with the highest allele diversity level observed in locus B (93 alleles). The most frequent haplotypes of two-locus combinations of HLA-A–B, HLA-A–C, HLA-A–DRB1, HLA-B–C, HLA-B–DRB1, and HLA-C–DRB1 haplotypes were A*11:01–B*15:02 (7.63%), A*11:01–C*08:01 (7.98%), A*11:01–DRB1*12:02 (10.56%), B*15:02–C*08:01 (14.0%), B*15:02–DRB1*12:02 (10.47%), and C*08:01–DRB1*12:02 (11.38%), respectively. In addition, the most frequent haplotypes of three- and four-locus sets of HLA-A–B–C, HLA-A–B–DRB1, HLA-A–C–DRB1, HLA-B–C–DRB1, and HLA-A–B–C–DRB1 were A*11:01–B*15:02–C*08:01 (7.57%), A*11:01–B*15:02–DRB1*12:02 (5.39%), A*11:01–C*08:01–DRB1*12:02 (5.54%), B*15:02–C*08:01–DRB1*12:02 (10.21%), and A*11:01–B*15:02–C*08:01–DRB1*12:02 (5.45%), respectively. This study provides critical information on the frequencies and distributions of HLA alleles and haplotypes in the Kinh Vietnamese population, accounting for more than 85% of Vietnamese citizens. It paves the way to establish an umbilical cord blood bank for cord blood transplantation programs in Vietnam.
Collapse
Affiliation(s)
- Tran Ngoc Que
- Stem Cell Bank, National Institute of Hematology and Blood Transfusion, Pham Van Bach, Cau Giay, Hanoi, Vietnam
- Department of Hematology, Hanoi Medical University, 1 Ton That Tung, Dong Da, Hanoi, Vietnam
| | - Nguyen Ba Khanh
- Stem Cell Bank, National Institute of Hematology and Blood Transfusion, Pham Van Bach, Cau Giay, Hanoi, Vietnam
- Department of Hematology, Hanoi Medical University, 1 Ton That Tung, Dong Da, Hanoi, Vietnam
| | - Bach Quoc Khanh
- Stem Cell Bank, National Institute of Hematology and Blood Transfusion, Pham Van Bach, Cau Giay, Hanoi, Vietnam
- Department of Hematology, Hanoi Medical University, 1 Ton That Tung, Dong Da, Hanoi, Vietnam
| | - Chu Van Son
- Key Laboratory of Enzyme and Protein Technology, VNU University of Science, Vietnam National University-Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam
| | - Nguyen Thi Van Anh
- Key Laboratory of Enzyme and Protein Technology, VNU University of Science, Vietnam National University-Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam
| | - Tran Thi Thuy Anh
- Faculty of Biology, VNU University of Science, Vietnam National University-Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam
| | - Pham Dinh Tung
- Department of Probability and Statistics, Faculty of Mathematics–Mechanics–Informatics, VNU University of Science, Vietnam National University, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam
| | - Nguyen Dinh Thang
- Faculty of Biology, VNU University of Science, Vietnam National University-Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam
- *Correspondence: Nguyen Dinh Thang,
| |
Collapse
|
6
|
Islam MR, Naveed SA, Zhang Y, Li Z, Zhao X, Fiaz S, Zhang F, Wu Z, Hu Z, Fu B, Shi Y, Shah SM, Xu J, Wang W. Identification of Candidate Genes for Salinity and Anaerobic Tolerance at the Germination Stage in Rice by Genome-Wide Association Analyses. Front Genet 2022; 13:822516. [PMID: 35281797 PMCID: PMC8905349 DOI: 10.3389/fgene.2022.822516] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 01/03/2022] [Indexed: 11/29/2022] Open
Abstract
Multiple stress tolerance at the seed germination stage is crucial for better crop establishment in the direct-seeded rice ecosystem. Therefore, identifying rice genes/quantitative trait loci (QTLs) associated with salinity and anaerobic tolerance at the germination stage is a prerequisite for adaptive breeding. Here, we studied 498 highly diverse rice accessions Xian (Indica) and Geng (Japonica), and six traits that are highly associated with salinity and anaerobic tolerance at germination stage were measured. A high-density 2.8M Single Nucleotide Polymorphisms (SNP) genotype map generated from the 3,000 Rice Genomes Project (3KRGP) was used for mapping through a genome-wide association study. In total, 99 loci harboring 117 QTLs were detected in different populations, 54, 21, and 42 of which were associated with anaerobic, salinity, and combined (anaerobic and salinity) stress tolerance. Nineteen QTLs were close to the reported loci for abiotic stress tolerance, whereas two regions on chromosome 4 (qSGr4a/qCL4c/qRI4d and qAGr4/qSGr4b) and one region on chromosome 10 (qRI10/qCL10/ qSGr10b/qBM10) were associated with anaerobic and salinity related traits. Further haplotype analysis detected 25 promising candidates genes significantly associated with the target traits. Two known genes (OsMT2B and OsTPP7) significantly associated with grain yield and its related traits under saline and anaerobic stress conditions were identified. In this study, we identified the genes involved in auxin efflux (Os09g0491740) and transportation (Os01g0976100), whereas we identified multistress responses gene OsMT2B (Os01g0974200) and a major gene OsTPP7 (Os09g0369400) involved in anaerobic germination and coleoptile elongation on chromosome 9. These promising candidates provide valuable resources for validating potential salt and anaerobic tolerance genes and will facilitate direct-seeded rice breeding for salt and anaerobic tolerance through marker-assisted selection or gene editing.
Collapse
Affiliation(s)
- Mohammad Rafiqul Islam
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shahzad Amir Naveed
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yue Zhang
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhikang Li
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China.,College of Agronomy, Anhui Agricultural University, Hefei, China.,Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xiuqin Zhao
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Sajid Fiaz
- Department of Plant Breeding and Genetics, The University of Haripur, Haripur, Pakistan
| | - Fan Zhang
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China.,College of Agronomy, Anhui Agricultural University, Hefei, China
| | - Zhichao Wu
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhiqing Hu
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Binying Fu
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yingyao Shi
- College of Agronomy, Anhui Agricultural University, Hefei, China
| | - Shahid Masood Shah
- Department of Biotechnology, COMSATS University Islamabad-Abbottabad Campus, Abbottabad, Pakistan
| | - Jianlong Xu
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China.,Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wensheng Wang
- Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing, China.,College of Agronomy, Anhui Agricultural University, Hefei, China.,National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, China
| |
Collapse
|
7
|
Swart C, Meldau S, Centner CM, Marais AD, Omar F. Validation of PHASE for deriving N-acetyltransferase 2 haplotypes in the Western Cape mixed ancestry population. Afr J Lab Med 2020; 9:988. [PMID: 33392048 PMCID: PMC7756977 DOI: 10.4102/ajlm.v9i1.988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 09/14/2020] [Indexed: 11/17/2022] Open
Abstract
Background There is a shortage of data on the accuracy of statistical methods for the prediction of N-acetyltransferase 2 (NAT2) haplotypes in the mixed ancestry population of the Western Cape. Objective This study aimed to identify the NAT2 haplotypes and assess the accuracy of PHASE version 2.1.1 in assigning NAT2 haplotypes to a mixed ancestry population from the Western Cape. Methods This study was conducted between 2013 and 2016. The NAT2 gene was amplified and sequenced from the DNA of 100 self-identified mixed ancestry participants. Haplotyping was performed by molecular and computational techniques. Agreement was assessed between the two techniques. Results Haplotypes were assigned to 93 samples, of which 67 (72%) were ambiguous. Haplotype prediction by PHASE demonstrated 94.6% agreement (kappa 0.94, p < 0.001) with those assigned using molecular techniques. Five haplotype combinations (from 10 chromosomes) were incorrectly predicted, four of which were flagged as uncertain by the PHASE software. Only one resulted in the assignment of an incorrect acetylation phenotype (intermediate to slow), although the software flagged this for further analysis. The most common haplotypes were NAT2*4 (28%) followed by NAT2*5B (27.4%), NAT2*6A (21.5%) and NAT2*12A (7.5%). Four rare single nucleotide variants (c.589C>T, c.622T>C, c.809T>C and c.387C>T) were detected. Conclusion PHASE accurately predicted the phenotype in 92 of 93 samples (99%) from genotypic data in our mixed ancestry sample population, and is therefore a suitable alternative to molecular methods to individualise isoniazid therapy in this high burden tuberculosis setting.
Collapse
Affiliation(s)
- Celeste Swart
- Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.,National Health Laboratory Service (NHLS), Groote Schuur Hospital, Cape Town, South Africa
| | - Surita Meldau
- National Health Laboratory Service (NHLS), Groote Schuur Hospital, Cape Town, South Africa.,Division of Chemical Pathology, University of Cape Town, Cape Town, South Africa
| | - Chad M Centner
- Division of Medical Microbiology, University of Cape Town, Cape Town, South Africa.,National Health Laboratory Service (NHLS), Medical Microbiology, Groote Schuur Hospital, Cape Town, South Africa
| | - Adrian D Marais
- Division of Chemical Pathology, University of Cape Town, Cape Town, South Africa
| | - Fierdoz Omar
- Division of Chemical Pathology, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
8
|
Jawdat D, Uyar FA, Alaskar A, Müller CR, Hajeer A. HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 Allele and Haplotype Frequencies of 28,927 Saudi Stem Cell Donors Typed by Next-Generation Sequencing. Front Immunol 2020; 11:544768. [PMID: 33193311 PMCID: PMC7643328 DOI: 10.3389/fimmu.2020.544768] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 08/18/2020] [Indexed: 11/21/2022] Open
Abstract
Human leukocyte antigen (HLA) allele and haplotype frequency distribution varies widely between different ethnicities and geographical areas. Matching for HLA alleles is essential for successful related and unrelated stem cell transplantation. Among the Saudi population, data on HLA alleles and haplotypes are limited. A cross-sectional study was performed on 28,927 bone marrow donors. The most frequent HLA alleles were HLA-A*02:01:01G (20.2%), A*24:02:01G (7.5%); B*51:01:01G (19.0%), B*50:01:01G (12.3%); C*06:02:01G (16.7%), C*07:02:01G (12.2%); DRB1*07:01:01 (15.7%), DRB1*03:01:01G (13.3%); DQB1*02:01:01G (29.9%), DQB1*03:02:01G (13.2%); and DPB1*04:01:01G (35.2%), DPB1*02:01:02G (21.8%). The most frequent HLA-A~C~B~DRB1~DQB1 haplotypes were A*02:01:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G (1.9%) and A*02:05:01G~C*06:02:01G~B*50:01:01G~DRB1*07:01:01G~DQB1*02:01:01G (1.6%). The most frequent HLA-A~C~B~DRB1~DQB1~DPB1 haplotypes were A*02:01:01G~C*15:02:01G~B*51:01:01G~DRB1*04:02~DQB1*03:02:01G~DPB1*04:01:0G (1%) and A*02:01:01G~C*07:02:01G~B*07:02:01G~DRB1*15:01:01G~DQB1*06:02:01G~ DPB1*04:01:01G (0.9%). Based on these haplotype frequencies, we provide forecasts for the fraction of patients with full matching and single mismatched donors for 3 to 6 loci depending on the registry size. With one million donors, about 50% of the patients would find an 8/8 match and 90% a 7/8 match. These data are essential for registry planning, finding unrelated stem cell donors, population genetic studies, and HLA disease associations.
Collapse
Affiliation(s)
- Dunia Jawdat
- Saudi Stem Cells Donor Registry, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
| | - F. Aytül Uyar
- Department of Physiology, Istanbul Medical Faculty, Istanbul University, Istanbul, Turkey
| | - Ahmed Alaskar
- Department of Oncology, King Abdulaziz Medical City - Ministry of National Guard Health Affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Carlheinz R. Müller
- ZKRD Zentrales Knochenmarkspender–Register für die Bundesrepublik Deutschland, Ulm, Germany
| | - Ali Hajeer
- Department of Pathology and Laboratory Medicine, King Abdulaziz Medical City - Ministry of National Guard Health Affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| |
Collapse
|
9
|
Abstract
The human dopamine transporter gene SLC6A3 is involved in substance use disorders (SUDs) among many other common neuropsychiatric illnesses but allelic association results including those with its classic genetic markers 3'VNTR or Int8VNTR remain mixed and unexplainable. To better understand the genetics for reproducible association signals, we report the presence of recombination hotspots based on sequencing of the entire 5' promoter regions in two small SUDs cohorts, 30 African Americans (AAs) and 30 European Americans (EAs). Recombination rate was the highest near the transcription start site (TSS) in both cohorts. In addition, each cohort carried 57 different promoter haplotypes out of 60 and no haplotypes were shared between the two ethnicities. A quarter of the haplotypes evolved in an ethnicity-specific manner. Finally, analysis of five hundred subjects of European ancestry, from the 1000 Genome Project, confirmed the promoter recombination hotspots and also revealed several additional ones in non-coding regions only. These findings provide an explanation for the mixed results as well as guidance for selection of effective markers to be used in next generation association validation (NGAV), facilitating the delineation of pathogenic variation in this critical neuropsychiatric gene.
Collapse
|
10
|
Mocci E, Debeljak M, Klein AP, Eshleman JR. A New Fast Phasing Method Based On Haplotype Subtraction. J Mol Diagn 2019; 21:427-436. [PMID: 30872187 PMCID: PMC6504677 DOI: 10.1016/j.jmoldx.2018.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 10/26/2018] [Accepted: 12/31/2018] [Indexed: 11/16/2022] Open
Abstract
We developed a novel phasing approach, based solely on molecules and genotype frequency, that does not rely on inference of new alleles. We initiated the project because of errors that were detected in the phased 1000 Genomes Project data. The algorithm first combined identical genotypes into clusters and ranked them by descending frequency. Using alleles defined in homozygotes, it combined them to produce expected genotypes that were dismissed and subtracted them from remaining genotypes to define additional new putative alleles. Putative alleles had to be confirmed by identifying them in independent genotypes, and the process was iterated until all alleles were identified. The new approach was validated using single-molecule sequencing of eight loci, 145 (8 to 35 per locus) alleles were identified, and an average 98.2% (range, 95.0% to 99.9%) of 1000 genome individuals at these loci were explained. The accuracy of the new method was compared with that from PHASE and SHAPEIT2 to the experimentally determined genotypes based on single-molecule sequencing. Our method was comparable to PHASE and SHAPEIT2 in accuracy but was, on average, 14.6- and 10.8-fold faster, respectively.
Collapse
Affiliation(s)
- Evelina Mocci
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center
| | - Marija Debeljak
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Alison P Klein
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center; Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - James R Eshleman
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center; Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, Maryland.
| |
Collapse
|
11
|
Dimou NL, Pantavou KG, Braliou GG, Bagos PG. Multivariate Methods for Meta-Analysis of Genetic Association Studies. Methods Mol Biol 2019; 1793:157-182. [PMID: 29876897 DOI: 10.1007/978-1-4939-7868-7_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
Collapse
Affiliation(s)
- Niki L Dimou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece.,Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
| | - Katerina G Pantavou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Georgia G Braliou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece.
| |
Collapse
|
12
|
A meta-analysis of associations of LEPR Q223R and K109R polymorphisms with Type 2 diabetes risk. PLoS One 2018; 13:e0189366. [PMID: 29293570 PMCID: PMC5749718 DOI: 10.1371/journal.pone.0189366] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/26/2017] [Indexed: 12/17/2022] Open
Abstract
Background Leptin receptor (LEPR) plays a pivotal role in the control of body weight, energy metabolism, and insulin sensitivity. Various genetic association studies were performed to evaluate associations of LEPR genetic variants with type 2 diabetes (T2D) susceptibility. Methods A comprehensive search was conducted to identify all eligible case-control studies for examining the associations of LEPR single nucleotide polymorphisms (SNPs) Q223R (rs1137101) and K109R (rs1137100) with T2D risk. Odds ratios (OR) and corresponding 95% confidence intervals (CIs) were used to measure the magnitudes of association. Results For Q223R, 13 studies (11 articles) consisting of a total of 4030 cases and 2844 controls, and for K109R 7 studies (7 articles) consisting of 3319 cases and 2465 controls were available. Under an allele model, Q223R was not significantly associated with T2D risk (OR = 1.09, 95% CI: 0.80–1.48, P-value = 0.5989), which was consistent with results obtained under four genotypic models (ranges: ORs 1.08–1.20, 95% CIs: 0.58–2.02 to 0.64–2.26; P-values, 0.3650–0.8177, which all exceeded multiplicity-adjusted α = 0.05/5 = 0.01). In addition, no significant association was found between K109R and T2D risk based on either an allele model (OR = 0.93, 95% CI: 0.85–1.03, P-value = 0.1868) or four genotypic models (ranges: ORs 0.81–0.99, 95% CIs: 0.67–0.86 to 0.97–1.26, P-values, 0.0207–0.8804 which all exceeded multiplicity-adjusted α of 0.01). The magnitudes of association for these two SNPs were not dramatically changed in subgroup analyses by ethnicity or sensitivity analyses. Funnel plot inspections as well as Begg and Mazumdar adjusted rank correlation test and Egger linear regression test did not reveal significant publication biases in main and subgroup analyses. Bioinformatics analysis predicted that both missense SNPs were functionally neutral and benign. Conclusions The present meta-analysis did not detect significant genetic associations between LEPR Q223R and K109R polymorphisms and T2D risk.
Collapse
|
13
|
Amara A, Mrad M, Sayeh A, Haggui A, Lahideb D, Fekih-Mrissa N, Haouala H, Nsiri B. Association of FV G1691A Polymorphism but not A4070G With Coronary Artery Disease. Clin Appl Thromb Hemost 2017; 24:330-337. [PMID: 29179580 PMCID: PMC6714679 DOI: 10.1177/1076029617744320] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Coronary artery disease (CAD) is one of the chief causes of death in the world. Several
hypotheses have been promoted as for the origin of the disease, among which are genetic
predispositions and/or environmental factors. The aim of this study was to determine the
effect of factor V (FV) gene polymorphisms (Leiden, G1691A [FVL] and HR2 A4070G) and to
analyze their association with traditional risk factors in assessing the risk of CAD. Our
study population included 200 Tunisian patients with symptomatic CAD and a control group
of 300 participants matched for age and sex. All participants were genotyped for the FVL
and HR2 polymorphisms. Multivariate logistic regression was applied to analyze independent
factors associated with the risk of CAD. Our analysis showed that the FVL A allele
frequency (P < 10–3, odds ratio [OR] = 2.81, 95% confidence
interval [CI] = 1.6-4.9) and GA genotype (P < 10–3, OR =
4.03, 95% CI = 2.1-7.6) are significantly more prevalent among patients with CAD compared
to those controls and may be predisposing to CAD. We further found that the FVL mutation
is an independent risk factor whose effect is not modified by other factors (smoking,
diabetes, hypertension, dyslipidemia, and a family history of CAD) in increasing the risk
of the disease. However, analysis of FV HR2 variation does not show any statistically
significant association with CAD. The FVL polymorphism may be an independent risk factor
for CAD. However, further investigations on these polymorphisms and their possible
synergisms with traditional risk factors for CAD could help to ascertain better
predictability for CAD susceptibility.
Collapse
Affiliation(s)
- Ahmed Amara
- 1 Hôpital Militaire de Tunis, Service d'Hématologie, Laboratoire de Biologie Moléculaire, Montfleury, Tunisie.,2 Université Tunis el Manar, Faculté des Sciences de Tunis, Tunisie
| | - Meriem Mrad
- 1 Hôpital Militaire de Tunis, Service d'Hématologie, Laboratoire de Biologie Moléculaire, Montfleury, Tunisie.,2 Université Tunis el Manar, Faculté des Sciences de Tunis, Tunisie
| | - Aicha Sayeh
- 1 Hôpital Militaire de Tunis, Service d'Hématologie, Laboratoire de Biologie Moléculaire, Montfleury, Tunisie.,2 Université Tunis el Manar, Faculté des Sciences de Tunis, Tunisie
| | - Abdeddayem Haggui
- 3 Hôpital Militaire de Tunis, Service de Cardiologie, Montfleury, Tunisie.,4 Université de Tunis El Manar, Faculté de Médecine de Tunis, Tunisie
| | - Dhaker Lahideb
- 3 Hôpital Militaire de Tunis, Service de Cardiologie, Montfleury, Tunisie.,4 Université de Tunis El Manar, Faculté de Médecine de Tunis, Tunisie
| | - Najiba Fekih-Mrissa
- 1 Hôpital Militaire de Tunis, Service d'Hématologie, Laboratoire de Biologie Moléculaire, Montfleury, Tunisie.,5 Académie Militaire Fondouk Jédid, Nabeul, Tunisie
| | - Habib Haouala
- 3 Hôpital Militaire de Tunis, Service de Cardiologie, Montfleury, Tunisie.,4 Université de Tunis El Manar, Faculté de Médecine de Tunis, Tunisie
| | - Brahim Nsiri
- 1 Hôpital Militaire de Tunis, Service d'Hématologie, Laboratoire de Biologie Moléculaire, Montfleury, Tunisie.,6 Université de Monastir, Faculté de Pharmacie, Monastir, Tunisie
| |
Collapse
|
14
|
Louzoun Y, Alter I, Gragert L, Albrecht M, Maiers M. Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection. Immunogenetics 2017; 70:279-292. [DOI: 10.1007/s00251-017-1040-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/23/2017] [Indexed: 11/24/2022]
|
15
|
Song J, Yang Y, Mauvais-Jarvis F, Wang YP, Niu T. KCNJ11, ABCC8 and TCF7L2 polymorphisms and the response to sulfonylurea treatment in patients with type 2 diabetes: a bioinformatics assessment. BMC MEDICAL GENETICS 2017; 18:64. [PMID: 28587604 PMCID: PMC5461698 DOI: 10.1186/s12881-017-0422-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Accepted: 05/11/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND Type 2 diabetes (T2D) is a worldwide epidemic with considerable health and economic consequences. Sulfonylureas are widely used drugs for the treatment of patients with T2D. KCNJ11 and ABCC8 encode the Kir6.2 (pore-forming subunit) and SUR1 (regulatory subunit that binds to sulfonylurea) of pancreatic β cell KATP channel respectively with a critical role in insulin secretion and glucose homeostasis. TCF7L2 encodes a transcription factor expressed in pancreatic β cells that regulates insulin production and processing. Because mutations of these genes could affect insulin secretion stimulated by sulfonylureas, the aim of this study is to assess associations between molecular variants of KCNJ11, ABCC8 and TCF7L2 genes and response to sulfonylurea treatment and to predict their potential functional effects. METHODS Based on a comprehensive literature search, we found 13 pharmacogenetic studies showing that single nucleotide polymorphisms (SNPs) located in KCNJ11: rs5219 (E23K), ABCC8: rs757110 (A1369S), rs1799854 (intron 15, exon 16 -3C/T), rs1799859 (R1273R), and TCF7L2: rs7903146 (intron 4) were significantly associated with responses to sulfonylureas. For in silico bioinformatics analysis, SIFT, PolyPhen-2, PANTHER, MutPred, and SNPs3D were applied for functional predictions of 36 coding (KCNJ11: 10, ABCC8: 24, and TCF7L2: 2; all are missense), and HaploReg v4.1, RegulomeDB, and Ensembl's VEP were used to predict functions of 7 non-coding (KCNJ11: 1, ABCC8: 1, and TCF7L2: 5) SNPs, respectively. RESULTS Based on various in silico tools, 8 KCNJ11 missense SNPs, 23 ABCC8 missense SNPs, and 2 TCF7L2 missense SNPs could affect protein functions. Of them, previous studies showed that mutant alleles of 4 KCNJ11 missense SNPs and 5 ABCC8 missense SNPs can be successfully rescued by sulfonylurea treatments. Further, 3 TCF7L2 non-coding SNPs (rs7903146, rs11196205 and rs12255372), can change motif(s) based on HaploReg v4.1 and are predicted as risk factors by Ensembl's VEP. CONCLUSIONS Our study indicates that a personalized medicine approach by tailoring sulfonylurea therapy of T2D patients according to their genotypes of KCNJ11, ABCC8, and TCF7L2 could attain an optimal treatment efficacy.
Collapse
Affiliation(s)
- Jingwen Song
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA 70112 USA
| | - Yunzhong Yang
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA 70112 USA
| | - Franck Mauvais-Jarvis
- Division of Endocrinology and Metabolism, Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA 70112 USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University School of Science and Engineering, New Orleans, LA 70118 USA
| | - Tianhua Niu
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA 70112 USA
| |
Collapse
|
16
|
Walsh SJ. IL-10 Gene Polymorphisms and Self-Medication With Over-the-Counter Nonsteroidal Anti-Inflammatory Drugs. Biol Res Nurs 2017; 19:329-338. [DOI: 10.1177/1099800417690253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background: Genetic influences on self-medication with over-the-counter (OTC) drugs merit investigation. For example, patients frequently use OTC nonsteroidal anti-inflammatory drugs (NSAIDs) to treat inflammation, but the inflammatory response is also affected by endogenous cytokines whose production varies across polymorphisms of their encoding genes. In the case of interleukin 10 (IL-10), literature suggests that a single nucleotide polymorphism (SNP) in the promoter region of the cytokine’s gene (-1082 A > G [rs1800896]) influences production with higher levels associated with G variant alleles. Objective: To demonstrate the feasibility of researching the role of genetics in self-medication by using existing national survey data to evaluate a possible association between OTC NSAID use and genotype at the -1082 SNP of the IL-10 gene. Methods: Statistical analyses were performed using data from 6,309 participants in the Third National Health and Nutrition Examination Survey (NHANES III). Results: OTC NSAID use (aspirin or ibuprofen) in the previous month was significantly more common among persons with AG or GG genotypes at the -1082 SNP. Odds of use consistently increased relative to the number of G alleles. This trend remained statistically significant (odds ratio = 1.14 per additional G allele, p = .02, 95% confidence interval [1.02, 1.27]) after adjustment for confounding. Conclusions: Analysis of population-based genetic data suggests an association between a common self-medication behavior and a specific genetic polymorphism. These findings broadly demonstrate that NHANES data provide opportunities to investigate such associations and specifically imply that potential interrelationships among OTC NSAID use, IL-10 genotype, and IL-10 cytokine levels deserve further study.
Collapse
|
17
|
Liao B, Wang X, Zhu W, Li X, Cai L, Chen H. New multilocus linkage disequilibrium measure for tag SNP selection. J Bioinform Comput Biol 2017; 15:1750001. [DOI: 10.1142/s0219720017500019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Numerous approaches have been proposed for selecting an optimal tag single-nucleotide polymorphism (SNP) set. Most of these approaches are based on linkage disequilibrium (LD). Classical LD measures, such as D′ and r2, are frequently used to quantify the relationship between two marker (pairwise) linkage disequilibria. Despite of their successful use in many applications, these measures cannot be used to measure the LD between multiple-marker. These LD measures need information about the frequencies of alleles collected from haplotype dataset. In this study, a cluster algorithm is proposed to cluster SNPs according to multilocus LD measure which is based on information theory. After that, tag SNPs are selected in each cluster optimized by the number of tag SNPs, prediction accuracy and so on. The experimental results show that this new LD measure can be directly applied to genotype dataset collected from the HapMap project, so that it saves the cost of haplotyping. More importantly, the proposed method significantly improves the efficiency and prediction accuracy of tag SNP selection.
Collapse
Affiliation(s)
- Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiangjun Wang
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiong Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Haowen Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
18
|
Luo G, Zhou Y, Yi W, Yi H. Expression levels of JNK associated with polymorphic lactotransferrin haplotypes in human nasopharyngeal carcinoma. Oncol Lett 2016; 12:1085-1094. [PMID: 27446399 DOI: 10.3892/ol.2016.4723] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 02/12/2016] [Indexed: 12/19/2022] Open
Abstract
Lactotransferrin (LTF), a member of the transferrin family, serves a role in the innate immune response and is involved in anti-inflammatory, anti-microbial and anti-tumor activity. Alterations in the LTF gene are associated with an increased incidence of cancer. The LTF gene is polymorphic, and several common alleles may be observed in the general population. Our previous study identified a lower rate of occurrence of the 'A-G-G-T' haplotype (constructed with rs1126477, rs1126478, rs2073495 and rs9110) in nasopharyngeal carcinoma (NPC) patients compared with controls. In the present study, in order to elucidate a possible mechanism of LTF-mediated anti-tumor activity in NPC, the protein profiles of NPC and non-tumorous nasopharyngeal epithelium tissues with/without the 'A-G-G-T' haplotype were constructed using LTQ Orbitrap technology. The results revealed that c-Jun N-terminal kinase 2 (JNK2) was highly expressed in NPC tissues and non-tumor nasopharyngeal epithelium tissues without the 'A-G-G-T' haplotype. These results were confirmed by western blot analysis. Furthermore, microRNA (miRNA) microarray analysis was conducted to investigate the differential miRNA profiles of NPC and non-tumor nasopharyngeal epithelium tissues with/without the 'A-G-G-T' haplotype. It was observed that hsa-miR-1256 and hsa-miR-659, which are potentially targeted to the JNK2 gene, were downregulated in NPC tissues without the 'A-G-G-T' haplotype. Hsa-miR-298, another miRNA potentially targeted to the JNK2 gene, was downregulated in non-tumor nasopharyngeal epithelium tissues without the 'A-G-G-T' haplotype. In summary, these results suggested that the expression levels of JNK2 may be associated with polymorphic LTF haplotypes in human NPC.
Collapse
Affiliation(s)
- Gengqiu Luo
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, P.R. China
| | - Yanhong Zhou
- Molecular Genetics Laboratory, Cancer Research Institute, Central South University, Changsha, Hunan 410078, P.R. China
| | - Wei Yi
- Molecular Genetics Laboratory, Cancer Research Institute, Central South University, Changsha, Hunan 410078, P.R. China
| | - Hong Yi
- Research Center of Carcinogenesis and Targeted Therapy, Xiangya Hospital, Central South University, Changsha, Hunan 410008, P.R. China
| |
Collapse
|
19
|
Cao CC, Sun X. Ehapp2: Estimate haplotype frequencies from pooled sequencing data with prior database information. J Bioinform Comput Biol 2016; 14:1650017. [PMID: 27216711 DOI: 10.1142/s0219720016500177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
To reduce the cost of large-scale re-sequencing, multiple individuals are pooled together and sequenced called pooled sequencing. Pooled sequencing could provide a cost-effective alternative to sequencing individuals separately. To facilitate the application of pooled sequencing in haplotype-based diseases association analysis, the critical procedure is to accurately estimate haplotype frequencies from pooled samples. Here we present Ehapp2 for estimating haplotype frequencies from pooled sequencing data by utilizing a database which provides prior information of known haplotypes. We first translate the problem of estimating frequency for each haplotype into finding a sparse solution for a system of linear equations, where the NNREG algorithm is employed to achieve the solution. Simulation experiments reveal that Ehapp2 is robust to sequencing errors and able to estimate the frequencies of haplotypes with less than 3% average relative difference for pooled sequencing of mixture of real Drosophila haplotypes with 50× total coverage even when the sequencing error rate is as high as 0.05. Owing to the strategy that proportions for local haplotypes spanning multiple SNPs are accurately calculated first, Ehapp2 retains excellent estimation for recombinant haplotypes resulting from chromosomal crossover. Comparisons with present methods reveal that Ehapp2 is state-of-the-art for many sequencing study designs and more suitable for current massive parallel sequencing.
Collapse
Affiliation(s)
- Chang-Chang Cao
- 1 State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - Xiao Sun
- 1 State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| |
Collapse
|
20
|
|
21
|
Rhee JK, Li H, Joung JG, Hwang KB, Zhang BT, Shin SY. Survey of computational haplotype determination methods for single individual. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0342-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
22
|
Comparison of high-resolution human leukocyte antigen haplotype frequencies in different ethnic groups: Consequences of sampling fluctuation and haplotype frequency distribution tail truncation. Hum Immunol 2015; 76:374-80. [PMID: 25637668 DOI: 10.1016/j.humimm.2015.01.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 01/16/2015] [Accepted: 01/21/2015] [Indexed: 11/23/2022]
Abstract
High-resolution haplotype frequency estimations and descriptive metrics are becoming increasingly popular for accurately describing human leukocyte antigen diversity. In this study, we compared sample sets of publically available haplotype frequencies from different populations to characterize the consequences of unequal sample size on haplotype frequency estimation. We found that for low samples sizes (a few thousand), haplotype frequencies were overestimated, affecting all descriptive metrics of the underlying distribution, such as most frequent haplotype, the number of haplotypes, and the mean/median frequency. This overestimation was a result of random sample fluctuation and truncation of the tail end of the frequency distribution that comprises the least frequent haplotypes. Finally, we simulated balanced datasets through resampling and contrasted the disparities of descriptive metrics among equal and unequal datasets. This simulation resulted in the global description of the most frequent human leukocyte antigen haplotypes worldwide.
Collapse
|
23
|
Wu Y, Fan H, Wang Y, Zhang L, Gao X, Chen Y, Li J, Ren H, Gao H. Genome-wide association studies using haplotypes and individual SNPs in Simmental cattle. PLoS One 2014; 9:e109330. [PMID: 25330174 PMCID: PMC4203724 DOI: 10.1371/journal.pone.0109330] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 09/10/2014] [Indexed: 01/05/2023] Open
Abstract
Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components.
Collapse
Affiliation(s)
- Yang Wu
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Huizhong Fan
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Yanhui Wang
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Lupei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Xue Gao
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Yan Chen
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
| | - HongYan Ren
- Department of life sciences, National Natural Science Foundation of China, Beijing, China
- * E-mail: (HG); (HR)
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, China
- * E-mail: (HG); (HR)
| |
Collapse
|
24
|
Cao CC, Sun X. Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics 2014; 31:515-22. [PMID: 25304780 DOI: 10.1093/bioinformatics/btu670] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION A variety of hypotheses have been proposed for finding the missing heritability of complex diseases in genome-wide association studies. Studies have focused on the value of haplotype to improve the power of detecting associations with disease. To facilitate haplotype-based association analysis, it is necessary to accurately estimate haplotype frequencies of pooled samples. RESULTS Taking advantage of databases that contain prior haplotypes, we present Ehapp based on the algorithm for solving the system of linear equations to estimate the frequencies of haplotypes from pooled sequencing data. Effects of various factors in sequencing on the performance are evaluated using simulated data. Our method could estimate the frequencies of haplotypes with only about 3% average relative difference for pooled sequencing of the mixture of 10 haplotypes with total coverage of 50×. When unknown haplotypes exist, our method maintains excellent performance for haplotypes with actual frequencies >0.05. Comparisons with present method on simulated data in conjunction with publicly available Illumina sequencing data indicate that our method is state of the art for many sequencing study designs. We also demonstrate the feasibility of applying overlapping pool sequencing to identify rare haplotype carriers cost-effectively. AVAILABILITY AND IMPLEMENTATION Ehapp (in Perl) for the Linux platforms is available online (http://bioinfo.seu.edu.cn/Ehapp/). CONTACT xsun@seu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang-Chang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
25
|
Yun Y, Ma C, Ma X. The SNP rs1883832 in CD40 gene and risk of atherosclerosis in Chinese population: a meta-analysis. PLoS One 2014; 9:e97289. [PMID: 24828072 PMCID: PMC4020827 DOI: 10.1371/journal.pone.0097289] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 04/18/2014] [Indexed: 01/11/2023] Open
Abstract
Background The complications of atherosclerosis such as coronary and cerebrovascular disease, are the most prevalent causes of mortality and morbidity worldwide. A single nucleotide polymorphism (SNP) rs1883832 (-1C/T) in CD40 gene has been recently suggested to contribute to the susceptibility to atherosclerosis in Chinese population; however, previous genetic association studies yielded inconsistent results. Methods A meta-analysis of eligible studies reporting the association between rs1883832 and atherosclerosis in Chinese population was carried out. Results Pooling 7 eligible case-control studies involving 2129 patients and 1895 controls demonstrated a significant association between rs1883832 and atherosclerosis under dominant model [odds ratio (OR) = 1.631, 95% confidence interval [CI] [1.176, 2.260] in Chinese population with evident heterogeneity. Meta-regression analysis indicated that the heterogeneity could be completely explained by disease category. In subgroup analysis, rs1883832 conferred ORs of 2.866 (C/C versus T/T, 95%CI [2.203, 3.729]) and 1.680 (C/T versus T/T, 95%CI [1.352, 2.086]) for coronary artery disease (CAD) under co-dominant model without heterogeneity. Similar results were obtained for acute coronary syndrome (ACS) (C/C versus T/T, 3.674, 95%CI [2.638, 5.116]; C/T versus T/T, 1.981, 95%CI [1.483, 2.646]). The other genetic models including dominant, recessive and additive models, yielded consistent results without heterogeneity for CAD and ACS, respectively. However, a protective role was found for C allele in ischemic stroke (IS) under recessive model (0.582, 95%CI [0.393, 0.864]) and additive model (0.785, 95%CI [0.679, 0.909]) with reduced heterogeneity. Conclusions This meta-analysis provided evidence of association of rs1883832 C allele with an overall increased risk of atherosclerosis but distinct effect of C allele on CAD (including ACS) and IS in Chinese population, respectively.
Collapse
Affiliation(s)
- Yan Yun
- School of Medicine, Shandong University, Ji’nan, Shandong, People’s Republic of China
| | - Chi Ma
- School of Medicine, Shandong University, Ji’nan, Shandong, People’s Republic of China
| | - XiaoChun Ma
- School of Medicine, Shandong University, Ji’nan, Shandong, People’s Republic of China
- * E-mail:
| |
Collapse
|
26
|
Yang T, Deng HW, Niu T. Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences. BMC Bioinformatics 2014; 15:3. [PMID: 24387001 PMCID: PMC3890628 DOI: 10.1186/1471-2105-15-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 12/30/2013] [Indexed: 12/04/2022] Open
Abstract
Background Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging. Results We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data. Conclusions While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.
Collapse
Affiliation(s)
| | | | - Tianhua Niu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, Suite 2001, New Orleans, LA 70112, USA.
| |
Collapse
|
27
|
Zhou Y, Michelhaugh SK, Schmidt CJ, Liu JS, Bannon MJ, Lin Z. Ventral midbrain correlation between genetic variation and expression of the dopamine transporter gene in cocaine-abusing versus non-abusing subjects. Addict Biol 2014; 19:122-31. [PMID: 22026501 DOI: 10.1111/j.1369-1600.2011.00391.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Altered activity of the human dopamine transporter gene (hDAT) is associated with several common and severe brain disorders, including cocaine abuse. However, there is little a priori information on whether such alterations are due to nature (genetic variation) or nurture (human behaviors such as cocaine abuse). This study investigated the correlation between seven markers throughout hDAT and its mRNA levels in postmortem ventral midbrain tissues from 18 cocaine abusers and 18 strictly matched drug-free controls in the African-American population. Here, we show that one major haplotype with the same frequency in cocaine abusers versus drug-free controls displays a 37.1% reduction of expression levels in cocaine abusers compared with matched controls (P=0.0057). The most studied genetic marker, variable number tandem repeats (VNTR) located in Exon 15 (3'VNTR), is not correlated with hDAT mRNA levels. A 5' upstream VNTR (rs70957367) has repeat numbers that are positively correlated with expression levels in controls (r(2)=0.9536, P=0.0235), but this positive correlation disappears in cocaine abusers. The findings suggest that varying hDAT activity is attributable to both genetics and cocaine abuse.
Collapse
Affiliation(s)
- Yanhong Zhou
- Department of Psychiatry, Harvard Medical School and Division of Alcohol and Drug Abuse, McLean Hospital, Belmont, MA, USA Department of Pharmacology, Wayne State University School of Medicine, Detroit, MI, USA Department of Pathology, Wayne State University School of Medicine, Detroit, MI, USA Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | | | | | | | | |
Collapse
|
28
|
Gourraud PA, Balère ML, Faucher C, Loiseau P, Dormoy A, Marry E, Garnier F. HLA phenotypes of candidates for HSCT: comparing transplanted versus non-transplanted candidates, resulting in the predictive estimation of the probability to find a 10/10 HLA matched donor. ACTA ACUST UNITED AC 2013; 83:17-26. [DOI: 10.1111/tan.12263] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 10/21/2013] [Accepted: 11/09/2013] [Indexed: 11/29/2022]
Affiliation(s)
- P. A. Gourraud
- Agence de la Biomédecine; Registre France Greffe de Moelle; Paris France
| | - M. L. Balère
- Agence de la Biomédecine; Registre France Greffe de Moelle; Paris France
| | - C. Faucher
- Agence de la Biomédecine; Registre France Greffe de Moelle; Paris France
| | - P. Loiseau
- Laboratoire d'Histocompatibilité et d'immunologie; Hopital Saint Louis; Paris France
| | - A. Dormoy
- Laboratoire d'Immunogénétique; Etablissement Français de Sang Bourgogne-Franche-Comté; Besançon France
| | - E. Marry
- Agence de la Biomédecine; Registre France Greffe de Moelle; Paris France
| | - F. Garnier
- Agence de la Biomédecine; Registre France Greffe de Moelle; Paris France
| |
Collapse
|
29
|
Rao X, De Boer RJ, van Baarle D, Maiers M, Kesmir C. Complementarity of Binding Motifs is a General Property of HLA-A and HLA-B Molecules and Does Not Seem to Effect HLA Haplotype Composition. Front Immunol 2013; 4:374. [PMID: 24294213 PMCID: PMC3827838 DOI: 10.3389/fimmu.2013.00374] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 10/31/2013] [Indexed: 11/13/2022] Open
Abstract
Different human leukocyte antigen (HLA) haplotypes (i.e., the specific combinations of HLA-A, -B, -DR alleles inherited together from one parent) are observed in different frequencies in human populations. Some haplotypes, like HLA-A1-B8, are very frequent, reaching up to 10% in the Caucasian population, while others are very rare. Numerous studies have identified associations between HLA haplotypes and diseases, and differences in haplotype frequencies can in part be explained by these associations: the stronger the association with a severe (autoimmune) disease, the lower the expected HLA haplotype frequency. The peptide repertoires of the HLA molecules composing a haplotype can also influence the frequency of a haplotype. For example, it would seem advantageous to have HLA molecules with non-overlapping binding specificities within a haplotype, as individuals expressing such an haplotype would present a diverse set of peptides from viruses and pathogenic bacteria on the cell surface. To test this hypothesis, we collect the proteome data from a set of common viruses, and estimate the total ligand repertoire of HLA class I haplotypes (HLA-A-B) using in silico predictions. We compare the size of these repertoires to the HLA haplotype frequencies reported in the National Marrow Donor Program (NMDP). We find that in most HLA-A and HLA-B pairs have fairly distinct binding motifs, and that the observed haplotypes do not contain HLA-A and -B molecules with more distinct binding motifs than random HLA-A and HLA-B pairs. In addition, the population frequency of a haplotype is not correlated to the distinctness of its HLA-A and HLA-B peptide binding motifs. These results suggest that there is a not a strong selection pressure on the haplotype level favoring haplotypes having HLA molecules with distinct binding motifs, which would result the largest possible presented peptide repertoires in the context of infectious diseases.
Collapse
Affiliation(s)
- Xiangyu Rao
- Theoretical Biology and Bioinformatics, Utrecht University , Utrecht , Netherlands
| | | | | | | | | |
Collapse
|
30
|
Kuk AYC, Li X, Xu J. An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data. BMC Genet 2013; 14:82. [PMID: 24034507 PMCID: PMC3847674 DOI: 10.1186/1471-2156-14-82] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 08/28/2013] [Indexed: 12/19/2022] Open
Abstract
Background Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (EM) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the EM algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. Results We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to “zero” or “at least one”, which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the EM estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the EM-ATCDL estimates outperform the EM estimates based on other lists as well as the collapsed data maximum likelihood estimates. Conclusions The proposed augmented and trimmed CD list is a useful list for the EM algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting EM-ATCDL estimates are more efficient than the EM estimates based on other lists.
Collapse
Affiliation(s)
- Anthony Y C Kuk
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546, Singapore.
| | | | | |
Collapse
|
31
|
Abstract
Haplotypes contain genealogical information and play a prominent part in population genetic and evolutionary studies. However, haplotype inference is a complex statistical problem, showing considerable internal algorithm variability and among-algorithm discordance. Thus, haplotypes inferred by statistical algorithms often contain hidden uncertainties, which may complicate and even mislead downstream analysis. Consensus strategy is one of the effective means to increase the confidence of inferred haplotypes. Here, we present a consensus tool, the CVhaplot package, to automate consensus techniques for haplotype inference. It generates consensus haplotypes from inferrals of competing algorithms to increase the confidence of haplotype inference results, while improving the performance of individual algorithms by considering their internal variability. It can effectively identify uncertain haplotypes potentially associated with inference errors. In addition, this tool allows file format conversion for several popular algorithms and extends the applicability of some algorithms to complex data containing triallelic polymorphic sites. CVhaplot is written in PERL and freely available at http://www.ioz.ac.cn/department/agripest/group/zhangdx/CVhaplot.htm.
Collapse
Affiliation(s)
- Zu-Shi Huang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China Center for Computational and Evolutionary Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
32
|
Askar M, Daghstani J, Thomas D, Leahy N, Dunn P, Claas F, Doran S, Saji H, Kanangat S, Karoichane M, Tambur A, Monos D, El-Khalifa M, Turner V, Kamoun M, Mustafa M, Ramon D, Gandhi M, Vernaza A, Gorodezky C, Wagenknecht D, Gautreaux M, Hajeer A, Kashi Z, Fernandez-Vina M. 16(th) IHIW: global distribution of extended HLA haplotypes. Int J Immunogenet 2013; 40:31-8. [PMID: 23302097 DOI: 10.1111/iji.12029] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Revised: 11/09/2012] [Accepted: 11/12/2012] [Indexed: 01/02/2023]
Abstract
This report describes the project to identify the global distribution of extended HLA haplotypes, a component of 16th International HLA and Immunogenetics Workshop (IHIW), and summarizes the initial analyses of data collected. The project aims to investigate extended HLA haplotypes, compare their distribution among different populations, assess their frequency in hematopoietic stem cell unrelated donor registries and initiate an international family studies database and DNA repository to be made publicly available. HLA haplotypes compiled in immunogenetics laboratories during the evaluation of transplant candidates and related potential donors were analysed. Haplotypes were determined using the pedigree analysis tool publicly available from the National Marrow Donor Program (NMDP) website. Nineteen laboratories from 10 countries (11 laboratories from North America, five from Asia, two from Latin America and one from Australia) contributed data on a total of 1719 families comprised of 7474 individuals. We identified 10393 HLA haplotypes, of which 1682 haplotypes included high-resolution typing at HLA-A, B, C, DRB1 and DQB1 loci. We also present haplotypes containing MICA and other HLA loci and haplotypes containing rare alleles seen in these families. The project will be extended through the 17th IHIW, and investigators interested in joining the project may communicate with the first author.
Collapse
Affiliation(s)
- M Askar
- Allogen Laboratories, Cleveland Clinic, Cleveland, OH 44195, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Kessner D, Turner TL, Novembre J. Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol Biol Evol 2013; 30:1145-58. [PMID: 23364324 PMCID: PMC3670732 DOI: 10.1093/molbev/mst016] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
DNA samples are often pooled, either by experimental design or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g., bacterial species comprising a microbiome or pathogen strains in a blood sample). We present an expectation–maximization algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different species within a metagenomics sample. Our method outperforms existing methods based on single-site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool.
Collapse
Affiliation(s)
- Darren Kessner
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, USA
| | | | | |
Collapse
|
34
|
Sabaa H, Cai Z, Wang Y, Goebel R, Moore S, Lin G. Whole genome identity-by-descent determination. J Bioinform Comput Biol 2013; 11:1350002. [PMID: 23600820 DOI: 10.1142/s0219720013500029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
High-throughput single nucleotide polymorphism genotyping assays conveniently produce genotype data for genome-wide genetic linkage and association studies. For pedigree datasets, the unphased genotype data is used to infer the haplotypes for individuals, according to Mendelian inheritance rules. Linkage studies can then locate putative chromosomal regions based on the haplotype allele sharing among the pedigree members and their disease status. Most existing haplotyping programs require rather strict pedigree structures and return a single inferred solution for downstream analysis. In this research, we relax the pedigree structure to contain ungenotyped founders and present a cubic time whole genome haplotyping algorithm to minimize the number of zero-recombination haplotype blocks. With or without explicitly enumerating all the haplotyping solutions, the algorithm determines all distinct haplotype allele identity-by-descent (IBD) sharings among the pedigree members, in linear time in the total number of haplotyping solutions. Our algorithm is implemented as a computer program iBDD. Extensive simulation experiments using 2 sets of 16 pedigree structures from previous studies showed that, in general, there are trillions of haplotyping solutions, but only up to a few thousand distinct haplotype allele IBD sharings. iBDD is able to return all these sharings for downstream genome-wide linkage and association studies.
Collapse
Affiliation(s)
- Hadi Sabaa
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| | | | | | | | | | | |
Collapse
|
35
|
Kuk AY, Li X, Xu J. A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants. Stat Med 2012; 32:1343-60. [DOI: 10.1002/sim.5540] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 06/11/2012] [Indexed: 12/31/2022]
Affiliation(s)
- Anthony Y.C. Kuk
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| | - Xiang Li
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| | - Jinfeng Xu
- Department of Statistics and Applied Probability; National University of Singapore; Singapore; Singapore
| |
Collapse
|
36
|
Gourraud PA, Gilson L, Girard M, Peschanski M. The role of human leukocyte antigen matching in the development of multiethnic "haplobank" of induced pluripotent stem cell lines. Stem Cells 2012; 30:180-6. [PMID: 22045598 DOI: 10.1002/stem.772] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Among the tools of regenerative medicine, induced pluripotent stem cells (iPSCs) are interesting because the donor genotype can be selected. The construction of banks of iPSC cell lines selected from human leukocyte antigen (HLA) homozygous donors has been proposed to be an effective way to match a maximal number of patients receiving cell therapy from iPSC lines. However, what effort would be required to constitute such a bank for a worldwide application has remained unexplored. We developed a probabilistic model to compute the number of donors to screen for constituting banks of best-chosen iPSC lines with homozygous HLA haplotypes (haplobanks) in four ancestry backgrounds. We estimated what percentage of the patients would be provided with single HLA haplotype matched cell lines. Genetic diversity leads to different outcomes for the four sets in all terms. A bank comprising iPSC lines representing the 20 most frequent haplotypes in each population would request quite different number of donors to screen, between 26,000 for European Americans and 110,000 for African Americans. It would also match different fractions of the recipient population, namely, more than 50% of the European Americans and 22% of African Americans. Conversely, a bank comprising the 100 iPSC lines with the most frequent HLA in each population would leave out only 22% of the European Americans, but 37% of the Asians, 48% of the Hispanics, and 55% of the African Americans. The constitution of a haplobank of iPSC lines is achievable through a large-scale concerted worldwide collaboration.
Collapse
Affiliation(s)
- Pierre-Antoine Gourraud
- Department of Neurology, University of California San Francisco School of Medicine, San Francisco, California, USA.
| | | | | | | |
Collapse
|
37
|
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet 2012; 8:e1002453. [PMID: 22291602 PMCID: PMC3266881 DOI: 10.1371/journal.pgen.1002453] [Citation(s) in RCA: 706] [Impact Index Per Article: 58.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 11/21/2011] [Indexed: 12/12/2022] Open
Abstract
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.
Collapse
Affiliation(s)
- Daniel John Lawson
- Department of Mathematics, University of Bristol, Bristol, United Kingdom
| | | | - Simon Myers
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Daniel Falush
- Environmental Research Institute, University College Cork, Cork, Ireland
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
38
|
Mack SJ, Gourraud PA, Single RM, Thomson G, Hollenbach JA. Analytical methods for immunogenetic population data. Methods Mol Biol 2012; 882:215-44. [PMID: 22665237 PMCID: PMC4209087 DOI: 10.1007/978-1-61779-842-9_13] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this chapter, we describe analyses commonly applied to immunogenetic population data, along with software tools that are currently available to perform those analyses. Where possible, we focus on tools that have been developed specifically for the analysis of highly polymorphic immunogenetic data. These analytical methods serve both as a means to examine the appropriateness of a dataset for testing a specific hypothesis, as well as a means of testing hypotheses. Rather than treat this chapter as a protocol for analyzing any population dataset, each researcher and analyst should first consider their data, the possible analyses, and any available tools in light of the hypothesis being tested. The extent to which the data and analyses are appropriate to each other should be determined before any analyses are performed.
Collapse
Affiliation(s)
- Steven J Mack
- Center for Genetics, Children's Hospital and Research Center Oakland, Oakland, CA, USA.
| | | | | | | | | |
Collapse
|
39
|
Abstract
The information carried by combination of alleles on the same chromosome, called haplotypes, is of crucial interest in several fields of modern genetics as population genetics or association studies. However, this information is usually lost by sequencing and needs, therefore, to be recovered by inference. In this chapter, we give a brief overview on the methods able to tackle this problem and some practical concerns to apply them on real data.
Collapse
|
40
|
Feng R, Wu Y, Jang GH, Ordovas JM, Arnett D. A powerful test of parent-of-origin effects for quantitative traits using haplotypes. PLoS One 2011; 6:e28909. [PMID: 22174922 PMCID: PMC3236760 DOI: 10.1371/journal.pone.0028909] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Accepted: 11/17/2011] [Indexed: 01/08/2023] Open
Abstract
Imprinting is an epigenetic phenomenon where the same alleles have unequal transcriptions and thus contribute differently to a trait depending on their parent of origin. This mechanism has been found to affect a variety of human disorders. Although various methods for testing parent-of-origin effects have been proposed in linkage analysis settings, only a few are available for association analysis and they are usually restricted to small families and particular study designs. In this study, we develop a powerful maximum likelihood test to evaluate the parent-of-origin effects of SNPs on quantitative phenotypes in general family studies. Our method incorporates haplotype distribution to take advantage of inter-marker LD information in genome-wide association studies (GWAS). Our method also accommodates missing genotypes that often occur in genetic studies. Our simulation studies with various minor allele frequencies, LD structures, family sizes, and missing schemes have uniformly shown that using the new method significantly improves the power of detecting imprinted genes compared with the method using the SNP at the testing locus only. Our simulations suggest that the most efficient strategy to investigate parent-of-origin effects is to recruit one parent and as many offspring as possible under practical constraints. As a demonstration, we applied our method to a dataset from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) to test the parent-of-origin effects of the SNPs within the PPARGC1A, MTP and FABP2 genes on diabetes-related phenotypes, and found that several SNPs in the MTP gene show parent-of-origin effects on insulin and glucose levels.
Collapse
Affiliation(s)
- Rui Feng
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
| | | | | | | | | |
Collapse
|
41
|
Cao L, Zhou Y, Li X, Yi H. The relationship of haplotype in lactotransferrin and its expression levels in Chinese Han ovarian cancer. Acta Biochim Biophys Sin (Shanghai) 2011; 43:884-90. [PMID: 21937479 DOI: 10.1093/abbs/gmr089] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Chromosomal DNA sequence polymorphisms may contribute to individuality, confer risk for diseases, and most commonly are used as genetic markers in association study. The iron-binding protein lactoferrin inhibits bacterial growth by sequestering essential iron and also exhibits antitumor, anti-inflammatory, and immunoregulatory activities. The gene coding for lactotransferrin (LTF) is polymorphic, with the occurrence of several common alleles in the general population. This genetically determined variation can affect LTF functions. In this study, we determined the distribution of LTF gene polymorphisms (rs1126477, rs1126478, rs2073495, and rs9110) in the Chinese Han population and investigated whether these polymorphisms were associated with increased risk of ovarian carcinoma in the Chinese. It was found that the rs1126477 was correlated significantly with ovarian cancer. The frequency of A allele of rs1126477 was significantly higher in 700 ovarian cancer patients compared with that in the control group of 700 cases (P< 0.01, χ(2)= 6.79). The frequency of AA genotype was significantly higher in ovarian cancer patients compared with that in the control group (P< 0.05, χ(2)= 6.49). AA genotype is the risk factor of ovarian cancer. The odds ratio (OR) was 2.24 and the 95% confidence interval (CI) was 1.08-4.59, respectively. The 'A-G-C-C' haplotype constructed with rs1126477, rs1126478, rs2073495, and rs9110 was the risk factor to be ovarian cancer. The expression of LTF gene was lower in individuals with 'A-G-C-C' haplotype compared with that in individuals without 'A-G-C-C' haplotype. These findings suggested that rs1126477 could play important roles in ovarian carcinoma physiological processes in the Chinese.
Collapse
Affiliation(s)
- Lanqin Cao
- Department of Gynecology and Obstetrics, Xiangya Hosptial, Central South University, Changsha 410078, China
| | | | | | | |
Collapse
|
42
|
Zhou Y, Wang W, Zheng D, Peng S, Xiong W, Ma J, Zeng Z, Wu M, Zhou M, Xiang J, Xiang B, Li X, Li X, Li G. Risk of nasopharyngeal carcinoma associated with polymorphic lactotransferrin haplotypes. Med Oncol 2011; 29:1456-62. [DOI: 10.1007/s12032-011-0079-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 09/25/2011] [Indexed: 12/29/2022]
|
43
|
Abstract
The focus of this review is software for the genotyping of microarray single nucleotide polymorphisms, in particular software for Affymetrix and Illumina arrays. Different statistical principles and ideas have been applied to the construction of genotyping algorithms -- for example, likelihood versus Bayesian modelling, and whether to genotype one or all arrays at a time. The release of new arrays is generally followed by new, or updated, algorithms.
Collapse
|
44
|
Roach J, Glusman G, Hubley R, Montsaroff S, Holloway A, Mauldin D, Srivastava D, Garg V, Pollard K, Galas D, Hood L, Smit A. Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet 2011; 89:382-97. [PMID: 21855840 DOI: 10.1016/j.ajhg.2011.07.023] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2011] [Revised: 07/23/2011] [Accepted: 07/30/2011] [Indexed: 01/06/2023] Open
Abstract
Assignment of alleles to haplotypes for nearly all the variants on all chromosomes can be performed by genetic analysis of a nuclear family with three or more children. Whole-genome sequence data enable deterministic phasing of nearly all sequenced alleles by permitting assignment of recombinations to precise chromosomal positions and specific meioses. We demonstrate this process of genetic phasing on two families each with four children. We generate haplotypes for all of the children and their parents; these haplotypes span all genotyped positions, including rare variants. Misassignments of phase between variants (switch errors) are nearly absent. Our algorithm can also produce multimegabase haplotypes for nuclear families with just two children and can handle families with missing individuals. We implement our algorithm in a suite of software scripts (Haploscribe). Haplotypes and family genome sequences will become increasingly important for personalized medicine and for fundamental biology.
Collapse
|
45
|
Does high-resolution donor typing of HLA-C or other loci upon registration confer advantages to patients? Hum Immunol 2011; 72:1033-8. [PMID: 21871938 DOI: 10.1016/j.humimm.2011.08.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2010] [Revised: 08/01/2011] [Accepted: 08/04/2011] [Indexed: 11/20/2022]
Abstract
Our study compared all requests for confirmatory typing (CT requests) received in our center between May 2007 and December 2009 (n = 134) for donors issued from 3 groups defined by different human leukocyte antigen (HLA) loci typed at different levels of resolution. We observed a significant advantage for volunteers when HLA-C 2-digit typing was available or with HLA-A, -B, -C, -DRB1 4-digit typing compared with generic HLA-A, -B, -DRB1, -DQB1 DNA typing: increased percentage of CT requests (p < 0.001), increased rate of donor selection for donation (p < 0.001), and decreased time frame for donor search (p = 0.025). The time frame for a successful search (donation) is similar among the 3 groups, indicating that the search might be concluded more rapidly when the pathology is clinically active or when the patient is at a high risk of relapse (76% of our cases) or for pediatric patients (24% of our cases), regardless of HLA typing resolution. Improvement of HLA typing for volunteers could be a great advantage for first selection in the absence of emergency or high-risk disease. Knowledge of HLA-C should be used to prioritize the selection of donors for further testing and could allow a better donor selection process, reducing search duration and increasing efficiency. In most cases, 2-digit typing for HLA-C associated with specific tools to estimate the probability of finding a matched donor could be sufficient.
Collapse
|
46
|
Powell JE, Kranis A, Floyd J, Dekkers JCM, Knott S, Haley CS. Optimal use of regression models in genome-wide association studies. Anim Genet 2011; 43:133-43. [PMID: 22404349 DOI: 10.1111/j.1365-2052.2011.02234.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The performance of linear regression models in genome-wide association studies is influenced by how marker information is parameterized in the model. Considering the impact of parameterization is especially important when using information from multiple markers to test for association. Properties of the population, such as linkage disequilibrium (LD) and allele frequencies, will also affect the ability of a model to provide statistical support for an underlying quantitative trait locus (QTL). Thus, for a given location in the genome, the relationship between population properties and model parameterization is expected to influence the performance of the model in providing evidence for the position of a QTL. As LD and allele frequencies vary throughout the genome and between populations, understanding the relationship between these properties and model parameterization is of considerable importance in order to make optimal use of available genomic data. Here, we evaluate the performance of regression-based association models using genotype and haplotype information across the full spectrum of allele frequency and LD scenarios. Genetic marker data from 200 broiler chickens were used to simulate genomic conditions by selecting individual markers to act as surrogate QTL (sQTL) and then investigating the ability of surrounding markers to estimate sQTL genotypes and provide statistical support for their location. The LD and allele frequencies of markers and sQTL are shown to have a strong effect on the performance of models relative to one another. Our results provide an indication of the best choice of model parameterization given certain scenarios of marker and QTL LD and allele frequencies. We demonstrate a clear advantage of haplotype-based models, which account for phase uncertainty over other models tested, particularly for QTL with low minor allele frequencies. We show that the greatest advantage of haplotype models over single-marker models occurs when LD between markers and the causal locus is low. Under these situations, haplotype models have a greater accuracy of predicting the location of the QTL than other models tested.
Collapse
Affiliation(s)
- J E Powell
- Department of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Roslin, UK.
| | | | | | | | | | | |
Collapse
|
47
|
Inferring haplotypes of copy number variations from high-throughput data with uncertainty. G3-GENES GENOMES GENETICS 2011; 1:35-42. [PMID: 22384316 PMCID: PMC3276117 DOI: 10.1534/g3.111.000174] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/14/2011] [Indexed: 11/18/2022]
Abstract
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
Collapse
|
48
|
Aragonès G, Guardiola M, Barreda M, Marsillach J, Beltrán-Debón R, Rull A, Mackness B, Mackness M, Joven J, Simó JM, Camps J. Measurement of serum PON-3 concentration: method evaluation, reference values, and influence of genotypes in a population-based study. J Lipid Res 2011; 52:1055-61. [PMID: 21335322 DOI: 10.1194/jlr.d014134] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Experimental studies showed that paraoxonase-3 (PON3) retards lipoprotein oxidation. Our objective was to describe a new assay to measure serum PON3 concentrations and report their reference values in a population-based study. The influence of PON3 promoter polymorphisms and their relationships with PON1 and lipid profile were also studied. We generated an anti-PON3 antibody by inoculating rabbits with a synthetic peptide specific to mature PON3. This antibody was used to develop an ELISA. The average regression line of standard plots (n = 8) was y = 0.9587 (0.3392) log(10)x + 1.9466 (0.0861) [r(2) = 0.924 (0.0131); P < 0.001]. There was no cross reaction with PON1. Detection limit was 0.24 mg/l. Imprecision was ≤ 13.2%. Reference interval (n = 356) was 1.00-2.47 mg/l. PON3 was observed in HDL particles containing apolipoprotein (apo)A-I and PON1, but not apoA-II or apoE. Serum PON3 concentrations showed a moderate influence (about 10% variation) by PON3 promoter polymorphisms. Our study describes for the first time a method to measure serum PON3 concentrations. This method offers new opportunities in the investigation of the properties and role of PON3 in cardiovascular disease, with possible implications in clinical practice.
Collapse
Affiliation(s)
- Gerard Aragonès
- Centre de Recerca Biomèdica, Hospital Universitari de Sant Joan, Institut d'Investigació Sanitària Pere Virgili, Universitat Rovira i Virgili, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Bagos PG. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 2011; 12:8. [PMID: 21247440 PMCID: PMC3087509 DOI: 10.1186/1471-2156-12-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Accepted: 01/19/2011] [Indexed: 01/05/2023] Open
Abstract
Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Lamia, Greece.
| |
Collapse
|
50
|
Abstract
In this chapter, mutation (specifically single-nucleotide polymorphisms, SNPs) and recombination will be covered in more detail, and the concepts of genotype and haplotype will be reviewed. Linkage disequilibrium (LD) describes the strength of a relationship between alleles at different loci. The definition for LD, its visual representation, and the calculation of statistics that measure LD will be presented. The power of genetic association studies to identify disease susceptibility alleles fundamentally relies on the genetic variants studied. A standard approach is to determine a set of tagging-SNPs (tSNPs) that capture the majority of genomic variation in regions of interest by exploiting local correlation structures. The concept of LD and how it is used to select tSNPs will be addressed, as well as specific procedures and algorithms that are practiced by researchers to determine these variants.
Collapse
Affiliation(s)
- Karen Curtin
- Genetic Epidemiology Division, Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | |
Collapse
|