1
|
Magalhães Borges V, Horimoto ARVR, Wijsman EM, Kimura L, Nunes K, Nato AQ, Mingroni-Netto RC. Genomic Exploration of Essential Hypertension in African-Brazilian Quilombo Populations: A Comprehensive Approach with Pedigree Analysis and Family-Based Association Studies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.26.24309531. [PMID: 38978678 PMCID: PMC11230341 DOI: 10.1101/2024.06.26.24309531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Essential Hypertension (EH) is a major global health concern, causing about 9.4 million deaths annually. Its prevalence varies across different regions, affecting 17% of the population in the Americas, 19.2% in the Western Pacific, 23.2% in Europe, 25.1% in Southeast Asia, 26.3% in the Eastern Mediterranean, and 27.2% in Africa. EH is a multifactorial disease influenced by both genetic and environmental factors. While genetic factors contribute 30-60% to blood pressure variation, the genetic complexity of EH remains largely unexplained due to limited knowledge of candidate genes and population-specific differences. Various methods, including candidate gene studies, genome-wide linkage analysis (GWLA), and genome-wide association studies (GWAS), have been employed to identify genetic factors, yet much of the heritability of EH is still unknown. This study aimed to investigate the genetic basis of EH by mapping regions of interest (ROIs) and identifying candidate genes and variants influencing EH in African-derived individuals from partially isolated populations of quilombo remnants in Vale do Ribeira, São Paulo, Brazil. Samples from 431 individuals (167 affected, 261 unaffected, 3 with unknown phenotype) from eight quilombo remnant populations were genotyped using a 650k SNP array. The global ancestry proportions were estimated at 47% African, 36% European, and 16% Native American. Genealogical information from 673 individuals was used to construct six pedigrees comprising 1104 individuals. The mapping strategy consisted of a multi-level computational approach. We constructed pedigrees based on interviews and kinship coefficient, pruned the dataset to obtain three non-overlapping markers subpanels, phased the haplotype and performed local ancestry to account for admixture. We performed GWLA and dense linkage analyses using markers subpanels and performed fine-mapping using family-based association studies (FBAS) based on population and pedigree imputed data, investigating EH-related genes and variants. The linkage analysis identified 22 ROIs with LOD scores 1.45-3.03, containing markers co-segregating with the phenotype. These ROIs encompassed 2363 genes. Fine-mapping identified 60 EH-related candidate genes and 118 suggestive or significant variants (FBAS). Among these, 14 genes, including PHGDH, S100A10, MFN2, and RYR2, were highlighted with strong evidence of association with hypertension. These genes, harboring 29 SNPs, were implicated in regulating blood pressure, sodium and potassium levels, and the aldosterone pathway. This study revealed, through a complementary approach - combining admixture-adjusted genome-wide linkage analysis based on Markov chain Monte Carlo (MCMC) methods, association studies on imputed data, and in silico investigations - genetic regions, variants and candidate genes that shed light on the genetic basis of essential hypertension, with significant potential to explain the genetic etiology in quilombo remnant populations.
Collapse
Affiliation(s)
- Vinícius Magalhães Borges
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Andrea R V R Horimoto
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105 USA
| | - Ellen Marie Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105 USA
| | - Lilian Kimura
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| | - Kelly Nunes
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| | - Alejandro Q Nato
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Regina Célia Mingroni-Netto
- Centro de Estudos sobre o Genoma Humano e Células Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| |
Collapse
|
2
|
Qiao Y, Jewett EM, McManus KF, Freyman WA, Curran JE, Williams-Blangero S, Blangero J, Williams AL. Reconstructing parent genomes using siblings and other relatives. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593578. [PMID: 38798596 PMCID: PMC11118276 DOI: 10.1101/2024.05.10.593578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Reconstructing the DNA of ancestors from their descendants has the potential to empower phenotypic analyses (including association and genetic nurture studies), improve pedigree reconstruction, and shed light on the ancestral population and phenotypes of ancestors. We developed HAPI-RECAP, a method that reconstructs the DNA of parents from full siblings and their relatives. This tool leverages HAPI2's output, a new phasing approach that applies to siblings (and optionally one or both parents) and reliably infers parent haplotypes but does not link the ungenotyped parents' DNA across chromosomes or between segments flanking ambiguities. By combining IBD between the reconstructed parents and the relatives, HAPI-RECAP resolves the source parent of these segments. Moreover, the method exploits crossovers the children inherited and sex-specific genetic maps to infer the reconstructed parents' sexes. We validated these methods on research participants from both 23andMe, Inc. and the San Antonio Mexican American Family Studies. Given data for one parent, HAPI2 reconstructs large fractions of the missing parent's DNA, between 77.6% and 99.97% among all families, and 90.3% on average in three- and four-child families. When reconstructing both parents, HAPI-RECAP inferred between 33.2% and 96.6% of the parents' genotypes, averaging 70.6% in four-child families. Reconstructed genotypes have average error rates < 10-3, or comparable to those from direct genotyping. HAPI-RECAP inferred the parent sexes 100% correctly given IBD-linked segments and can also reconstruct parents without any IBD. As datasets grow in size, more families will be implicitly collected; HAPI-RECAP holds promise to enable high quality parent genotype reconstruction.
Collapse
Affiliation(s)
- Ying Qiao
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | - Joanne E. Curran
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Sarah Williams-Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | | | - Amy L. Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- 23andMe, Inc., Sunnyvale, CA 94086, USA
| |
Collapse
|
3
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
4
|
Mdyogolo S, MacNeil MD, Neser FWC, Scholtz MM, Makgahlela ML. Assessing accuracy of genotype imputation in the Afrikaner and Brahman cattle breeds of South Africa. Trop Anim Health Prod 2022; 54:90. [PMID: 35133512 DOI: 10.1007/s11250-022-03102-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/01/2022] [Indexed: 11/26/2022]
Abstract
Imputation may be used to rescue genomic data from animals that would otherwise be eliminated due to a lower than desired call rate. The aim of this study was to compare the accuracy of genotype imputation for Afrikaner, Brahman, and Brangus cattle of South Africa using within- and multiple-breed reference populations. A total of 373, 309, and 101 Afrikaner, Brahman, and Brangus cattle, respectively, were genotyped using the GeneSeek Genomic Profiler 150 K panel that contained 141,746 markers. Markers with MAF ≤ 0.02 and call rates ≤ 0.95 or that deviated from Hardy Weinberg Equilibrium frequency with a probability of ≤ 0.0001 were excluded from the data as were animals with a call rate ≤ 0.90. The remaining data included 99,086 SNPs and 360 Afrikaner, 75,291 SNPs and 288 animals Brahman, and 97,897 SNPs and 99 Brangus animals. A total of 7986, 7002, and 7000 SNP from 50 Afrikaner and Brahman and 30 Brangus cattle, respectively, were masked and then imputed using BEAGLE v3 and FImpute v2. The within-breed imputation yielded accuracies ranging from 89.9 to 96.6% for the three breeds. The multiple-breed imputation yielded corresponding accuracies from 69.21 to 88.35%. The results showed that population homogeneity and numerical representation for within and across breed strategies, respectively, are crucial components for improving imputation accuracies.
Collapse
Affiliation(s)
- S Mdyogolo
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa.
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa.
| | - M D MacNeil
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
- Delta G, Miles City, MT, USA
| | - F W C Neser
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| | - M M Scholtz
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| | - M L Makgahlela
- Department of Animal Breeding and Genetics, Agricultural Research Council, Irene, South Africa
- Department of Animal, Wildlife and Grassland Sciences, University of the Free State, Bloemfontein, South Africa
| |
Collapse
|
5
|
Lee D, Kim Y, Chung Y, Lee D, Seo D, Choi TJ, Lim D, Yoon D, Lee SH. Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2021; 63:1232-1246. [PMID: 34957440 PMCID: PMC8672260 DOI: 10.5187/jast.2021.e117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/20/2022]
Abstract
Recently, the cattle genome sequence has been completed, followed by developing a
commercial single nucleotide polymorphism (SNP) chip panel in the animal genome
industry. In order to increase statistical power for detecting quantitative
trait locus (QTL), a number of animals should be genotyped. However, a
high-density chip for many animals would be increasing the genotyping cost.
Therefore, statistical inference of genotype imputation (low-density chip to
high-density) will be useful in the animal industry. The purpose of this study
is to investigate the effect of the reference population size and marker density
on the imputation accuracy and to suggest the appropriate number of reference
population sets for the imputation in Hanwoo cattle. A total of 3,821 Hanwoo
cattle were divided into reference and validation populations. The reference
sets consisted of 50k (38,916) marker data and different population sizes (500,
1,000, 1,500, 2,000, and 3,600). The validation sets consisted of four
validation sets (Total 889) and the different marker density (5k [5,000], 10k
[10,000], and 15k [15,000]). The accuracy of imputation was calculated by direct
comparison of the true genotype and the imputed genotype. In conclusion, when
the lowest marker density (5k) was used in the validation set, according to the
reference population size, the imputation accuracy was 0.793 to 0.929. On the
other hand, when the highest marker density (15k), according to the reference
population size, the imputation accuracy was 0.904 to 0.967. Moreover, the
reference population size should be more than 1,000 to obtain at least 88%
imputation accuracy in Hanwoo cattle.
Collapse
Affiliation(s)
- DooHo Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yeongkuk Kim
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Yoonji Chung
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongjae Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Dongwon Seo
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| | - Tae Jeong Choi
- National Institute of Animal Science, Cheonan 31000, Korea
| | - Dajeong Lim
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Wanju 55365, Korea
| | - Duhak Yoon
- Department of Animal Science & Biotechnology, Kyungpook National University, Sangju 37224, Korea
| | - Seung Hwan Lee
- Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
6
|
Whole-genome sequencing identifies functional noncoding variation in SEMA3C that cosegregates with dyslexia in a multigenerational family. Hum Genet 2021; 140:1183-1200. [PMID: 34076780 PMCID: PMC8263547 DOI: 10.1007/s00439-021-02289-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 04/27/2021] [Indexed: 12/11/2022]
Abstract
Dyslexia is a common heritable developmental disorder involving impaired reading abilities. Its genetic underpinnings are thought to be complex and heterogeneous, involving common and rare genetic variation. Multigenerational families segregating apparent monogenic forms of language-related disorders can provide useful entrypoints into biological pathways. In the present study, we performed a genome-wide linkage scan in a three-generational family in which dyslexia affects 14 of its 30 members and seems to be transmitted with an autosomal dominant pattern of inheritance. We identified a locus on chromosome 7q21.11 which cosegregated with dyslexia status, with the exception of two cases of phenocopy (LOD = 2.83). Whole-genome sequencing of key individuals enabled the assessment of coding and noncoding variation in the family. Two rare single-nucleotide variants (rs144517871 and rs143835534) within the first intron of the SEMA3C gene cosegregated with the 7q21.11 risk haplotype. In silico characterization of these two variants predicted effects on gene regulation, which we functionally validated for rs144517871 in human cell lines using luciferase reporter assays. SEMA3C encodes a secreted protein that acts as a guidance cue in several processes, including cortical neuronal migration and cellular polarization. We hypothesize that these intronic variants could have a cis-regulatory effect on SEMA3C expression, making a contribution to dyslexia susceptibility in this family.
Collapse
|
7
|
Abstract
Genotype imputation infers missing genotypes in silico using haplotype information from reference samples with genotypes from denser genotyping arrays or sequencing. This approach can confer a number of improvements on genome-wide association studies: it can improve statistical power to detect associations by reducing the number of missing genotypes; it can simplify data harmonization for meta-analyses by improving overlap of genomic variants between differently-genotyped sample sets; and it can increase the overall number and density of genomic variants available for association testing. This article reviews the general concepts behind imputation, describes imputation approaches and methods for various types of genotype data, including family-based data, and identifies web-based resources that can be used in different steps of the imputation process. For practical application, it provides a step-by-step guide to implementation of a two-step imputation process consisting of phasing of the study genotypes and the imputation of reference panel genotypes into the study haplotypes. In addition, this review describes recently developed haplotype reference panel resources and online imputation servers that are capable of remotely and securely implementing an imputation workflow on uploaded genotype array data. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Adam C Naj
- Department of Biostatistics, Epidemiology, and Informatics and Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
8
|
Sul JH, Service SK, Huang AY, Ramensky V, Hwang SG, Teshiba TM, Park Y, Ori APS, Zhang Z, Mullins N, Olde Loohuis LM, Fears SC, Araya C, Araya X, Spesny M, Bejarano J, Ramirez M, Castrillón G, Gomez-Makhinson J, Lopez MC, Montoya G, Montoya CP, Aldana I, Escobar JI, Ospina-Duque J, Kremeyer B, Bedoya G, Ruiz-Linares A, Cantor RM, Molina J, Coppola G, Ophoff RA, Macaya G, Lopez-Jaramillo C, Reus V, Bearden CE, Sabatti C, Freimer NB. Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates. Transl Psychiatry 2020; 10:74. [PMID: 32094344 PMCID: PMC7039961 DOI: 10.1038/s41398-020-0758-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 09/24/2019] [Accepted: 11/04/2019] [Indexed: 12/13/2022] Open
Abstract
Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 838 individuals and high-coverage whole-genome sequencing of 449 individuals. We compared polygenic risk scores (PRS), estimated using the latest BP1 genome-wide association study (GWAS) summary statistics, between BP1 individuals and related controls. We also evaluated whether BP1 individuals had a higher burden of rare deleterious single-nucleotide variants (SNVs) and rare copy number variants (CNVs) in a set of genes related to BP1. We found that compared with unaffected relatives, BP1 individuals had higher PRS estimated from BP1 GWAS statistics (P = 0.001 ~ 0.007) and displayed modest increase in burdens of rare deleterious SNVs (P = 0.047) and rare CNVs (P = 0.002 ~ 0.033) in genes related to BP1. We did not observe rare variants segregating in the pedigrees. These results suggest that small-to-moderate effect rare and common variants are more likely to contribute to BP1 risk in these extended pedigrees than a few large-effect rare variants.
Collapse
Affiliation(s)
- Jae Hoon Sul
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Susan K. Service
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA
| | - Alden Y. Huang
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Vasily Ramensky
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA ,Federal State Institution “National Medical Research Center for Preventive Medicine” of the Ministry of Healthcare of the Russian Federation. Petroverigskiy lane 10, Moscow, 101990 Russia
| | - Sun-Goo Hwang
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Terri M. Teshiba
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA
| | - YoungJun Park
- grid.19006.3e0000 0000 9632 6718Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Anil P. S. Ori
- grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA
| | - Zhongyang Zhang
- grid.59734.3c0000 0001 0670 2351Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Niamh Mullins
- grid.13097.3c0000 0001 2322 6764King’s College London, Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, De Crespigny Park, Denmark Hill, London, SE5 8AF UK ,grid.59734.3c0000 0001 0670 2351Pamela Sklar Division of Psychiatric Genomics, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Loes M. Olde Loohuis
- grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA
| | - Scott C. Fears
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Carmen Araya
- grid.412889.e0000 0004 1937 0706Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, 11501 Costa Rica
| | - Xinia Araya
- grid.412889.e0000 0004 1937 0706Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, 11501 Costa Rica
| | - Mitzi Spesny
- Division of Pediatric Pulmonology, Hospital Nacional de Nin ~os, San Jose, Costa Rica
| | - Julio Bejarano
- grid.412889.e0000 0004 1937 0706Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, 11501 Costa Rica
| | - Margarita Ramirez
- grid.412889.e0000 0004 1937 0706Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, 11501 Costa Rica
| | - Gabriel Castrillón
- Instituto de Alta Tecnologia Medica, Medellín, Antioquia, Colombia ,grid.15474.330000 0004 0477 2438Department of Neuroradiology, Klinikum rechts der Isar, TUM, Munich, Germany
| | - Juliana Gomez-Makhinson
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia
| | - Maria C. Lopez
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia
| | - Gabriel Montoya
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia
| | - Claudia P. Montoya
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia
| | - Ileana Aldana
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Javier I. Escobar
- grid.430387.b0000 0004 1936 8796Department of Psychiatry and Family Medicine, Rutgers-Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 08901 USA
| | - Jorge Ospina-Duque
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia
| | - Barbara Kremeyer
- grid.83440.3b0000000121901201Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT UK
| | - Gabriel Bedoya
- grid.412881.60000 0000 8882 5269Laboratory of Molecular Genetics, Institute of Biology, University of Antioquia, Medellín, 050010 Colombia
| | - Andres Ruiz-Linares
- grid.8547.e0000 0001 0125 2443Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai, 200438 China ,grid.5399.60000 0001 2176 4817Aix Marseille Univ, CNRS, EFS, ADES, Marseille, France
| | - Rita M. Cantor
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095 USA
| | | | - Giovanni Coppola
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Roel A. Ophoff
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA ,grid.19006.3e0000 0000 9632 6718Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095 USA ,grid.7692.a0000000090126352Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, Netherlands
| | - Gabriel Macaya
- grid.412889.e0000 0004 1937 0706Cell and Molecular Biology Research Center, Universidad de Costa Rica, San Pedro de Montes de Oca, San José, 11501 Costa Rica
| | - Carlos Lopez-Jaramillo
- grid.412881.60000 0000 8882 5269Grupo de Investigación en Psiquiatría (Research Group in Psychiatry; GIPSI), Departamento de Psiquiatría Facultad de Medicina, Universidad de Antioquia, Medellín, 050011 Colombia ,Mood Disorders Program, Hospital San Vicente Fundacion, Medellín, 050011 Colombia
| | - Victor Reus
- grid.266102.10000 0001 2297 6811Department of Psychiatry and UCSF Weill Institute for Neurosciences, University of California, San Francisco, CA 94143 USA
| | - Carrie E. Bearden
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA ,grid.19006.3e0000 0000 9632 6718Department of Psychology, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Chiara Sabatti
- grid.168010.e0000000419368956Department of Health Research and Policy, Division of Biostatistics, Stanford University, Stanford, CA 94305 USA
| | - Nelson B. Freimer
- grid.19006.3e0000 0000 9632 6718Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095 USA ,grid.19006.3e0000 0000 9632 6718Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, CA USA ,grid.19006.3e0000 0000 9632 6718Department of Human Genetics, University of California Los Angeles, Los Angeles, CA 90095 USA
| |
Collapse
|
9
|
Abney M, ElSherbiny A. Kinpute: using identity by descent to improve genotype imputation. Bioinformatics 2019; 35:4321-4326. [PMID: 30918937 PMCID: PMC6821425 DOI: 10.1093/bioinformatics/btz221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 02/21/2019] [Accepted: 03/26/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information-due to recent, familial relatedness or distant, unknown ancestors-in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. RESULTS Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. AVAILABILITY AND IMPLEMENTATION Kinpute is an open-source and freely available C++ software package that can be downloaded from. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark Abney
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Aisha ElSherbiny
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
10
|
Naj AC, Lin H, Vardarajan BN, White S, Lancour D, Ma Y, Schmidt M, Sun F, Butkiewicz M, Bush WS, Kunkle BW, Malamon J, Amin N, Choi SH, Hamilton-Nelson KL, van der Lee SJ, Gupta N, Koboldt DC, Saad M, Wang B, Nato AQ, Sohi HK, Kuzma A, Wang LS, Cupples LA, van Duijn C, Seshadri S, Schellenberg GD, Boerwinkle E, Bis JC, Dupuis J, Salerno WJ, Wijsman EM, Martin ER, DeStefano AL. Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. Genomics 2019; 111:808-818. [PMID: 29857119 PMCID: PMC6397097 DOI: 10.1016/j.ygeno.2018.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/03/2018] [Accepted: 05/06/2018] [Indexed: 12/30/2022]
Abstract
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Collapse
Affiliation(s)
- Adam C Naj
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Honghuang Lin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Badri N Vardarajan
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Lancour
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Yiyi Ma
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Michael Schmidt
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Fangui Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - John Malamon
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Najaf Amin
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Seung Hoan Choi
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Namrata Gupta
- Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA
| | - Daniel C Koboldt
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Harkirat K Sohi
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - Cornelia van Duijn
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Sudha Seshadri
- The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Anita L DeStefano
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
11
|
Kunji K, Ullah E, Nato AQ, Wijsman EM, Saad M. GIGI-Quick: a fast approach to impute missing genotypes in genome-wide association family data. Bioinformatics 2019; 34:1591-1593. [PMID: 29267877 DOI: 10.1093/bioinformatics/btx782] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 12/15/2017] [Indexed: 11/12/2022] Open
Abstract
Summary Genome-wide association studies have become common over the last ten years, with a shift towards targeting rare variants, especially in pedigree-data. Despite lower costs, sequencing for rare variants still remains expensive. To have a relatively large sample with acceptable cost, imputation approaches may be used, such as GIGI for pedigree data. GIGI is an imputation method that handles large pedigrees and is particularly good for rare variant imputation. GIGI requires a subset of individuals in a pedigree to be fully sequenced, while other individuals are sequenced only at relevant markers. The imputation will infer the missing genotypes at untyped markers. Running GIGI on large pedigrees for large numbers of markers can be very time consuming. We present GIGI-Quick as a method to efficiently split GIGI's input, run GIGI in parallel and efficiently merge the output to reduce the runtime with the number of cores. This allows obtaining imputation results faster, and therefore all subsequent association analyses. Availability and and implementation GIGI-Quick is open source and publicly available via: https://cse-git.qcri.org/Imputation/GIGI-Quick. Contact msaad@hbku.edu.qa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khalid Kunji
- Data Analytics Department, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ehsan Ullah
- Data Analytics Department, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195-7720, USA
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195-7720, USA
| | - Mohamad Saad
- Data Analytics Department, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
12
|
Rediscovering the value of families for psychiatric genetics research. Mol Psychiatry 2019; 24:523-535. [PMID: 29955165 PMCID: PMC7028329 DOI: 10.1038/s41380-018-0073-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 01/11/2018] [Accepted: 03/26/2018] [Indexed: 01/09/2023]
Abstract
As it is likely that both common and rare genetic variation are important for complex disease risk, studies that examine the full range of the allelic frequency distribution should be utilized to dissect the genetic influences on mental illness. The rate limiting factor for inferring an association between a variant and a phenotype is inevitably the total number of copies of the minor allele captured in the studied sample. For rare variation, with minor allele frequencies of 0.5% or less, very large samples of unrelated individuals are necessary to unambiguously associate a locus with an illness. Unfortunately, such large samples are often cost prohibitive. However, by using alternative analytic strategies and studying related individuals, particularly those from large multiplex families, it is possible to reduce the required sample size while maintaining statistical power. We contend that using whole genome sequence (WGS) in extended pedigrees provides a cost-effective strategy for psychiatric gene mapping that complements common variant approaches and WGS in unrelated individuals. This was our impetus for forming the "Pedigree-Based Whole Genome Sequencing of Affective and Psychotic Disorders" consortium. In this review, we provide a rationale for the use of WGS with pedigrees in modern psychiatric genetics research. We begin with a focused review of the current literature, followed by a short history of family-based research in psychiatry. Next, we describe several advantages of pedigrees for WGS research, including power estimates, methods for studying the environment, and endophenotypes. We conclude with a brief description of our consortium and its goals.
Collapse
|
13
|
Revisit Population-based and Family-based Genotype Imputation. Sci Rep 2019; 9:1800. [PMID: 30755687 PMCID: PMC6372660 DOI: 10.1038/s41598-018-38469-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/27/2018] [Indexed: 11/12/2022] Open
Abstract
Genome-Wide Association (GWA) with population-based imputation (PBI) has been successful in identifying common variants associated with complex diseases; however, much heritability remains to be explained and low frequency variants (LFV) may contribute. To identify LFV, a study of unrelated individuals may no longer be as efficient as a family study, where rare population variants can be frequent in families. Family-based imputation (FBI) provides an opportunity to evaluate LFV. To compare the performance of PBI and FBI, we conducted extensive simulations, generating genotypes using SeqSIMLA from various reference panels for families. We masked genotype information for variants unavailable in Framingham 550 K GWA genotype data in less informative subjects selected by GIGI-Pick. We implemented IMPUTE2 with duoHMM in SHAPEIT (Impute2_duoHMM) for PBI, MERLIN and GIGI for FBI and PedBLIMP for a hybrid approach. In general, FBI in both MERLIN and GIGI outperformed other approaches with imputation accuracy greater than 0.99 for the squared correlation and imputation quality scores (IQS) especially for LFV, although imputation accuracy from MERLIN depends on pedigree splitting for larger families. PBI performed worst with the exception of good imputation accuracy for common variants when a closely ancestry matched reference is used. In summary, linkage disequilibrium (LD) information from large available genotype resources provides good imputation for common variants with well-selected reference panels without requiring densely sequenced data in family members, while imputation of LFV with FBI benefits more from information on inheritance patterns within families yielding better imputation.
Collapse
|
14
|
Whalen A, Ros-Freixedes R, Wilson DL, Gorjanc G, Hickey JM. Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees. Genet Sel Evol 2018; 50:67. [PMID: 30563452 PMCID: PMC6299538 DOI: 10.1186/s12711-018-0438-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 12/11/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In this paper, we extend multi-locus iterative peeling to provide a computationally efficient method for calling, phasing, and imputing sequence data of any coverage in small or large pedigrees. Our method, called hybrid peeling, uses multi-locus iterative peeling to estimate shared chromosome segments between parents and their offspring at a subset of loci, and then uses single-locus iterative peeling to aggregate genomic information across multiple generations at the remaining loci. RESULTS Using a synthetic dataset, we first analysed the performance of hybrid peeling for calling and phasing genotypes in disconnected families, which contained only a focal individual and its parents and grandparents. Second, we analysed the performance of hybrid peeling for calling and phasing genotypes in the context of a full general pedigree. Third, we analysed the performance of hybrid peeling for imputing whole-genome sequence data to non-sequenced individuals in the population. We found that hybrid peeling substantially increased the number of called and phased genotypes by leveraging sequence information on related individuals. The calling rate and accuracy increased when the full pedigree was used compared to a reduced pedigree of just parents and grandparents. Finally, hybrid peeling imputed accurately whole-genome sequence to non-sequenced individuals. CONCLUSIONS We believe that this algorithm will enable the generation of low cost and high accuracy whole-genome sequence data in many pedigreed populations. We make this algorithm available as a standalone program called AlphaPeel.
Collapse
Affiliation(s)
- Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - Roger Ros-Freixedes
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - David L. Wilson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| | - John M. Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, Scotland, UK
| |
Collapse
|
15
|
Nelson D, Moreau C, de Vriendt M, Zeng Y, Preuss C, Vézina H, Milot E, Andelfinger G, Labuda D, Gravel S. Inferring Transmission Histories of Rare Alleles in Population-Scale Genealogies. Am J Hum Genet 2018; 103:893-906. [PMID: 30526866 DOI: 10.1016/j.ajhg.2018.10.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 10/22/2018] [Indexed: 01/06/2023] Open
Abstract
Learning the transmission history of alleles through a family or population plays an important role in evolutionary, demographic, and medical genetic studies. Most classical models of population genetics have attempted to do so under the assumption that the genealogy of a population is unavailable and that its idiosyncrasies can be described by a small number of parameters describing population size and mate choice dynamics. Large genetic samples have increased sensitivity to such modeling assumptions, and large-scale genealogical datasets become a useful tool to investigate realistic genealogies. However, analyses in such large datasets are often intractable using conventional methods. We present an efficient method to infer transmission paths of rare alleles through population-scale genealogies. Based on backward-time Monte Carlo simulations of genetic inheritance, we use an importance sampling scheme to dramatically speed up convergence. The approach can take advantage of available genotypes of subsets of individuals in the genealogy including haplotype structure as well as information about the mode of inheritance and general prevalence of a mutation or disease in the population. Using a high-quality genealogical dataset of more than three million married individuals in the Quebec founder population, we apply the method to reconstruct the transmission history of chronic atrial and intestinal dysrhythmia (CAID), a rare recessive disease. We identify the most likely early carriers of the mutation and geographically map the expected carrier rate in the present-day French-Canadian population of Quebec.
Collapse
|
16
|
Ullah E, Mall R, Abbas MM, Kunji K, Nato AQ, Bensmail H, Wijsman EM, Saad M. Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. Genome Res 2018; 29:125-134. [PMID: 30514702 PMCID: PMC6314157 DOI: 10.1101/gr.236315.118] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 11/30/2018] [Indexed: 01/19/2023]
Abstract
Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.
Collapse
Affiliation(s)
- Ehsan Ullah
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Mostafa M Abbas
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-9460, USA.,Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, West Virginia 25755, USA
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington 98195-9460, USA.,Department of Biostatistics, University of Washington, Seattle, Washington 98195-9460, USA
| | - Mohamad Saad
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
17
|
Nafikov RA, Nato AQ, Sohi H, Wang B, Brown L, Horimoto AR, Vardarajan BN, Barral SM, Tosto G, Mayeux RP, Thornton TA, Blue E, Wijsman EM. Analysis of pedigree data in populations with multiple ancestries: Strategies for dealing with admixture in Caribbean Hispanic families from the ADSP. Genet Epidemiol 2018; 42:500-515. [PMID: 29862559 PMCID: PMC6160322 DOI: 10.1002/gepi.22133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 05/04/2018] [Accepted: 05/14/2018] [Indexed: 11/12/2022]
Abstract
Multipoint linkage analysis is an important approach for localizing disease-associated loci in pedigrees. Linkage analysis, however, is sensitive to misspecification of marker allele frequencies. Pedigrees from recently admixed populations are particularly susceptible to this problem because of the challenge of accurately accounting for population structure. Therefore, increasing emphasis on use of multiethnic samples in genetic studies requires reevaluation of best practices, given data currently available. Typical strategies have been to compute allele frequencies from the sample, or to use marker allele frequencies determined by admixture proportions averaged over the entire sample. However, admixture proportions vary among pedigrees and throughout the genome in a family-specific manner. Here, we evaluate several approaches to model admixture in linkage analysis, providing different levels of detail about ancestral origin. To perform our evaluations, for specification of marker allele frequencies, we used data on 67 Caribbean Hispanic admixed families from the Alzheimer's Disease Sequencing Project. Our results show that choice of admixture model has an effect on the linkage analysis results. Variant-specific admixture proportions, computed for individual families, provide the most detailed regional admixture estimates, and, as such, are the most appropriate allele frequencies for linkage analysis. This likely decreases the number of false-positive results, and is straightforward to implement.
Collapse
Affiliation(s)
- Rafael A Nafikov
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Harkirat Sohi
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, Washington
| | - Lisa Brown
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Andrea R Horimoto
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | | | - Sandra M Barral
- Department of Neurology, Columbia University, New York, Washington
| | - Giuseppe Tosto
- Department of Neurology, Columbia University, New York, Washington
| | - Richard P Mayeux
- Department of Neurology, Columbia University, New York, Washington
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Elizabeth Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington
| | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington.,Department of Biostatistics, University of Washington, Seattle, Washington
| |
Collapse
|
18
|
Zheng C, Boer MP, van Eeuwijk FA. Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence. Genetics 2018; 210:71-82. [PMID: 30045858 PMCID: PMC6116951 DOI: 10.1534/genetics.118.300885] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 07/21/2018] [Indexed: 11/18/2022] Open
Abstract
Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi- and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low ([Formula: see text]) sequencing depth, in addition to having accurate genotype phasing and error detection.
Collapse
Affiliation(s)
- Chaozhi Zheng
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Martin P Boer
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
19
|
Miranda AM, Herman M, Cheng R, Nahmani E, Barrett G, Micevska E, Fontaine G, Potier MC, Head E, Schmitt FA, Lott IT, Jiménez-Velázquez IZ, Antonarakis SE, Di Paolo G, Lee JH, Hussaini SA, Marquer C. Excess Synaptojanin 1 Contributes to Place Cell Dysfunction and Memory Deficits in the Aging Hippocampus in Three Types of Alzheimer's Disease. Cell Rep 2018; 23:2967-2975. [PMID: 29874583 PMCID: PMC6040810 DOI: 10.1016/j.celrep.2018.05.011] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 03/01/2018] [Accepted: 05/02/2018] [Indexed: 12/11/2022] Open
Abstract
The phosphoinositide phosphatase synaptojanin 1 (SYNJ1) is a key regulator of synaptic function. We first tested whether SYNJ1 contributes to phenotypic variations in familial Alzheimer's disease (FAD) and show that SYNJ1 polymorphisms are associated with age of onset in both early- and late-onset human FAD cohorts. We then interrogated whether SYNJ1 levels could directly affect memory. We show that increased SYNJ1 levels in autopsy brains from adults with Down syndrome (DS/AD) are inversely correlated with synaptophysin levels, a direct readout of synaptic integrity. We further report age-dependent cognitive decline in a mouse model overexpressing murine Synj1 to the levels observed in human sporadic AD, triggered through hippocampal hyperexcitability and defects in the spatial reproducibility of place fields. Taken together, our findings suggest that SYNJ1 contributes to memory deficits in the aging hippocampus in all forms of AD.
Collapse
Affiliation(s)
- Andre M Miranda
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA; Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, 4710-057 Braga, Portugal; ICVS/3B's, PT Government Associate Laboratory, 4806-909 Braga/Guimarães, Portugal
| | - Mathieu Herman
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Rong Cheng
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; G. H. Sergievsky Center, Columbia University Medical Center, New York, NY 10032, USA
| | - Eden Nahmani
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Geoffrey Barrett
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Elizabeta Micevska
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Gaelle Fontaine
- Sorbonne Universités, UPMC Univ Paris 06, Inserm U1127, CNRS UMR7225, ICM, 75013 Paris, France
| | - Marie-Claude Potier
- Sorbonne Universités, UPMC Univ Paris 06, Inserm U1127, CNRS UMR7225, ICM, 75013 Paris, France
| | - Elizabeth Head
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40536-0230, USA; Department of Pharmacology & Nutritional Sciences, University of Kentucky, Lexington, KY 40506, USA
| | - Frederick A Schmitt
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY 40536-0230, USA; Department of Neurology, University of Kentucky, Lexington, KY 40506, USA
| | - Ira T Lott
- Department of Physiology, University of Kentucky, Lexington, KY 40506, USA; Department of Pediatrics and Neurology, School of Medicine, University of California, Irvine (UCI), Orange, CA 92668, USA
| | | | - Stylianos E Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland
| | - Gilbert Di Paolo
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Joseph H Lee
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; G. H. Sergievsky Center, Columbia University Medical Center, New York, NY 10032, USA; Departments of Epidemiology and Neurology, Columbia University Medical Center, New York, NY 10032, USA
| | - S Abid Hussaini
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA
| | - Catherine Marquer
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032, USA.
| |
Collapse
|
20
|
Torkamaneh D, Boyle B, Belzile F. Efficient genome-wide genotyping strategies and data integration in crop plants. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:499-511. [PMID: 29352324 DOI: 10.1007/s00122-018-3056-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 01/12/2018] [Indexed: 05/21/2023]
Abstract
Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.
Collapse
Affiliation(s)
- Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec City, QC, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec City, QC, Canada.
| |
Collapse
|
21
|
Herzig AF, Nutile T, Babron MC, Ciullo M, Bellenguez C, Leutenegger AL. Strategies for phasing and imputation in a population isolate. Genet Epidemiol 2018; 42:201-213. [PMID: 29319195 DOI: 10.1002/gepi.22109] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/05/2022]
Abstract
In the search for genetic associations with complex traits, population isolates offer the advantage of reduced genetic and environmental heterogeneity. In addition, cost-efficient next-generation association approaches have been proposed in these populations where only a subsample of representative individuals is sequenced and then genotypes are imputed into the rest of the population. Gene mapping in such populations thus requires high-quality genetic imputation and preliminary phasing. To identify an effective study design, we compare by simulation a range of phasing and imputation software and strategies. We simulated 1,115,604 variants on chromosome 10 for 477 members of the large complex pedigree of Campora, a village within the established isolate of Cilento in southern Italy. We assessed the phasing performance of identical by descent based software ALPHAPHASE and SLRP, LD-based software SHAPEIT2, SHAPEIT3, and BEAGLE, and new software EAGLE that combines both methodologies. For imputation we compared IMPUTE2, IMPUTE4, MINIMAC3, BEAGLE, and new software PBWT. Genotyping errors and missing genotypes were simulated to observe their effects on the performance of each software. Highly accurate phased data were achieved by all software with SHAPEIT2, SHAPEIT3, and EAGLE2 providing the most accurate results. MINIMAC3, IMPUTE4, and IMPUTE2 all performed strongly as imputation software and our study highlights the considerable gain in imputation accuracy provided by a genome sequenced reference panel specific to the population isolate.
Collapse
Affiliation(s)
- Anthony Francis Herzig
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - Marie-Claude Babron
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Céline Bellenguez
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France.,Institut Pasteur de Lille, Lille, France.,Université de Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Anne-Louise Leutenegger
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| |
Collapse
|
22
|
Tissier R, Tsonaka R, Mooijaart SP, Slagboom E, Houwing-Duistermaat JJ. Secondary phenotype analysis in ascertained family designs: application to the Leiden longevity study. Stat Med 2017; 36:2288-2301. [PMID: 28303589 PMCID: PMC5485037 DOI: 10.1002/sim.7281] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 02/17/2017] [Accepted: 02/20/2017] [Indexed: 01/14/2023]
Abstract
The case-control design is often used to test associations between the case-control status and genetic variants. In addition to this primary phenotype, a number of additional traits, known as secondary phenotypes, are routinely recorded, and typically, associations between genetic factors and these secondary traits are studied too. Analysing secondary phenotypes in case-control studies may lead to biased genetic effect estimates, especially when the marker tested is associated with the primary phenotype and when the primary and secondary phenotypes tested are correlated. Several methods have been proposed in the literature to overcome the problem, but they are limited to case-control studies and not directly applicable to more complex designs, such as the multiple-cases family studies. A proper secondary phenotype analysis, in this case, is complicated by the within families correlations on top of the biased sampling design. We propose a novel approach to accommodate the ascertainment process while explicitly modelling the familial relationships. Our approach pairs existing methods for mixed-effects models with the retrospective likelihood framework and uses a multivariate probit model to capture the association between the mixed type primary and secondary phenotypes. To examine the efficiency and bias of the estimates, we performed simulations under several scenarios for the association between the primary phenotype, secondary phenotype and genetic markers. We will illustrate the method by analysing the association between triglyceride levels and glucose (secondary phenotypes) and genetic markers from the Leiden Longevity Study, a multiple-cases family study that investigates longevity. © 2017 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.
Collapse
Affiliation(s)
- Renaud Tissier
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, The Netherlands
| | - Roula Tsonaka
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, The Netherlands
| | - Simon P Mooijaart
- Department of Gerontology and Geriatrics, Leiden University Medical Centre, Leiden, The Netherlands
| | - Eline Slagboom
- Department of Molecular Epidemiology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Jeanine J Houwing-Duistermaat
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, The Netherlands.,Department of Statistics, University of Leeds, U.K
| |
Collapse
|
23
|
|
24
|
Woodbury-Smith M, Bilder DA, Morgan J, Jerominski L, Darlington T, Dyer T, Paterson AD, Coon H. Combined genome-wide linkage and targeted association analysis of head circumference in autism spectrum disorder families. J Neurodev Disord 2017; 9:5. [PMID: 28289475 PMCID: PMC5304400 DOI: 10.1186/s11689-017-9187-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 01/20/2017] [Indexed: 11/24/2022] Open
Abstract
Background It has long been recognized that there is an association between enlarged head circumference (HC) and autism spectrum disorder (ASD), but the genetics of HC in ASD is not well understood. In order to investigate the genetic underpinning of HC in ASD, we undertook a genome-wide linkage study of HC followed by linkage signal targeted association among a sample of 67 extended pedigrees with ASD. Methods HC measurements on members of 67 multiplex ASD extended pedigrees were used as a quantitative trait in a genome-wide linkage analysis. The Illumina 6K SNP linkage panel was used, and analyses were carried out using the SOLAR implemented variance components model. Loci identified in this way formed the target for subsequent association analysis using the Illumina OmniExpress chip and imputed genotypes. A modification of the qTDT was used as implemented in SOLAR. Results We identified a linkage signal spanning 6p21.31 to 6p22.2 (maximum LOD = 3.4). Although targeted association did not find evidence of association with any SNP overall, in one family with the strongest evidence of linkage, there was evidence for association (rs17586672, p = 1.72E−07). Conclusions Although this region does not overlap with ASD linkage signals in these same samples, it has been associated with other psychiatric risk, including ADHD, developmental dyslexia, schizophrenia, specific language impairment, and juvenile bipolar disorder. The genome-wide significant linkage signal represents the first reported observation of a potential quantitative trait locus for HC in ASD and may be relevant in the context of complex multivariate risk likely leading to ASD. Electronic supplementary material The online version of this article (doi:10.1186/s11689-017-9187-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- M Woodbury-Smith
- Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, ON Canada.,Program in Genetics and Genome Biology, The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON Canada.,St Joseph's Healthcare, West 5th Campus, 100 West 5th Street, Hamilton, ON Canada
| | - D A Bilder
- Department of Psychiatry, University of Utah, Salt Lake City, UT USA
| | - J Morgan
- Department of Psychiatry, University of Utah, Salt Lake City, UT USA
| | - L Jerominski
- Department of Psychiatry, University of Utah, Salt Lake City, UT USA
| | - T Darlington
- Department of Psychiatry, University of Utah, Salt Lake City, UT USA
| | - T Dyer
- University of Texas Rio Grande Valley School of Medicine and South Texas Diabetes and Obesity Institute, Harlingen, TX USA
| | - A D Paterson
- Program in Genetics and Genome Biology, The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON Canada.,Division of Epidemiology and Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada
| | - H Coon
- Department of Psychiatry, University of Utah, Salt Lake City, UT USA
| |
Collapse
|
25
|
Saad M, Nato AQ, Grimson FL, Lewis SM, Brown LA, Blue EM, Thornton TA, Thompson EA, Wijsman EM. Identity-by-descent estimation with population- and pedigree-based imputation in admixed family data. BMC Proc 2016; 10:295-301. [PMID: 27980652 PMCID: PMC5133511 DOI: 10.1186/s12919-016-0046-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023] Open
Abstract
Background In the past few years, imputation approaches have been mainly used in population-based designs of genome-wide association studies, although both family- and population-based imputation methods have been proposed. With the recent surge of family-based designs, family-based imputation has become more important. Imputation methods for both designs are based on identity-by-descent (IBD) information. Apart from imputation, the use of IBD information is also common for several types of genetic analysis, including pedigree-based linkage analysis. Methods We compared the performance of several family- and population-based imputation methods in large pedigrees provided by Genetic Analysis Workshop 19 (GAW19). We also evaluated the performance of a new IBD mapping approach that we propose, which combines IBD information from known pedigrees with information from unrelated individuals. Results Different combinations of the imputation methods have varied imputation accuracies. Moreover, we showed gains from the use of both known pedigrees and unrelated individuals with our IBD mapping approach over the use of known pedigrees only. Conclusions Our results represent accuracies of different combinations of imputation methods that may be useful for data sets similar to the GAW19 pedigree data. Our IBD mapping approach, which uses both known pedigree and unrelated individuals, performed better than classical linkage analysis.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Alejandro Q Nato
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Fiona L Grimson
- Department of Statistics, University of Washington, Seattle, WA USA
| | - Steven M Lewis
- Department of Statistics, University of Washington, Seattle, WA USA
| | - Lisa A Brown
- Department of Biostatistics, University of Washington, Seattle, WA USA
| | - Elizabeth M Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | | | | | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA ; Department of Genome Sciences, University of Washington, Seattle, WA USA
| |
Collapse
|
26
|
Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 2016; 99:846-859. [PMID: 27666371 DOI: 10.1016/j.ajhg.2016.08.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Accepted: 08/17/2016] [Indexed: 11/24/2022] Open
Abstract
Recently, multiple studies have performed whole-exome or whole-genome sequencing to identify groups of rare variants associated with complex traits and diseases. They have primarily utilized case-control study designs that often require thousands of individuals to reach acceptable statistical power. Family-based studies can be more powerful because a rare variant can be enriched in an extended pedigree and segregate with the phenotype. Although many methods have been proposed for using family data to discover rare variants involved in a disease, a majority of them focus on a specific pedigree structure and are designed to analyze either binary or continuously measured outcomes. In this article, we propose RareIBD, a general and powerful approach to identifying rare variants involved in disease susceptibility. Our method can be applied to large extended families of arbitrary structure, including pedigrees with only affected individuals. The method accommodates both binary and quantitative traits. A series of simulation experiments suggest that RareIBD is a powerful test that outperforms existing approaches. In addition, our method accounts for individuals in top generations, which are not usually genotyped in extended families. In contrast to available statistical tests, RareIBD generates accurate p values even when genetic data from these individuals are missing. We applied RareIBD, as well as other methods, to two extended family datasets generated by different genotyping technologies and representing different ethnicities. The analysis of real data confirmed that RareIBD is the only method that properly controls type I error.
Collapse
|
27
|
Ristov S, Brajkovic V, Cubric-Curik V, Michieli I, Curik I. MaGelLAn 1.0: a software to facilitate quantitative and population genetic analysis of maternal inheritance by combination of molecular and pedigree information. Genet Sel Evol 2016; 48:65. [PMID: 27613390 PMCID: PMC5018160 DOI: 10.1186/s12711-016-0242-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 08/29/2016] [Indexed: 11/23/2022] Open
Abstract
Background Identification of genes or even nucleotides that are responsible for quantitative and adaptive trait variation is a difficult task due to the complex interdependence between a large number of genetic and environmental factors. The polymorphism of the mitogenome is one of the factors that can contribute to quantitative trait variation. However, the effects of the mitogenome have not been comprehensively studied, since large numbers of mitogenome sequences and recorded phenotypes are required to reach the adequate power of analysis. Current research in our group focuses on acquiring the necessary mitochondria sequence information and analysing its influence on the phenotype of a quantitative trait. To facilitate these tasks we have produced software for processing pedigrees that is optimised for maternal lineage analysis. Results We present MaGelLAn 1.0 (maternal genealogy lineage analyser), a suite of four Python scripts (modules) that is designed to facilitate the analysis of the impact of mitogenome polymorphism on quantitative trait variation by combining molecular and pedigree information. MaGelLAn 1.0 is primarily used to: (1) optimise the sampling strategy for molecular analyses; (2) identify and correct pedigree inconsistencies; and (3) identify maternal lineages and assign the corresponding mitogenome sequences to all individuals in the pedigree, this information being used as input to any of the standard software for quantitative genetic (association) analysis. In addition, MaGelLAn 1.0 allows computing the mitogenome (maternal) effective population sizes and probability of mitogenome (maternal) identity that are useful for conservation management of small populations. Conclusions MaGelLAn is the first tool for pedigree analysis that focuses on quantitative genetic analyses of mitogenome data. It is conceived with the purpose to significantly reduce the effort in handling and preparing large pedigrees for processing the information linked to maternal lines. The software source code, along with the manual and the example files can be downloaded at http://lissp.irb.hr/software/magellan-1-0/ and https://github.com/sristov/magellan. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0242-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Strahil Ristov
- Ruđer Bošković Institute, Bijenička cesta 54, 10000, Zagreb, Croatia.
| | - Vladimir Brajkovic
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000, Zagreb, Croatia
| | - Vlatka Cubric-Curik
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000, Zagreb, Croatia
| | - Ivan Michieli
- Ruđer Bošković Institute, Bijenička cesta 54, 10000, Zagreb, Croatia
| | - Ino Curik
- Faculty of Agriculture, University of Zagreb, Svetošimunska cesta 25, 10000, Zagreb, Croatia
| |
Collapse
|
28
|
Bimber BN, Raboin MJ, Letaw J, Nevonen KA, Spindel JE, McCouch SR, Cervera-Juanes R, Spindel E, Carbone L, Ferguson B, Vinson A. Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation. BMC Genomics 2016; 17:676. [PMID: 27558348 PMCID: PMC4997765 DOI: 10.1186/s12864-016-2966-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2016] [Accepted: 07/22/2016] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still in its infancy. Whole-genome sequence (WGS) data in large pedigreed macaque colonies could provide substantial experimental power for genetic discovery, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for 30X WGS, followed by low-cost genotyping-by-sequencing (GBS) at 30X on the remaining macaques in order to generate sparse genotype data at high accuracy. Dense variants from the selected macaques with WGS data are then imputed into macaques having only sparse GBS data, resulting in dense genome-wide genotypes throughout the pedigree. RESULTS We developed GBS for the macaque genome using a digestion with PstI, followed by sequencing of size-selected fragments at 30X coverage. From GBS sequence data collected on all individuals in a 16-member pedigree, we characterized high-confidence genotypes at 22,455 single nucleotide variant (SNV) sites that were suitable for guiding imputation of dense sequence data from WGS. To characterize dense markers for imputation, we performed WGS at 30X coverage on nine of the 16 individuals, yielding 10,193,425 high-confidence SNVs. To validate the use of GBS data for facilitating imputation, we initially focused on chromosome 19 as a test case, using an optimized panel of 833 sparse, evenly-spaced markers from GBS and 5,010 dense markers from WGS. Using the method of "Genotype Imputation Given Inheritance" (GIGI), we evaluated the effects on imputation accuracy of 3 different strategies for selecting individuals for WGS, including 1) using "GIGI-Pick" to select the most informative individuals, 2) using the most recent generation, or 3) using founders only. We also evaluated the effects on imputation accuracy of using a range of from 1 to 9 WGS individuals for imputation. We found that the GIGI-Pick algorithm for selection of WGS individuals outperformed common heuristic approaches, and that genotype numbers and accuracy improved very little when using >5 WGS individuals for imputation. Informed by our findings, we used 4 macaques with WGS data to impute variants at up to 7,655,491 sites spanning all 20 autosomes in the 12 remaining macaques, based on their GBS genotypes at only 17,158 loci. Using a strict confidence threshold, we imputed an average of 3,680,238 variants per individual at >99 % accuracy, or an average 4,458,883 variants per individual at a more relaxed threshold, yielding >97 % accuracy. CONCLUSIONS We conclude that an optimal tradeoff between genotype accuracy, number of imputed genotypes, and overall cost exists at the ratio of one individual selected for WGS using the GIGI-Pick algorithm, per 3-5 relatives selected for GBS. This approach makes feasible the collection of accurate, dense genome-wide sequence data in large pedigreed macaque cohorts without the need for more expensive WGS data on all individuals.
Collapse
Affiliation(s)
- Benjamin N Bimber
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Michael J Raboin
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - John Letaw
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Kimberly A Nevonen
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Jennifer E Spindel
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA
| | - Susan R McCouch
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA
| | - Rita Cervera-Juanes
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Eliot Spindel
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Lucia Carbone
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Betsy Ferguson
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.,Oregon Health & Science University, Portland, OR, USA
| | - Amanda Vinson
- Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA. .,Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
29
|
Chung RH, Tsai WY, Kang CY, Yao PJ, Tsai HJ, Chen CH. FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies. PLoS Comput Biol 2016; 12:e1004980. [PMID: 27272119 PMCID: PMC4894624 DOI: 10.1371/journal.pcbi.1004980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/12/2016] [Indexed: 11/18/2022] Open
Abstract
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- * E-mail:
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Chen-Yu Kang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Po-Ju Yao
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Hui-Ju Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Chia-Hsiang Chen
- Department of Psychiatry, Chang Gung Memorial Hospital-Linkou, Gueishan, Taoyuan, Taiwan
- Department and Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
30
|
Abstract
Participants in the family-based analysis group at Genetic Analysis Workshop 19 addressed diverse topics, all of which used the family data. Topics addressed included questions of study design and data quality control (QC), genotype imputation to augment available sequence data, and linkage and/or association analyses. Results show that pedigree-based tests that are sensitive to genotype error may be useful for QC. Imputation quality improved with inclusion of small amounts of pedigree information used to phase the data in evaluation of 5 commonly used approaches for imputation in samples of (typically) unrelated subjects. It improved still further when pedigree-based imputation using larger pedigrees was also added. An important distinction was made between methods that do versus do not make use of Mendelian transmission in pedigrees, because this serves as a key difference between underlying models and assumptions. Methods that model relatedness generally had higher power in association testing than did analyses that carry out testing in the presence of a transmission model, but this may reflect details of implementation and/or ability of more general methods to jointly include data from larger pedigrees. In either case, for single nucleotide polymorphism-set approaches, weights that incorporate information on functional effects may be more useful than those that are based only on allele frequencies. The overall results demonstrate that family data continue to provide important information in the search for trait loci.
Collapse
Affiliation(s)
- Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98195, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
31
|
Nato AQ, Chapman NH, Sohi HK, Nguyen HD, Brkanac Z, Wijsman EM. PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers. Bioinformatics 2015; 31:3790-8. [PMID: 26231429 PMCID: PMC4668752 DOI: 10.1093/bioinformatics/btv444] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 07/07/2015] [Accepted: 07/25/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Huge genetic datasets with dense marker panels are now common. With the availability of sequence data and recognition of importance of rare variants, smaller studies based on pedigrees are again also common. Pedigree-based samples often start with a dense marker panel, a subset of which may be used for linkage analysis to reduce computational burden and to limit linkage disequilibrium between single-nucleotide polymorphisms (SNPs). Programs attempting to select markers for linkage panels exist but lack flexibility. RESULTS We developed a pedigree-based analysis pipeline (PBAP) suite of programs geared towards SNPs and sequence data. PBAP performs quality control, marker selection and file preparation. PBAP sets up files for MORGAN, which can handle analyses for small and large pedigrees, typically human, and results can be used with other programs and for downstream analyses. We evaluate and illustrate its features with two real datasets. AVAILABILITY AND IMPLEMENTATION PBAP scripts may be downloaded from http://faculty.washington.edu/wijsman/software.shtml. CONTACT wijsman@uw.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Hiep D Nguyen
- Division of Medical Genetics, Department of Medicine
| | | | - Ellen M Wijsman
- Division of Medical Genetics, Department of Medicine, Department of Biostatistics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
32
|
Chapman NH, Nato AQ, Bernier R, Ankenman K, Sohi H, Munson J, Patowary A, Archer M, Blue EM, Webb SJ, Coon H, Raskind WH, Brkanac Z, Wijsman EM. Whole exome sequencing in extended families with autism spectrum disorder implicates four candidate genes. Hum Genet 2015; 134:1055-68. [PMID: 26204995 PMCID: PMC4578871 DOI: 10.1007/s00439-015-1585-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 07/11/2015] [Indexed: 12/26/2022]
Abstract
Autism spectrum disorders (ASDs) are a group of neurodevelopmental disorders, characterized by impairment in communication and social interactions, and by repetitive behaviors. ASDs are highly heritable, and estimates of the number of risk loci range from hundreds to >1000. We considered 7 extended families (size 12-47 individuals), each with ≥3 individuals affected by ASD. All individuals were genotyped with dense SNP panels. A small subset of each family was typed with whole exome sequence (WES). We used a 3-step approach for variant identification. First, we used family-specific parametric linkage analysis of the SNP data to identify regions of interest. Second, we filtered variants in these regions based on frequency and function, obtaining exactly 200 candidates. Third, we compared two approaches to narrowing this list further. We used information from the SNP data to impute exome variant dosages into those without WES. We regressed affected status on variant allele dosage, using pedigree-based kinship matrices to account for relationships. The p value for the test of the null hypothesis that variant allele dosage is unrelated to phenotype was used to indicate strength of evidence supporting the variant. A cutoff of p = 0.05 gave 28 variants. As an alternative third filter, we required Mendelian inheritance in those with WES, resulting in 70 variants. The imputation- and association-based approach was effective. We identified four strong candidate genes for ASD (SEZ6L, HISPPD1, FEZF1, SAMD11), all of which have been previously implicated in other studies, or have a strong biological argument for their relevance.
Collapse
Affiliation(s)
- Nicola H Chapman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Raphael Bernier
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Katy Ankenman
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Harkirat Sohi
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Jeff Munson
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Ashok Patowary
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Marilyn Archer
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Elizabeth M Blue
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Sara Jane Webb
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Hilary Coon
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
- Department of Psychiatry, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Wendy H Raskind
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Ellen M Wijsman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- University of Washington, University of Washington Tower, T15, 4333 Brooklyn Ave, NE, BOX 359460, Seattle, WA, 98195-9460, USA.
| |
Collapse
|
33
|
Gribble MO, Voruganti VS, Cole SA, Haack K, Balakrishnan P, Laston SL, Tellez-Plaza M, Francesconi KA, Goessler W, Umans JG, Thomas DC, Gilliland F, North KE, Franceschini N, Navas-Acien A. Linkage Analysis of Urine Arsenic Species Patterns in the Strong Heart Family Study. Toxicol Sci 2015. [PMID: 26209557 DOI: 10.1093/toxsci/kfv164] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Arsenic toxicokinetics are important for disease risks in exposed populations, but genetic determinants are not fully understood. We examined urine arsenic species patterns measured by HPLC-ICPMS among 2189 Strong Heart Study participants 18 years of age and older with data on ~400 genome-wide microsatellite markers spaced ~10 cM and arsenic speciation (683 participants from Arizona, 684 from Oklahoma, and 822 from North and South Dakota). We logit-transformed % arsenic species (% inorganic arsenic, %MMA, and %DMA) and also conducted principal component analyses of the logit % arsenic species. We used inverse-normalized residuals from multivariable-adjusted polygenic heritability analysis for multipoint variance components linkage analysis. We also examined the contribution of polymorphisms in the arsenic metabolism gene AS3MT via conditional linkage analysis. We localized a quantitative trait locus (QTL) on chromosome 10 (LOD 4.12 for %MMA, 4.65 for %DMA, and 4.84 for the first principal component of logit % arsenic species). This peak was partially but not fully explained by measured AS3MT variants. We also localized a QTL for the second principal component of logit % arsenic species on chromosome 5 (LOD 4.21) that was not evident from considering % arsenic species individually. Some other loci were suggestive or significant for 1 geographical area but not overall across all areas, indicating possible locus heterogeneity. This genome-wide linkage scan suggests genetic determinants of arsenic toxicokinetics to be identified by future fine-mapping, and illustrates the utility of principal component analysis as a novel approach that considers % arsenic species jointly.
Collapse
Affiliation(s)
- Matthew O Gribble
- *Department of Preventive Medicine, University of Southern California, Los Angeles, California;
| | - Venkata Saroja Voruganti
- Department of Nutrition, University of North Carolina, Chapel Hill, North Carolina; UNC Nutrition Research Institute, University of North Carolina at Chapel Hill, Kannapolis, North Carolina
| | - Shelley A Cole
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas
| | - Karin Haack
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas
| | - Poojitha Balakrishnan
- Department of Environmental Health Sciences, Johns Hopkins University, Baltimore, Maryland; Department of Epidemiology, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Sandra L Laston
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center, San Antonio-Regional Academic Health Center, Brownsville, Texas
| | - Maria Tellez-Plaza
- Department of Environmental Health Sciences, Johns Hopkins University, Baltimore, Maryland; Biomedical Research Institute, Hospital Clinic de Valencia-INCLIVA, Valencia, Spain
| | - Kevin A Francesconi
- Institute of Chemistry-Analytical Chemistry, University of Graz, Graz, Austria
| | - Walter Goessler
- Institute of Chemistry-Analytical Chemistry, University of Graz, Graz, Austria
| | - Jason G Umans
- Georgetown-Howard Universities Center for Clinical and Translational Science, Washington, District of Columbia; MedStar Health Research Institute, Hyattsville, Maryland
| | - Duncan C Thomas
- *Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Frank Gilliland
- *Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina
| | - Ana Navas-Acien
- Department of Environmental Health Sciences, Johns Hopkins University, Baltimore, Maryland; Department of Epidemiology, Johns Hopkins Medical Institutions, Baltimore, Maryland; Welch Center for Prevention, Epidemiology and Clinical Research, Johns Hopkins Medical Institutions, Baltimore, Maryland; Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, Maryland
| |
Collapse
|
34
|
Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 2015; 11:e1005271. [PMID: 26043085 PMCID: PMC4456389 DOI: 10.1371/journal.pgen.1005271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 05/12/2015] [Indexed: 12/23/2022] Open
Abstract
Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing. To identify disease variants that occur less frequently in population, sequencing families in which multiple individuals are affected is more powerful due to the enrichment of causal variants. An important step in such studies is to infer individual genotypes from sequencing data. Existing methods do not utilize full familial transmission information and therefore result in reduced accuracy of inferred genotypes. In this study we describe a new method that infers shared genetic materials among family members and then incorporate the shared genomic information in a novel algorithm that can accurately infer genotypes. Our method is particularly advantageous when inferring low frequency variants with fewer sequence data, making it effective in analyzing genome-wide sequence data. We implemented the algorithm in a computationally efficient tool to facilitate cost-effective sequencing in families for identifying disease genetic variants.
Collapse
|
35
|
Kember RL, Georgi B, Bailey-Wilson JE, Stambolian D, Paul SM, Bućan M. Copy number variants encompassing Mendelian disease genes in a large multigenerational family segregating bipolar disorder. BMC Genet 2015; 16:27. [PMID: 25887117 PMCID: PMC4382929 DOI: 10.1186/s12863-015-0184-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 02/19/2015] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Bipolar affective disorder (BP) is a common, highly heritable psychiatric disorder characterized by periods of depression and mania. Using dense SNP genotype data, we characterized CNVs in 388 members of an Old Order Amish Pedigree with bipolar disorder. We identified CNV regions arising from common ancestral mutations by utilizing the pedigree information. By combining this analysis with whole genome sequence data in the same individuals, we also explored the role of compound heterozygosity. RESULTS Here we describe 541 inherited CNV regions, of which 268 are rare in a control population of European origin but present in a large number of Amish individuals. In addition, we highlight a set of CNVs found at higher frequencies in BP individuals, and within genes known to play a role in human development and disease. As in prior reports, we find no evidence for an increased burden of CNVs in BP individuals, but we report a trend towards a higher burden of CNVs in known Mendelian disease loci in bipolar individuals (BPI and BPII, p = 0.06). CONCLUSIONS We conclude that CNVs may be contributing factors in the phenotypic presentation of mood disorders and co-morbid medical conditions in this family. These results reinforce the hypothesis of a complex genetic architecture underlying BP disorder, and suggest that the role of CNVs should continue to be investigated in BP data sets.
Collapse
Affiliation(s)
- Rachel L Kember
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| | - Benjamin Georgi
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, USA.
| | - Dwight Stambolian
- Department of Ophthalmology, University of Pennsylvania, Philadelphia, PA, USA.
| | - Steven M Paul
- Appel Alzheimer's Disease Research Institute, Mind and Brain Institute, Weill Cornell Medical College, New York, NY, USA.
| | - Maja Bućan
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
36
|
Livne OE, Han L, Alkorta-Aranburu G, Wentworth-Sheilds W, Abney M, Ober C, Nicolae DL. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. PLoS Comput Biol 2015; 11:e1004139. [PMID: 25735005 PMCID: PMC4348507 DOI: 10.1371/journal.pcbi.1004139] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 01/19/2015] [Indexed: 12/31/2022] Open
Abstract
Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.
Collapse
Affiliation(s)
- Oren E. Livne
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Lide Han
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Gorka Alkorta-Aranburu
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - William Wentworth-Sheilds
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Carole Ober
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Dan L. Nicolae
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- Departments of Medicine, and Statistics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
37
|
Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014; 38:579-90. [PMID: 25132070 PMCID: PMC4190076 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]
Abstract
In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Ellen M. Wijsman
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
38
|
Blue EM, Sun L, Tintle NL, Wijsman EM. Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond. Genet Epidemiol 2014; 38 Suppl 1:S21-8. [PMID: 25112184 DOI: 10.1002/gepi.21821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.
Collapse
Affiliation(s)
- Elizabeth M Blue
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | | | | | | |
Collapse
|
39
|
Chen W, Schaid DJ. PedBLIMP: extending linear predictors to impute genotypes in pedigrees. Genet Epidemiol 2014; 38:531-41. [PMID: 25044249 DOI: 10.1002/gepi.21838] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 05/15/2014] [Accepted: 05/19/2014] [Indexed: 12/13/2022]
Abstract
Recently, Wen and Stephens (Wen and Stephens [2010] Ann Appl Stat 4(3):1158-1182) proposed a linear predictor, called BLIMP, that uses conditional multivariate normal moments to impute genotypes with accuracy similar to current state-of-the-art methods. One novelty is that it regularized the estimated covariance matrix based on a model from population genetics. We extended multivariate moments to impute genotypes in pedigrees. Our proposed method, PedBLIMP, utilizes both the linkage-disequilibrium (LD) information estimated from external panel data and the pedigree structure or identity-by-descent (IBD) information. The proposed method was evaluated on a pedigree design where some individuals were genotyped with dense markers and the rest with sparse markers. We found that incorporating the pedigree/IBD information can improve imputation accuracy compared to BLIMP. Because rare variants usually have low LD with other single-nucleotide polymorphisms (SNPs), incorporating pedigree/IBD information largely improved imputation accuracy for rare variants. We also compared PedBLIMP with IMPUTE2 and GIGI. Results show that when sparse markers are in a certain density range, our method can outperform both IMPUTE2 and GIGI.
Collapse
Affiliation(s)
- Wenan Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | | |
Collapse
|
40
|
Identity-by-descent graphs offer a flexible framework for imputation and both linkage and association analyses. BMC Proc 2014; 8:S19. [PMID: 25519371 PMCID: PMC4143703 DOI: 10.1186/1753-6561-8-s1-s19] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We demonstrate the flexibility of identity-by-descent (IBD) graphs for genotype imputation and testing relationships between genotype and phenotype. We analyzed chromosome 3 and the first replicate of simulated diastolic blood pressure. IBD graphs were obtained from complete pedigrees and full multipoint marker analysis, facilitating subsequent linkage and other analyses. For rare alleles, pedigree-based imputation using these IBD graphs had a higher call rate than did population-based imputation. Combining the two approaches improved call rates for common alleles. We found it advantageous to incorporate known, rather than estimated, pedigree relationships when testing for association. Replacing missing data with imputed alleles improved association signals as well. Analyses were performed with knowledge of the underlying model.
Collapse
|
41
|
Rubenstein K, Raskind WH, Berninger VW, Matsushita MM, Wijsman EM. Genome scan for cognitive trait loci of dyslexia: Rapid naming and rapid switching of letters, numbers, and colors. Am J Med Genet B Neuropsychiatr Genet 2014; 165B:345-56. [PMID: 24807833 PMCID: PMC4053475 DOI: 10.1002/ajmg.b.32237] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 04/14/2014] [Indexed: 12/14/2022]
Abstract
Dyslexia, or specific reading disability, is a common developmental disorder that affects 5-12% of school-aged children. Dyslexia and its component phenotypes, assessed categorically or quantitatively, have complex genetic bases. The ability to rapidly name letters, numbers, and colors from rows presented visually correlates strongly with reading in multiple languages and is a valid predictor of reading and spelling impairment. Performance on measures of rapid naming and switching, RAN and RAS, is stable throughout elementary school years, with slowed performance persisting in adults who still manifest dyslexia. Targeted analyses of dyslexia candidate regions have included RAN measures, but only one other genome-wide linkage study has been reported. As part of a broad effort to identify genetic contributors to dyslexia, we performed combined oligogenic segregation and linkage analyses of measures of RAN and RAS in a family-based cohort ascertained through probands with dyslexia. We obtained strong evidence for linkage of RAN letters to the DYX3 locus on chromosome 2p and RAN colors to chromosome 10q, but were unable to confirm the chromosome 6p21 linkage detected for a composite measure of RAN colors and objects in the previous genome-wide study.
Collapse
Affiliation(s)
- Kevin Rubenstein
- Department of Biostatistics University of Washington, Seattle, WA
| | - Wendy H. Raskind
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| | | | - Mark M. Matsushita
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| | - Ellen M. Wijsman
- Department of Biostatistics University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine University of Washington, Seattle, WA
| |
Collapse
|
42
|
Cheung CYK, Thompson EA, Wijsman EM. Detection of Mendelian consistent genotyping errors in pedigrees. Genet Epidemiol 2014; 38:291-9. [PMID: 24718985 DOI: 10.1002/gepi.21806] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Revised: 03/03/2014] [Accepted: 03/04/2014] [Indexed: 11/12/2022]
Abstract
Detection of genotyping errors is a necessary step to minimize false results in genetic analysis. This is especially important when the rate of genotyping errors is high, as has been reported for high-throughput sequence data. To detect genotyping errors in pedigrees, Mendelian inconsistent (MI) error checks exist, as do multi-point methods that flag Mendelian consistent (MC) errors for sparse multi-allelic markers. However, few methods exist for detecting MC genotyping errors, particularly for dense variants on large pedigrees. Here, we introduce an efficient method to detect MC errors even for very dense variants (e.g., SNPs and sequencing data) on pedigrees that may be large. Our method first samples inheritance vectors (IVs) using a moderately sparse but informative set of markers using a Markov chain Monte Carlo-based sampler. Using sampled IVs, we considered two test statistics to detect MC genotyping errors: the percentage of IVs inconsistent with observed genotypes (A1) or the posterior probability of error configurations (A2). Using simulations, we show that this method, even with the simpler A1 statistic, is effective for detecting MC genotyping errors in dense variants, with sensitivity almost as high as the theoretical best sensitivity possible. We also evaluate the effectiveness of this method as a function of parameters, when including the observed pattern for genotype, density of framework markers, error rate, allele frequencies, and number of sampled inheritance vectors. Our approach provides a line of defense against false findings based on the use of dense variants in pedigrees.
Collapse
Affiliation(s)
- Charles Y K Cheung
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | | | | |
Collapse
|
43
|
Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate. PLoS Genet 2014; 10:e1004229. [PMID: 24625924 PMCID: PMC3953017 DOI: 10.1371/journal.pgen.1004229] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Accepted: 01/24/2014] [Indexed: 11/19/2022] Open
Abstract
Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable efforts genetic studies have yet to reveal the precise genetic underpinnings of the disorder. In this study we have analyzed a large extended pedigree of Old Order Amish that segregates bipolar disorder. Our study design integrates both dense genotype and whole-genome sequence data. In a combined linkage and association analysis we identify five chromosomal regions with nominally significant or suggestive evidence for linkage, several of which constitute replication of earlier linkage findings for bipolar disorder in non-Amish families. Association analysis of genetic variants in each of the linkage regions yielded a number of plausible candidate genes for bipolar disorder. The striking genetic heterogeneity we observed in this genetic isolate has profound implications for the study of bipolar disorder in the general population.
Collapse
|
44
|
A statistical framework to guide sequencing choices in pedigrees. Am J Hum Genet 2014; 94:257-67. [PMID: 24507777 DOI: 10.1016/j.ajhg.2014.01.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 01/13/2014] [Indexed: 11/23/2022] Open
Abstract
The use of large pedigrees is an effective design for identifying rare functional variants affecting heritable traits. Cost-effective studies using sequence data can be achieved via pedigree-based genotype imputation in which some subjects are sequenced and missing genotypes are inferred on the remaining subjects. Because of high cost, it is important to carefully prioritize subjects for sequencing. Here, we introduce a statistical framework that enables systematic comparison among subject-selection choices for sequencing. We introduce a metric "local coverage," which allows the use of inferred inheritance vectors to measure genotype-imputation ability specifically in a region of interest, such as one with prior evidence of linkage. In the absence of linkage information, we can instead use a "genome-wide coverage" metric computed with the pedigree structure. These metrics enable the development of a method that identifies efficient selection choices for sequencing. As implemented in GIGI-Pick, this method also flexibly allows initial manual selection of subjects and optimizes selections within the constraint that only some subjects might be available for sequencing. In the present study, we used simulations to compare GIGI-Pick with PRIMUS, ExomePicks, and common ad hoc methods of selecting subjects. In genotype imputation of both common and rare alleles, GIGI-Pick substantially outperformed all other methods considered and had the added advantage of incorporating prior linkage information. We also used a real pedigree to demonstrate the utility of our approach in identifying causal mutations. Our work enables prioritization of subjects for sequencing to facilitate dissection of the genetic basis of heritable traits.
Collapse
|
45
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
46
|
Saad M, Wijsman EM. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet Epidemiol 2013; 38:1-9. [PMID: 24243664 DOI: 10.1002/gepi.21776] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 09/30/2013] [Accepted: 10/15/2013] [Indexed: 01/09/2023]
Abstract
Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America; Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | | |
Collapse
|
47
|
Marchani EE, Chapman NH, Cheung CYK, Ankenman K, Stanaway IB, Coon HH, Nickerson D, Bernier R, Brkanac Z, Wijsman EM. Identification of rare variants from exome sequence in a large pedigree with autism. Hum Hered 2013; 74:153-64. [PMID: 23594493 DOI: 10.1159/000346560] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
We carried out analyses with the goal of identifying rare variants in exome sequence data that contribute to disease risk for a complex trait. We analyzed a large, 47-member, multigenerational pedigree with 11 cases of autism spectrum disorder, using genotypes from 3 technologies representing increasing resolution: a multiallelic linkage marker panel, a dense diallelic marker panel, and variants from exome sequencing. Genome-scan marker genotypes were available on most subjects, and exome sequence data was available on 5 subjects. We used genome-scan linkage analysis to identify and prioritize the chromosome 22 region of interest, and to select subjects for exome sequencing. Inheritance vectors (IVs) generated by Markov chain Monte Carlo analysis of multilocus marker data were the foundation of most analyses. Genotype imputation used IVs to determine which sequence variants reside on the haplotype that co-segregates with the autism diagnosis. Together with a rare-allele frequency filter, we identified only one rare variant on the risk haplotype, illustrating the potential of this approach to prioritize variants. The associated gene, MYH9, is biologically unlikely, and we speculate that for this complex trait, the key variants may lie outside the exome.
Collapse
Affiliation(s)
- E E Marchani
- Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|