1
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
2
|
Ledesma A, Ribeiro FAS, Uberti A, Edwards J, Hearne S, Frei U, Lübberstedt T. Molecular characterization of doubled haploid lines derived from different cycles of the Iowa Stiff Stalk Synthetic (BSSS) maize population. FRONTIERS IN PLANT SCIENCE 2023; 14:1226072. [PMID: 37600186 PMCID: PMC10433169 DOI: 10.3389/fpls.2023.1226072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023]
Abstract
Molecular characterization of a given set of maize germplasm could be useful for understanding the use of the assembled germplasm for further improvement in a breeding program, such as analyzing genetic diversity, selecting a parental line, assigning heterotic groups, creating a core set of germplasm and/or performing association analysis for traits of interest. In this study, we used single nucleotide polymorphism (SNP) markers to assess the genetic variability in a set of doubled haploid (DH) lines derived from the unselected Iowa Stiff Stalk Synthetic (BSSS) maize population, denoted as C0 (BSSS(R)C0), the seventeenth cycle of reciprocal recurrent selection in BSSS (BSSS(R)C17), denoted as C17 and the cross between BSSS(R)C0 and BSSS(R)C17 denoted as C0/C17. With the aim to explore if we have potentially lost diversity from C0 to C17 derived DH lines and observe whether useful genetic variation in C0 was left behind during the selection process since C0 could be a reservoir of genetic diversity that could be untapped using DH technology. Additionally, we quantify the contribution of the BSSS progenitors in each set of DH lines. The molecular characterization analysis confirmed the apparent separation and the loss of genetic variability from C0 to C17 through the recurrent selection process. Which was observed by the degree of differentiation between the C0_DHL versus C17_DHL groups by Wright's F-statistics (FST). Similarly for the population structure based on principal component analysis (PCA) revealed a clear separation among groups of DH lines. Some of the progenitors had a higher genetic contribution in C0 compared with C0/C17 and C17 derived DH lines. Although genetic drift can explain most of the genetic structure genome-wide, phenotypic data provide evidence that selection has altered favorable allele frequencies in the BSSS maize population through the reciprocal recurrent selection program.
Collapse
Affiliation(s)
- Alejandro Ledesma
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | | | - Alison Uberti
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Jode Edwards
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA, United States
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT), El Batan, Texcoco, Mexico
| | - Ursula Frei
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | | |
Collapse
|
3
|
Yu Z, Abdel-Azim S, Duggal P, Vergara C. Identity by descent mapping of HCV spontaneous clearance in populations of diverse ancestry. RESEARCH SQUARE 2023:rs.3.rs-2433454. [PMID: 36712049 PMCID: PMC9882640 DOI: 10.21203/rs.3.rs-2433454/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Background Acute infection with hepatitis C virus (HCV) affects millions of individuals worldwide. Host genetics plays a role in spontaneous clearance of the acute infection which occurs in approximately 30% of the individuals. Common variants in GPR158, genes in the interferon lambda (IFNL) cluster, and the MHC region have been associated with HCV clearance in populations of diverse ancestry. Fine mapping of those regions has identified some key variants and amino acids as potential causal variants but the role of rare variants in those regions and in the genome, in general, has not been explored. We aimed to detect haplotypes containing rare variants related to HCV clearance using identity-by-descent (IBD) haplotype sharing between unrelated cases/case pairs and case/controls pairs in 3,608 individuals with European and African ancestry. Results We detected 1,711,832 and 5,678,043 and individual pairs of IBD segments in the European and African ancestry individuals, respectively. As expected, individuals of African descent had more, and shorter segments compared to Europeans. We did not detect any significant IBD signals in the known associated gene regions. Conclusions IBD is based on sharing of haplotypes and is most powerful in populations with a shared founder or recent common ancestor. For the complex trait of HCV clearance, we used two outbred, global populations that limited our power to detect IBD associations. Overall, in this population-based sample we failed to detect rare variations associated with HCV clearance in individuals of European and African ancestry.
Collapse
Affiliation(s)
- Zixuan Yu
- Johns Hopkins University, Bloomberg School of Public Health
| | | | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health
| | | |
Collapse
|
4
|
Angarita Barajas BK, Cantet RJC, Steibel JP, Schrauf MF, Forneris NS. Heritability estimates and predictive ability for pig meat quality traits using identity-by-state and identity-by-descent relationships in an F 2 population. J Anim Breed Genet 2023; 140:13-27. [PMID: 36300585 DOI: 10.1111/jbg.12742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 10/05/2022] [Indexed: 12/13/2022]
Abstract
Genomic relationships can be computed with dense genome-wide genotypes through different methods, either based on identity-by-state (IBS) or identity-by-descent (IBD). The latter has been shown to increase the accuracy of both estimated relationships and predicted breeding values. However, it is not clear whether an IBD approach would achieve greater heritability ( h 2 ) and predictive ability ( r ̂ y , y ̂ ) than its IBS counterpart for data with low-depth pedigrees. Here, we compare both approaches in terms of the estimated of h 2 and r ̂ y , y ̂ , using data on meat quality and carcass traits recorded in experimental crossbred pigs, with a pedigree constrained to only three generations. Three animal models were fitted which differed on the relationship matrix: an IBS model ( G IBS ), an IBD (defined within the known pedigree) model ( G IBD ), and a pedigree model ( A 22 ). In 9 of 20 traits, the range of increase for the estimates of σ u 2 and h 2 was 1.2-2.9 times greater with G IBS and G IBD models than with A 22 . Whereas for all traits, both parameters were similar between genomic models. The r ̂ y , y ̂ of the genomic models was higher compared to A 22 . A scarce increment in r ̂ y , y ̂ was found with G IBS when compared to G IBD , most likely due to the former recovering sizeable relationships among founder F0 animals.
Collapse
Affiliation(s)
| | - Rodolfo J C Cantet
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
| | - Matias F Schrauf
- Departamento de Métodos Cuantitativos y Sistemas de Información, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina.,Animal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Natalia S Forneris
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
5
|
Burkett KM, Rakesh M, Morris P, Vézina H, Laprise C, Freeman EE, Roy-Gagnon MH. Correspondence Between Genomic- and Genealogical/Coalescent-Based Inference of Homozygosity by Descent in Large French-Canadian Genealogies. Front Genet 2022; 12:808829. [PMID: 35126470 PMCID: PMC8814340 DOI: 10.3389/fgene.2021.808829] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 12/06/2021] [Indexed: 01/03/2023] Open
Abstract
Research on the genetics of complex traits overwhelmingly focuses on the additive effects of genes. Yet, animal studies have shown that non-additive effects, in particular homozygosity effects, can shape complex traits. Recent investigations in human studies found some significant homozygosity effects. However, most human populations display restricted ranges of homozygosity by descent (HBD), making the identification of homozygosity effects challenging. Founder populations give rise to higher HBD levels. When deep genealogical data are available in a founder population, it is possible to gain information on the time to the most recent common ancestor (MRCA) from whom a chromosomal segment has been transmitted to both parents of an individual and in turn to that individual. This information on the time to MRCA can be combined with the time to MRCA inferred from coalescent models of gene genealogies. HBD can also be estimated from genomic data. The extent to which the genomic HBD measures correspond to the genealogical/coalescent measures has not been documented in founder populations with extensive genealogical data. In this study, we used simulations to relate genomic and genealogical/coalescent HBD measures. We based our simulations on genealogical data from two ongoing studies from the French-Canadian founder population displaying different levels of inbreeding. We simulated single-nucleotide polymorphisms (SNPs) in a 1-Mb genomic segment from a coalescent model in conjunction with the observed genealogical data. We compared genealogical/coalescent HBD to two genomic methods of HBD estimation based on hidden Markov models (HMMs). We found that genomic estimates of HBD correlated well with genealogical/coalescent HBD measures in both study genealogies. We described generation time to coalescence in terms of genomic HBD estimates and found a large variability in generation time captured by genomic HBD when considering each SNP. However, SNPs in longer segments were more likely to capture recent time to coalescence, as expected. Our study suggests that estimating the coalescent gene genealogy from the genomic data to use in conjunction with observed genealogical data could provide valuable information on HBD.
Collapse
Affiliation(s)
- Kelly M. Burkett
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Mohan Rakesh
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| | - Patricia Morris
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Hélène Vézina
- Projet BALSAC, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
- Département des Sciences Humaines et Sociales, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
- Centre Intersectoriel en Santé Durable, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
| | - Catherine Laprise
- Centre Intersectoriel en Santé Durable, Université du Québec à Chicoutimi, Chicoutimi, QC, Canada
- Département des Sciences Fondamentales, Université Du Québec à Chicoutimi, Chicoutimi, QC, Canada
| | - Ellen E. Freeman
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
- Centre de Recherche, Hĉpital Maisonneuve-Rosemont, Montréal, QC, Canada
| | - Marie-Hélène Roy-Gagnon
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
- *Correspondence: Marie-Hélène Roy-Gagnon,
| |
Collapse
|
6
|
Thumma BR, Joyce KR, Jacobs A. Genomic studies with preselected markers reveal dominance effects influencing growth traits in Eucalyptus nitens. G3 GENES|GENOMES|GENETICS 2022; 12:6423988. [PMID: 34791210 PMCID: PMC8728041 DOI: 10.1093/g3journal/jkab363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 10/13/2021] [Indexed: 11/17/2022]
Abstract
Genomic selection (GS) is being increasingly adopted by the tree breeding community. Most of the GS studies in trees are focused on estimating additive genetic effects. Exploiting the dominance effects offers additional opportunities to improve genetic gain. To detect dominance effects, trait-relevant markers may be important compared to nonselected markers. Here, we used preselected markers to study the dominance effects in a Eucalyptus nitens (E. nitens) breeding population consisting of open-pollinated (OP) and controlled-pollinated (CP) families. We used 8221 trees from six progeny trials in this study. Of these, 868 progeny and 255 parents were genotyped with the E. nitens marker panel. Three traits; diameter at breast height (DBH), wood basic density (DEN), and kraft pulp yield (KPY) were analyzed. Two types of genomic relationship matrices based on identity-by-state (IBS) and identity-by-descent (IBD) were tested. Performance of the genomic best linear unbiased prediction (GBLUP) models with IBS and IBD matrices were compared with pedigree-based additive best linear unbiased prediction (ABLUP) models with and without the pedigree reconstruction. Similarly, the performance of the single-step GBLUP (ssGBLUP) with IBS and IBD matrices were compared with ABLUP models using all 8221 trees. Significant dominance effects were observed with the GBLUP-AD model for DBH. The predictive ability of DBH is higher with the GBLUP-AD model compared to other models. Similarly, the prediction accuracy of genotypic values is higher with GBLUP-AD compared to the GBLUP-A model. Among the two GBLUP models (IBS and IBD), no differences were observed in predictive abilities and prediction accuracies. While the estimates of predictive ability with additive effects were similar among all four models, prediction accuracies of ABLUP were lower than the GBLUP models. The prediction accuracy of ssGBLUP-IBD is higher than the other three models while the theoretical accuracy of ssGBLUP-IBS is consistently higher than the other three models across all three groups tested (parents, genotyped, and nongenotyped). Significant inbreeding depression was observed for DBH and KPY. While there is a linear relationship between inbreeding and DBH, the relationship between inbreeding and KPY is nonlinear and quadratic. These results indicate that the inbreeding depression of DBH is mainly due to directional dominance while in KPY it may be due to epistasis. Inbreeding depression may be the main source of the observed dominance effects in DBH. The significant dominance effect observed for DBH may be used to select complementary parents to improve the genetic merit of the progeny in E. nitens.
Collapse
Affiliation(s)
- Bala R Thumma
- Gondwana Genomics Pty Ltd , Canberra, ACT 2600, Australia
| | | | | |
Collapse
|
7
|
Jurcic EJ, Villalba PV, Pathauer PS, Palazzini DA, Oberschelp GPJ, Harrand L, Garcia MN, Aguirre NC, Acuña CV, Martínez MC, Rivas JG, Cisneros EF, López JA, Poltri SNM, Munilla S, Cappa EP. Single-step genomic prediction of Eucalyptus dunnii using different identity-by-descent and identity-by-state relationship matrices. Heredity (Edinb) 2021; 127:176-189. [PMID: 34145424 PMCID: PMC8322403 DOI: 10.1038/s41437-021-00450-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 02/05/2023] Open
Abstract
Genomic selection based on the single-step genomic best linear unbiased prediction (ssGBLUP) approach is becoming an important tool in forest tree breeding. The quality of the variance components and the predictive ability of the estimated breeding values (GEBV) depends on how well marker-based genomic relationships describe the actual genetic relationships at unobserved causal loci. We investigated the performance of GEBV obtained when fitting models with genomic covariance matrices based on two identity-by-descent (IBD) and two identity-by-state (IBS) relationship measures. Multiple-trait multiple-site ssGBLUP models were fitted to diameter and stem straightness in five open-pollinated progeny trials of Eucalyptus dunnii, genotyped using the EUChip60K. We also fitted the conventional ABLUP model with a pedigree-based covariance matrix. Estimated relationships from the IBD estimators displayed consistently lower standard deviations than those from the IBS approaches. Although ssGBLUP based in IBS estimators resulted in higher trait-site heritabilities, the gain in accuracy of the relationships using IBD estimators has resulted in higher predictive ability and lower bias of GEBV, especially for low-heritability trait-site. ssGBLUP based on IBS and IBD approaches performed considerably better than the traditional ABLUP. In summary, our results advocate the use of the ssGBLUP approach jointly with the IBD relationship matrix in open-pollinated forest tree evaluation.
Collapse
Affiliation(s)
- Esteban J Jurcic
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina.
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina.
| | - Pamela V Villalba
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Pablo S Pathauer
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
| | - Dino A Palazzini
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
| | - Gustavo P J Oberschelp
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Concordia, Entre Ríos, Argentina
| | - Leonel Harrand
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Concordia, Entre Ríos, Argentina
| | - Martín N Garcia
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Natalia C Aguirre
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Cintia V Acuña
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - María C Martínez
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Juan G Rivas
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Esteban F Cisneros
- Facultad de Ciencias Forestales, Universidad Nacional de Santiago del Estero, Santiago del Estero, Argentina
| | - Juan A López
- Instituto Nacional de Tecnología Agropecuaria (INTA), Estación Experimental Agropecuaria Bella Vista, Corrientes, Argentina
| | - Susana N Marcucci Poltri
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo), INTA-CONICET, Buenos Aires, Argentina
| | - Sebastián Munilla
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Eduardo P Cappa
- Instituto Nacional de Tecnología Agropecuaria (INTA), Instituto de Recursos Biológicos, Centro de Investigación en Recursos Naturales, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
8
|
Mollon J, Knowles EEM, Mathias SR, Rodrigue A, Moore TM, Calkins ME, Gur RC, Peralta JM, Weiner DJ, Robinson EB, Gur RE, Blangero J, Almasy L, Glahn DC. Genetic influences on externalizing psychopathology overlap with cognitive functioning and show developmental variation. Eur Psychiatry 2021; 64:e29. [PMID: 33785081 PMCID: PMC8080212 DOI: 10.1192/j.eurpsy.2021.21] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Questions remain regarding whether genetic influences on early life psychopathology overlap with cognition and show developmental variation. METHODS Using data from 9,421 individuals aged 8-21 from the Philadelphia Neurodevelopmental Cohort, factors of psychopathology were generated using a bifactor model of item-level data from a psychiatric interview. Five orthogonal factors were generated: anxious-misery (mood and anxiety), externalizing (attention deficit hyperactivity and conduct disorder), fear (phobias), psychosis-spectrum, and a general factor. Genetic analyses were conducted on a subsample of 4,662 individuals of European American ancestry. A genetic relatedness matrix was used to estimate heritability of these factors, and genetic correlations with executive function, episodic memory, complex reasoning, social cognition, motor speed, and general cognitive ability. Gene × Age analyses determined whether genetic influences on these factors show developmental variation. RESULTS Externalizing was heritable (h2 = 0.46, p = 1 × 10-6), but not anxious-misery (h2 = 0.09, p = 0.183), fear (h2 = 0.04, p = 0.337), psychosis-spectrum (h2 = 0.00, p = 0.494), or general psychopathology (h2 = 0.21, p = 0.040). Externalizing showed genetic overlap with face memory (ρg = -0.412, p = 0.004), verbal reasoning (ρg = -0.485, p = 0.001), spatial reasoning (ρg = -0.426, p = 0.010), motor speed (ρg = 0.659, p = 1x10-4), verbal knowledge (ρg = -0.314, p = 0.002), and general cognitive ability (g)(ρg = -0.394, p = 0.002). Gene × Age analyses revealed decreasing genetic variance (γg = -0.146, p = 0.004) and increasing environmental variance (γe = 0.059, p = 0.009) on externalizing. CONCLUSIONS Cognitive impairment may be a useful endophenotype of externalizing psychopathology and, therefore, help elucidate its pathophysiological underpinnings. Decreasing genetic variance suggests that gene discovery efforts may be more fruitful in children than adolescents or young adults.
Collapse
Affiliation(s)
- Josephine Mollon
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Emma E M Knowles
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Samuel R Mathias
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Amanda Rodrigue
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Tyler M Moore
- Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, Penn-CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Monica E Calkins
- Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, Penn-CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ruben C Gur
- Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, Penn-CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Juan Manuel Peralta
- South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas of the Rio Grande Valley, Brownsville, Texas, USA
| | - Daniel J Weiner
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Elise B Robinson
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Raquel E Gur
- Brain Behavior Laboratory, Department of Psychiatry, Perelman School of Medicine, Penn-CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas of the Rio Grande Valley, Brownsville, Texas, USA
| | - Laura Almasy
- Department of Genetics, Perelman School of Medicine, Penn-CHOP Lifespan Brain Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - David C Glahn
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.,Olin Neuropsychiatry Research Center, Institute of Living, Hartford, Connecticut, USA
| |
Collapse
|
9
|
Genetic influence on cognitive development between childhood and adulthood. Mol Psychiatry 2021; 26:656-665. [PMID: 30644433 PMCID: PMC6570578 DOI: 10.1038/s41380-018-0277-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 08/15/2018] [Accepted: 09/11/2018] [Indexed: 12/17/2022]
Abstract
Successful cognitive development between childhood and adulthood has important consequences for future mental and physical wellbeing, as well as occupational and financial success. Therefore, delineating the genetic influences underlying changes in cognitive abilities during this developmental period will provide important insights into the biological mechanisms that govern both typical and atypical maturation. Using data from the Philadelphia Neurodevelopmental Cohort (PNC), a large population-based sample of individuals aged 8 to 21 years old (n = 6634), we used an empirical relatedness matrix to establish the heritability of general and specific cognitive functions and determine if genetic factors influence cognitive maturation (i.e., Gene × Age interactions) between childhood and early adulthood. We found that neurocognitive measures across childhood and early adulthood were significantly heritable. Moreover, genetic variance on general cognitive ability, or g, increased significantly between childhood and early adulthood. Finally, we did not find evidence for decay in genetic correlation on neurocognition throughout childhood and adulthood, suggesting that the same genetic factors underlie cognition at different ages throughout this developmental period. Establishing significant Gene × Age interactions in neurocognitive functions across childhood and early adulthood is a necessary first step in identifying genes that influence cognitive development, rather than genes that influence cognition per se. Moreover, since aberrant cognitive development confers risk for several psychiatric disorders, further examination of these Gene × Age interactions may provide important insights into their etiology.
Collapse
|
10
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
11
|
Barrera-Redondo J, Piñero D, Eguiarte LE. Genomic, Transcriptomic and Epigenomic Tools to Study the Domestication of Plants and Animals: A Field Guide for Beginners. Front Genet 2020; 11:742. [PMID: 32760427 PMCID: PMC7373799 DOI: 10.3389/fgene.2020.00742] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 06/22/2020] [Indexed: 01/07/2023] Open
Abstract
In the last decade, genomics and the related fields of transcriptomics and epigenomics have revolutionized the study of the domestication process in plants and animals, leading to new discoveries and new unresolved questions. Given that some domesticated taxa have been more studied than others, the extent of genomic data can range from vast to nonexistent, depending on the domesticated taxon of interest. This review is meant as a rough guide for students and academics that want to start a domestication research project using modern genomic tools, as well as for researchers already conducting domestication studies that are interested in following a genomic approach and looking for alternate strategies (cheaper or more efficient) and future directions. We summarize the theoretical and technical background needed to carry out domestication genomics, starting from the acquisition of a reference genome and genome assembly, to the sampling design for population genomics, paleogenomics, transcriptomics, epigenomics and experimental validation of domestication-related genes. We also describe some examples of the aforementioned approaches and the relevant discoveries they made to understand the domestication of the studied taxa.
Collapse
Affiliation(s)
| | | | - Luis E. Eguiarte
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
12
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
13
|
Abney M, ElSherbiny A. Kinpute: using identity by descent to improve genotype imputation. Bioinformatics 2019; 35:4321-4326. [PMID: 30918937 PMCID: PMC6821425 DOI: 10.1093/bioinformatics/btz221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 02/21/2019] [Accepted: 03/26/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information-due to recent, familial relatedness or distant, unknown ancestors-in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. RESULTS Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. AVAILABILITY AND IMPLEMENTATION Kinpute is an open-source and freely available C++ software package that can be downloaded from. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark Abney
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Aisha ElSherbiny
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
14
|
Herzig AF, Nutile T, Ruggiero D, Ciullo M, Perdry H, Leutenegger AL. Detecting the dominance component of heritability in isolated and outbred human populations. Sci Rep 2018; 8:18048. [PMID: 30575761 PMCID: PMC6303332 DOI: 10.1038/s41598-018-36050-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 11/10/2018] [Indexed: 11/21/2022] Open
Abstract
Inconsistencies between published estimates of dominance heritability between studies of human genetic isolates and human outbred populations incite investigation into whether such differences result from particular trait architectures or specific population structures. We analyse simulated datasets, characteristic of genetic isolates and of unrelated individuals, before analysing the isolate of Cilento for various commonly studied traits. We show the strengths of using genetic relationship matrices for variance decomposition over identity-by-descent based methods in a population isolate and that heritability estimates in isolates will avoid the downward biases that may occur in studies of samples of unrelated individuals; irrespective of the simulated distribution of causal variants. Yet, we also show that precise estimates of dominance in isolates are demonstrably problematic in the presence of shared environmental effects and such effects should be accounted for. Nevertheless, we demonstrate how studying isolates can help determine the existence or non-existence of dominance for complex traits, and we find strong indications of non-zero dominance for low-density lipoprotein level in Cilento. Finally, we recommend future study designs to analyse trait variance decomposition from ensemble data across multiple population isolates.
Collapse
Affiliation(s)
- Anthony F Herzig
- Inserm, U946, Genetic variation and Human diseases, Paris, France. .,Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy
| | - Daniela Ruggiero
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso - CNR, Naples, Italy. .,IRCCS Neuromed, Pozzilli, Isernia, Italy.
| | - Hervé Perdry
- Université Paris-Saclay, University. Paris-Sud, Inserm, CESP, Villejuif, France
| | - Anne-Louise Leutenegger
- Inserm, U946, Genetic variation and Human diseases, Paris, France.,Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France
| |
Collapse
|
15
|
Fazia T, Pastorino R, Foco L, Han L, Abney M, Beecham A, Hadjixenofontos A, Guo H, Gentilini D, Papachristou C, Bitti PP, Ticca A, Berzuini C, McCauley JL, Bernardinelli L. Investigating multiple sclerosis genetic susceptibility on the founder population of east-central Sardinia via association and linkage analysis of immune-related loci. Mult Scler 2018; 24:1815-1824. [PMID: 28933650 DOI: 10.1177/1352458517732841] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
BACKGROUND A wealth of single-nucleotide polymorphisms (SNPs) responsible for multiple sclerosis (MS) susceptibility have been identified; however, they explain only a fraction of MS heritability. OBJECTIVES We contributed to discovery of new MS susceptibility SNPs by studying a founder population with high MS prevalence. METHODS We analyzed ImmunoChip data from 15 multiplex families and 94 unrelated controls from the Nuoro Province, Sardinia, Italy. We tested each SNP for both association and linkage with MS, the linkage being explored in terms of identity-by-descent (IBD) sharing excess and using gene dropping to compute a corresponding empirical p-value. By targeting regions that are both associated and in linkage with MS, we increase chances of identifying interesting genomic regions. RESULTS We identified 486 MS-associated (p < 1 × 10-4) and 18,426 MS-linked (p < 0.05) SNPs. A total of 111 loci were both linked and associated with MS, 18 of them pointing to 14 non-major histocompatibility complex (MHC) genes, and 93 of them located in the MHC region. CONCLUSION We discovered new suggestive signals and confirmed some previously identified ones. We believe this to represent a significant step toward an understanding of the genetic basis of MS.
Collapse
Affiliation(s)
- Teresa Fazia
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| | - Roberta Pastorino
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| | - Luisa Foco
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy; Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Bolzano, Italy
| | - Lide Han
- Department of Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Ashley Beecham
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Athena Hadjixenofontos
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Hui Guo
- Center for Biostatistics, Institute of Population Health, The University of Manchester, Manchester, UK
| | - Davide Gentilini
- Unità di Bioinformatica e Statistica Genomica, Istituto Auxologico Italiano-IRCCS, Milano, Italy
| | | | - Pier Paolo Bitti
- Immunoematologia e Medicina Trasfusionale, Ospedale "San Francesco" Nuoro, ASSL Nuoro, Azienda Tutela Salute Sardegna, Nuoro, Italy
| | - Anna Ticca
- Neurologia e Stroke Unit, Ospedale "San Francesco" Nuoro, ASSL Nuoro, Azienda Tutela Salute Sardegna, Nuoro, Italy
| | - Carlo Berzuini
- Center for Biostatistics, Institute of Population Health, The University of Manchester, Manchester, UK
| | - Jacob L McCauley
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Luisa Bernardinelli
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| |
Collapse
|
16
|
Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLoS Genet 2018; 14:e1007279. [PMID: 29791438 PMCID: PMC5988311 DOI: 10.1371/journal.pgen.1007279] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 06/05/2018] [Accepted: 02/26/2018] [Indexed: 12/30/2022] Open
Abstract
Identification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized in analysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as having many highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases. There are growing concerns over the emergence of antimicrobial drug resistance, which threatens the efficacy of treatments for infectious diseases such as malaria. As such, it is important to understand the dynamics of resistance by investigating population structure, natural selection and disease transmission in microorganisms. The study of disease dynamics has been hampered by the lack of suitable statistical models for analysis of isolates containing multiple infections. We introduce a statistical model that uses population genomic data to identify genomic regions (loci) that are inherited from a common ancestor, in the presence of multiple infections. We demonstrate its potential for biological discovery using a global Plasmodium falciparum dataset. We identify low genetic diversity in isolates from Southeast Asia, possibly from clonal expansion following intensified control efforts after the emergence of artemisinin resistance. We also identify loci under positive selection, most of which contain genes that have been associated with antimalarial drug resistance. We discover two loci under strong selection in multiple countries throughout Southeast Asia and Africa where the selection pressure is currently unknown. We find that the selection pressure at one of these loci has originated from gene flow, while the other loci has originated from multiple independent events.
Collapse
|
17
|
Hsueh WC, Bennett PH, Esparza-Romero J, Urquidez-Romero R, Valencia ME, Ravussin E, Williams RC, Knowler WC, Baier LJ, Schulz LO, Hanson RL. Analysis of type 2 diabetes and obesity genetic variants in Mexican Pima Indians: Marked allelic differentiation among Amerindians at HLA. Ann Hum Genet 2018; 82:287-299. [PMID: 29774533 DOI: 10.1111/ahg.12252] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 02/11/2018] [Accepted: 03/08/2018] [Indexed: 01/21/2023]
Abstract
Prevalence of diabetes and obesity in Mexican Pima Indians is low, while prevalence in US Pima Indians is high. Although lifestyle likely accounts for much of the difference, the role of genetic factors is not well explored. To examine this, we genotyped 359 single nucleotide polymorphisms, including established type 2 diabetes and obesity variants from genome-wide association studies (GWAS) and 96 random markers, in 342 Mexican Pimas. A multimarker risk score of obesity variants was associated with body mass index (BMI; β = 0.81 kg/m2 per SD, P = 0.0066). The mean value of the score was lower in Mexican Pimas than in US Pimas (P = 4.3 × 10-11 ), and differences in allele frequencies at established loci could account for approximately 7% of the population difference in BMI; however, the difference in risk scores was consistent with evolutionary neutrality given genetic distance. To identify loci potentially under recent natural selection, allele frequencies at 283 variants were compared between US and Mexican Pimas, accounting for genetic distance. The largest differences were seen at HLA markers (e.g., rs9271720, difference = 0.75, P = 8.7 × 10-9 ); genetic distances at HLA were greater than at random markers (P = 1.6 × 10-46 ). Analyses of GWAS data in 937 US Pimas also showed sharing of alleles identical by descent at HLA that exceeds its genomic expectation (P = 7.0 × 10-10 ). These results suggest that, in addition to the widely recognized balancing selection at HLA, recent directional selection may also occur, resulting in marked allelic differentiation between closely related populations.
Collapse
Affiliation(s)
- Wen-Chi Hsueh
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Peter H Bennett
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Julian Esparza-Romero
- Departamento de Nutrición Pública y Salud, Coordinación de Nutrición, Centro de Investigación en Alimentación y Desarrollo, Hermosillo, Sonora, México
| | - Rene Urquidez-Romero
- Instituto de Ciencias Biomédicas, Departamento de Ciencias de la Salud, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, Chihuahua, México
| | - Mauro E Valencia
- Departamento de Nutrición Pública y Salud, Coordinación de Nutrición, Centro de Investigación en Alimentación y Desarrollo, Hermosillo, Sonora, México
| | - Eric Ravussin
- Pennington Biomedical Research Center, Louisiana State University Systems, Baton Rouge, LA, USA
| | - Robert C Williams
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - William C Knowler
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Leslie J Baier
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Leslie O Schulz
- College of Health and Human Services, Northern Arizona University, Flagstaff, AZ, USA
| | - Robert L Hanson
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| |
Collapse
|
18
|
Discrimination of relationships with the same degree of kinship using chromosomal sharing patterns estimated from high-density SNPs. Forensic Sci Int Genet 2017; 33:10-16. [PMID: 29172066 DOI: 10.1016/j.fsigen.2017.11.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/07/2017] [Accepted: 11/14/2017] [Indexed: 11/21/2022]
Abstract
Distinguishing relationships with the same degree of kinship (e.g., uncle-nephew and grandfather-grandson) is generally difficult in forensic genetics by using the commonly employed short tandem repeat loci. In this study, we developed a new method for discerning such relationships between two individuals by examining the number of chromosomal shared segments estimated from high-density single nucleotide polymorphisms (SNPs). We computationally generated second-degree kinships (i.e., uncle-nephew and grandfather-grandson) and third-degree kinships (i.e., first cousins and great-grandfather-great-grandson) for 174,254 autosomal SNPs considering the effect of linkage disequilibrium and recombination for each SNP. We investigated shared chromosomal segments between two individuals that were estimated based on identity by state regions. We then counted the number of segments in each pair. Based on our results, the number of shared chromosomal segments in collateral relationships was larger than that in lineal relationships with both the second-degree and third-degree kinships. This was probably caused by differences involving chromosomal transitions and recombination between relationships. As we probabilistically evaluated the relationships between simulated pairs based on the number of shared segments using logistic regression, we could determine accurate relationships in >90% of second-degree relatives and >70% of third-degree relatives, using a probability criterion for the relationship ≥0.9. Furthermore, we could judge the true relationships of actual sample pairs from volunteers, as well as simulated data. Therefore, this method can be useful for discerning relationships between two individuals with the same degree of kinship.
Collapse
|
19
|
Abstract
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.
Collapse
|
20
|
Robust Inference of Identity by Descent from Exome-Sequencing Data. Am J Hum Genet 2016; 99:1106-1116. [PMID: 27745837 DOI: 10.1016/j.ajhg.2016.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 09/13/2016] [Indexed: 01/26/2023] Open
Abstract
Identifying and characterizing genomic regions that are shared identical by descent (IBD) among individuals can yield insight into population history, facilitate the identification of adaptively evolving loci, and be an important tool in disease gene mapping. Although increasingly large collections of exome sequences have been generated, it is challenging to detect IBD segments in exomes, precluding many potentially informative downstream analyses. Here, we describe an approach, ExIBD, to robustly detect IBD segments in exome-sequencing data, rigorously evaluate its performance, and apply this method to high-coverage exomes from 6,515 European and African Americans. Furthermore, we show how IBD networks, constructed from patterns of pairwise IBD between individuals, and principles from graph theory provide insight into recent population history and reveal cryptic population structure in European Americans. Our results enable IBD analyses to be performed on exome data, which will expand the scope of inferences that can be made from existing massively large exome-sequencing datasets.
Collapse
|
21
|
Morimoto C, Manabe S, Kawaguchi T, Kawai C, Fujimoto S, Hamano Y, Yamada R, Matsuda F, Tamaki K. Pairwise Kinship Analysis by the Index of Chromosome Sharing Using High-Density Single Nucleotide Polymorphisms. PLoS One 2016; 11:e0160287. [PMID: 27472558 PMCID: PMC4966930 DOI: 10.1371/journal.pone.0160287] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/15/2016] [Indexed: 11/18/2022] Open
Abstract
We developed a new approach for pairwise kinship analysis in forensic genetics based on chromosomal sharing between two individuals. Here, we defined "index of chromosome sharing" (ICS) calculated using 174,254 single nucleotide polymorphism (SNP) loci typed by SNP microarray and genetic length of the shared segments from the genotypes of two individuals. To investigate the expected ICS distributions from first- to fifth-degree relatives and unrelated pairs, we used computationally generated genotypes to consider the effect of linkage disequilibrium and recombination. The distributions were used for probabilistic evaluation of the pairwise kinship analysis, such as likelihood ratio (LR) or posterior probability, without allele frequencies and haplotype frequencies. Using our method, all actual sample pairs from volunteers showed significantly high LR values (i.e., ≥ 108); therefore, we can distinguish distant relationships (up to the fifth-degree) from unrelated pairs based on LR. Moreover, we can determine accurate degrees of kinship in up to third-degree relationships with a probability of > 80% using the criterion of posterior probability ≥ 0.90, even if the kinship of the pair is totally unpredictable. This approach greatly improves pairwise kinship analysis of distant relationships, specifically in cases involving identification of disaster victims or missing persons.
Collapse
Affiliation(s)
- Chie Morimoto
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Sho Manabe
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Takahisa Kawaguchi
- Unit of Human Disease Genomics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Chihiro Kawai
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Shuntaro Fujimoto
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yuya Hamano
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Forensic Science Laboratory, Kyoto Prefectural Police Headquarters, Kyoto, Japan
| | - Ryo Yamada
- Unit of Statistical Genetics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Fumihiko Matsuda
- Unit of Human Disease Genomics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Keiji Tamaki
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- * E-mail:
| |
Collapse
|
22
|
Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3-GENES GENOMES GENETICS 2016; 6:1287-96. [PMID: 26935417 PMCID: PMC4856080 DOI: 10.1534/g3.116.027581] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
Collapse
|
23
|
Genome-Wide Association Studies of the Human Gut Microbiota. PLoS One 2015; 10:e0140301. [PMID: 26528553 PMCID: PMC4631601 DOI: 10.1371/journal.pone.0140301] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 09/05/2015] [Indexed: 12/17/2022] Open
Abstract
The bacterial composition of the human fecal microbiome is influenced by many lifestyle factors, notably diet. It is less clear, however, what role host genetics plays in dictating the composition of bacteria living in the gut. In this study, we examined the association of ~200K host genotypes with the relative abundance of fecal bacterial taxa in a founder population, the Hutterites, during two seasons (n = 91 summer, n = 93 winter, n = 57 individuals collected in both). These individuals live and eat communally, minimizing variation due to environmental exposures, including diet, which could potentially mask small genetic effects. Using a GWAS approach that takes into account the relatedness between subjects, we identified at least 8 bacterial taxa whose abundances were associated with single nucleotide polymorphisms in the host genome in each season (at genome-wide FDR of 20%). For example, we identified an association between a taxon known to affect obesity (genus Akkermansia) and a variant near PLD1, a gene previously associated with body mass index. Moreover, we replicate a previously reported association from a quantitative trait locus (QTL) mapping study of fecal microbiome abundance in mice (genus Lactococcus, rs3747113, P = 3.13 x 10−7). Finally, based on the significance distribution of the associated microbiome QTLs in our study with respect to chromatin accessibility profiles, we identified tissues in which host genetic variation may be acting to influence bacterial abundance in the gut.
Collapse
|
24
|
Gazal S, Génin E, Leutenegger AL. Relationship inference from the genetic data on parents or offspring: A comparative study. Theor Popul Biol 2015; 107:31-8. [PMID: 26431644 DOI: 10.1016/j.tpb.2015.09.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 09/16/2015] [Accepted: 09/17/2015] [Indexed: 02/08/2023]
Abstract
Relationship inference in a population is of interest for many areas of research from anthropology to genetics. It is possible to directly infer the relationship between the two individuals in a couple from their genetic data or to indirectly infer it from the genetic data of one of their offspring. For this reason, one can wonder if it is more advantageous to sample couples or single individuals to study relationships of couples in a population. Indeed, sampling two individuals is more informative than sampling one as we are looking at four haplotypes instead of two, but it also doubles the cost of the study and is a more complex sampling scheme. To answer this question, we performed simulations of 1000 trios from 10 different relationships using real human haplotypes to have realistic genome-wide genetic data. Then, we compared the genome sharing coefficients and the relationship inference obtained from either a pair of individuals or one of their offspring using both single-point and multi-point approaches. We observed that for relationships closer than 1st cousin, pairs of individuals were more informative than one of their offspring for relationship inference, and kinship coefficients obtained from single-point methods gave more accurate or equivalent genome sharing estimations. For more remote relationships, offspring were more informative for relationship inference, and inbreeding coefficients obtained from multi-point methods gave more accurate genome sharing estimations. In conclusion, relationship inference on a parental pair or on one of their offspring provides complementary information. When possible, sampling trios should be encouraged as it could allow spanning a wider range of potential relationships.
Collapse
Affiliation(s)
- Steven Gazal
- Inserm, UMR 1137, IAME, Paris, France; Université Paris Diderot, Sorbonne Paris Cité, UMR 1137, Paris, France; Plateforme de Génétique constitutionnelle-Nord (PfGC-Nord), Paris, France
| | - Emmanuelle Génin
- Inserm, UMR 1078, Brest, France; Université Bretagne Occidentale, Brest, France; Centre Hospitalier Régional Universitaire, Brest, France
| | - Anne-Louise Leutenegger
- Inserm, U946, Genetic Variation and Human Diseases Lab, Paris, France; Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, UMR 946, Paris, France.
| |
Collapse
|
25
|
Hunter-Zinck H, Clark AG. Aberrant Time to Most Recent Common Ancestor as a Signature of Natural Selection. Mol Biol Evol 2015; 32:2784-97. [PMID: 26093129 DOI: 10.1093/molbev/msv142] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Natural selection inference methods often target one mode of selection of a particular age and strength. However, detecting multiple modes simultaneously, or with atypical representations, would be advantageous for understanding a population's evolutionary history. We have developed an anomaly detection algorithm using distributions of pairwise time to most recent common ancestor (TMRCA) to simultaneously detect multiple modes of natural selection in whole-genome sequences. As natural selection distorts local genealogies in distinct ways, the method uses pairwise TMRCA distributions, which approximate genealogies at a nonrecombining locus, to detect distortions without targeting a specific mode of selection. We evaluate the performance of our method, TSel, for both positive and balancing selection over different time-scales and selection strengths and compare TSel's performance with that of other methods. We then apply TSel to the Complete Genomics diversity panel, a set of human whole-genome sequences, and recover loci previously inferred to be under positive or balancing selection.
Collapse
Affiliation(s)
- Haley Hunter-Zinck
- Department of Biological Statistics and Computational Biology, Cornell University
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University
| |
Collapse
|
26
|
Carmi S, Wilton PR, Wakeley J, Pe'er I. A renewal theory approach to IBD sharing. Theor Popul Biol 2014; 97:35-48. [PMID: 25149691 DOI: 10.1016/j.tpb.2014.08.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Revised: 07/30/2014] [Accepted: 08/08/2014] [Indexed: 10/24/2022]
Abstract
A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be identical-by-descent (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC'), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC') for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.
Collapse
Affiliation(s)
- Shai Carmi
- Department of Computer Science, Columbia University, New York, NY, 10027, USA.
| | - Peter R Wilton
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY, 10027, USA
| |
Collapse
|
27
|
Durand EY, Eriksson N, McLean CY. Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis. Mol Biol Evol 2014; 31:2212-22. [PMID: 24784137 PMCID: PMC4104314 DOI: 10.1093/molbev/msu151] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.
Collapse
|
28
|
Abstract
The past fifty years have seen the development and application of numerous statistical methods to identify genomic regions that appear to be shaped by natural selection. These methods have been used to investigate the macro- and microevolution of a broad range of organisms, including humans. Here, we provide a comprehensive outline of these methods, explaining their conceptual motivations and statistical interpretations. We highlight areas of recent and future development in evolutionary genomics methods and discuss ongoing challenges for researchers employing such tests. In particular, we emphasize the importance of functional follow-up studies to characterize putative selected alleles and the use of selection scans as hypothesis-generating tools for investigating evolutionary histories.
Collapse
Affiliation(s)
- Joseph J Vitti
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138; ,
| | | | | |
Collapse
|
29
|
Gazal S, Sahbatou M, Babron MC, Génin E, Leutenegger AL. FSuite: exploiting inbreeding in dense SNP chip and exome data. ACTA ACUST UNITED AC 2014; 30:1940-1. [PMID: 24632498 DOI: 10.1093/bioinformatics/btu149] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
UNLABELLED FSuite is a user-friendly pipeline developed for exploiting inbreeding information derived from human genomic data. It can make use of single nucleotide polymorphism chip or exome data. Compared with other software, the advantage of FSuite is that it provides a complete suite of scripts to describe and use the inbreeding information. It includes a module to detect inbred individuals and estimate their inbreeding coefficient, a module to describe the proportion of different mating types in the population and the individual probability to be offspring of different mating types that can be useful for population genetic studies. It also allows the identification of shared regions of homozygosity between affected individuals (homozygosity mapping) that can be used to identify rare recessive mutations involved in monogenic or multifactorial diseases. AVAILABILITY AND IMPLEMENTATION FSuite is developed in Perl and uses R functions to generate graphical outputs. This pipeline is freely available under GNU GPL license at: http://genestat.cephb.fr/software/index.php/FSuite.
Collapse
Affiliation(s)
- Steven Gazal
- Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France
| | - Mourad Sahbatou
- Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France
| | - Marie-Claude Babron
- Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France
| | - Emmanuelle Génin
- Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France
| | - Anne-Louise Leutenegger
- Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France Inserm, U946, Genetic variability and human diseases, Paris, 75010, Université Paris Sud, Kremlin-Bicêtre, 94270, Fondation Jean Dausset CEPH, Paris, 75010, Université Paris-Diderot, UMR 946, Institut Universitaire d'Hématologie, Paris, 75475, Inserm, U1078, Génétique, Génomique fonctionnelle et Biotechnologies, Brest, 29218 and Centre Hospitalier Régional Universitaire de Brest, Brest, 29200, France
| |
Collapse
|
30
|
Li MJ, Wang LY, Xia Z, Wong MP, Sham PC, Wang J. dbPSHP: a database of recent positive selection across human populations. Nucleic Acids Res 2014; 42:D910-6. [PMID: 24194603 PMCID: PMC3965004 DOI: 10.1093/nar/gkt1052] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2013] [Revised: 10/04/2013] [Accepted: 10/11/2013] [Indexed: 12/31/2022] Open
Abstract
The dbPSHP database (http://jjwanglab.org/dbpshp) aims to help researchers to efficiently identify, validate and visualize putative positively selected loci in human evolution and further discover the mechanism governing these natural selections. Recent evolution of human populations at the genomic level reflects the adaptations to the living environments, including climate change and availability and stability of nutrients. Many genetic regions under positive selection have been identified, which assist us to understand how natural selection has shaped population differences. Here, we manually collect recent positive selections in different human populations, consisting of 15,472 loci from 132 publications. We further compiled a database that used 15 statistical terms of different evolutionary attributes for single nucleotide variant sites from the HapMap 3 and 1000 Genomes Project to identify putative regions under positive selection. These attributes include variant allele/genotype properties, variant heterozygosity, within population diversity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provide interactive pages for visualization and annotation of different selective signals. The database is freely available to the public and will be frequently updated.
Collapse
Affiliation(s)
- Mulin Jun Li
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Lily Yan Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Zhengyuan Xia
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maria P. Wong
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China, Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China, State Key Laboratory in Cognitive and Brain Sciences, The University of Hong Kong, Hong Kong SAR, China and Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
31
|
Abstract
Summary: Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent. Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and they can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and they were used to reconstruct recent demographic events in several populations. We here extend the framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formulation in several demographic scenarios, compare our approach with methods based on ancestry deconvolution and use this method to analyze Masai samples from the HapMap 3 dataset. Availability: DoRIS, a Java implementation of the proposed method, and its source code are freely available at http://www.cs.columbia.edu/∼pier/doris. Contact: itsik@cs.columbia.edu
Collapse
Affiliation(s)
- Pier Francesco Palamara
- Department of Computer Science, Columbia University, 500 West 120th, New York City, NY 10027, USA
| | | |
Collapse
|
32
|
Practical considerations regarding the use of genotype and pedigree data to model relatedness in the context of genome-wide association studies. G3-GENES GENOMES GENETICS 2013; 3:1861-7. [PMID: 23979941 PMCID: PMC3789811 DOI: 10.1534/g3.113.007948] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genome-wide association studies of complex traits often are complicated by relatedness among individuals. Ignoring or inappropriately accounting for relatedness often results in inflated type I error rates. Either genotype or pedigree data can be used to estimate relatedness for use in mixed-models when undertaking quantitative trait locus mapping. We performed simulations to investigate methods for controlling type I error and optimizing power considering both full and partial pedigrees and, similarly, both sparse and dense marker coverage; we also examined real data sets. (1) When marker density was low, estimating relatedness by genotype data alone failed to control the type I error rate; (2) this was resolved by combining both genotype and pedigree data. (3) When sufficiently dense marker data were used to estimate relatedness, type I error was well controlled and power increased; however, (4) this was only true when the relatedness was estimated using genotype data that excluded genotypes on the chromosome currently being scanned for a quantitative trait locus.
Collapse
|
33
|
Abstract
Segments of indentity-by-descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping, and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement) evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent superexponential population growth that is designed to match United Kingdom data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the United Kingdom and from Northern Finland and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, and easy to use and is implemented in Beagle version 4.
Collapse
|