1
|
Garcia-Erill G, Wang X, Rasmussen MS, Quinn L, Khan A, Bertola LD, Santander CG, Balboa RF, Ogutu JO, Pečnerová P, Hanghøj K, Kuja J, Nursyifa C, Masembe C, Muwanika V, Bibi F, Moltke I, Siegismund HR, Albrechtsen A, Heller R. Extensive Population Structure Highlights an Apparent Paradox of Stasis in the Impala (Aepyceros melampus). Mol Ecol 2024:e17539. [PMID: 39373069 DOI: 10.1111/mec.17539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 08/30/2024] [Accepted: 09/18/2024] [Indexed: 10/08/2024]
Abstract
Impalas are unusual among bovids because they have remained morphologically similar over millions of years-a phenomenon referred to as evolutionary stasis. Here, we sequenced 119 whole genomes from the two extant subspecies of impala, the common (Aepyceros melampus melampus) and black-faced (A. m. petersi) impala. We investigated the evolutionary forces working within the species to explore how they might be associated with its evolutionary stasis as a taxon. Despite being one of the most abundant bovid species, we found low genetic diversity overall, and a phylogeographic signal of spatial expansion from southern to eastern Africa. Contrary to expectations under a scenario of evolutionary stasis, we found pronounced genetic structure between and within the two subspecies with indications of ancient, but not recent, gene flow. Black-faced impala and eastern African common impala populations had more runs of homozygosity than common impala in southern Africa, and, using a proxy for genetic load, we found that natural selection is working less efficiently in these populations compared to the southern African populations. Together with the fossil record, our results are consistent with a fixed-optimum model of evolutionary stasis, in which impalas in the southern African core of the range are able to stay near their evolutionary fitness optimum as a generalist ecotone species, whereas eastern African impalas may struggle to do so due to the effects of genetic drift and reduced adaptation to the local habitat, leading to recurrent local extinction in eastern Africa and re-colonisation from the South.
Collapse
Affiliation(s)
- Genís Garcia-Erill
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Department of Molecular Biology and Genetics, Bioinformatics Research Center, Aarhus University, Aarhus, Denmark
| | - Xi Wang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Liam Quinn
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anubhab Khan
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Laura D Bertola
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Cindy G Santander
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Renzo F Balboa
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Joseph O Ogutu
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | | | - Kristian Hanghøj
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Josiah Kuja
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Casia Nursyifa
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Charles Masembe
- College of Natural Sciences, Makerere University, Kampala, Uganda
| | - Vincent Muwanika
- College of Agricultural and Environmental Sciences, Makerere University, Kampala, Uganda
| | - Faysal Bibi
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Berlin, Germany
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Hans R Siegismund
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Rasmus Heller
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
2
|
Liu X, Lin L, Sinding MHS, Bertola LD, Hanghøj K, Quinn L, Garcia-Erill G, Rasmussen MS, Schubert M, Pečnerová P, Balboa RF, Li Z, Heaton MP, Smith TPL, Pinto RR, Wang X, Kuja J, Brüniche-Olsen A, Meisner J, Santander CG, Ogutu JO, Masembe C, da Fonseca RR, Muwanika V, Siegismund HR, Albrechtsen A, Moltke I, Heller R. Introgression and disruption of migration routes have shaped the genetic integrity of wildebeest populations. Nat Commun 2024; 15:2921. [PMID: 38609362 PMCID: PMC11014984 DOI: 10.1038/s41467-024-47015-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 03/11/2024] [Indexed: 04/14/2024] Open
Abstract
The blue wildebeest (Connochaetes taurinus) is a keystone species in savanna ecosystems from southern to eastern Africa, and is well known for its spectacular migrations and locally extreme abundance. In contrast, the black wildebeest (C. gnou) is endemic to southern Africa, barely escaped extinction in the 1900s and is feared to be in danger of genetic swamping from the blue wildebeest. Despite the ecological importance of the wildebeest, there is a lack of understanding of how its unique migratory ecology has affected its gene flow, genetic structure and phylogeography. Here, we analyze whole genomes from 121 blue and 22 black wildebeest across the genus' range. We find discrete genetic structure consistent with the morphologically defined subspecies. Unexpectedly, our analyses reveal no signs of recent interspecific admixture, but rather a late Pleistocene introgression of black wildebeest into the southern blue wildebeest populations. Finally, we find that migratory blue wildebeest populations exhibit a combination of long-range panmixia, higher genetic diversity and lower inbreeding levels compared to neighboring populations whose migration has recently been disrupted. These findings provide crucial insights into the evolutionary history of the wildebeest, and tangible genetic evidence for the negative effects of anthropogenic activities on highly migratory ungulates.
Collapse
Affiliation(s)
- Xiaodong Liu
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Long Lin
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Laura D Bertola
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristian Hanghøj
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Liam Quinn
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Mikkel Schubert
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Renzo F Balboa
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Zilong Li
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Michael P Heaton
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA
| | - Rui Resende Pinto
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research-University of Porto, Porto, Portugal
- Section for Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Xi Wang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Josiah Kuja
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Jonas Meisner
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- Copenhagen Research Centre for Mental Health, Copenhagen University Hospital, Copenhagen, Denmark
| | - Cindy G Santander
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Joseph O Ogutu
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
| | - Charles Masembe
- Department of Zoology, Entomology and Fisheries Sciences, Makerere University, P. O. Box 7062, Kampala, Uganda
| | - Rute R da Fonseca
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research-University of Porto, Porto, Portugal
- Section for Biodiversity, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Vincent Muwanika
- Department of Environmental Management, Makerere University, PO Box 7062, Kampala, Uganda
| | - Hans R Siegismund
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Heller
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
3
|
Ritland K. Relatedness coefficients and their applications for triplets and quartets of genetic markers. G3 (BETHESDA, MD.) 2024; 14:jkad236. [PMID: 38411620 PMCID: PMC10989858 DOI: 10.1093/g3journal/jkad236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 07/26/2023] [Indexed: 02/28/2024]
Abstract
Relatedness coefficients which seek the identity-by-descent of genetic markers are described. The markers are in groups of two, three or four, and if four, can consist of two pairs. It is essential to use cumulants (not moments) for four-marker-gene probabilities, as the covariance of homozygosity, used in four-marker applications, can only be described with cumulants. A covariance of homozygosity between pairs of markers arises when populations follow a mixture distribution. Also, the probability of four markers all identical-by-descent equals the normalized fourth cumulant. In this article, a "genetic marker" generally represents either a gene locus or an allele at a locus. Applications of three marker coefficients mainly involve conditional regression, and applications of four marker coefficients can involve identity disequilibrium. Estimation of relatedness using genetic marker data is discussed. However, three- and four-marker estimators suffer from statistical and numerical problems, including higher statistical variance, complexity of estimation formula, and singularity at some intermediate allele frequencies.
Collapse
Affiliation(s)
- Kermit Ritland
- Biodiversity Research Center, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
4
|
He S, Wang Y, Luo Y, Xue M, Wu M, Tan H, Peng Y, Wang K, Fang M. Integrated analysis strategy of genome-wide functional gene mining reveals DKK2 gene underlying meat quality in Shaziling synthesized pigs. BMC Genomics 2024; 25:30. [PMID: 38178019 PMCID: PMC10765619 DOI: 10.1186/s12864-023-09925-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Shaziling pig is a well-known indigenous breed in China who has superior meat quality traits. However, the genetic mechanism and genomic evidence underlying meat quality characteristics of Shaziling pigs are still unclear. To explore and investigate the germplasm characteristics of Shaziling pigs, we totally analyzed 67 individual's whole genome sequencing data for the first time (20 Shaziling pigs [S], 20 Dabasha pigs [DBS], 11 Yorkshire pigs [Y], 10 Berkshire pigs [BKX], 5 Basha pigs [BS] and 1 Warthog). RESULTS A total of 2,538,577 SNPs with high quality were detected and 9 candidate genes which was specifically selected in S and shared in S to DBS were precisely mined and screened using an integrated analysis strategy of identity-by-descent (IBD) and selective sweep. Of them, dickkopf WNT signaling pathway inhibitor 2 (DKK2), the antagonist of Wnt signaling pathway, was the most promising candidate gene which was not only identified an association of palmitic acid and palmitoleic acid quantitative trait locus in PigQTLdb, but also specifically selected in S compared to other 48 Chinese local pigs of 12 populations and 39 foreign pigs of 4 populations. Subsequently, a mutation at 12,726-bp of DKK2 intron 1 (g.114874954 A > C) was identified associated with intramuscular fat content using method of PCR-RFLP in 21 different pig populations. We observed DKK2 specifically expressed in adipose tissues. Overexpression of DKK2 decreased the content of triglyceride, fatty acid synthase and expression of relevant genes of adipogenic and Wnt signaling pathway, while interference of DKK2 got contrary effect during adipogenesis differentiation of porcine preadipocytes and 3T3-L1 cells. CONCLUSIONS Our findings provide an analysis strategy for mining functional genes of important economic traits and provide fundamental data and molecular evidence for improving pig meat quality traits and molecular breeding.
Collapse
Affiliation(s)
- Shuaihan He
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Yubei Wang
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Yabiao Luo
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Mingming Xue
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Maisheng Wu
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Hong Tan
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Yinglin Peng
- Hunan Institute of Animal & Veterinary Science, Changsha, 410131, China
| | - Kejun Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, 450002, China.
| | - Meiying Fang
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
| |
Collapse
|
5
|
Balboa RF, Bertola LD, Brüniche-Olsen A, Rasmussen MS, Liu X, Besnard G, Salmona J, Santander CG, He S, Zinner D, Pedrono M, Muwanika V, Masembe C, Schubert M, Kuja J, Quinn L, Garcia-Erill G, Stæger FF, Rakotoarivony R, Henrique M, Lin L, Wang X, Heaton MP, Smith TPL, Hanghøj K, Sinding MHS, Atickem A, Chikhi L, Roos C, Gaubert P, Siegismund HR, Moltke I, Albrechtsen A, Heller R. African bushpigs exhibit porous species boundaries and appeared in Madagascar concurrently with human arrival. Nat Commun 2024; 15:172. [PMID: 38172616 PMCID: PMC10764920 DOI: 10.1038/s41467-023-44105-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Several African mammals exhibit a phylogeographic pattern where closely related taxa are split between West/Central and East/Southern Africa, but their evolutionary relationships and histories remain controversial. Bushpigs (Potamochoerus larvatus) and red river hogs (P. porcus) are recognised as separate species due to morphological distinctions, a perceived lack of interbreeding at contact, and putatively old divergence times, but historically, they were considered conspecific. Moreover, the presence of Malagasy bushpigs as the sole large terrestrial mammal shared with the African mainland raises intriguing questions about its origin and arrival in Madagascar. Analyses of 67 whole genomes revealed a genetic continuum between the two species, with putative signatures of historical gene flow, variable FST values, and a recent divergence time (<500,000 years). Thus, our study challenges key arguments for splitting Potamochoerus into two species and suggests their speciation might be incomplete. Our findings also indicate that Malagasy bushpigs diverged from southern African populations and underwent a limited bottleneck 1000-5000 years ago, concurrent with human arrival in Madagascar. These results shed light on the evolutionary history of an iconic and widespread African mammal and provide insight into the longstanding biogeographic puzzle surrounding the bushpig's presence in Madagascar.
Collapse
Affiliation(s)
- Renzo F Balboa
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Laura D Bertola
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Xiaodong Liu
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Guillaume Besnard
- Laboratoire Evolution et Diversité Biologique (EDB), UMR 5174, CNRS, IRD, Université Toulouse Paul Sabatier, 31062, Toulouse, France
| | - Jordi Salmona
- Laboratoire Evolution et Diversité Biologique (EDB), UMR 5174, CNRS, IRD, Université Toulouse Paul Sabatier, 31062, Toulouse, France
| | - Cindy G Santander
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Shixu He
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Dietmar Zinner
- Cognitive Ecology Laboratory, German Primate Center, Leibniz Institute for Primate Research, 37077, Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, 37077, Göttingen, Germany
- Leibniz Science Campus Primate Cognition, 37077, Göttingen, Germany
| | - Miguel Pedrono
- UMR ASTRE, CIRAD, Campus International de Baillarguet, Montpellier, France
| | - Vincent Muwanika
- College of Agricultural and Environmental Sciences, Makerere University, Kampala, Uganda
| | - Charles Masembe
- College of Natural Sciences, Makerere University, Kampala, Uganda
| | - Mikkel Schubert
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Josiah Kuja
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Liam Quinn
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | - Long Lin
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Xi Wang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Kristian Hanghøj
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Anagaw Atickem
- Department of Zoological Sciences, Addis Ababa University, PO Box 1176, Addis Ababa, Ethiopia
| | - Lounès Chikhi
- Laboratoire Evolution et Diversité Biologique (EDB), UMR 5174, CNRS, IRD, Université Toulouse Paul Sabatier, 31062, Toulouse, France
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, 37077, Göttingen, Germany
| | - Philippe Gaubert
- Laboratoire Evolution et Diversité Biologique (EDB), UMR 5174, CNRS, IRD, Université Toulouse Paul Sabatier, 31062, Toulouse, France
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR), Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal
| | - Hans R Siegismund
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | | | - Rasmus Heller
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
6
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
7
|
Coonahan E, Gage H, Chen D, Noormahomed EV, Buene TP, Mendes de Sousa I, Akrami K, Chambal L, Schooley RT, Winzeler EA, Cowell AN. Whole-genome surveillance identifies markers of Plasmodium falciparum drug resistance and novel genomic regions under selection in Mozambique. mBio 2023; 14:e0176823. [PMID: 37750720 PMCID: PMC10653802 DOI: 10.1128/mbio.01768-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 08/02/2023] [Indexed: 09/27/2023] Open
Abstract
IMPORTANCE Malaria is a devastating disease caused by Plasmodium parasites. The evolution of parasite drug resistance continues to hamper progress toward malaria elimination, and despite extensive efforts to control malaria, it remains a leading cause of death in Mozambique and other countries in the region. The development of successful vaccines and identification of molecular markers to track drug efficacy are essential for managing the disease burden. We present an analysis of the parasite genome in Mozambique, a country with one of the highest malaria burdens globally and limited available genomic data, revealing current selection pressure. We contribute additional evidence to limited prior studies supporting the effectiveness of SWGA in producing reliable genomic data from complex clinical samples. Our results provide the identity of genomic loci that may be associated with current antimalarial drug use, including artemisinin and lumefantrine, and reveal selection pressure predicted to compromise the efficacy of current vaccine candidates.
Collapse
Affiliation(s)
- Erin Coonahan
- School of Medicine, University of California San Diego, La Jolla, California, USA
| | - Hunter Gage
- School of Medicine, University of California San Diego, La Jolla, California, USA
| | - Daisy Chen
- Department of Pediatrics, University of California San Diego (UCSD), La Jolla, California, USA
| | - Emilia Virginia Noormahomed
- School of Medicine, University of California San Diego, La Jolla, California, USA
- Department of Microbiology, Parasitology Laboratory, Faculty of Medicine, Eduardo Mondlane University, Maputo, Mozambique
- Mozambique Institute of Health Education and Research (MIHER), Maputo, Mozambique
| | - Titos Paulo Buene
- Department of Microbiology, Parasitology Laboratory, Faculty of Medicine, Eduardo Mondlane University, Maputo, Mozambique
- Mozambique Institute of Health Education and Research (MIHER), Maputo, Mozambique
| | - Irina Mendes de Sousa
- Mozambique Institute of Health Education and Research (MIHER), Maputo, Mozambique
- Biological Sciences Department, Faculty of Sciences, Eduardo Mondlane University, Maputo, Mozambique
| | - Kevan Akrami
- School of Medicine, University of California San Diego, La Jolla, California, USA
- Faculdade de Medicina da Bahia, Universidade Federal da Bahia, Salvador, Brazil
| | - Lucia Chambal
- Mozambique Institute of Health Education and Research (MIHER), Maputo, Mozambique
- Department of Internal Medicine, Faculty of Medicine, Eduardo Mondlane University, Maputo, Mozambique
- Maputo Central Hospital, Maputo, Mozambique
| | - Robert T. Schooley
- School of Medicine, University of California San Diego, La Jolla, California, USA
| | - Elizabeth A. Winzeler
- Department of Pediatrics, University of California San Diego (UCSD), La Jolla, California, USA
| | - Annie N. Cowell
- School of Medicine, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
8
|
Sanga S, Chakraborty S, Bardhan M, Polavarapu K, Kumar VP, Bhattacharya C, Nashi S, Vengalil S, Geetha TS, Ramprasad V, Nalini A, Basu A, Acharya M. Identification of a shared, common haplotype segregating with an SGCB c.544 T > G mutation in Indian patients affected with sarcoglycanopathy. Sci Rep 2023; 13:15095. [PMID: 37699968 PMCID: PMC10497502 DOI: 10.1038/s41598-023-41487-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 08/28/2023] [Indexed: 09/14/2023] Open
Abstract
Sarcoglycanopathy is the most frequent form of autosomal recessive limb-girdle muscular dystrophies caused by mutations in SGCB gene encoding beta-sarcoglycan proteins. In this study, we describe a shared, common haplotype co-segregating in 14 sarcoglycanopathy cases from 13 unrelated families from south Indian region with the likely pathogenic homozygous mutation c.544 T > G (p.Thr182Pro) in SGCB. Haplotype was reconstructed based on 10 polymorphic markers surrounding the c.544 T > G mutation in the cases and related family members as well as 150 unrelated controls from Indian populations using PLINK1.9. We identified haplotype H1 = G, A, G, T, G, G, A, C, T, G, T at a significantly higher frequency in cases compared to related controls and unrelated control Indian population. Upon segregation analysis within the family pedigrees, H1 is observed to co-segregate with c.544 T > G in a homozygous state in all the pedigrees of cases except one indicating a probable event of founder effect. Furthermore, Identical-by-descent and inbreeding coefficient analysis revealed relatedness among 33 new pairs of seemingly unrelated individuals from sarcoglycanopathy cohort and a higher proportion of homozygous markers, thereby indicating common ancestry. Since all these patients are from the south Indian region, we suggest this region to be a primary target of mutation screening in patients diagnosed with sarcoglycanopathy.
Collapse
Affiliation(s)
- Shamita Sanga
- National Institute of Biomedical Genomics, P.O: N.S.S, Kalyani, West Bengal, 741251, India
| | - Sudipta Chakraborty
- National Institute of Biomedical Genomics, P.O: N.S.S, Kalyani, West Bengal, 741251, India
- Regional Centre for Biotechnology, Faridabad, India
| | - Mainak Bardhan
- National Institute of Mental Health and Neurosciences, Bangalore, India
| | - Kiran Polavarapu
- National Institute of Mental Health and Neurosciences, Bangalore, India
| | | | - Chandrika Bhattacharya
- National Institute of Biomedical Genomics, P.O: N.S.S, Kalyani, West Bengal, 741251, India
| | - Saraswati Nashi
- National Institute of Mental Health and Neurosciences, Bangalore, India
| | - Seena Vengalil
- National Institute of Mental Health and Neurosciences, Bangalore, India
| | | | | | - Atchayaram Nalini
- National Institute of Mental Health and Neurosciences, Bangalore, India
| | - Analabha Basu
- National Institute of Biomedical Genomics, P.O: N.S.S, Kalyani, West Bengal, 741251, India
| | - Moulinath Acharya
- National Institute of Biomedical Genomics, P.O: N.S.S, Kalyani, West Bengal, 741251, India.
| |
Collapse
|
9
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W.K. Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
10
|
Oriol Sabat B, Mas Montserrat D, Giro-i-Nieto X, Ioannidis AG. SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics 2022; 38:ii27-ii33. [PMID: 36124792 PMCID: PMC9486591 DOI: 10.1093/bioinformatics/btac464] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. RESULTS We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods. AVAILABILITY AND IMPLEMENTATION We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes). SUPPLEMENTARY INFORMATION Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Benet Oriol Sabat
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
- Department of Biomedical Data Science, Stanford Medical School
| | | | - Xavier Giro-i-Nieto
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
| | - Alexander G Ioannidis
- Department of Biomedical Data Science, Stanford Medical School
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Enbody ED, Sin SYW, Boersma J, Edwards SV, Ketaloya S, Schwabl H, Webster MS, Karubian J. The evolutionary history and mechanistic basis of female ornamentation in a tropical songbird. Evolution 2022; 76:1720-1736. [PMID: 35748580 PMCID: PMC9543242 DOI: 10.1111/evo.14545] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/26/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]
Abstract
Ornamentation, such as the showy plumage of birds, is widespread among female vertebrates, yet the evolutionary pressures shaping female ornamentation remain uncertain. In part this is due to a poor understanding of the mechanistic route to ornamentation in females. To address this issue, we evaluated the evolutionary history of ornament expression in a tropical passerine bird, the White-shouldered Fairywren, whose females, but not males, strongly vary between populations in occurrence of ornamented black-and-white plumage. We first use phylogenomic analysis to demonstrate that female ornamentation is derived and that female ornamentation evolves independently of changes in male plumage. We then use exogenous testosterone in a field experiment to induce partial ornamentation in naturally unornamented females. By sequencing the transcriptome of experimentally induced ornamented and natural feathers, we identify genes expressed during ornament production and evaluate the degree to which female ornamentation in this system is associated with elevated testosterone, as is common in males. We reveal that some ornamentation in females is linked to testosterone and that sexes differ in ornament-linked gene expression. Lastly, using genomic outlier analysis we identify a candidate melanogenesis gene that lies in a region of high genomic divergence among populations that is also differentially expressed in feather follicles of different female plumages. Taken together, these findings are consistent with sex-specific selection favoring the evolution of female ornaments and demonstrate a key role for testosterone in generating population divergence in female ornamentation through gene regulation. More broadly, our work highlights similarities and differences in how ornamentation evolves in the sexes.
Collapse
Affiliation(s)
- Erik D. Enbody
- Department of Ecology and Evolutionary BiologyTulane UniversityNew OrleansLouisiana70118,Department of Medical Biochemistry and MicrobiologyUppsala UniversityUppsalaSE‐75123Sweden
| | - Simon Y. W. Sin
- Department of Organismic and Evolutionary BiologyHarvard UniversityCambridgeMassachusetts02138,School of Biological SciencesThe University of Hong KongPok Fu Lam RoadHong Kong
| | - Jordan Boersma
- School of Biological Sciences, Center for Reproductive BiologyWashington State UniversityPullmanWashington99164,Department of Neurobiology and BehaviorCornell UniversityIthacaNew York14853,Macaulay LibraryCornell Lab of OrnithologyIthacaNew York14850
| | - Scott V. Edwards
- Department of Organismic and Evolutionary BiologyHarvard UniversityCambridgeMassachusetts02138
| | - Serena Ketaloya
- Department of Ecology and Evolutionary BiologyTulane UniversityNew OrleansLouisiana70118
| | - Hubert Schwabl
- School of Biological Sciences, Center for Reproductive BiologyWashington State UniversityPullmanWashington99164
| | - Michael S. Webster
- Department of Neurobiology and BehaviorCornell UniversityIthacaNew York14853,Macaulay LibraryCornell Lab of OrnithologyIthacaNew York14850
| | - Jordan Karubian
- Department of Ecology and Evolutionary BiologyTulane UniversityNew OrleansLouisiana70118
| |
Collapse
|
12
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
13
|
Severson AL, Korneliussen TS, Moltke I. LocalNgsRelate: a software tool for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data. Bioinformatics 2022; 38:1159-1161. [PMID: 34718411 PMCID: PMC8796377 DOI: 10.1093/bioinformatics/btab732] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 09/28/2021] [Accepted: 10/24/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Inference of identity-by-descent (IBD) sharing along the genome between pairs of individuals has important uses. But all existing inference methods are based on genotypes, which is not ideal for low-depth Next Generation Sequencing (NGS) data from which genotypes can only be called with high uncertainty. RESULTS We present a new probabilistic software tool, LocalNgsRelate, for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data. Its inference is based on genotype likelihoods instead of genotypes, and thereby it takes the uncertainty of the genotype calling into account. Using real data from the 1000 Genomes project, we show that LocalNgsRelate provides more accurate IBD inference for low-depth NGS data than two state-of-the-art genotype-based methods, Albrechtsen et al. (2009) and hap-IBD. We also show that the method works well for NGS data down to a depth of 2×. AVAILABILITY AND IMPLEMENTATION LocalNgsRelate is freely available at https://github.com/idamoltke/LocalNgsRelate. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305-5020, USA
| | | | - Ida Moltke
- Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
14
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
15
|
Nøhr AK, Hanghøj K, Erill GG, Li Z, Moltke I, Albrechtsen A. NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data. G3-GENES GENOMES GENETICS 2021; 11:6279082. [PMID: 34015083 PMCID: PMC8496226 DOI: 10.1093/g3journal/jkab174] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 05/03/2021] [Indexed: 12/04/2022]
Abstract
Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here, we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.
Collapse
Affiliation(s)
- Anne Krogh Nøhr
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.,H. Lundbeck A/S, 2500 Valby, Denmark
| | - Kristian Hanghøj
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Genis Garcia Erill
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Zilong Li
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Ida Moltke
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Anders Albrechtsen
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
16
|
Browning SR, Browning BL. Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection. Am J Hum Genet 2020; 107:895-910. [PMID: 33053335 PMCID: PMC7553009 DOI: 10.1016/j.ajhg.2020.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/25/2020] [Indexed: 12/18/2022] Open
Abstract
Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
17
|
De-la-Cruz IM, Merilä J, Valverde PL, Flores-Ortiz CM, Núñez-Farfán J. Genomic and chemical evidence for local adaptation in resistance to different herbivores in Datura stramonium. Evolution 2020; 74:2629-2643. [PMID: 32935854 DOI: 10.1111/evo.14097] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/28/2020] [Accepted: 09/12/2020] [Indexed: 12/18/2022]
Abstract
Because most species are collections of genetically variable populations distributed to habitats differing in their abiotic/biotic environmental factors and community composition, the pattern and strength of natural selection imposed by species on each other's traits are also expected to be highly spatially variable. Here, we used genomic and quantitative genetic approaches to understand how spatially variable selection operates on the genetic basis of plant defenses to herbivores. To this end, an F2 progeny was generated by crossing Datura stramonium (Solanaceae) parents from two populations differing in their level of chemical defense. This F2 progeny was reciprocally transplanted into the parental plants' habitats and by measuring the identity by descent (IBD) relationship of each F2 plant to each parent, we were able to elucidate how spatially variable selection imposed by herbivores operated on the genetic background (IBD) of resistance to herbivory, promoting local adaptation. The results highlight that plants possessing the highest total alkaloid concentrations (sum of all alkaloid classes) were not the most well-defended or fit. Instead, specific alkaloids and their linked loci/alleles were favored by selection imposed by different herbivores. This has led to population differentiation in plant defenses and thus, to local adaptation driven by plant-herbivore interactions.
Collapse
Affiliation(s)
- Ivan M De-la-Cruz
- Laboratory of Ecological Genetics and Evolution, Department of Evolutionary Ecology, Institute of Ecology, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Juha Merilä
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Pedro L Valverde
- Department of Biology, Universidad Autónoma Metropolitana Campus Iztapalapa, Mexico City, Mexico
| | - César M Flores-Ortiz
- Facultad de Estudios Superiores Iztacala, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Juan Núñez-Farfán
- Laboratory of Ecological Genetics and Evolution, Department of Evolutionary Ecology, Institute of Ecology, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
18
|
Identity by descent analysis identifies founder events and links SOD1 familial and sporadic ALS cases. NPJ Genom Med 2020; 5:32. [PMID: 32789025 PMCID: PMC7414871 DOI: 10.1038/s41525-020-00139-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 07/14/2020] [Indexed: 12/11/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder characterised by the loss of upper and lower motor neurons resulting in paralysis and eventual death. Approximately 10% of ALS cases have a family history of disease, while the remainder present as apparently sporadic cases. Heritability studies suggest a significant genetic component to sporadic ALS, and although most sporadic cases have an unknown genetic aetiology, some familial ALS mutations have also been found in sporadic cases. This suggests that some sporadic cases may be unrecognised familial cases with reduced disease penetrance in their ancestors. A powerful strategy to uncover a familial link is identity-by-descent (IBD) analysis, which detects genomic regions that have been inherited from a common ancestor. IBD analysis was performed on 83 Australian familial ALS cases from 25 families and three sporadic ALS cases, each of whom carried one of three SOD1 mutations (p.I114T, p.V149G and p.E101G). We defined five unique 350-SNP haplotypes that carry these mutations in our cohort, indicative of five founder events. This included two founder haplotypes that carry SOD1 p.I114T; linking familial and sporadic cases. We found that SOD1 p.E101G arose independently in each family that carries this mutation and linked two families that carry SOD1 p.V149G. The age of disease onset varied between cases that carried each SOD1 p.I114T haplotype. Linking families with identical ALS mutations allows for larger sample sizes and increased statistical power to identify putative phenotypic modifiers.
Collapse
|
19
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
20
|
Enabling population assignment from cancer genomes with SNP2pop. Sci Rep 2020; 10:4846. [PMID: 32179800 PMCID: PMC7075896 DOI: 10.1038/s41598-020-61854-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 03/04/2020] [Indexed: 11/08/2022] Open
Abstract
In many cancers, incidence, treatment efficacy and overall prognosis vary between geographic populations. Studies disentangling the contributing factors may help in both understanding cancer biology and tailoring therapeutic interventions. Ancestry estimation in such studies should preferably be driven by genomic data, due to frequently missing or erroneous self-reported or inferred metadata. While respective algorithms have been demonstrated for baseline genomes, such a strategy has not been shown for cancer genomes carrying a substantial somatic mutation load. We have developed a bioinformatics tool for the assignment of population groups from genome profiling data for both unaltered and cancer genomes. Despite extensive somatic mutations in the cancer genomes, consistency between germline and cancer data reached of 97% and 92% for assignment into 5 and 26 ancestral groups, respectively. Comparison with self-reported meta-data estimated a matching rate between 88-92%, mostly limited by interpretation of self-reported ethnicity labels compared to the standardized mapping output. Our SNP2pop application allows to assess population information from SNP arrays as well as sequencing platforms and to estimate the population structure in cancer genomics projects, to facilitate research into the interplay between ethnicity-related genetic background, environmental factors and somatic mutation patterns in cancer biology.
Collapse
|
21
|
Waples RK, Albrechtsen A, Moltke I. Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data. Mol Ecol 2019; 28:35-48. [PMID: 30462358 PMCID: PMC6850436 DOI: 10.1111/mec.14954] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/12/2018] [Indexed: 01/03/2023]
Abstract
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.
Collapse
Affiliation(s)
- Ryan K Waples
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Ida Moltke
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| |
Collapse
|
22
|
Ko A, Nielsen R. Joint Estimation of Pedigrees and Effective Population Size Using Markov Chain Monte Carlo. Genetics 2019; 212:855-868. [PMID: 31123041 PMCID: PMC6614905 DOI: 10.1534/genetics.119.302280] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 05/16/2019] [Indexed: 12/31/2022] Open
Abstract
Pedigrees provide the genealogical relationships among individuals at a fine resolution and serve an important function in many areas of genetic studies. One such use of pedigree information is in the estimation of the short-term effective population size [Formula: see text], which is of great relevance in fields such as conservation genetics. Despite the usefulness of pedigrees, however, they are often an unknown parameter and must be inferred from genetic data. In this study, we present a Bayesian method to jointly estimate pedigrees and [Formula: see text] from genetic markers using Markov Chain Monte Carlo. Our method supports analysis of a large number of markers and individuals within a single generation with the use of a composite likelihood, which significantly increases computational efficiency. We show, on simulated data, that our method is able to jointly estimate relationships up to first cousins and [Formula: see text] with high accuracy. We also apply the method on a real dataset of house sparrows to reconstruct their previously unreported pedigree.
Collapse
Affiliation(s)
- Amy Ko
- Department of Integrative Biology, University of California, Berkeley, 94720 California
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, 94720 California
- Department of Statistics, University of California, Berkeley, 94720 California
- Museum of Natural History, University of Copenhagen, 1123 Denmark
| |
Collapse
|
23
|
Meisner J, Albrechtsen A. Testing for Hardy-Weinberg equilibrium in structured populations using genotype or low-depth next generation sequencing data. Mol Ecol Resour 2019; 19:1144-1152. [PMID: 30977299 DOI: 10.1111/1755-0998.13019] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/02/2019] [Accepted: 04/03/2019] [Indexed: 11/29/2022]
Abstract
Testing for deviations from Hardy-Weinberg equilibrium (HWE) is a common practice for quality control in genetic studies. Variable sites violating HWE may be identified as technical errors in the sequencing or genotyping process, or they may be of particular evolutionary interest. Large-scale genetic studies based on next-generation sequencing (NGS) methods have become more prevalent as cost is decreasing but these methods are still associated with statistical uncertainty. The large-scale studies usually consist of samples from diverse ancestries that make the existence of some degree of population structure almost inevitable. Precautions are therefore needed when analysing these data set, as population structure causes deviations from HWE. Here we propose a method that takes population structure into account in the testing for HWE, such that other factors causing deviations from HWE can be detected. We show the effectiveness of PCAngsd in low-depth NGS data, as well as in genotype data, for both simulated and real data set, where the use of genotype likelihoods enables us to model the uncertainty.
Collapse
Affiliation(s)
- Jonas Meisner
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
24
|
Reconstructing recent population history while mapping rare variants using haplotypes. Sci Rep 2019; 9:5849. [PMID: 30971755 PMCID: PMC6458133 DOI: 10.1038/s41598-019-42385-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 03/28/2019] [Indexed: 12/11/2022] Open
Abstract
Haplotype-based methods are a cost-effective alternative to characterize unobserved rare variants and map disease-associated alleles. Moreover, they can be used to reconstruct recent population history, which shaped distribution of rare variants and thus can be used to guide gene mapping studies. In this study, we analysed Illumina 650 k genotyped dataset on three underrepresented populations from Eastern Europe, where ancestors of Russians came into contact with two indigenous ethnic groups, Bashkirs and Tatars. Using the IBD mapping approach, we identified two rare IBD haplotypes strongly enriched in asthma patients of distinct ethnic background. We reconstructed recent population history using haplotype-based methods to reconcile this contradictory finding. Our ChromoPainter analysis showed that these haplotypes each descend from a single ancestor coming from one of the ethnic groups studied. Next, we used DoRIS approach and showed that source populations for patients exchanged recent (<60 generations) asymmetric gene flow, which supported the ChromoPainter-based scenario that patients share haplotypes through inter-ethnic admixture. Finally, we show that these IBD haplotypes overlap with asthma-associated genomic regions ascertained in European population. This finding is consistent with the fact that the two donor populations for the rare IBD haplotypes: Russians and Tatars have European ancestry.
Collapse
|
25
|
Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLoS Genet 2018; 14:e1007279. [PMID: 29791438 PMCID: PMC5988311 DOI: 10.1371/journal.pgen.1007279] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 06/05/2018] [Accepted: 02/26/2018] [Indexed: 12/30/2022] Open
Abstract
Identification of genomic regions that are identical by descent (IBD) has proven useful for human genetic studies where analyses have led to the discovery of familial relatedness and fine-mapping of disease critical regions. Unfortunately however, IBD analyses have been underutilized in analysis of other organisms, including human pathogens. This is in part due to the lack of statistical methodologies for non-diploid genomes in addition to the added complexity of multiclonal infections. As such, we have developed an IBD methodology, called isoRelate, for analysis of haploid recombining microorganisms in the presence of multiclonal infections. Using the inferred IBD status at genomic locations, we have also developed a novel statistic for identifying loci under positive selection and propose relatedness networks as a means of exploring shared haplotypes within populations. We evaluate the performance of our methodologies for detecting IBD and selection, including comparisons with existing tools, then perform an exploratory analysis of whole genome sequencing data from a global Plasmodium falciparum dataset of more than 2500 genomes. This analysis identifies Southeast Asia as having many highly related isolates, possibly as a result of both reduced transmission from intensified control efforts and population bottlenecks following the emergence of antimalarial drug resistance. Many signals of selection are also identified, most of which overlap genes that are known to be associated with drug resistance, in addition to two novel signals observed in multiple countries that have yet to be explored in detail. Additionally, we investigate relatedness networks over the selected loci and determine that one of these sweeps has spread between continents while the other has arisen independently in different countries. IBD analysis of microorganisms using isoRelate can be used for exploring population structure, positive selection and haplotype distributions, and will be a valuable tool for monitoring disease control and elimination efforts of many diseases. There are growing concerns over the emergence of antimicrobial drug resistance, which threatens the efficacy of treatments for infectious diseases such as malaria. As such, it is important to understand the dynamics of resistance by investigating population structure, natural selection and disease transmission in microorganisms. The study of disease dynamics has been hampered by the lack of suitable statistical models for analysis of isolates containing multiple infections. We introduce a statistical model that uses population genomic data to identify genomic regions (loci) that are inherited from a common ancestor, in the presence of multiple infections. We demonstrate its potential for biological discovery using a global Plasmodium falciparum dataset. We identify low genetic diversity in isolates from Southeast Asia, possibly from clonal expansion following intensified control efforts after the emergence of artemisinin resistance. We also identify loci under positive selection, most of which contain genes that have been associated with antimalarial drug resistance. We discover two loci under strong selection in multiple countries throughout Southeast Asia and Africa where the selection pressure is currently unknown. We find that the selection pressure at one of these loci has originated from gene flow, while the other loci has originated from multiple independent events.
Collapse
|
26
|
Salvoro C, Bortoluzzi S, Coppe A, Valle G, Feltrin E, Mostacciuolo ML, Vazza G. Rare Risk Variants Identification by Identity-by-Descent Mapping and Whole-Exome Sequencing Implicates Neuronal Development Pathways in Schizophrenia and Bipolar Disorder. Mol Neurobiol 2018; 55:7366-7376. [PMID: 29411265 DOI: 10.1007/s12035-018-0922-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 01/22/2018] [Indexed: 12/31/2022]
Abstract
Schizophrenia (SCZ) and bipolar disorder (BPD) are highly heritable disorders with an estimated co-heritability of 68%. Hundreds of common alleles have been implicated, but recently a role for rare, high-penetrant variants has been also suggested in both disorders. This study investigated a familial cohort of SCZ and BPD patients from a closed population sample, where the high recurrence of the disorders and the homogenous genetic background indicate a possible enrichment in rare risk alleles. A total of 230 subjects (161 cases, 22 unaffected relatives, and 47 controls) were genetically investigated through an innovative strategy that integrates identity-by-descent (IBD) mapping and whole-exome sequencing (WES). IBD analysis allowed to track high-risk haplotypes (IBDrisk) shared exclusively by multiple patients from different families and possibly carrying the most penetrant alleles. A total of 444 non-synonymous sequence variants, of which 137 disruptive, were identified in IBDrisk haplotypes by WES. Interestingly, gene sets previously implicated in SCZ (i.e., post-synaptic density (PSD) proteins, voltage-gated calcium channels (VGCCs), and fragile X mental retardation protein (FMRP) targets) were found significantly enriched in genes carrying IBDrisk variants. Further, IBDrisk variants were preferentially affecting genes involved in the extracellular matrix (ECM) biology and axon guidance processes which appeared to be functionally connected in the pathway-derived meta-network analysis. Results thus confirm rare risk variants as key factors in SCZ and BPD pathogenesis and highlight a role for the development of neuronal connectivity in the etiology of both disorders.
Collapse
Affiliation(s)
- C Salvoro
- Department of Biology, University of Padova, Padova, Italy
| | - S Bortoluzzi
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - A Coppe
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - G Valle
- Department of Biology, University of Padova, Padova, Italy
| | - E Feltrin
- Department of Biology, University of Padova, Padova, Italy
| | | | - G Vazza
- Department of Biology, University of Padova, Padova, Italy.
| |
Collapse
|
27
|
Attard CRM, Beheregaray LB, Möller LM. Genotyping‐by‐sequencing for estimating relatedness in nonmodel organisms: Avoiding the trap of precise bias. Mol Ecol Resour 2018; 18:381-390. [DOI: 10.1111/1755-0998.12739] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 11/02/2017] [Accepted: 11/02/2017] [Indexed: 12/29/2022]
Affiliation(s)
- Catherine R. M. Attard
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| | - Luciano B. Beheregaray
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| | - Luciana M. Möller
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| |
Collapse
|
28
|
Moreno-Mayar JV, Potter BA, Vinner L, Steinrücken M, Rasmussen S, Terhorst J, Kamm JA, Albrechtsen A, Malaspinas AS, Sikora M, Reuther JD, Irish JD, Malhi RS, Orlando L, Song YS, Nielsen R, Meltzer DJ, Willerslev E. Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature 2018; 553:203-207. [DOI: 10.1038/nature25173] [Citation(s) in RCA: 212] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 11/26/2017] [Indexed: 12/30/2022]
|
29
|
Mathiesen JS, Kroustrup JP, Vestergaard P, Stochholm K, Poulsen PL, Rasmussen ÅK, Feldt-Rasmussen U, Gaustadnes M, Ørntoft TF, Rossing M, Nielsen FC, Albrechtsen A, Brixen K, Godballe C, Frederiksen AL. Founder Effect of the RET C611Y Mutation in Multiple Endocrine Neoplasia 2A in Denmark: A Nationwide Study. Thyroid 2017; 27:1505-1510. [PMID: 29020875 DOI: 10.1089/thy.2017.0404] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND Multiple endocrine neoplasia (MEN) 2A and 2B are caused by REarranged during Transfection (RET) germline mutations. In a recent nationwide study, an unusually high prevalence (33%) of families with the C611Y mutation was reported, and it was hypothesized that this might be due to a founder effect. The first nationwide study of haplotypes in MEN2A families was conducted, with the aim of investigating the relatedness and occurrence of de novo mutations among Danish families carrying similar mutations. METHODS The study included 21 apparently unrelated MEN2A families identified from a nationwide Danish RET cohort from 1994 to 2014. Twelve, two, two, three, and two families carried the C611Y, C618F, C618Y, C620R, and C634R mutations, respectively. Single nucleotide polymorphism chip data and identity by descent analysis were used to assess relatedness. RESULTS A common founder mutation was found among all 12 C611Y families and between both C618Y families. No relatedness was identified in the remaining families. CONCLUSION The data suggest that all families with the C611Y germline mutation in Denmark originate from a recent common ancestor, probably explaining the unusually high prevalence of this mutation. Additionally, the results indicate that the C611Y mutation rarely arises de novo, thus underlining the need for thorough multigenerational genetic work up in carriers of this mutation.
Collapse
Affiliation(s)
- Jes Sloth Mathiesen
- 1 Department of ORL Head and Neck Surgery, Odense University Hospital , Odense, Denmark
- 2 Department of Clinical Research, University of Southern Denmark , Odense, Denmark
| | - Jens Peter Kroustrup
- 3 Department of Clinical Medicine and Endocrinology, Aalborg University Hospital , Aalborg, Denmark
| | - Peter Vestergaard
- 3 Department of Clinical Medicine and Endocrinology, Aalborg University Hospital , Aalborg, Denmark
| | - Kirstine Stochholm
- 4 Department of Internal Medicine and Endocrinology, Aarhus University Hospital , Aarhus, Denmark
| | - Per Løgstrup Poulsen
- 4 Department of Internal Medicine and Endocrinology, Aarhus University Hospital , Aarhus, Denmark
| | - Åse Krogh Rasmussen
- 5 Department of Medical Endocrinology, Copenhagen University Hospital , Copenhagen, Denmark
| | - Ulla Feldt-Rasmussen
- 5 Department of Medical Endocrinology, Copenhagen University Hospital , Copenhagen, Denmark
| | - Mette Gaustadnes
- 6 Department of Molecular Medicine, Aarhus University Hospital , Aarhus, Denmark
| | - Torben Falck Ørntoft
- 6 Department of Molecular Medicine, Aarhus University Hospital , Aarhus, Denmark
| | - Maria Rossing
- 7 Center for Genomic Medicine, Copenhagen University Hospital , Copenhagen, Denmark
| | - Finn Cilius Nielsen
- 7 Center for Genomic Medicine, Copenhagen University Hospital , Copenhagen, Denmark
| | - Anders Albrechtsen
- 8 Bioinformatics Center, Department of Biology, University of Copenhagen , Copenhagen, Denmark
| | - Kim Brixen
- 2 Department of Clinical Research, University of Southern Denmark , Odense, Denmark
| | - Christian Godballe
- 1 Department of ORL Head and Neck Surgery, Odense University Hospital , Odense, Denmark
| | - Anja Lisbeth Frederiksen
- 2 Department of Clinical Research, University of Southern Denmark , Odense, Denmark
- 9 Department of Clinical Genetics, Odense University Hospital , Odense, Denmark
| |
Collapse
|
30
|
Blant A, Kwong M, Szpiech ZA, Pemberton TJ. Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 2017; 18:928. [PMID: 29191164 PMCID: PMC5709839 DOI: 10.1186/s12864-017-4312-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/16/2017] [Indexed: 12/14/2022] Open
Abstract
Background Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4312-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexandra Blant
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Michelle Kwong
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Zachary A Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.
| |
Collapse
|
31
|
Sikora M, Seguin-Orlando A, Sousa VC, Albrechtsen A, Korneliussen T, Ko A, Rasmussen S, Dupanloup I, Nigst PR, Bosch MD, Renaud G, Allentoft ME, Margaryan A, Vasilyev SV, Veselovskaya EV, Borutskaya SB, Deviese T, Comeskey D, Higham T, Manica A, Foley R, Meltzer DJ, Nielsen R, Excoffier L, Mirazon Lahr M, Orlando L, Willerslev E. Ancient genomes show social and reproductive behavior of early Upper Paleolithic foragers. Science 2017; 358:659-662. [DOI: 10.1126/science.aao1807] [Citation(s) in RCA: 203] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 09/25/2017] [Indexed: 01/01/2023]
|
32
|
He D, Wang Z, Parida L, Eskin E. IPED2: Inheritance Path Based Pedigree Reconstruction Algorithm for Complicated Pedigrees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1094-1103. [PMID: 28368828 DOI: 10.1109/tcbb.2017.2688439] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings. We note that the presence of half-siblings makes the reconstruction problem significantly more challenging which is why previous methods exclude the possibility of half-siblings. We proposed a novel model as well as an efficient graph algorithm and experiments show that our algorithm achieves relatively accurate reconstruction. To our knowledge, this is the first method that is able to handle pedigree reconstruction from genotype data when half-sibling exists in any generation of the pedigree.
Collapse
|
33
|
Ko A, Nielsen R. Composite likelihood method for inferring local pedigrees. PLoS Genet 2017; 13:e1006963. [PMID: 28827797 PMCID: PMC5578687 DOI: 10.1371/journal.pgen.1006963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 08/31/2017] [Accepted: 08/07/2017] [Indexed: 12/21/2022] Open
Abstract
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit. Pedigrees contain information about the genealogical relationships among individuals. This information can be used in many areas of genetic studies such as disease association studies, conservation efforts, and for inferences about the demographic history and social structure of a population. Despite their importance, pedigrees are often unknown and must be estimated from genetic information. However, pedigree inference remains a difficult problem due to the high cost of likelihood computation and the enormous number of possible pedigrees that must be considered. These difficulties limit existing methods in their ability to infer pedigrees when the sample size or the number of markers is large, or when the sample contains only distant relatives. In this report, we present a method that circumvents these computational challenges in order to infer pedigrees of complex structure for a large number of individuals. Using simulations, we find that the method can infer distant relatives much more accurately than existing methods. Furthermore, we show that even pairwise inferences of relatedness can be improved substantially by consideration of the pedigree structure with other related individuals in the sample.
Collapse
Affiliation(s)
- Amy Ko
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- * E-mail:
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Museum of Natural History, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
34
|
Martin MD, Jay F, Castellano S, Slatkin M. Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations. Mol Ecol 2017; 26:4145-4157. [PMID: 28543951 DOI: 10.1111/mec.14188] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 05/05/2017] [Indexed: 02/01/2023]
Abstract
We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available.
Collapse
Affiliation(s)
- Michael D Martin
- Department of Natural History, NTNU University Museum, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.,Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Flora Jay
- Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA.,Laboratoire de Recherche en Informatique, CNRS UMR 8623, Université Paris-Sud, Paris-Saclay, France
| | - Sergi Castellano
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Montgomery Slatkin
- Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| |
Collapse
|
35
|
Genetic screening of the FLCN gene identify six novel variants and a Danish founder mutation. J Hum Genet 2016; 62:151-157. [PMID: 27734835 DOI: 10.1038/jhg.2016.118] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 08/12/2016] [Accepted: 09/06/2016] [Indexed: 12/19/2022]
Abstract
Pathogenic germline mutations in the folliculin (FLCN) tumor suppressor gene predispose to Birt-Hogg-Dubé (BHD) syndrome, a rare disease characterized by the development of cutaneous hamartomas (fibrofolliculomas), multiple lung cysts, spontaneous pneumothoraces and renal cell cancer. In this study, we report the identification of 13 variants and three polymorphisms in the FLCN gene in 143 Danish patients or families with suspected BHD syndrome. Functional mini-gene splicing analysis revealed that two intronic variants (c.1062+2T>G and c.1177-5_1177-3del) introduced splicing aberrations. Eleven families exhibited the c.1062+2T>G mutation. Combined single nucleotide polymorphism array-haplotype analysis showed that these families share a 3-Mb genomic fragment containing the FLCN gene, revealing that the c.1062+2T>G mutation is a Danish founder mutation. On the basis of in silico prediction and functional splicing assays, we classify the 16 identified variants in the FLCN gene as follows: nine as pathogenic, one as likely pathogenic, three as likely benign and three as polymorphisms. In conclusion, the study describes the FLCN mutation spectrum in Danish BHD patients, and contributes to a better understanding of BHD syndrome and management of BHD patients.
Collapse
|
36
|
Gregianin E, Pallafacchina G, Zanin S, Crippa V, Rusmini P, Poletti A, Fang M, Li Z, Diano L, Petrucci A, Lispi L, Cavallaro T, Fabrizi GM, Muglia M, Boaretto F, Vettori A, Rizzuto R, Mostacciuolo ML, Vazza G. Loss-of-function mutations in the SIGMAR1 gene cause distal hereditary motor neuropathy by impairing ER-mitochondria tethering and Ca2+ signalling. Hum Mol Genet 2016; 25:3741-3753. [PMID: 27402882 DOI: 10.1093/hmg/ddw220] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 06/29/2016] [Accepted: 06/30/2016] [Indexed: 01/14/2023] Open
Abstract
Distal hereditary motor neuropathies (dHMNs) are clinically and genetically heterogeneous neurological conditions characterized by degeneration of the lower motor neurons. So far, 18 dHMN genes have been identified, however, about 80% of dHMN cases remain without a molecular diagnosis. By a combination of autozygosity mapping, identity-by-descent segment detection and whole-exome sequencing approaches, we identified two novel homozygous mutations in the SIGMAR1 gene (p.E138Q and p.E150K) in two distinct Italian families affected by an autosomal recessive form of HMN. Functional analyses in several neuronal cell lines strongly support the pathogenicity of the mutations and provide insights into the underlying pathomechanisms involving the regulation of ER-mitochondria tethering, Ca2+ homeostasis and autophagy. Indeed, in vitro, both mutations reduce cell viability, the formation of abnormal protein aggregates preventing the correct targeting of sigma-1R protein to the mitochondria-associated ER membrane (MAM) and thus impinging on the global Ca2+ signalling. Our data definitively demonstrate the involvement of SIGMAR1 in motor neuron maintenance and survival by correlating, for the first time in the Caucasian population, mutations in this gene to distal motor dysfunction and highlight the chaperone activity of sigma-1R at the MAM as a critical aspect in dHMN pathology.
Collapse
Affiliation(s)
| | - Giorgia Pallafacchina
- Department of Biomedical Sciences, University of Padova and CNR Neuroscience Institute, Padova, Italy
| | - Sofia Zanin
- Department of Biomedical Sciences, University of Padova and CNR Neuroscience Institute, Padova, Italy
| | - Valeria Crippa
- Experimental Neurobiology Lab, IRCCS "C. Mondino" National Neurological Institute, Pavia, Italy
| | - Paola Rusmini
- Department of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Angelo Poletti
- Department of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Mingyan Fang
- Department of Science & Technology, BGI-Shenzhen, Shenzhen, China
| | - Zhouxuan Li
- Department of Science & Technology, BGI-Shenzhen, Shenzhen, China
| | - Laura Diano
- Medical Genetics, University Hospital "Tor Vergata", Roma, Italy
| | - Antonio Petrucci
- Neuromuscular and Rare Neurological Diseases Centre Neurology & Neurophysiopathology Unit, ASO San Camillo-Forlanini Hospital of Rome, Rome, Italy
| | - Ludovico Lispi
- Neuromuscular and Rare Neurological Diseases Centre Neurology & Neurophysiopathology Unit, ASO San Camillo-Forlanini Hospital of Rome, Rome, Italy
| | - Tiziana Cavallaro
- Section of Neuropathology, Neurological and Movement Sciences, University of Verona, Verona, Italy
| | - Gian M Fabrizi
- Section of Neuropathology, Neurological and Movement Sciences, University of Verona, Verona, Italy
| | - Maria Muglia
- CNR Institute of Neurological Sciences, Mangone, Cosenza, Italy
| | | | | | - Rosario Rizzuto
- Department of Biomedical Sciences, University of Padova and CNR Neuroscience Institute, Padova, Italy
| | | | | |
Collapse
|
37
|
Henden L, Freytag S, Afawi Z, Baldassari S, Berkovic SF, Bisulli F, Canafoglia L, Casari G, Crompton DE, Depienne C, Gecz J, Guerrini R, Helbig I, Hirsch E, Keren B, Klein KM, Labauge P, LeGuern E, Licchetta L, Mei D, Nava C, Pippucci T, Rudolf G, Scheffer IE, Striano P, Tinuper P, Zara F, Corbett M, Bahlo M. Identity by descent fine mapping of familial adult myoclonus epilepsy (FAME) to 2p11.2-2q11.2. Hum Genet 2016; 135:1117-25. [PMID: 27368338 DOI: 10.1007/s00439-016-1700-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/21/2016] [Indexed: 02/03/2023]
Abstract
Familial adult myoclonus epilepsy (FAME) is a rare autosomal dominant disorder characterized by adult onset, involuntary muscle jerks, cortical myoclonus and occasional seizures. FAME is genetically heterogeneous with more than 70 families reported worldwide and five potential disease loci. The efforts to identify potential causal variants have been unsuccessful in all but three families. To date, linkage analysis has been the main approach to find and narrow FAME critical regions. We propose an alternative method, pedigree free identity-by-descent (IBD) mapping, that infers regions of the genome between individuals that have been inherited from a common ancestor. IBD mapping provides an alternative to linkage analysis in the presence of allelic and locus heterogeneity by detecting clusters of individuals who share a common allele. Succeeding IBD mapping, gene prioritization based on gene co-expression analysis can be used to identify the most promising candidate genes. We performed an IBD analysis using high-density single nucleotide polymorphism (SNP) array data followed by gene prioritization on a FAME cohort of ten European families and one Australian/New Zealander family; eight of which had known disease loci. By identifying IBD regions common to multiple families, we were able to narrow the FAME2 locus to a 9.78 megabase interval within 2p11.2-q11.2. We provide additional evidence of a founder effect in four Italian families and allelic heterogeneity with at least four distinct founders responsible for FAME at the FAME2 locus. In addition, we suggest candidate disease genes using gene prioritization based on gene co-expression analysis.
Collapse
Affiliation(s)
- Lyndal Henden
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Zaid Afawi
- Tel Aviv University Medical School, 69978, Tel Aviv, Israel
| | - Sara Baldassari
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Samuel F Berkovic
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia
| | - Francesca Bisulli
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Laura Canafoglia
- Neurophysiopathology and Epilepsy Center, IRCCS Foundation C. Besta Neurological Institute, Milan, Italy
| | - Giorgio Casari
- Division of Genetics and Cell Biology, Università Vita-Salute San Raffaele, San Raffaele Scientific Institute, Milan, Italy
| | | | - Christel Depienne
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Laboratoire de diagnostic génétique, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Jozef Gecz
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia.,School of Biological Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Renzo Guerrini
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy.,IRCCS Stella Maris Foundation, Pisa, Italy
| | - Ingo Helbig
- Department of Neuropediatrics, Christian-Albrechts-University of Kiel and University Medical Center, Kiel, Schleswig-Holstein, Germany.,Departments of Brain and Cognitive Sciences, Physiology and Cell Biology, Zlotowski Center for Neuroscience, Ben-Gurion University of the Negev, Negev, Israel.,Division of Neurology, The Children's Hospital of Philadelphia, Philadelphia, USA
| | - Edouard Hirsch
- Medical and Surgical Epilepsy Unit, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Boris Keren
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Karl Martin Klein
- Department of Neurology, Epilepsy Center Frankfurt Rhine-Main, Center of Neurology and Neurosurgery, University Hospital, Goethe-University Frankfurt, Frankfurt, Germany.,Department of Neurology, Epilepsy Center Hessen, University Hospitals Giessen and Marburg, Philipps-University Marburg, Marburg, Germany
| | - Pierre Labauge
- Department of Neurology, Montpellier University, Gui de Chauliac, 34295, Montpellier, Cedex 5, France
| | - Eric LeGuern
- Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France.,INSERM, U 1127; CNRS, UMR 7225; INSERM UMR 975; Institut du Cerveau et de la Moelle Epinière; and Département de Génétique et de Cytogénétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux De Paris (AP-HP), Paris, France.,Université Pierre et Marie Curie (Paris 6) (UPMC), UMRS 975, Paris, France
| | - Laura Licchetta
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Davide Mei
- Pediatric Neurology, Neurogenetics and Neurobiology Unit and Laboratories, Neuroscience Department, A Meyer Children's Hospital, University of Florence, Florence, Italy
| | - Caroline Nava
- Département de Génétique, Hôpital de la Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, 75013, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06,UMR S 1127, ICM, 75013, Paris, France
| | - Tommaso Pippucci
- Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi-Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Gabrielle Rudolf
- Département de Médicine translationnelle et Neurogénétique, IGBMC, CNRS UMR 7104/INSERM U964/Université de Strasbourg, Illkirch, France.,Department of Neurology, Hautepierre Hospital, University of Strasbourg, Strasbourg, France
| | - Ingrid Eileen Scheffer
- Epilepsy Research Centre, Department of Medicine, University of Melbourne Austin Health, Melbourne, VIC, 3084, Australia.,Florey Institute of Neuroscience and Mental Health, Melbourne, VIC, 3084, Australia.,Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Melbourne, VIC, 3052, Australia
| | - Pasquale Striano
- Pediatric Neurology and Muscular Diseases Unit, Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, Gaslini Institute, Genoa, Italy
| | - Paolo Tinuper
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy.,Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy
| | - Federico Zara
- Laboratory of Neurogenetics, Department of Neurosciences, Gaslini Institute, Genoa, Italy
| | - Mark Corbett
- Robinson Institute and School of Medicine, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, 3052, Australia. .,Department of Medical Biology, University of Melbourne, Melbourne, VIC, 3010, Australia.
| |
Collapse
|
38
|
Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3-GENES GENOMES GENETICS 2016; 6:1287-96. [PMID: 26935417 PMCID: PMC4856080 DOI: 10.1534/g3.116.027581] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
Collapse
|
39
|
Henden L, Wakeham D, Bahlo M. XIBD: software for inferring pairwise identity by descent on the X chromosome. ACTA ACUST UNITED AC 2016; 32:2389-91. [PMID: 27153693 DOI: 10.1093/bioinformatics/btw124] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 03/01/2016] [Indexed: 11/13/2022]
Abstract
UNLABELLED XIBD performs pairwise relatedness mapping on the X chromosome using dense single nucleotide polymorphism (SNP) data from either SNP chips or next generation sequencing data. It correctly accounts for the difference in chromosomal numbers between males and females and estimates global relatedness as well as regions of the genome that are identical by descent (IBD). XIBD also generates novel graphical summaries of all pairwise IBD tracts for a cohort making it very useful for disease locus mapping. AVAILABILITY AND IMPLEMENTATION XIBD is written in R/Rcpp and executed from shell scripts that are freely available from http://bioinf.wehi.edu.au/software/XIBD along with accompanying reference datasets. CONTACT henden.l@wehi.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lyndal Henden
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia Department of Medical Biology
| | | | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia Department of Medical Biology School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
40
|
Dad S, Rendtorff ND, Kann E, Albrechtsen A, Mehrjouy MM, Bak M, Tommerup N, Tranebjærg L, Rosenberg T, Jensen H, Møller LB. Partial USH2A deletions contribute to Usher syndrome in Denmark. Eur J Hum Genet 2015; 23:1646-51. [PMID: 25804404 DOI: 10.1038/ejhg.2015.54] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 02/16/2015] [Accepted: 02/20/2015] [Indexed: 12/18/2022] Open
Abstract
Usher syndrome is an autosomal recessive disorder characterized by congenital hearing impairment, progressive visual loss owing to retinitis pigmentosa and in some cases vestibular dysfunction. Usher syndrome is divided into three subtypes, USH1, USH2 and USH3. Twelve loci and eleven genes have so far been identified. Duplications and deletions in PCDH15 and USH2A that lead to USH1 and USH2, respectively, have previously been identified in patients from United Kingdom, Spain and Italy. In this study, we investigate the proportion of exon deletions and duplications in PCDH15 and USH2A in 20 USH1 and 30 USH2 patients from Denmark using multiplex ligation-dependent probe amplification (MLPA). Two heterozygous deletions were identified in USH2A, but no deletions or duplications were identified in PCDH15. Next-generation mate-pair sequencing was used to identify the exact breakpoints of the two deletions identified in USH2A. Our results suggest that USH2 is caused by USH2A exon deletions in a small fraction of the patients, whereas deletions or duplications in PCDH15 might be rare in Danish Usher patients.
Collapse
Affiliation(s)
- Shzeena Dad
- Clinical Genetics Clinic, The Kennedy Center, Rigshospitalet, University of Copenhagen, Glostrup, Denmark
| | - Nanna D Rendtorff
- Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark.,Department of Otorhinolaryngology, Head and Neck Surgery and Audiology, Bispebjerg Hospital/Rigshospitalet, Copenhagen, Denmark
| | - Erik Kann
- Clinical Genetics Clinic, The Kennedy Center, Rigshospitalet, University of Copenhagen, Glostrup, Denmark
| | - Anders Albrechtsen
- Department of Biology, Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark
| | - Mana M Mehrjouy
- Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Mads Bak
- Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Niels Tommerup
- Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Lisbeth Tranebjærg
- Department of Cellular and Molecular Medicine, University of Copenhagen, Copenhagen, Denmark.,Department of Otorhinolaryngology, Head and Neck Surgery and Audiology, Bispebjerg Hospital/Rigshospitalet, Copenhagen, Denmark
| | - Thomas Rosenberg
- Department of Ophthalmology, The National Eye Clinic, Copenhagen University Hospital, The Kennedy Center, Glostrup, Denmark
| | - Hanne Jensen
- Department of Ophthalmology, The National Eye Clinic, Copenhagen University Hospital, The Kennedy Center, Glostrup, Denmark
| | - Lisbeth B Møller
- Clinical Genetics Clinic, The Kennedy Center, Rigshospitalet, University of Copenhagen, Glostrup, Denmark
| |
Collapse
|
41
|
Park DS, Baran Y, Hormozdiari F, Eng C, Torgerson DG, Burchard EG, Zaitlen N. PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling. BMC Bioinformatics 2015; 16 Suppl 5:S9. [PMID: 25860540 PMCID: PMC4402697 DOI: 10.1186/1471-2105-16-s5-s9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult. To overcome this, many state of the art methods estimate the probability of IBD between each pair of haplotypes separately. While computationally efficient, these methods fail to leverage the clique structure of IBD resulting in less powerful IBD identification, especially for small IBD segments. We develop a hybrid approach (PIGS), which combines the computational efficiency of pairwise methods with the power of multiway methods. It leverages the IBD graph structure to compute the probability of IBD conditional on all pairwise estimates simultaneously. We show via extensive simulations and analysis of real data that our method produces a substantial increase in the number of identified small IBD segments.
Collapse
|
42
|
Zheng C, Kuhner MK, Thompson EA. Joint inference of identity by descent along multiple chromosomes from population samples. J Comput Biol 2014; 21:185-200. [PMID: 24606562 DOI: 10.1089/cmb.2013.0140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
There has been much interest in detecting genomic identity by descent (IBD) segments from modern dense genetic marker data and in using them to identify human disease susceptibility loci. Here we present a novel Bayesian framework using Markov chain Monte Carlo (MCMC) realizations to jointly infer IBD states among multiple individuals not known to be related, together with the allelic typing error rate and the IBD process parameters. The data are phased single nucleotide polymorphism (SNP) haplotypes. We model changes in latent IBD state along homologous chromosomes by a continuous time Markov model having the Ewens sampling formula as its stationary distribution. We show by simulation that this model for the IBD process fits quite well with the coalescent predictions. Using simulation data sets of 40 haplotypes over regions of 1 and 10 million base pairs (Mbp), we show that the jointly estimated IBD states are very close to the true values, although the presence of linkage disequilibrium decreases the accuracy. We also present comparisons with the ibd_haplo program, which estimates IBD among sets of four haplotypes. Our new IBD detection method focuses on the scale between genome-wide methods using simple IBD models and complex coalescent-based methods that are limited to short genome segments. At the scale of a few Mbp, our approach offers potentially more power for fine-scale IBD association mapping.
Collapse
Affiliation(s)
- Chaozhi Zheng
- 1 Department of Statistics, University of Washington , Seattle, Washington
| | | | | |
Collapse
|
43
|
Raghavan M, DeGiorgio M, Albrechtsen A, Moltke I, Skoglund P, Korneliussen TS, Grønnow B, Appelt M, Gulløv HC, Friesen TM, Fitzhugh W, Malmström H, Rasmussen S, Olsen J, Melchior L, Fuller BT, Fahrni SM, Stafford T, Grimes V, Renouf MAP, Cybulski J, Lynnerup N, Lahr MM, Britton K, Knecht R, Arneborg J, Metspalu M, Cornejo OE, Malaspinas AS, Wang Y, Rasmussen M, Raghavan V, Hansen TVO, Khusnutdinova E, Pierre T, Dneprovsky K, Andreasen C, Lange H, Hayes MG, Coltrain J, Spitsyn VA, Götherström A, Orlando L, Kivisild T, Villems R, Crawford MH, Nielsen FC, Dissing J, Heinemeier J, Meldgaard M, Bustamante C, O'Rourke DH, Jakobsson M, Gilbert MTP, Nielsen R, Willerslev E. The genetic prehistory of the New World Arctic. Science 2014; 345:1255832. [PMID: 25170159 DOI: 10.1126/science.1255832] [Citation(s) in RCA: 147] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The New World Arctic, the last region of the Americas to be populated by humans, has a relatively well-researched archaeology, but an understanding of its genetic history is lacking. We present genome-wide sequence data from ancient and present-day humans from Greenland, Arctic Canada, Alaska, Aleutian Islands, and Siberia. We show that Paleo-Eskimos (~3000 BCE to 1300 CE) represent a migration pulse into the Americas independent of both Native American and Inuit expansions. Furthermore, the genetic continuity characterizing the Paleo-Eskimo period was interrupted by the arrival of a new population, representing the ancestors of present-day Inuit, with evidence of past gene flow between these lineages. Despite periodic abandonment of major Arctic regions, a single Paleo-Eskimo metapopulation likely survived in near-isolation for more than 4000 years, only to vanish around 700 years ago.
Collapse
Affiliation(s)
- Maanasa Raghavan
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University, 502 Wartik Laboratory, University Park, PA 16802, USA
| | - Anders Albrechtsen
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark
| | - Ida Moltke
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark. Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Pontus Skoglund
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden. Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Thorfinn S Korneliussen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Bjarne Grønnow
- Arctic Centre at the Ethnographic Collections (SILA), National Museum of Denmark, Frederiksholms Kanal 12, 1220 Copenhagen, Denmark
| | - Martin Appelt
- Arctic Centre at the Ethnographic Collections (SILA), National Museum of Denmark, Frederiksholms Kanal 12, 1220 Copenhagen, Denmark
| | - Hans Christian Gulløv
- Arctic Centre at the Ethnographic Collections (SILA), National Museum of Denmark, Frederiksholms Kanal 12, 1220 Copenhagen, Denmark
| | - T Max Friesen
- Department of Anthropology, University of Toronto, Toronto, Ontario M5S 2S2, Canada
| | - William Fitzhugh
- Arctic Studies Center, Post Office Box 37012, Department of Anthropology, MRC 112, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA
| | - Helena Malmström
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark. Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Simon Rasmussen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Jesper Olsen
- AMS 14C Dating Centre, Department of Physics and Astronomy, Aarhus University, Ny Munkegade 120, 8000 Aarhus C, Denmark
| | - Linea Melchior
- Anthropological Laboratory, Institute of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Frederik V's Vej 11, 2100 Copenhagen, Denmark
| | - Benjamin T Fuller
- Department of Earth System Science, University of California, Irvine, CA 92697, USA
| | - Simon M Fahrni
- Department of Earth System Science, University of California, Irvine, CA 92697, USA
| | - Thomas Stafford
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark. AMS 14C Dating Centre, Department of Physics and Astronomy, Aarhus University, Ny Munkegade 120, 8000 Aarhus C, Denmark
| | - Vaughan Grimes
- Department of Archaeology, Memorial University, Queen's College, 210 Prince Philip Drive, St. John's, Newfoundland, A1C 5S7, Canada. Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - M A Priscilla Renouf
- Department of Archaeology, Memorial University, Queen's College, 210 Prince Philip Drive, St. John's, Newfoundland, A1C 5S7, Canada
| | - Jerome Cybulski
- Canadian Museum of History, 100 Rue Laurier, Gatineau, Quebec K1A 0M8, Canada. Department of Anthropology, University of Western Ontario, 1151 Richmond Street North, London N6A 5C2, Canada
| | - Niels Lynnerup
- Anthropological Laboratory, Institute of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Frederik V's Vej 11, 2100 Copenhagen, Denmark
| | - Marta Mirazon Lahr
- Leverhulme Centre for Human Evolutionary Studies, Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 1QH, UK
| | - Kate Britton
- Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany. Department of Archaeology, University of Aberdeen, St. Mary's Building, Elphinstone Road, Aberdeen AB24 3UF, Scotland, UK
| | - Rick Knecht
- Department of Archaeology, University of Aberdeen, St. Mary's Building, Elphinstone Road, Aberdeen AB24 3UF, Scotland, UK
| | - Jette Arneborg
- National Museum of Denmark, Frederiksholms kanal 12, 1220 Copenhagen, Denmark. School of Geosciences, University of Edinburgh, Edinburgh EH8 9XP, UK
| | - Mait Metspalu
- Estonian Biocentre, Evolutionary Biology Group, Tartu 51010, Estonia. Department of Evolutionary Biology, University of Tartu, Tartu 51010, Estonia
| | - Omar E Cornejo
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA. School of Biological Sciences, Washington State University, Post Office Box 644236, Pullman, WA 99164, USA
| | - Anna-Sapfo Malaspinas
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Yong Wang
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA. Ancestry.com DNA LLC, San Francisco, CA 94107, USA
| | - Morten Rasmussen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Vibha Raghavan
- Informatics and Bio-computing, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, M5G 0A3, Canada
| | - Thomas V O Hansen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark
| | - Elza Khusnutdinova
- Institute of Biochemistry and Genetics, Ufa Scientific Center of Russian Academy of Sciences, Ufa, Russia. Department of Genetics and Fundamental Medicine, Bashkir State University, Ufa, Bashkortostan 450074, Russia
| | - Tracey Pierre
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Kirill Dneprovsky
- State Museum for Oriental Art, 12a, Nikitsky Boulevard, Moscow 119019, Russia
| | - Claus Andreasen
- Greenland National Museum and Archives, Post Office Box 145, 3900 Nuuk, Greenland
| | - Hans Lange
- Greenland National Museum and Archives, Post Office Box 145, 3900 Nuuk, Greenland
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA. Department of Anthropology, Weinberg College of Arts and Sciences, Northwestern University, Evanston, IL 60208, USA. Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Joan Coltrain
- Department of Anthropology, University of Utah, Salt Lake City, UT 84112, USA
| | - Victor A Spitsyn
- Research Centre for Medical Genetics of Russian Academy of Medical Sciences, 1 Moskvorechie, Moscow 115478, Russia
| | - Anders Götherström
- Department of Archaeology and Classical Studies, Stockholm University, Stockholm, Sweden
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Toomas Kivisild
- Estonian Biocentre, Evolutionary Biology Group, Tartu 51010, Estonia. Department of Archaeology and Anthropology, University of Cambridge, Cambridge CB2 1QH, UK
| | - Richard Villems
- Estonian Biocentre, Evolutionary Biology Group, Tartu 51010, Estonia. Department of Evolutionary Biology, University of Tartu, Tartu 51010, Estonia
| | - Michael H Crawford
- Laboratory of Biological Anthropology, University of Kansas, Lawrence, KS 66045, USA
| | - Finn C Nielsen
- Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark
| | - Jørgen Dissing
- Anthropological Laboratory, Institute of Forensic Medicine, Faculty of Health Sciences, University of Copenhagen, Frederik V's Vej 11, 2100 Copenhagen, Denmark
| | - Jan Heinemeier
- AMS 14C Dating Centre, Department of Physics and Astronomy, Aarhus University, Ny Munkegade 120, 8000 Aarhus C, Denmark
| | - Morten Meldgaard
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Carlos Bustamante
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Dennis H O'Rourke
- Department of Anthropology, University of Utah, Salt Lake City, UT 84112, USA
| | - Mattias Jakobsson
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark.
| |
Collapse
|
44
|
Bahlo M, Tankard R, Lukic V, Oliver KL, Smith KR. Using familial information for variant filtering in high-throughput sequencing studies. Hum Genet 2014; 133:1331-41. [PMID: 25129038 PMCID: PMC4185103 DOI: 10.1007/s00439-014-1479-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/07/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, ‘sporadic’ cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data.
Collapse
Affiliation(s)
- Melanie Bahlo
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia,
| | | | | | | | | |
Collapse
|
45
|
Leblois R, Pudlo P, Néron J, Bertaux F, Reddy Beeravolu C, Vitalis R, Rousset F. Maximum-likelihood inference of population size contractions from microsatellite data. Mol Biol Evol 2014; 31:2805-23. [PMID: 25016583 DOI: 10.1093/molbev/msu212] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Understanding the demographic history of populations and species is a central issue in evolutionary biology and molecular ecology. In this work, we develop a maximum-likelihood method for the inference of past changes in population size from microsatellite allelic data. Our method is based on importance sampling of gene genealogies, extended for new mutation models, notably the generalized stepwise mutation model (GSM). Using simulations, we test its performance to detect and characterize past reductions in population size. First, we test the estimation precision and confidence intervals coverage properties under ideal conditions, then we compare the accuracy of the estimation with another available method (MSVAR) and we finally test its robustness to misspecification of the mutational model and population structure. We show that our method is very competitive compared with alternative ones. Moreover, our implementation of a GSM allows more accurate analysis of microsatellite data, as we show that the violations of a single step mutation assumption induce very high bias toward false contraction detection rates. However, our simulation tests also showed some limits, which most importantly are large computation times for strong disequilibrium scenarios and a strong influence of some form of unaccounted population structure. This inference method is available in the latest implementation of the MIGRAINE software package.
Collapse
Affiliation(s)
- Raphaël Leblois
- INRA, UMR 1062 CBGP (INRA-IRD-CIRAD-Montpellier Supagro), Montpellier, France Muséum National d'Histoire Naturelle, CNRS, UMR OSEB, Paris, France Institut de Biologie Computationnelle, Montpellier, France
| | - Pierre Pudlo
- INRA, UMR 1062 CBGP (INRA-IRD-CIRAD-Montpellier Supagro), Montpellier, France Institut de Biologie Computationnelle, Montpellier, France Université Montpellier 2, CNRS, UMR I3M, Montpellier, France
| | - Joseph Néron
- Muséum National d'Histoire Naturelle, CNRS, UMR OSEB, Paris, France
| | - François Bertaux
- Muséum National d'Histoire Naturelle, CNRS, UMR OSEB, Paris, France INRIA Paris-Rocquencourt, BANG Team, Le Chesnay, France
| | | | - Renaud Vitalis
- INRA, UMR 1062 CBGP (INRA-IRD-CIRAD-Montpellier Supagro), Montpellier, France Institut de Biologie Computationnelle, Montpellier, France
| | - François Rousset
- Institut de Biologie Computationnelle, Montpellier, France Université Montpellier 2, CNRS, UMR ISEM, Montpellier, France
| |
Collapse
|
46
|
Durand EY, Eriksson N, McLean CY. Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis. Mol Biol Evol 2014; 31:2212-22. [PMID: 24784137 PMCID: PMC4104314 DOI: 10.1093/molbev/msu151] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.
Collapse
|
47
|
He D. IBD-Groupon: an efficient method for detecting group-wise identity-by-descent regions simultaneously in multiple individuals based on pairwise IBD relationships. Bioinformatics 2013; 29:i162-70. [PMID: 23812980 PMCID: PMC3694672 DOI: 10.1093/bioinformatics/btt237] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detecting IBD tracts is an important problem in genetics. Most of the existing methods focus on detecting pairwise IBD tracts, which have relatively low power to detect short IBD tracts. Methods to detect IBD tracts among multiple individuals simultaneously, or group-wise IBD tracts, have better performance for short IBD tracts detection. Group-wise IBD tracts can be applied to a wide range of applications, such as disease mapping, pedigree reconstruction and so forth. The existing group-wise IBD tract detection method is computationally inefficient and is only able to handle small datasets, such as 20, 30 individuals with hundreds of SNPs. It also requires a previous specification of the number of IBD groups, or partitions of the individuals where all the individuals in the same partition are IBD with each other, which may not be realistic in many cases. The method can only handle a small number of IBD groups, such as two or three, because of scalability issues. What is more, it does not take LD (linkage disequilibrium) into consideration. RESULTS In this work, we developed an efficient method IBD-Groupon, which detects group-wise IBD tracts based on pairwise IBD relationships, and it is able to address all the drawbacks aforementioned. To our knowledge, our method is the first practical group-wise IBD tracts detection method that is scalable to very large datasets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile, it is powerful to detect short IBD tracts. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration, as it is based on pairwise IBD tracts where LD can be easily incorporated.
Collapse
Affiliation(s)
- Dan He
- Computational Genomics, IBM TJ Watson Research, Yorktown Heights, NY 10598, USA.
| |
Collapse
|
48
|
Moltke I, Albrechtsen A. RelateAdmix: a software tool for estimating relatedness between admixed individuals. ACTA ACUST UNITED AC 2013; 30:1027-8. [PMID: 24215025 DOI: 10.1093/bioinformatics/btt652] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Pairwise relatedness plays an important role in a range of genetic research fields. However, currently only few estimators exist for individuals that are admixed, i.e. have ancestry from more than one population, and these estimators fail in some situations. RESULTS We present a new software tool, RelateAdmix, for obtaining maximum likelihood estimates of pairwise relatedness from genetic data between admixed individuals. We show using simulated data that it gives rise to better estimates than three state-of-the-art software tools, REAP, KING and Plink, while still being fast enough to be applicable to large datasets. AVAILABILITY AND IMPLEMENTATION The software tool, implemented in C and R, is freely available from www.popgen.dk/software.
Collapse
Affiliation(s)
- Ida Moltke
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA and Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark
| | | |
Collapse
|
49
|
Gauvin H, Moreau C, Lefebvre JF, Laprise C, Vézina H, Labuda D, Roy-Gagnon MH. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur J Hum Genet 2013; 22:814-21. [PMID: 24129432 PMCID: PMC4023206 DOI: 10.1038/ejhg.2013.227] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 08/07/2013] [Accepted: 09/04/2013] [Indexed: 12/16/2022] Open
Abstract
In genetics the ability to accurately describe the familial relationships among a group of individuals can be very useful. Recent statistical tools succeeded in assessing the degree of relatedness up to 6-7 generations with good power using dense genome-wide single-nucleotide polymorphism data to estimate the extent of identity-by-descent (IBD) sharing. It is therefore important to describe genome-wide patterns of IBD sharing for more remote and complex relatedness between individuals, such as that observed in a founder population like Quebec, Canada. Taking advantage of the extended genealogical records of the French Canadian founder population, we first compared different tools to identify regions of IBD in order to best describe genome-wide IBD sharing and its correlation with genealogical characteristics. Results showed that the extent of IBD sharing identified with FastIBD correlates best with relatedness measured using genealogical data. Total length of IBD sharing explained 85% of the genealogical kinship's variance. In addition, we observed significantly higher sharing in pairs of individuals with at least one inbred ancestor compared with those without any. Furthermore, patterns of IBD sharing and average sharing were different across regional populations, consistent with the settlement history of Quebec. Our results suggest that, as expected, the complex relatedness present in founder populations is reflected in patterns of IBD sharing. Using these patterns, it is thus possible to gain insight on the types of distant relationships in a sample from a founder population like Quebec.
Collapse
Affiliation(s)
- Héloïse Gauvin
- 1] Département de médecine sociale et préventive, Université de Montréal, Montréal, Québec, Canada [2] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Claudia Moreau
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Jean-François Lefebvre
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Catherine Laprise
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Hélène Vézina
- Département des sciences humaines, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Damian Labuda
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Département de pédiatrie, Université de Montréal, Montréal, Québec, Canada
| | - Marie-Hélène Roy-Gagnon
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
50
|
He D, Wang Z, Han B, Parida L, Eskin E. IPED: inheritance path-based pedigree reconstruction algorithm using genotype data. J Comput Biol 2013; 20:780-91. [PMID: 24093229 PMCID: PMC3791035 DOI: 10.1089/cmb.2013.0080] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The problem of inference of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. Various methods have been proposed to automate the process of pedigree reconstruction given the genotypes or haplotypes of a set of individuals. Current methods, unfortunately, are very time-consuming and inaccurate for complicated pedigrees, such as pedigrees with inbreeding. In this work, we propose an efficient algorithm that is able to reconstruct large pedigrees with reasonable accuracy. Our algorithm reconstructs the pedigrees generation by generation, backward in time from the extant generation. We predict the relationships between individuals in the same generation using an inheritance path-based approach implemented with an efficient dynamic programming algorithm. Experiments show that our algorithm runs in linear time with respect to the number of reconstructed generations, and therefore, it can reconstruct pedigrees that have a large number of generations. Indeed it is the first practical method for reconstruction of large pedigrees from genotype data.
Collapse
Affiliation(s)
- Dan He
- IBM T.J. Watson Research, Yorktown Heights, New York
| | - Zhanyong Wang
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California
| | - Buhm Han
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Laxmi Parida
- IBM T.J. Watson Research, Yorktown Heights, New York
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, California
| |
Collapse
|