1
|
Aanes H, Vigeland MD, Star B, Gilfillan GD, Mattingsdal M, Trøan S, Strand M, Eide LM, Hanssen EN. Heating up three cold cases in Norway using investigative genetic genealogy. Forensic Sci Int Genet 2024; 76:103217. [PMID: 39787642 DOI: 10.1016/j.fsigen.2024.103217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 09/09/2024] [Accepted: 12/22/2024] [Indexed: 01/12/2025]
Abstract
With the advent of commercial DNA databases, investigative genetic genealogy (IGG) has emerged as a powerful forensic tool, rivalling the impact of STR analyses, introduced four decades ago. IGG has been frequently applied in the US and tested in other countries, but never in Norway. Here, we apply IGG to three cold criminal cases and successfully identify the donor of the DNA in two of these cases. Our findings suggest that when combined with phenotypic prediction and case information, IGG holds substantial potential for resolving both active and cold cases in Norway. This potential is amplified by the digitalization of archives and the transparent and structured nature of society in Norway. Additionally, the databases exhibit sufficient representation to yield matches with distant relatives. Moreover, this work has uncovered a series of lingering research questions spanning the entire workflow from DNA extraction to genealogy research. Finally, we highlight the possibility that more insights can be gleaned from genetic profiles, for instance using an accurate age prediction method. The results show that IGG can be successfully applied in Norway, having reached a level of maturity that enables identification of unknown individuals in cases where DNA is accessible.
Collapse
Affiliation(s)
- Håvard Aanes
- Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Magnus D Vigeland
- Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway
| | - Bastiaan Star
- Center for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Blindern, PO Box 1066, Oslo 0316, Norway
| | - Gregor D Gilfillan
- Department Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Morten Mattingsdal
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0313, Norway; Department of Medical Research, Vestre Viken Hospital Trust, Bærum Hospital, Gjettum 1346, Norway
| | | | | | | | | |
Collapse
|
2
|
Llorin H, Tennen R, Laskey S, Zhan J, Detweiler S, Abul-Husn NS. Shortcomings of ethnicity-based carrier screening for conditions associated with Ashkenazi Jewish ancestry. GENETICS IN MEDICINE OPEN 2024; 2:101869. [PMID: 39669632 PMCID: PMC11613755 DOI: 10.1016/j.gimo.2024.101869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 07/09/2024] [Accepted: 07/11/2024] [Indexed: 12/14/2024]
Abstract
Purpose Carrier screening identifies reproductive risk for autosomal recessive and X-linked genetic conditions. Currently, some medical society guidelines continue to recommend ethnicity-based carrier screening for conditions associated with Ashkenazi Jewish (AJ) ancestry. We assessed the utility and limitations of these guidelines in a large, ethnically and genetically diverse cohort of genotyped individuals. Methods We characterized the self-reported ethnicity and genetic ancestry of over 110,000 consenting research participants identified as heterozygous for pathogenic variants associated with 15 autosomal recessive conditions recommended by the American College of Obstetricians and Gynecologists for screening in individuals of AJ descent. Results Out of 7.2 million research participants, 116,517 research participants were identified as heterozygous for pathogenic variants associated with 15 conditions evaluated. The majority (54.9%) of heterozygotes did not report qualifying ethnicity under American College of Obstetricians and Gynecologists ethnicity-based screening guidelines. Approximately half (51.3%) of all individuals heterozygous for pathogenic variants in genes associated with 1 or more conditions recommended to be screened exclusively in individuals of AJ descent had <20% computed AJ ancestry. Conclusion Ethnicity-based carrier screening leads to the under detection of heterozygotes and associated reproductive risk for conditions historically associated with AJ ancestry.
Collapse
|
3
|
Ji Q, Yao Y, Li Z, Zhou Z, Qian J, Tang Q, Xie J. Characterizing identity by descent segments in Chinese interpopulation unrelated individual pairs. Mol Genet Genomics 2024; 299:37. [PMID: 38494535 DOI: 10.1007/s00438-024-02132-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/22/2024] [Indexed: 03/19/2024]
Abstract
Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.
Collapse
Affiliation(s)
- Qiqi Ji
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhimin Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.
| |
Collapse
|
4
|
Nguyen R, Kapp JD, Sacco S, Myers SP, Green RE. A computational approach for positive genetic identification and relatedness detection from low-coverage shotgun sequencing data. J Hered 2023; 114:504-512. [PMID: 37381815 PMCID: PMC10445519 DOI: 10.1093/jhered/esad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 06/28/2023] [Indexed: 06/30/2023] Open
Abstract
Several methods exist for detecting genetic relatedness or identity by comparing DNA information. These methods generally require genotype calls, either single-nucleotide polymorphisms or short tandem repeats, at the sites used for comparison. For some DNA samples, like those obtained from bone fragments or single rootless hairs, there is often not enough DNA present to generate genotype calls that are accurate and complete enough for these comparisons. Here, we describe IBDGem, a fast and robust computational procedure for detecting genomic regions of identity-by-descent by comparing low-coverage shotgun sequence data against genotype calls from a known query individual. At less than 1× genome coverage, IBDGem reliably detects segments of relatedness and can make high-confidence identity detections with as little as 0.01× genome coverage.
Collapse
Affiliation(s)
- Remy Nguyen
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Joshua D Kapp
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Steven P Myers
- California Department of Justice Jan Bashinski DNA Laboratory, Richmond, CA, United States
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
5
|
Bridging Disciplines to Form a New One: The Emergence of Forensic Genetic Genealogy. Genes (Basel) 2022; 13:genes13081381. [PMID: 36011291 PMCID: PMC9407302 DOI: 10.3390/genes13081381] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 01/27/2023] Open
Abstract
Forensic Genetic Genealogy (FGG) has fast become a popular tool in criminal investigations since it first emerged in 2018. FGG is a novel investigatory tool that has been applied to hundreds of unresolved cold cases in the United States to generate investigative leads and identify unknown individuals. Consumer DNA testing and the public’s increased curiosity about their own DNA and genetic ancestry, have greatly contributed to the availability of human genetic data. Genetic genealogy has been a field of study/interest for many years as both amateur and professional genetic genealogists use consumer DNA data to explore genetic connections in family trees. FGG encompasses this knowledge by applying advanced sequencing technologies to forensic DNA evidence samples and by performing genetic genealogy methods and genealogical research, to produce possible identities of unknown perpetrators of violent crimes and unidentified human remains. This combination of forensic genetics, genetic genealogy, and genealogical research has formed a new subdiscipline within the forensic sciences. This paper will summarize the individual disciplines that led to the emergence of FGG, its practice in forensic investigations, and current/future considerations for its use.
Collapse
|
6
|
Tournebize R, Chu G, Moorjani P. Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals. PLoS Genet 2022; 18:e1010243. [PMID: 35737729 PMCID: PMC9223333 DOI: 10.1371/journal.pgen.1010243] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 05/08/2022] [Indexed: 11/30/2022] Open
Abstract
Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups––including Native Americans, Oceanians and South Asians––have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time––with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events. A founder event occurs when small numbers of ancestral individuals give rise to a large fraction of the population. Founder events reduce genetic variation and increase the risk of recessive diseases. Despite their importance in evolutionary and disease studies, we still only have a limited comprehension of their prevalence and properties in humans and other species, as most existing methods require large sample sizes or phased genomes. Here, we present a flexible method, ASCEND, to infer the timing and the strength of founder events that is suitable for sparse datasets with few samples or limited coverage. ASCEND provides reliable estimates across a wide range of demographic scenarios. By applying it to data from two species (humans and dogs), we document a widespread history of recent founder events in both species and provide insights about the demographic processes related to these events. Our analysis helps to identify groups with strong founder events that should be prioritized for future studies as they offer a unique opportunity for biological discovery and reducing disease burden through mapping of recessive disease-causing genes and pathways, as previously shown in studies of Ashkenazi Jews and Finns.
Collapse
Affiliation(s)
- Rémi Tournebize
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Center for Computational Biology, University of California, Berkeley, California, United States of America
- * E-mail: (RT); (PM)
| | - Gillian Chu
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, United States of America
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Center for Computational Biology, University of California, Berkeley, California, United States of America
- * E-mail: (RT); (PM)
| |
Collapse
|
7
|
Arciero E, Dogra SA, Malawsky DS, Mezzavilla M, Tsismentzoglou T, Huang QQ, Hunt KA, Mason D, Sharif SM, van Heel DA, Sheridan E, Wright J, Small N, Carmi S, Iles MM, Martin HC. Fine-scale population structure and demographic history of British Pakistanis. Nat Commun 2021; 12:7189. [PMID: 34893604 PMCID: PMC8664933 DOI: 10.1038/s41467-021-27394-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 11/09/2021] [Indexed: 02/08/2023] Open
Abstract
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
Collapse
Affiliation(s)
- Elena Arciero
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Sufyan A Dogra
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | | | | | - Theofanis Tsismentzoglou
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Qin Qin Huang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Karen A Hunt
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dan Mason
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Saghira Malik Sharif
- Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - David A van Heel
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eamonn Sheridan
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Neil Small
- Faculty of Health Studies, University of Bradford, Richmond Road, Bradford, UK
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mark M Iles
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hilary C Martin
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
8
|
de Vries JH, Kling D, Vidaki A, Arp P, Kalamara V, Verbiest MMPJ, Piniewska-Róg D, Parsons TJ, Uitterlinden AG, Kayser M. Impact of SNP microarray analysis of compromised DNA on kinship classification success in the context of investigative genetic genealogy. Forensic Sci Int Genet 2021; 56:102625. [PMID: 34753062 DOI: 10.1016/j.fsigen.2021.102625] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/25/2021] [Accepted: 10/27/2021] [Indexed: 11/04/2022]
Abstract
Single nucleotide polymorphism (SNP) data generated with microarray technologies have been used to solve murder cases via investigative leads obtained from identifying relatives of the unknown perpetrator included in accessible genomic databases, an approach referred to as investigative genetic genealogy (IGG). However, SNP microarrays were developed for relatively high input DNA quantity and quality, while DNA typically obtainable from crime scene stains is of low DNA quantity and quality, and SNP microarray data obtained from compromised DNA are largely missing. By applying the Illumina Global Screening Array (GSA) to 264 DNA samples with systematically altered quantity and quality, we empirically tested the impact of SNP microarray analysis of compromised DNA on kinship classification success, as relevant in IGG. Reference data from manufacturer-recommended input DNA quality and quantity were used to estimate genotype accuracy in the compromised DNA samples and for simulating data of different degree relatives. Although stepwise decrease of input DNA amount from 200 ng to 6.25 pg led to decreased SNP call rates and increased genotyping errors, kinship classification success did not decrease down to 250 pg for siblings and 1st cousins, 1 ng for 2nd cousins, while at 25 pg and below kinship classification success was zero. Stepwise decrease of input DNA quality via increased DNA fragmentation resulted in the decrease of genotyping accuracy as well as kinship classification success, which went down to zero at the average DNA fragment size of 150 base pairs. Combining decreased DNA quantity and quality in mock casework and skeletal samples further highlighted possibilities and limitations. Overall, GSA analysis achieved maximal kinship classification success from 800 to 200 times lower input DNA quantities than manufacturer-recommended, although DNA quality plays a key role too, while compromised DNA produced false negative kinship classifications rather than false positive ones.
Collapse
Affiliation(s)
- Jard H de Vries
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Daniel Kling
- Department of Forensic Genetics and Toxicology, National Board of Forensic Medicine, Artillerigatan 12, 587 58 Linköping, Sweden
| | - Athina Vidaki
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Pascal Arp
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Vivian Kalamara
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Michael M P J Verbiest
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Danuta Piniewska-Róg
- Malopolska Centre of Biotechnology, Jagiellonian University, 30-387 Krakow, Poland; Department of Forensic Medicine, Jagiellonian University Medical College, 31-531 Krakow, Poland
| | - Thomas J Parsons
- International Commission on Missing Persons, Koninginnegracht 12a, 2514 AA Den Haag, the Netherlands
| | - André G Uitterlinden
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands; Erasmus MC, University Medical Center Rotterdam, Department of Epidemiology, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Manfred Kayser
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands.
| |
Collapse
|
9
|
Zimmerman KD, Schurr TG, Chen W, Nayak U, Mychaleckyj JC, Quet Q, Moultrie LH, Divers J, Keene KL, Kamen DL, Gilkeson GS, Hunt KJ, Spruill IJ, Fernandes JK, Aldrich MC, Reich D, Garvey WT, Langefeld CD, Sale MM, Ramos PS. Genetic landscape of Gullah African Americans. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2021; 175:905-919. [PMID: 34008864 PMCID: PMC8286328 DOI: 10.1002/ajpa.24333] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 03/30/2021] [Accepted: 04/17/2021] [Indexed: 01/20/2023]
Abstract
OBJECTIVES Gullah African Americans are descendants of formerly enslaved Africans living in the Sea Islands along the coast of the southeastern U.S., from North Carolina to Florida. Their relatively high numbers and geographic isolation were conducive to the development and preservation of a unique culture that retains deep African features. Although historical evidence supports a West-Central African ancestry for the Gullah, linguistic and cultural evidence of a connection to Sierra Leone has led to the suggestion of this country/region as their ancestral home. This study sought to elucidate the genetic structure and ancestry of the Gullah. MATERIALS AND METHODS We leveraged whole-genome genotype data from Gullah, African Americans from Jackson, Mississippi, African populations from Sierra Leone, and population reference panels from Africa and Europe to infer population structure, ancestry proportions, and global estimates of admixture. RESULTS Relative to non-Gullah African Americans from the Southeast US, the Gullah exhibited higher mean African ancestry, lower European admixture, a similarly small Native American contribution, and increased male-biased European admixture. A slightly tighter bottleneck in the Gullah 13 generations ago suggests a largely shared demographic history with non-Gullah African Americans. Despite a slightly higher relatedness to populations from Sierra Leone, our data demonstrate that the Gullah are genetically related to many West African populations. DISCUSSION This study confirms that subtle differences in African American population structure exist at finer regional levels. Such observations can help to inform medical genetics research in African Americans, and guide the interpretation of genetic data used by African Americans seeking to explore ancestral identities.
Collapse
Affiliation(s)
- Kip D. Zimmerman
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Theodore G. Schurr
- Department of AnthropologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Wei‐Min Chen
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Uma Nayak
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Josyf C. Mychaleckyj
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Queen Quet
- Gullah/Geechee NationSt. Helena IslandSouth CarolinaUSA
| | - Lee H. Moultrie
- Lee H. Moultrie & AssociatesNorth CharlestonSouth CarolinaUSA
| | - Jasmin Divers
- Department of Health Services ResearchNew York University Winthrop HospitalMineolaNew YorkUSA
| | - Keith L. Keene
- Department of BiologyEast Carolina UniversityGreenvilleNorth CarolinaUSA
- Center for Health DisparitiesEast Carolina University Brody School of MedicineGreenvilleNorth CarolinaUSA
| | - Diane L. Kamen
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Gary S. Gilkeson
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Kelly J. Hunt
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Ida J. Spruill
- College of NursingMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Jyotika K. Fernandes
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Melinda C. Aldrich
- Department of Thoracic SurgeryVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Vanderbilt Genetics InstituteVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - David Reich
- Department of GeneticsHarvard Medical SchoolBostonMassachusettsUSA
- Howard Hughes Medical InstituteHarvard Medical SchoolBostonMassachusettsUSA
- Broad Institute of MIT and HarvardCambridgeMassachusettsUSA
- Department of Human Evolutionary BiologyHarvard UniversityCambridgeMassachusettsUSA
| | - W. Timothy Garvey
- Department of Nutrition ScienceUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Carl D. Langefeld
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Michèle M. Sale
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Paula S. Ramos
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| |
Collapse
|
10
|
Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat Commun 2021; 12:3546. [PMID: 34112768 PMCID: PMC8192555 DOI: 10.1038/s41467-021-22910-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/01/2021] [Indexed: 01/08/2023] Open
Abstract
The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.
Collapse
|
11
|
Carress H, Lawson DJ, Elhaik E. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. BMC Genomics 2021; 22:351. [PMID: 34001009 PMCID: PMC8127217 DOI: 10.1186/s12864-021-07618-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/14/2021] [Indexed: 12/11/2022] Open
Abstract
The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts.
Collapse
Affiliation(s)
- Hannah Carress
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Daniel John Lawson
- School of Mathematics and Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Eran Elhaik
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK. .,Department of Biology, Lund University, Lund, Sweden.
| |
Collapse
|
12
|
Jain A, Sharma D, Bajaj A, Gupta V, Scaria V. Founder variants and population genomes-Toward precision medicine. ADVANCES IN GENETICS 2021; 107:121-152. [PMID: 33641745 DOI: 10.1016/bs.adgen.2020.11.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Human migration and community specific cultural practices have contributed to founder events and enrichment of the variants associated with genetic diseases. While many founder events in isolated populations have remained uncharacterized, the application of genomics in clinical settings as well as for population scale studies in the recent years have provided an unprecedented push towards identification of founder variants associated with human health and disease. The discovery and characterization of founder variants could have far reaching implications not only in understanding the history or genealogy of the disease, but also in implementing evidence based policies and genetic testing frameworks. This further enables precise diagnosis and prevention in an attempt towards precision medicine. This review provides an overview of founder variants along with methods and resources cataloging them. We have also discussed the public health implications and examples of prevalent disease associated founder variants in specific populations.
Collapse
Affiliation(s)
- Abhinav Jain
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Disha Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Anjali Bajaj
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Vishu Gupta
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Vinod Scaria
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.
| |
Collapse
|
13
|
Naseri A, Tang K, Geng X, Shi J, Zhang J, Shakya P, Liu X, Zhang S, Zhi D. Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments. BMC Biol 2021; 19:32. [PMID: 33593342 PMCID: PMC7888130 DOI: 10.1186/s12915-021-00964-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 01/19/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The genealogical histories of individuals within populations are of interest to studies aiming both to uncover detailed pedigree information and overall quantitative population demographic histories. However, the analysis of quantitative details of individual genealogical histories has faced challenges from incomplete available pedigree records and an absence of objective and quantitative details in pedigree information. Although complete pedigree information for most individuals is difficult to track beyond a few generations, it is possible to describe a person's genealogical history using their genetic relatives revealed by identity by descent (IBD) segments-long genomic segments shared by two individuals within a population, which are identical due to inheritance from common ancestors. When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using such IBD segments, offering opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBD segments among the UK Biobank participants that represent 0.7% of the UK population. RESULTS We made a high-quality call set of IBD segments over 5 cM among all 500,000 UK Biobank participants. On average, one UK individual shares IBD segments with 14,000 UK Biobank participants, which we refer to as "relatives." Using these segments, approximately 80% of a person's genome can be imputed. We subsequently propose genealogical descriptors based on the genetic connections of relative cohorts of individuals sharing at least one IBD segment and show that such descriptors offer important information about one's genetic makeup, personal genealogical history, and social behavior. Through analysis of relative counts sharing segments at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of relatives in one's neighborhood, we identified regional variations of personal preference favoring living closer to one's extended families. CONCLUSIONS Our analysis revealed genetic makeup, personal genealogical history, and social behaviors at the population scale, opening possibilities for further studies of individual's genetic connections in biobank data.
Collapse
Affiliation(s)
- Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xin Geng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Junjie Shi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jing Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Pramesh Shakya
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Precision Health, School of Biomedical Informatics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
14
|
Kling D, Phillips C, Kennett D, Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci Int Genet 2021; 52:102474. [PMID: 33592389 DOI: 10.1016/j.fsigen.2021.102474] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/12/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022]
Abstract
Investigative genetic genealogy (IGG) has emerged as a new, rapidly growing field of forensic science. We describe the process whereby dense SNP data, commonly comprising more than half a million markers, are employed to infer distant relationships. By distant we refer to degrees of relatedness exceeding that of first cousins. We review how methods of relationship matching and SNP analysis on an enlarged scale are used in a forensic setting to identify a suspect in a criminal investigation or a missing person. There is currently a strong need in forensic genetics not only to understand the underlying models to infer relatedness but also to fully explore the DNA technologies and data used in IGG. This review brings together many of the topics and examines their effectiveness and operational limits, while suggesting future directions for their forensic validation. We further investigated the methods used by the major direct-to-consumer (DTC) genetic ancestry testing companies as well as submitting a questionnaire where providers of forensic genetic genealogy summarized their operation/services. Although most of the DTC market, and genetic genealogy in general, has undisclosed, proprietary algorithms we review the current knowledge where information has been discussed and published more openly.
Collapse
Affiliation(s)
- Daniel Kling
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - Debbie Kennett
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Biomedical and Clinical Sciences, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
15
|
Edge MD, Coop G. Attacks on genetic privacy via uploads to genealogical databases. eLife 2020; 9:e51810. [PMID: 31908268 PMCID: PMC6992384 DOI: 10.7554/elife.51810] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 12/23/2019] [Indexed: 02/06/2023] Open
Abstract
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
Collapse
Affiliation(s)
- Michael D Edge
- Center for Population BiologyUniversity of California, DavisDavisUnited States
- Department of Evolution and EcologyUniversity of California, DavisDavisUnited States
- Quantitative and Computational Biology, Department of Biological SciencesUniversity of Southern CaliforniaLos AngelesUnited States
| | - Graham Coop
- Center for Population BiologyUniversity of California, DavisDavisUnited States
- Department of Evolution and EcologyUniversity of California, DavisDavisUnited States
| |
Collapse
|
16
|
Granot-Hershkovitz E, Karasik D, Friedlander Y, Rodriguez-Murillo L, Dorajoo R, Liu J, Sewda A, Peter I, Carmi S, Hochner H. A study of Kibbutzim in Israel reveals risk factors for cardiometabolic traits and subtle population structure. Eur J Hum Genet 2018; 26:1848-1858. [PMID: 30108283 PMCID: PMC6244281 DOI: 10.1038/s41431-018-0230-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 06/24/2018] [Accepted: 07/17/2018] [Indexed: 11/09/2022] Open
Abstract
Genetic studies in isolated populations often increase power for identifying loci associated with complex diseases and traits. We present here the Kibbutzim Family Study (KFS), aimed at investigating the genetic basis of cardiometabolic traits in extended Israeli families characterized by long-term social stability and a homogeneous environment. Extensive information on cardiometabolic traits, as well as genome-wide genotypes, were collected on 901 individuals. We observed that most KFS participants were of Ashkenazi Jewish (AJ) genetic origin, confirmed a recent severe bottleneck in the AJ recent history, and detected a subtle within-AJ population structure. Focusing on genetic variants relatively common in the KFS but very rare in Europeans, we observed that AJ-enriched variants appear in cancer-related pathways more than expected by chance. We conducted an association study of the AJ-enriched variants against 16 cardiometabolic traits, and found seven loci (24 variants) to be significantly associated. The strongest association, which we also replicated in an independent study, was between a variant upstream of MSRA (frequency ≈1% in the KFS and nearly absent in Europeans) and weight (P = 3.6∙10-8). In conclusion, the KFS is a valuable resource for the study of the population genetics of Israel as well as the genetics of cardiometabolic traits.
Collapse
Affiliation(s)
| | - David Karasik
- Faculty of Medicine in the Galilee, Bar-Ilan University, Safed, Israel
| | - Yechiel Friedlander
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel
| | - Laura Rodriguez-Murillo
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rajkumar Dorajoo
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Jianjun Liu
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Anshuman Sewda
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Inga Peter
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shai Carmi
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel.
| | - Hagit Hochner
- Braun School of Public Health, Hebrew University-Hadassah Medical Center, Jerusalem, Israel.
| |
Collapse
|
17
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
18
|
Dutta R, Saha-Mandal A, Cheng X, Qiu S, Serpen J, Fedorova L, Fedorov A. 1000 human genomes carry widespread signatures of GC biased gene conversion. BMC Genomics 2018; 19:256. [PMID: 29661137 PMCID: PMC5902838 DOI: 10.1186/s12864-018-4593-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 03/12/2018] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND GC-Biased Gene Conversion (gBGC) is one of the important theories put forward to explain profound long-range non-randomness in nucleotide compositions along mammalian chromosomes. Nucleotide changes due to gBGC are hard to distinguish from regular mutations. Here, we present an algorithm for analysis of millions of known SNPs that detects a subset of so-called "SNP flip-over" events representing recent gBGC nucleotide changes, which occurred in previous generations via non-crossover meiotic recombination. RESULTS This algorithm has been applied in a large-scale analysis of 1092 sequenced human genomes. Altogether, 56,328 regions on all autosomes have been examined, which revealed 223,955 putative gBGC cases leading to SNP flip-overs. We detected a strong bias (11.7% ± 0.2% excess) in AT- > GC over GC- > AT base pair changes within the entire set of putative gBGC cases. CONCLUSIONS On average, a human gamete acquires 7 SNP flip-over events, in which one allele is replaced by its complementary allele during the process of meiotic non-crossover recombination. In each meiosis event, on average, gBGC results in replacement of 7 AT base pairs by GC base pairs, while only 6 GC pairs are replaced by AT pairs. Therefore, every human gamete is enriched by one GC pair. Happening over millions of years of evolution, this bias may be a noticeable force in changing the nucleotide composition landscape along chromosomes.
Collapse
Affiliation(s)
- Rajib Dutta
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Center for Cardiovascular and Pulmonary Research, Nationwide Children’s Hospital, 700 Children’s Dr, Columbus, OH USA
| | - Arnab Saha-Mandal
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Present Address: Biochemistry and Molecular Biology Graduate Program, Cumming School of Medicine, University of Calgary, Calgary, AB T2N4N1 Canada
| | - Xi Cheng
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Shuhao Qiu
- Program in Biomedical Sciences, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| | - Jasmine Serpen
- SURF Program, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- College of Arts and Sciences, Washington University in St. Louis, 1 Brookings Dr, St. Louis, MO 63130 USA
| | | | - Alexei Fedorov
- Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614 USA
| |
Collapse
|
19
|
Ramstetter MD, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Mezey JG, Williams AL. Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives. Genetics 2017; 207:75-82. [PMID: 28739658 PMCID: PMC5586387 DOI: 10.1534/genetics.117.1122] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 07/08/2017] [Indexed: 01/03/2023] Open
Abstract
Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.
Collapse
Affiliation(s)
- Monica D Ramstetter
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520
| | - Donna M Lehman
- Department of Medicine, University of Texas Health San Antonio, San Antonio, Texas 78229
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, Texas 78520
| | - Jason G Mezey
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
- Department of Genetic Medicine, Weill Cornell Medicine, New York, New York 10065
| | - Amy L Williams
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
| |
Collapse
|
20
|
Nakatsuka N, Moorjani P, Rai N, Sarkar B, Tandon A, Patterson N, Bhavani GS, Girisha KM, Mustak MS, Srinivasan S, Kaushik A, Vahab SA, Jagadeesh SM, Satyamoorthy K, Singh L, Reich D, Thangaraj K. The promise of discovering population-specific disease-associated genes in South Asia. Nat Genet 2017; 49:1403-1407. [PMID: 28714977 PMCID: PMC5675555 DOI: 10.1038/ng.3917] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 06/21/2017] [Indexed: 12/21/2022]
Abstract
The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identified 81 unique groups, 14 of which had estimated census sizes of more than 1 million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identified multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an underappreciated opportunity for decreasing disease burden among South Asians through discovery of and testing for recessive disease-associated genes.
Collapse
Affiliation(s)
- Nathan Nakatsuka
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Priya Moorjani
- Department of Biological Sciences, Columbia University, New York, New York, USA
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Niraj Rai
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | | | - Arti Tandon
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Nick Patterson
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | - Katta Mohan Girisha
- Department of Medical Genetics, Kasturba Medical College, Manipal University, Manipal, India
| | - Mohammed S Mustak
- Department of Applied Zoology, Mangalore University, Mangalore, India
| | | | - Amit Kaushik
- Amity Institute of Biotechnology, Amity University, Noida, India
| | | | | | | | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, USA
| | | |
Collapse
|
21
|
Bjelland DW, Lingala U, Patel PS, Jones M, Keller MC. A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data. Eur J Hum Genet 2017; 25:617-624. [PMID: 28176766 PMCID: PMC5437913 DOI: 10.1038/ejhg.2017.6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 11/22/2016] [Accepted: 12/24/2016] [Indexed: 11/08/2022] Open
Abstract
Identical by descent (IBD) segments are used to understand a number of fundamental issues in genetics. IBD segments are typically detected using long stretches of identical alleles between haplotypes in phased, whole-genome SNP data. Phase or SNP call errors in genomic data can degrade accuracy of IBD detection and lead to false-positive/negative calls and to under/overextension of true IBD segments. Furthermore, the number of comparisons increases quadratically with sample size, requiring high computational efficiency. We developed a new IBD segment detection program, FISHR (Find IBD Shared Haplotypes Rapidly), in an attempt to accurately detect IBD segments and to better estimate their endpoints using an algorithm that is fast enough to be deployed on very large whole-genome SNP data sets. We compared the performance of FISHR to three leading IBD segment detection programs: GERMLINE, refined IBD, and HaploScore. Using simulated and real genomic sequence data, we show that FISHR is slightly more accurate than all programs at detecting long (>3 cm) IBD segments but slightly less accurate than refined IBD at detecting short (~1 cm) IBD segments. More centrally, FISHR outperforms all programs in determining the true endpoints of IBD segments, which is crucial for several applications of IBD information. FISHR takes two to three times longer than GERMLINE to run, whereas both GERMLINE and FISHR were orders of magnitude faster than refined IBD and HaploScore. Overall, FISHR provides accurate IBD detection in unrelated individuals and is computationally efficient enough to be utilized on large SNP data sets >60 000 individuals.
Collapse
Affiliation(s)
- Douglas W Bjelland
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Uday Lingala
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Piyush S Patel
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
| | - Matt Jones
- Department of Psychology & Neuroscience, University of Colorado at Boulder, Boulder, CO, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado at Boulder, Boulder, CO, USA
- Department of Psychology & Neuroscience, University of Colorado at Boulder, Boulder, CO, USA
| |
Collapse
|
22
|
Xue J, Lencz T, Darvasi A, Pe’er I, Carmi S. The time and place of European admixture in Ashkenazi Jewish history. PLoS Genet 2017; 13:e1006644. [PMID: 28376121 PMCID: PMC5380316 DOI: 10.1371/journal.pgen.1006644] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 02/18/2017] [Indexed: 12/21/2022] Open
Abstract
The Ashkenazi Jewish (AJ) population is important in genetics due to its high rate of Mendelian disorders. AJ appeared in Europe in the 10th century, and their ancestry is thought to comprise European (EU) and Middle-Eastern (ME) components. However, both the time and place of admixture are subject to debate. Here, we attempt to characterize the AJ admixture history using a careful application of new and existing methods on a large AJ sample. Our main approach was based on local ancestry inference, in which we first classified each AJ genomic segment as EU or ME, and then compared allele frequencies along the EU segments to those of different EU populations. The contribution of each EU source was also estimated using GLOBETROTTER and haplotype sharing. The time of admixture was inferred based on multiple statistics, including ME segment lengths, the total EU ancestry per chromosome, and the correlation of ancestries along the chromosome. The major source of EU ancestry in AJ was found to be Southern Europe (≈60–80% of EU ancestry), with the rest being likely Eastern European. The inferred admixture time was ≈30 generations ago, but multiple lines of evidence suggest that it represents an average over two or more events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event, which was likely Southern European, was estimated to ≈25–50 generations ago. The Ashkenazi Jewish population has resided in Europe for much of its 1000-year existence. However, its ethnic and geographic origins are controversial, due to the scarcity of reliable historical records. Previous genetic studies have found links to Middle-Eastern and European ancestries, but the admixture history has not been studied in detail yet, partly due to technical difficulties in disentangling signals from multiple admixture events. Here, we present an in-depth analysis of the sources of European gene flow and the time of admixture events by using multiple new and existing methods and extensive simulations. Our results suggest a model of at least two events of European admixture. One event slightly pre-dated a late medieval founder event and was likely from a Southern European source. Another event post-dated the founder event and likely occurred in Eastern Europe. These results, as well as the methods introduced, will be highly valuable for geneticists and other researchers interested in Ashkenazi Jewish origins.
Collapse
Affiliation(s)
- James Xue
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Todd Lencz
- Center for Psychiatric Neuroscience, The Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, United States of America
- Department of Psychiatry, Division of Research, The Zucker Hillside Hospital Division of the North Shore–Long Island Jewish Health System, Glen Oaks, New York, United States of America
- Departments of Psychiatry and Molecular Medicine, Hofstra Northwell School of Medicine, Hempstead, New York, United States of America
| | - Ariel Darvasi
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Itsik Pe’er
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|
23
|
Han E, Carbonetto P, Curtis RE, Wang Y, Granka JM, Byrnes J, Noto K, Kermany AR, Myres NM, Barber MJ, Rand KA, Song S, Roman T, Battat E, Elyashiv E, Guturu H, Hong EL, Chahine KG, Ball CA. Clustering of 770,000 genomes reveals post-colonial population structure of North America. Nat Commun 2017; 8:14238. [PMID: 28169989 PMCID: PMC5309710 DOI: 10.1038/ncomms14238] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 12/12/2016] [Indexed: 02/06/2023] Open
Abstract
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Collapse
Affiliation(s)
- Eunjung Han
- AncestryDNA, San Francisco, California 94107, USA
| | | | | | - Yong Wang
- AncestryDNA, San Francisco, California 94107, USA
| | | | - Jake Byrnes
- AncestryDNA, San Francisco, California 94107, USA
| | - Keith Noto
- AncestryDNA, San Francisco, California 94107, USA
| | | | | | | | | | - Shiya Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Theodore Roman
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Erin Battat
- W.E.B. Du Bois Research Institute, Hutchins Center for African and African American Research, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | | | | |
Collapse
|
24
|
Gao F, Keinan A. Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 2016; 41:130-139. [PMID: 27710906 PMCID: PMC5161661 DOI: 10.1016/j.gde.2016.09.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Accepted: 09/11/2016] [Indexed: 11/19/2022]
Abstract
The advent of next-generation sequencing technology has allowed the collection of vast amounts of genetic variation data. A recurring discovery from studying larger and larger samples of individuals had been the extreme, previously unexpected, excess of very rare genetic variants, which has been shown to be mostly due to the recent explosive growth of human populations. Here, we review recent literature that inferred recent changes in population size in different human populations and with different methodologies, with many pointing to recent explosive growth, especially in European populations for which more data has been available. We also review the state-of-the-art methods and software for the inference of historical population size changes that lead to these discoveries. Finally, we discuss the implications of recent population growth on personalized genomics, on purifying selection in the non-equilibrium state it entails and, as a consequence, on the genetic architecture underlying complex disease and the performance of mapping methods in discovering rare variants that contribute to complex disease risk.
Collapse
Affiliation(s)
- Feng Gao
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Ithaca, NY 14850, United States.
| |
Collapse
|
25
|
Minikel EV, Vallabh SM, Lek M, Estrada K, Samocha KE, Sathirapongsasuti JF, McLean CY, Tung JY, Yu LPC, Gambetti P, Blevins J, Zhang S, Cohen Y, Chen W, Yamada M, Hamaguchi T, Sanjo N, Mizusawa H, Nakamura Y, Kitamoto T, Collins SJ, Boyd A, Will RG, Knight R, Ponto C, Zerr I, Kraus TFJ, Eigenbrod S, Giese A, Calero M, de Pedro-Cuesta J, Haïk S, Laplanche JL, Bouaziz-Amar E, Brandel JP, Capellari S, Parchi P, Poleggi A, Ladogana A, O'Donnell-Luria AH, Karczewski KJ, Marshall JL, Boehnke M, Laakso M, Mohlke KL, Kähler A, Chambert K, McCarroll S, Sullivan PF, Hultman CM, Purcell SM, Sklar P, van der Lee SJ, Rozemuller A, Jansen C, Hofman A, Kraaij R, van Rooij JGJ, Ikram MA, Uitterlinden AG, van Duijn CM, Daly MJ, MacArthur DG. Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med 2016; 8:322ra9. [PMID: 26791950 DOI: 10.1126/scitranslmed.aad5169] [Citation(s) in RCA: 240] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance-the probability that a carrier of the purported disease-causing genotype will indeed develop the disease-is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.
Collapse
Affiliation(s)
- Eric Vallabh Minikel
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA. Prion Alliance, Cambridge, MA 02139, USA.
| | - Sonia M Vallabh
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA. Prion Alliance, Cambridge, MA 02139, USA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Karol Estrada
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | | | - Cory Y McLean
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Joyce Y Tung
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Linda P C Yu
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Pierluigi Gambetti
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Janis Blevins
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Shulin Zhang
- University Hospitals Case Medical Center, Cleveland, OH 44106, USA
| | - Yvonne Cohen
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Wei Chen
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Masahito Yamada
- Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa 920-8640, Japan
| | - Tsuyoshi Hamaguchi
- Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa 920-8640, Japan
| | - Nobuo Sanjo
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University, Tokyo 113-8519, Japan
| | - Hidehiro Mizusawa
- National Center Hospital, National Center of Neurology and Psychiatry, Tokyo 187-8551, Japan
| | - Yosikazu Nakamura
- Department of Public Health, Jichi Medical University, Shimotsuke 329-0498, Japan
| | - Tetsuyuki Kitamoto
- Department of Neurological Science, Tohoku University Graduate School of Medicine, Sendai 980-8575, Japan
| | - Steven J Collins
- Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Alison Boyd
- Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robert G Will
- National Creutzfeldt-Jakob Disease Research & Surveillance Unit, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Richard Knight
- National Creutzfeldt-Jakob Disease Research & Surveillance Unit, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Claudia Ponto
- National Reference Center for the Surveillance of Human Transmissible Spongiform Encephalopathies, Georg-August-University, Goettingen 37073, Germany
| | - Inga Zerr
- National Reference Center for the Surveillance of Human Transmissible Spongiform Encephalopathies, Georg-August-University, Goettingen 37073, Germany
| | - Theo F J Kraus
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Sabina Eigenbrod
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Armin Giese
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Miguel Calero
- Centro de Investigación Biomédica en Red de Enfermedades Neurodegenerativas, Instituto de Salud Carlos III, Madrid 28031, Spain
| | - Jesús de Pedro-Cuesta
- Centro de Investigación Biomédica en Red de Enfermedades Neurodegenerativas, Instituto de Salud Carlos III, Madrid 28031, Spain
| | - Stéphane Haïk
- INSERM U 1127, CNRS UMR 7225, Sorbonne Universités, Pierre and Marie Curie University Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle Epinière, 75013 Paris, France. Assistance Publique-Hôpitaux de Paris (AP-HP), Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France
| | - Jean-Louis Laplanche
- AP-HP, Service de Biochimie et Biologie Moléculaire, Hôpital Lariboisière, 75010 Paris, France
| | - Elodie Bouaziz-Amar
- AP-HP, Service de Biochimie et Biologie Moléculaire, Hôpital Lariboisière, 75010 Paris, France
| | - Jean-Philippe Brandel
- INSERM U 1127, CNRS UMR 7225, Sorbonne Universités, Pierre and Marie Curie University Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle Epinière, 75013 Paris, France. Assistance Publique-Hôpitaux de Paris (AP-HP), Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France
| | - Sabina Capellari
- Istituto di Ricovero e Cura a Carattere Scientifico, Institute of Neurological Sciences, Bologna 40123, Italy. Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna 40126, Italy
| | - Piero Parchi
- Istituto di Ricovero e Cura a Carattere Scientifico, Institute of Neurological Sciences, Bologna 40123, Italy. Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna 40126, Italy
| | - Anna Poleggi
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Anna Ladogana
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Anne H O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jamie L Marshall
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Markku Laakso
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio 70210, Finland
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, USA
| | - Anna Kähler
- Karolinska Institutet, Stockholm SE-171 77, Sweden
| | - Kimberly Chambert
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, USA. Karolinska Institutet, Stockholm SE-171 77, Sweden
| | | | - Shaun M Purcell
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pamela Sklar
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - Annemieke Rozemuller
- Dutch Surveillance Centre for Prion Diseases, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
| | - Casper Jansen
- Dutch Surveillance Centre for Prion Diseases, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
| | - Albert Hofman
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - Robert Kraaij
- Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, Netherlands
| | | | - M Arfan Ikram
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - André G Uitterlinden
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands. Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, Netherlands
| | - Cornelia M van Duijn
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | | | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
| |
Collapse
|
26
|
Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, Bustamante CD, Kenny EE, Williams SM, Aldrich MC, Gravel S. The Great Migration and African-American Genomic Diversity. PLoS Genet 2016; 12:e1006059. [PMID: 27232753 PMCID: PMC4883799 DOI: 10.1371/journal.pgen.1006059] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/26/2016] [Indexed: 12/23/2022] Open
Abstract
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance. Genetic studies of African-Americans identify functional variants, elucidate historical and genealogical mysteries, and reveal basic biology. However, African-Americans have been under-represented in genetic studies, and relatively little is known about nation-wide patterns of genomic diversity in the population. Here, we study African-American genomic diversity using genotype data from nationally and regionally representative cohorts. Access to these unique cohorts allows us to clarify the role of population structure, admixture, and recent massive migrations in shaping African-American genomic diversity and sheds new light on the genetic history of this population.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Maxime Barakatt
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- School of Computer Science, McGill University, Montreal, Quebec, Canada
| | - Christopher R. Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Suyash Shringarpure
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jacob Errington
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - William J. Blot
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- International Epidemiology Institute, Rockville, Maryland, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Eimear E. Kenny
- Department of Genetics and Genomic Sciences, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Charles Bronfman Institute for Personalized Medicine, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Icahn Institute for Genomics and Multiscale Biology, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Center for Statistical Genetics, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Scott M. Williams
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Melinda C. Aldrich
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
27
|
Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3-GENES GENOMES GENETICS 2016; 6:1287-96. [PMID: 26935417 PMCID: PMC4856080 DOI: 10.1534/g3.116.027581] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
Collapse
|
28
|
Fedorova L, Qiu S, Dutta R, Fedorov A. Atlas of Cryptic Genetic Relatedness Among 1000 Human Genomes. Genome Biol Evol 2016; 8:777-90. [PMID: 26907499 PMCID: PMC4824066 DOI: 10.1093/gbe/evw034] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A novel computational method for detecting identical-by-descent (IBD) chromosomal segments between sequenced genomes is presented. It utilizes the distribution patterns of very rare genetic variants (vrGVs), which have minor allele frequencies <0.2%. Contrary to the existing probabilistic approaches our method is rather deterministic, because it considers a group of very rare events which cannot happen together only by chance. This method has been applied for exhaustive computational search of shared IBD segments among 1,092 sequenced individuals from 14 populations. It demonstrated that clusters of vrGVs are unique and powerful markers of genetic relatedness, that uncover IBD chromosomal segments between and within populations, irrespective of whether divergence was recent or occurred hundreds-to-thousands of years ago. We found that several IBD segments are shared by practically any possible pair of individuals belonging to the same population. Moreover, shared short IBD segments (median size 183 kb) were found in 10% of inter-continental human pairs, each comprising of a person from sub-Saharan Africa and a person from Southern Europe. The shortest shared IBD segments (median size 54 kb) were found in 0.42% of inter-continental pairs composed of individuals from Chinese/Japanese populations and Africans from Kenya and Nigeria. Knowledge of inheritance of IBD segments is important in clinical case–control and cohort studies, since unknown distant familial relationships could compromise interpretation of collected data. Clusters of vrGVs should be useful markers for familial relationship and common multifactorial disorders.
Collapse
Affiliation(s)
| | - Shuhao Qiu
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| | - Rajib Dutta
- Program in Biomedical Sciences, University of Toledo
| | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| |
Collapse
|
29
|
Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet 2015; 96:37-53. [PMID: 25529636 PMCID: PMC4289685 DOI: 10.1016/j.ajhg.2014.11.010] [Citation(s) in RCA: 435] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 11/17/2014] [Indexed: 12/11/2022] Open
Abstract
Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (brought largely by the trans-Atlantic slave trade), shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.
Collapse
Affiliation(s)
- Katarzyna Bryc
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; 23andMe, Inc., Mountain View, CA 94043, USA.
| | | | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | |
Collapse
|
30
|
Al-Khudhair A, Qiu S, Wyse M, Chowdhury S, Cheng X, Bekbolsynov D, Saha-Mandal A, Dutta R, Fedorova L, Fedorov A. Inference of distant genetic relations in humans using "1000 genomes". Genome Biol Evol 2015; 7:481-92. [PMID: 25573959 PMCID: PMC4350174 DOI: 10.1093/gbe/evv003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Nucleotide sequence differences on the whole-genome scale have been computed for 1,092 people from 14 populations publicly available by the 1000 Genomes Project. Total number of differences in genetic variants between 96,464 human pairs has been calculated. The distributions of these differences for individuals within European, Asian, or African origin were characterized by narrow unimodal peaks with mean values of 3.8, 3.5, and 5.1 million, respectively, and standard deviations of 0.1–0.03 million. The total numbers of genomic differences between pairs of all known relatives were found to be significantly lower than their respective population means and in reverse proportion to the distance of their consanguinity. By counting the total number of genomic differences it is possible to infer familial relations for people that share down to 6% of common loci identical-by-descent. Detection of familial relations can be radically improved when only very rare genetic variants are taken into account. Counting of total number of shared very rare single nucleotide polymorphisms (SNPs) from whole-genome sequences allows establishing distant familial relations for persons with eighth and ninth degrees of relationship. Using this analysis we predicted 271 distant familial pairwise relations among 1,092 individuals that have not been declared by 1000 Genomes Project. Particularly, among 89 British and 97 Chinese individuals we found three British–Chinese pairs with distant genetic relationships. Individuals from these pairs share identical-by-descent DNA fragments that represent 0.001%, 0.004%, and 0.01% of their genomes. With affordable whole-genome sequencing techniques, very rare SNPs should become important genetic markers for familial relationships and population stratification.
Collapse
Affiliation(s)
- Ahmed Al-Khudhair
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo
| | - Shuhao Qiu
- Program in Biomedical Sciences, University of Toledo Department of Medicine, University of Toledo
| | - Meghan Wyse
- Program in Biomedical Sciences, University of Toledo
| | | | - Xi Cheng
- Program in Biomedical Sciences, University of Toledo
| | | | - Arnab Saha-Mandal
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo
| | - Rajib Dutta
- Program in Biomedical Sciences, University of Toledo Department of Medicine, University of Toledo
| | | | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| |
Collapse
|
31
|
Abstract
Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.
Collapse
|
32
|
Zidan J, Ben-Avraham D, Carmi S, Maray T, Friedman E, Atzmon G. Genotyping of geographically diverse Druze trios reveals substructure and a recent bottleneck. Eur J Hum Genet 2014; 23:1093-9. [PMID: 25370042 DOI: 10.1038/ejhg.2014.218] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 09/02/2014] [Accepted: 09/19/2014] [Indexed: 11/09/2022] Open
Abstract
Druze individuals rarely marry outside their faith (often practicing consanguinity) and are thus believed to form a genetic isolate. To comprehensively characterize the genetic structure of the Druze population, we recruited and genotyped 40 parent-offspring trios from the Upper Galilee in Israel and the Golan Heights, attempting to capture different extended families (clans) across various geographical locations. Principal component (PC) and ADMIXTURE analyses demonstrated that Druze are close to, yet distinct from, other Middle-Eastern groups (Bedouins and Palestinians), supporting the Druze's Middle-Eastern origin and their recent genetic isolation. Reconstruction of the Druze demographic history using identical-by-descent (IBD) segments suggested an ≈15-fold reduction in population size taking place ≈22-47 generations ago, close to the documented time of the foundation of the Druze faith at the 11th century. Combining the Galilee and Golan Druze genotypes with previously published data on Druze from the Carmel (Israel) and Lebanon demonstrated that all four Druze communities are genetically distinct. The Lebanese group shared less IBD segments (within the group and with other groups) compared with the Israeli Druze and showed higher heterozygosity (suggesting less consanguinity), but was less diverse in PC space. These findings suggest complex recent and ancient demographic history of the Druze population.
Collapse
Affiliation(s)
- Jamal Zidan
- The Oncology Department, Ziv Medical Center, The Faculty of Medicine in the Galilee, Bar-Ilan University, Zefat, Israel
| | - Dan Ben-Avraham
- Department of Medicine and Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Shai Carmi
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Taiseer Maray
- Golan for Development, Madjal Shams, The Golan Heights, Israel
| | - Eitan Friedman
- 1] The Susanne Levy Gertner Oncogenetics Unit, The Danek Gertner Institute of Human Genetics, Chaim Sheba Medical Center, Tel-Hashomer, Israel [2] The Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Gil Atzmon
- Department of Medicine and Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|