1
|
Runs of homozygosity in sub-Saharan African populations provide insights into complex demographic histories. Hum Genet 2019; 138:1123-1142. [DOI: 10.1007/s00439-019-02045-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 07/03/2019] [Indexed: 12/20/2022]
|
2
|
Chiang CWK, Mangul S, Robles C, Sankararaman S. A Comprehensive Map of Genetic Variation in the World's Largest Ethnic Group-Han Chinese. Mol Biol Evol 2018; 35:2736-2750. [PMID: 30169787 PMCID: PMC6693441 DOI: 10.1093/molbev/msy170] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
As are most non-European populations, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our data set. Individuals from this data set came from 24 out of 33 administrative divisions across China (including 19 provinces, 4 municipalities, and 1 autonomous region), thus allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identified previously unrecognized population structure along the East-West axis of China, demonstrated a general pattern of isolation-by-distance among Han Chinese, and reported unique regional signals of admixture, such as European influences among the Northwestern provinces of China. Furthermore, we identified a number of highly differentiated, putatively adaptive, loci (e.g., MTHFR, ADH7, and FADS, among others) that may be driven by immune response, climate, and diet in the Han Chinese. Finally, we have made available allele frequency estimates stratified by administrative divisions across China in the Geography of Genetic Variant browser for the broader community. By leveraging the largest currently available genetic data set for Han Chinese, we have gained insights into the history and population structure of the world's largest ethnic group.
Collapse
Affiliation(s)
- Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA
| | - Serghei Mangul
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA
- Institute for Quantitative and Computational Bioscience, University of California Los Angeles, Los Angeles, CA
| | - Christopher Robles
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| |
Collapse
|
3
|
Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet 2018; 19:220-234. [PMID: 29335644 DOI: 10.1038/nrg.2017.109] [Citation(s) in RCA: 380] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Long runs of homozygosity (ROH) arise when identical haplotypes are inherited from each parent and thus a long tract of genotypes is homozygous. Cousin marriage or inbreeding gives rise to such autozygosity; however, genome-wide data reveal that ROH are universally common in human genomes even among outbred individuals. The number and length of ROH reflect individual demographic history, while the homozygosity burden can be used to investigate the genetic architecture of complex disease. We discuss how to identify ROH in genome-wide microarray and sequence data, their distribution in human populations and their application to the understanding of inbreeding depression and disease risk.
Collapse
Affiliation(s)
- Francisco C Ceballos
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Parktown 2193, Johannesburg, South Africa.,Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Peter K Joshi
- Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh EH8 9AG, UK
| | - David W Clark
- Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh EH8 9AG, UK
| | - Michèle Ramsay
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Parktown 2193, Johannesburg, South Africa.,Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Braamfontein 2000, Johannesburg, South Africa
| | - James F Wilson
- Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK.,Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh EH8 9AG, UK
| |
Collapse
|
4
|
Metzger BPH, Gelembiuk GW, Lee CE. Direct sequencing of haplotypes from diploid individuals through a modified emulsion PCR-based single-molecule sequencing approach. Mol Ecol Resour 2013; 13:135-43. [PMID: 23231626 DOI: 10.1111/1755-0998.12034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Revised: 10/08/2012] [Accepted: 10/11/2012] [Indexed: 11/30/2022]
Abstract
While standard DNA-sequencing approaches readily yield genotypic sequence data, haplotype information is often of greater utility for population genetic analyses. However, obtaining individual haplotype sequences can be costly and time-consuming and sometimes requires statistical reconstruction approaches that are subject to bias and error. Advancements have recently been made in determining individual chromosomal sequences in large-scale genomic studies, yet few options exist for obtaining this information from large numbers of highly polymorphic individuals in a cost-effective manner. As a solution, we developed a simple PCR-based method for obtaining sequence information from individual DNA strands using standard laboratory equipment. The method employs a water-in-oil emulsion to separate the PCR mixture into thousands of individual microreactors. PCR within these small vesicles results in amplification from only a single starting DNA template molecule and thus a single haplotype. We improved upon previous approaches by including SYBR Green I and a melted agarose solution in the PCR, allowing easy identification and separation of individually amplified DNA molecules. We demonstrate the use of this method on a highly polymorphic estuarine population of the copepod Eurytemora affinis for which current molecular and computational methods for haplotype determination have been inadequate.
Collapse
|
5
|
Johnson NA, London SJ, Romieu I, Wong WH, Tang H. ACCURATE CONSTRUCTION OF LONG RANGE HAPLOTYPE IN UNRELATED INDIVIDUALS. Stat Sin 2013; 23:1441-1461. [PMID: 37398638 PMCID: PMC10312227 DOI: 10.5705/ss.2012.141s] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Haplotype, or the sequence of alleles along a single chromosome, has important applications in phenotype-genotype association studies, as well as in population genetics analyses. Because haplotype cannot be experimentally assayed in diploid organisms in a high-throughput fashion, numerous statistical methods have been developed to reconstruct probable haplotype from genotype data. These methods focus primarily on accurate phasing of a short genomic region with a small number of markers, and the error rate increases rapidly for longer regions. Here we introduce a new phasing algorithm, emphases, which aims to improve long-range phasing accuracy. Using datasets from multiple populations, we found that emphases reduces long-range phasing errors by up to 50% compared to the current state-of-the-art methods. In addition to inferring the most likely haplotypes, emphases produces confidence measures, allowing downstream analyses to account for the uncertainties associated with some haplotypes. We anticipate that emphases offers a powerful tool for analyzing large-scale data generated in the genome-wide association studies (GWAS).
Collapse
|
6
|
Positive natural selection of TRIB2, a novel gene that influences visceral fat accumulation, in East Asia. Hum Genet 2012; 132:201-17. [PMID: 23108367 DOI: 10.1007/s00439-012-1240-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 10/16/2012] [Indexed: 10/27/2022]
Abstract
Accumulation of visceral fat increases cardiovascular mortality in industrialized societies. However, during the evolution of the modern human, visceral fat may have acted as energy storage facility to survive in times of famine. Therefore, past natural selection might contribute to shaping the variation of visceral fat accumulation in present populations. Here, we report that the gene encoding tribbles homolog 2 (TRIB2) influenced visceral fat accumulation and was operated by recent positive natural selection in East Asians. Our candidate gene association analysis on 11 metabolic traits of 5,810 East Asians revealed that rs1057001, a T/A transversion polymorphism in 3'untranslated region (UTR) of TRIB2, was strongly associated with visceral fat area (VFA) and waist circumference adjusted for body mass index (P = 2.7 × 10(-6) and P = 9.0 × 10(-6), respectively). rs1057001 was in absolute linkage disequilibrium with a conserved insertion-deletion polymorphism in the 3'UTR and was associated with allelic imbalance of TRIB2 transcript levels in adipose tissues. rs1057001 showed high degree of interpopulation variation of the allele frequency; the low-VFA-associated A allele was found with high frequencies in East Asians. Haplotypes containing the rs1057001 A allele exhibited a signature of a selective sweep, which may have occurred 16,546-27,827 years ago in East Asians. Given the predominance of the thrifty gene hypothesis, it is surprising that the apparently non-thrifty allele was selectively favored in the evolution of modern humans. Environmental/physiological factors other than famine would be needed to explain the non-neutral evolution of TRIB2 in East Asians.
Collapse
|
7
|
Teare MD, Pinyakorn S, Heighway J, Santibanez Koref MF. Comparing methods for mapping cis acting polymorphisms using allelic expression ratios. PLoS One 2011; 6:e28636. [PMID: 22174852 PMCID: PMC3236754 DOI: 10.1371/journal.pone.0028636] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/11/2011] [Indexed: 02/04/2023] Open
Abstract
Genome wide association studies frequently reveal associations between disease susceptibility and polymorphisms outside coding regions. Such associations cannot always be explained by linkage disequilibrium with changes affecting the transcription products. This has stimulated the interest in characterising sequence variation influencing gene expression levels, in particular in changes acting in cis. Differences in transcription between the two alleles at an autosomal locus can be used to test the association between candidate polymorphisms and the modulation of gene expression in cis. This type of approach requires at least one transcribed polymorphism and one candidate polymorphism. In the past five years, different methods have been proposed to analyse such data. Here we use simulations and real data sets to compare the power of some of these methods. The results show that when it is not possible to determine the phase between the transcribed and potentially cis acting allele there is some advantage in using methods that estimate phased genotype and effect on expression simultaneously. However when the phase can be determined, simple regression models seem preferable because of their simplicity and flexibility. The simulations and the analysis of experimental data suggest that in the majority of situations, methods that assume a lognormal distribution of the allelic expression ratios are both robust to deviations from this assumption and more powerful than alternatives that do not make these assumptions.
Collapse
Affiliation(s)
- Marion Dawn Teare
- School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom.
| | | | | | | |
Collapse
|
8
|
Abstract
Whole-genome sequencing of an Irish person reveals hundreds of thousands of novel genomic variants. Imputation using previous known information improves the accuracy of low-read-depth sequencing. See research article: http://genomebiology.com/2010/11/9/R91
Collapse
Affiliation(s)
- Young Seok Ju
- Genomic Medicine Institute, Medical Research Center, Seoul National University, 28 Yongon-Dong, Jongno-Gu, Seoul 110-799, Korea
| | | | | | | |
Collapse
|
9
|
Kukita Y, Yahara K, Tahira T, Higasa K, Sonoda M, Yamamoto K, Kato K, Wake N, Hayashi K. A definitive haplotype map as determined by genotyping duplicated haploid genomes finds a predominant haplotype preference at copy-number variation events. Am J Hum Genet 2010; 86:918-28. [PMID: 20537301 DOI: 10.1016/j.ajhg.2010.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 04/13/2010] [Accepted: 05/07/2010] [Indexed: 10/19/2022] Open
Abstract
The majority of complete hydatidiform moles (CHMs) harbor duplicated haploid genomes that originate from sperm. This makes CHMs more advantageous than conventional diploid cells for determining haplotypes of SNPs and copy-number variations (CNVs), because all of the genetic variants in a CHM genome are homozygous. Here we report SNP and CNV haplotype structures determined by analysis of 100 CHMs from Japanese subjects via high-density DNA arrays. The obtained haplotype map should be useful as a reference for the haplotype structure of Asian populations. We resolved common CNV regions (merged CNV segments across the examined samples) into CNV events (clusters of CNV segments) on the basis of mutual overlap and found that the haplotype backgrounds of different CNV events within the same CNV region were predominantly similar, perhaps because of inherent structural instability.
Collapse
|
10
|
Yamaguchi-Kabata Y, Tsunoda T, Takahashi A, Hosono N, Kubo M, Nakamura Y, Kamatani N. Making a haplotype catalog with estimated frequencies based on SNP homozygotes. J Hum Genet 2010; 55:500-6. [DOI: 10.1038/jhg.2010.56] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
11
|
Haplotype allelic classes for detecting ongoing positive selection. BMC Bioinformatics 2010; 11:65. [PMID: 20109229 PMCID: PMC2831848 DOI: 10.1186/1471-2105-11-65] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 01/28/2010] [Indexed: 01/23/2023] Open
Abstract
Background Natural selection eliminates detrimental and favors advantageous phenotypes. This process leaves characteristic signatures in underlying genomic segments that can be recognized through deviations in allelic or haplotypic frequency spectra. To provide an identifiable signature of recent positive selection that can be detected by comparison with the background distribution, we introduced a new way of looking at genomic polymorphisms: haplotype allelic classes. Results The model combines segregating sites and haplotypic information in order to reveal useful data characteristics. We developed a summary statistic, Svd, to compare the distribution of the haplotypes carrying the selected allele with the distribution of the remaining ones. Coalescence simulations are used to study the distributions under standard population models assuming neutrality, demographic scenarios and selection models. To test, in practice, haplotype allelic class performance and the derived statistic in capturing deviation from neutrality due to positive selection, we analyzed haplotypic variation in detail in the locus of lactase persistence in the three HapMap Phase II populations. Conclusions We showed that the Svd statistic is less sensitive than other tests to confounding factors such as demography or recombination. Our approach succeeds in identifying candidate loci, such as the lactase-persistence locus, as targets of strong positive selection and provides a new tool complementary to other tests to study natural selection in genomic data.
Collapse
|