1
|
Lavanchy E, Weir BS, Goudet J. Detecting inbreeding depression in structured populations. Proc Natl Acad Sci U S A 2024; 121:e2315780121. [PMID: 38687793 DOI: 10.1073/pnas.2315780121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/19/2024] [Indexed: 05/02/2024] Open
Abstract
Measuring inbreeding and its consequences on fitness is central for many areas in biology including human genetics and the conservation of endangered species. However, there is no consensus on the best method, neither for quantification of inbreeding itself nor for the model to estimate its effect on specific traits. We simulated traits based on simulated genomes from a large pedigree and empirical whole-genome sequences of human data from populations with various sizes and structures (from the 1,000 Genomes project). We compare the ability of various inbreeding coefficients ([Formula: see text]) to quantify the strength of inbreeding depression: allele-sharing, two versions of the correlation of uniting gametes which differ in the weight they attribute to each locus and two identical-by-descent segments-based estimators. We also compare two models: the standard linear model and a linear mixed model (LMM) including a genetic relatedness matrix (GRM) as random effect to account for the nonindependence of observations. We find LMMs give better results in scenarios with population or family structure. Within the LMM, we compare three different GRMs and show that in homogeneous populations, there is little difference among the different [Formula: see text] and GRM for inbreeding depression quantification. However, as soon as a strong population or family structure is present, the strength of inbreeding depression can be most efficiently estimated only if i) the phenotypes are regressed on [Formula: see text] based on a weighted version of the correlation of uniting gametes, giving more weight to common alleles and ii) with the GRM obtained from an allele-sharing relatedness estimator.
Collapse
Affiliation(s)
- Eléonore Lavanchy
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Population Genetics and Genomics group, Swiss Institute of Bioinformatics, University of Lausanne, Lausanne CH-1015, Switzerland
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle WA 98195
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Population Genetics and Genomics group, Swiss Institute of Bioinformatics, University of Lausanne, Lausanne CH-1015, Switzerland
| |
Collapse
|
2
|
Billenstein RJ, Höhna S. Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories. Mol Biol Evol 2024; 41:msae073. [PMID: 38630635 PMCID: PMC11068272 DOI: 10.1093/molbev/msae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/16/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.
Collapse
Affiliation(s)
- Ronja J Billenstein
- GeoBio-Center, Ludwig-Maximilians-Universität München, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich 80333, Germany
| | - Sebastian Höhna
- GeoBio-Center, Ludwig-Maximilians-Universität München, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich 80333, Germany
| |
Collapse
|
3
|
Legarra A, Bermann M, Mei Q, Christensen OF. Redefining and interpreting genomic relationships of metafounders. Genet Sel Evol 2024; 56:34. [PMID: 38698373 DOI: 10.1186/s12711-024-00891-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 03/18/2024] [Indexed: 05/05/2024] Open
Abstract
Metafounders are a useful concept to characterize relationships within and across populations, and to help genetic evaluations because they help modelling the means and variances of unknown base population animals. Current definitions of metafounder relationships are sensitive to the choice of reference alleles and have not been compared to their counterparts in population genetics-namely, heterozygosities, FST coefficients, and genetic distances. We redefine the relationships across populations with an arbitrary base of a maximum heterozygosity population in Hardy-Weinberg equilibrium. Then, the relationship between or within populations is a cross-product of the formΓ b , b ' = 2 n 2 p b - 1 2 p b ' - 1 ' with p being vectors of allele frequencies at n markers in populations b and b ' . This is simply the genomic relationship of two pseudo-individuals whose genotypes are equal to twice the allele frequencies. We also show that this coding is invariant to the choice of reference alleles. In addition, standard population genetics metrics (inbreeding coefficients of various forms; FST differentiation coefficients; segregation variance; and Nei's genetic distance) can be obtained from elements of matrix Γ .
Collapse
Affiliation(s)
| | - Matias Bermann
- Animal and Dairy Science, University of Georgia, 425 River Rd, Athens, GA, 30602, USA
| | - Quanshun Mei
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| | - Ole F Christensen
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 3, Bld. 1130, 8000, Aarhus C, Denmark
| |
Collapse
|
4
|
Yalinkiliç NA, Başbağ S, Altaf MT, Ali A, Nadeem MA, Baloch FS. Applicability of SCoT markers in unraveling genetic variation and population structure among sugar beet (Beta vulgaris L.) germplasm. Mol Biol Rep 2024; 51:584. [PMID: 38683231 DOI: 10.1007/s11033-024-09526-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 04/05/2024] [Indexed: 05/01/2024]
Abstract
BACKGROUND Sugar beet (Beta vulgaris L.) holds significant importance as a crop globally cultivated for sugar production. The genetic diversity present in sugar beet accessions plays a crucial role in crop improvement programs. METHODS AND RESULTS During the present study, we collected 96 sugar beet accessions from different regions and extracted DNA from their leaves. Genomic DNA was amplified using SCoT primers, and the resulting fragments were separated by gel electrophoresis. The data were analyzed using various genetic diversity indices, and constructed a population STRUCTURE, applied the unweighted pair-group method with arithmetic mean (UPGMA), and conducted Principle Coordinate Analysis (PCoA). The results revealed a high level of genetic diversity among the sugar beet accessions, with 265 bands produced by the 10 SCoT primers used. The percentage of polymorphic bands was 97.60%, indicating substantial genetic variation. The study uncovered significant genetic variation, leading to higher values for overall gene diversity (0.21), genetic distance (0.517), number of effective alleles (1.36), Shannon's information index (0.33), and polymorphism information contents (0.239). The analysis of molecular variance suggested a considerable amount of genetic variation, with 89% existing within the population. Using STRUCTURE and UPGMA analysis, the sugar beet germplasm was divided into two major populations. Structure analysis partitioned the germplasm based on the origin and domestication history of sugar beet, resulting in neighboring countries clustering together. CONCLUSION The utilization of SCoT markers unveiled a noteworthy degree of genetic variation within the sugar beet germplasm in this study. These findings can be used in future breeding programs with the objective of enhancing both sugar beet yield and quality.
Collapse
Affiliation(s)
- Nazlı Aybar Yalinkiliç
- Faculty of Applied Sciences, Department of Plant Production and Technologies, Mus Alparslan University, Muş, Türkiye, Turkey
| | - Sema Başbağ
- Department of field crops, Faculty of agriculture, Dicle University, Diyarbakir, Türkiye, Turkey
| | - Muhammad Tanveer Altaf
- Department of Plant Production and Technologies, Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, 58140, Sivas, Türkiye, Turkey
| | - Amjad Ali
- Department of Plant Protection, Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, 58140, Sivas, Türkiye, Turkey
| | - Muhammad Azhar Nadeem
- Department of Plant Production and Technologies, Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, 58140, Sivas, Türkiye, Turkey
| | - Faheem Shehzad Baloch
- Department of Biotechnology, Faculty of Science, Mersin University, Yenişehir, Mersin, Türkiye, 33343, Turkey.
| |
Collapse
|
5
|
Aoki S, Fukasawa K. Kernel density estimation of allele frequency including undetected alleles. PeerJ 2024; 12:e17248. [PMID: 38666077 PMCID: PMC11044881 DOI: 10.7717/peerj.17248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Whereas undetected species contribute to estimation of species diversity, undetected alleles have not been used to estimated genetic diversity. Although random sampling guarantees unbiased estimation of allele frequency and genetic diversity measures, using undetected alleles may provide biased but more precise estimators useful for conservation. We newly devised kernel density estimation (KDE) for allele frequency including undetected alleles and tested it in estimation of allele frequency and nucleotide diversity using population generated by coalescent simulation as well as well as real population data. Contrary to expectations, nucleotide diversity estimated by KDE had worse bias and accuracy. Allele frequency estimated by KDE was also worse except when the sample size was small. These might be due to finity of population and/or the curse of dimensionality. In conclusion, KDE of allele frequency does not contribute to genetic diversity estimation.
Collapse
Affiliation(s)
- Satoshi Aoki
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| | - Keita Fukasawa
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| |
Collapse
|
6
|
Salvo NM, Olsen GH, Berg T, Janssen K. Biogeographical Ancestry Analyses Using the ForenSeq TM DNA Signature Prep Kit and Multiple Prediction Tools. Genes (Basel) 2024; 15:510. [PMID: 38674444 PMCID: PMC11050699 DOI: 10.3390/genes15040510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024] Open
Abstract
The inference of biogeographical ancestry (BGA) can assist in police investigations of serious crime cases and help to identify missing people and victims of mass disasters. In this study, we evaluated the typing performance of 56 ancestry-informative SNPs in 177 samples using the ForenSeq™ DNA Signature Prep Kit on the MiSeq FGx system. Furthermore, we compared the prediction accuracy of the tools Universal Analysis Software v1.2 (UAS), the FROG-kb, and GenoGeographer when inferring the ancestry of 503 Europeans, 22 non-Europeans, and 5 individuals with co-ancestry. The kit was highly sensitive with complete aiSNP profiles in samples with as low as 250pg input DNA. However, in line with others, we observed low read depth and occasional drop-out in some SNPs. Therefore, we suggest not using less than the recommended 1ng of input DNA. FROG-kb and GenoGeographer accurately predicted both Europeans (99.6% and 91.8% correct, respectively) and non-Europeans (95.4% and 90.9% correct, respectively). The UAS was highly accurate when predicting Europeans (96.0% correct) but performed poorer when predicting non-Europeans (40.9% correct). None of the tools were able to correctly predict individuals with co-ancestry. Our study demonstrates that the use of multiple prediction tools will increase the prediction accuracy of BGA inference in forensic casework.
Collapse
Affiliation(s)
- Nina Mjølsnes Salvo
- Centre for Forensic Genetics, Department of Medical Biology, Faculty of Health Sciences, UiT The Arctic University of Norway, Post Box 6050, 9037 Tromsø, Norway
| | | | | | - Kirstin Janssen
- Centre for Forensic Genetics, Department of Medical Biology, Faculty of Health Sciences, UiT The Arctic University of Norway, Post Box 6050, 9037 Tromsø, Norway
| |
Collapse
|
7
|
Liu X, Yang C, Chen X, Han X, Liu H, Zhang X, Xu Q, Yang X, Liu C, Chen L, Liu C. A novel 193-plex MPS panel integrating STRs and SNPs highlights the application value of forensic genetics in individual identification and paternity testing. Hum Genet 2024; 143:371-383. [PMID: 38499885 DOI: 10.1007/s00439-024-02658-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/13/2024] [Indexed: 03/20/2024]
Abstract
Massively parallel sequencing (MPS) has emerged as a promising technology for targeting multiple genetic loci simultaneously in forensic genetics. Here, a novel 193-plex panel was designed to target 28 A-STRs, 41 Y-STRs, 21 X-STRs, 3 sex-identified loci, and 100 A-SNPs by employing a single-end 400 bp sequencing strategy on the MGISEQ-2000™ platform. In the present study, a series of validations and sequencing of 1642 population samples were performed to evaluate the overall performance of the MPS-based panel and its practicality in forensic application according to the SWGDAM guidelines. In general, the 193-plex markers in our panel showed good performance in terms of species specificity, stability, and repeatability. Compared to commercial kits, this panel achieved 100% concordance for standard gDNA and 99.87% concordance for 14,560 population genotypes. Moreover, this panel detected 100% of the loci from 0.5 ng of DNA template and all unique alleles at a 1:4 DNA mixture ratio (0.2 ng minor contributor), and the applicability of the proposed approach for tracing and degrading DNA was further supported by case samples. In addition, several forensic parameters of STRs and SNPs were calculated in a population study. High CPE and CPD values greater than 0.9999999 were clearly demonstrated and these results could be useful references for the application of this panel in individual identification and paternity testing. Overall, this 193-plex MPS panel has been shown to be a reliable, repeatable, robust, inexpensive, and powerful tool sufficient for forensic practice.
Collapse
Affiliation(s)
- Xueyuan Liu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, 510515, China
| | - Chengliang Yang
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, 510515, China
| | - Xiaohui Chen
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China
| | - Xiaolong Han
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China
| | - Hong Liu
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China
| | - Xingkun Zhang
- DeepReads Biotech, Guangzhou, Guangdong, 510000, China
| | - Quyi Xu
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China
| | - Xingyi Yang
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China
| | - Changhui Liu
- Guangdong Province Key Laboratory of Forensic Genetics, Guangzhou Forensic Science Institute, Guangzhou, Guangdong, 510030, China.
| | - Ling Chen
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, 510515, China.
| | - Chao Liu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, 510515, China.
- National Anti-Drug Laboratory Guangdong Regional Center, Guangzhou, Guangdong, 510230, China.
| |
Collapse
|
8
|
Zhang S, Zhang R, Yuan K, Yang L, Liu C, Liu Y, Ni X, Xu S. Reconstructing complex admixture history using a hierarchical model. Brief Bioinform 2024; 25:bbad540. [PMID: 38261339 PMCID: PMC10805183 DOI: 10.1093/bib/bbad540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/04/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Various methods have been proposed to reconstruct admixture histories by analyzing the length of ancestral chromosomal tracts, such as estimating the admixture time and number of admixture events. However, available methods do not explicitly consider the complex admixture structure, which characterizes the joining and mixing patterns of different ancestral populations during the admixture process, and instead assume a simplified one-by-one sequential admixture model. In this study, we proposed a novel approach that considers the non-sequential admixture structure to reconstruct admixture histories. Specifically, we introduced a hierarchical admixture model that incorporated four ancestral populations and developed a new method, called HierarchyMix, which uses the length of ancestral tracts and the number of ancestry switches along genomes to reconstruct the four-way admixture history. By automatically selecting the optimal admixture model using the Bayesian information criterion principles, HierarchyMix effectively estimates the corresponding admixture parameters. Simulation studies confirmed the effectiveness and robustness of HierarchyMix. We also applied HierarchyMix to Uyghurs and Kazakhs, enabling us to reconstruct the admixture histories of Central Asians. Our results highlight the importance of considering complex admixture structures and demonstrate that HierarchyMix is a useful tool for analyzing complex admixture events.
Collapse
Affiliation(s)
- Shi Zhang
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Rui Zhang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Kai Yuan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Lu Yang
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Chang Liu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yuting Liu
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Center for Evolutionary Biology, School of Life Sciences, Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai 200032 , China
- Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 201203, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
9
|
Vahedi SM, Ardestani SS. FSTest: an efficient tool for cross-population fixation index estimation on variant call format files. J Genet 2024; 103:04. [PMID: 38258299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.
Collapse
Affiliation(s)
- Seyed Milad Vahedi
- Department of Animal Science and Aquaculture, Dalhousie University, Bible Hill, NS B2N5E3,
| | | |
Collapse
|
10
|
Abstract
We consider whether one can forecast the emergence of variants of concern in the SARS-CoV-2 outbreak and similar pandemics. We explore methods of population genetics and identify key relevant principles in both deterministic and stochastic models of spread of infectious disease. Finally, we demonstrate that fitness variation, defined as a trait for which an increase in its value is associated with an increase in net Darwinian fitness if the value of other traits are held constant, is a strong indicator of imminent transition in the viral population.
Collapse
Affiliation(s)
- James Kyle Miller
- Auton Systems LLC, Pittsburgh, PA, United States of America
- * E-mail:
| | - Kimberly Elenberg
- United States Department of Defense Covid Task Force, Washington, DC, United States of America
| | | |
Collapse
|
11
|
Changmai P, Jaisamut K, Kampuansai J, Kutanan W, Altınışık NE, Flegontova O, Inta A, Yüncü E, Boonthai W, Pamjav H, Reich D, Flegontov P. Indian genetic heritage in Southeast Asian populations. PLoS Genet 2022; 18:e1010036. [PMID: 35176016 PMCID: PMC8853555 DOI: 10.1371/journal.pgen.1010036] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 01/12/2022] [Indexed: 11/20/2022] Open
Abstract
The great ethnolinguistic diversity found today in mainland Southeast Asia (MSEA) reflects multiple migration waves of people in the past. Maritime trading between MSEA and India was established at the latest 300 BCE, and the formation of early states in Southeast Asia during the first millennium CE was strongly influenced by Indian culture, a cultural influence that is still prominent today. Several ancient Indian-influenced states were located in present-day Thailand, and various populations in the country are likely to be descendants of people from those states. To systematically explore Indian genetic heritage in MSEA populations, we generated genome-wide SNP data (using the Affymetrix Human Origins array) for 119 present-day individuals belonging to 10 ethnic groups from Thailand and co-analyzed them with published data using PCA, ADMIXTURE, and methods relying on f-statistics and on autosomal haplotypes. We found low levels of South Asian admixture in various MSEA populations for whom there is evidence of historical connections with the ancient Indian-influenced states but failed to find this genetic component in present-day hunter-gatherer groups and relatively isolated groups from the highlands of Northern Thailand. The results suggest that migration of Indian populations to MSEA may have been responsible for the spread of Indian culture in the region. Our results also support close genetic affinity between Kra-Dai-speaking (also known as Tai-Kadai) and Austronesian-speaking populations, which fits a linguistic hypothesis suggesting cladality of the two language families.
Collapse
Affiliation(s)
- Piya Changmai
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Kitipong Jaisamut
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Jatupol Kampuansai
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
- Research Center in Bioresources for Agriculture, Industry and Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Wibhu Kutanan
- Department of Biology, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand
| | - N Ezgi Altınışık
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Olga Flegontova
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Angkhana Inta
- Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
- Research Center in Bioresources for Agriculture, Industry and Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Eren Yüncü
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Worrawit Boonthai
- Research Unit in Physical Anthropology and Health Science, Thammasat University, Pathum thani, Thailand
| | - Horolma Pamjav
- Hungarian Institute for Forensic Sciences, Institute of Forensic Genetics, Budapest, Hungary
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Pavel Flegontov
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Kalmyk Research Center of the Russian Academy of Sciences, Elista, Kalmykia, Russia
| |
Collapse
|
12
|
Xu ZM, Rüeger S, Zwyer M, Brites D, Hiza H, Reinhard M, Rutaihwa L, Borrell S, Isihaka F, Temba H, Maroa T, Naftari R, Hella J, Sasamalo M, Reither K, Portevin D, Gagneux S, Fellay J. Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations. PLoS Comput Biol 2022; 18:e1009628. [PMID: 35025869 PMCID: PMC8791479 DOI: 10.1371/journal.pcbi.1009628] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Revised: 01/26/2022] [Accepted: 11/10/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array. Genome-wide association studies, which study the association between genetic variants and various phenotypes, typically rely on genotyping arrays. Only a small proportion of genetic variants within the genome are typed on genotyping arrays. Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants (haplotypes) observed in external reference panels are leveraged to infer untyped variants in the study population. However, for study populations that are underrepresented in existing reference panels, the quality of imputation is often sub-optimal. This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population. Here, we illustrate an approach to select a custom set of population-specific typed variants to improve genotype imputation in such underrepresented populations.
Collapse
Affiliation(s)
- Zhi Ming Xu
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Rüeger
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michaela Zwyer
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Daniela Brites
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Hellen Hiza
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | - Miriam Reinhard
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Liliana Rutaihwa
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sonia Borrell
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | - Thomas Maroa
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Jerry Hella
- Ifakara Health Institute, Dar es Salaam, Tanzania
| | | | - Klaus Reither
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Damien Portevin
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Precision Medicine Unit, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
13
|
Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O'Reilly PF, Vilhjálmsson BJ. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet 2022; 109:12-23. [PMID: 34995502 PMCID: PMC8764121 DOI: 10.1016/j.ajhg.2021.11.008] [Citation(s) in RCA: 88] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/04/2021] [Indexed: 12/25/2022] Open
Abstract
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark.
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris 75015, France; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | | | - Clive Hoggart
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paul F O'Reilly
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| |
Collapse
|
14
|
Abstract
Gene drives are selfish genetic elements that are transmitted to progeny at super-Mendelian (>50%) frequencies. Recently developed CRISPR-Cas9-based gene-drive systems are highly efficient in laboratory settings, offering the potential to reduce the prevalence of vector-borne diseases, crop pests and non-native invasive species. However, concerns have been raised regarding the potential unintended impacts of gene-drive systems. This Review summarizes the phenomenal progress in this field, focusing on optimal design features for full-drive elements (drives with linked Cas9 and guide RNA components) that either suppress target mosquito populations or modify them to prevent pathogen transmission, allelic drives for updating genetic elements, mitigating strategies including trans-complementing split-drives and genetic neutralizing elements, and the adaptation of drive technology to other organisms. These scientific advances, combined with ethical and social considerations, will facilitate the transparent and responsible advancement of these technologies towards field implementation.
Collapse
Affiliation(s)
- Ethan Bier
- Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
15
|
Carrot-Zhang J, Han S, Zhou W, Damrauer JS, Kemal A, Cherniack AD, Beroukhim R. Analytical protocol to identify local ancestry-associated molecular features in cancer. STAR Protoc 2021; 2:100766. [PMID: 34585150 PMCID: PMC8456058 DOI: 10.1016/j.xpro.2021.100766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
People of different ancestries vary in cancer risk and outcome, and their molecular differences may indicate sources of these variations. Determining the "local" ancestry composition at each genetic locus across ancestry-admixed populations can suggest causal associations. We present a protocol to identify local ancestry and detect the associated molecular changes, using data from the Cancer Genome Atlas. This workflow can be applied to cancer cohorts with matched tumor and normal data from admixed patients to examine germline contributions to cancer. For complete details on the use and execution of this protocol, please refer to Carrot-Zhang et al. (2020).
Collapse
Affiliation(s)
- Jian Carrot-Zhang
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Seunghun Han
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Wanding Zhou
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jeffrey S. Damrauer
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Anab Kemal
- National Cancer Institute, Bethesda, MD 20892, USA
| | - Andrew D. Cherniack
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Rameen Beroukhim
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| |
Collapse
|
16
|
Legarra A, Garcia-Baccino CA, Wientjes YCJ, Vitezica ZG. The correlation of substitution effects across populations and generations in the presence of nonadditive functional gene action. Genetics 2021; 219:iyab138. [PMID: 34718531 PMCID: PMC8664574 DOI: 10.1093/genetics/iyab138] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 08/19/2021] [Indexed: 11/14/2022] Open
Abstract
Allele substitution effects at quantitative trait loci (QTL) are part of the basis of quantitative genetics theory and applications such as association analysis and genomic prediction. In the presence of nonadditive functional gene action, substitution effects are not constant across populations. We develop an original approach to model the difference in substitution effects across populations as a first order Taylor series expansion from a "focal" population. This expansion involves the difference in allele frequencies and second-order statistical effects (additive by additive and dominance). The change in allele frequencies is a function of relationships (or genetic distances) across populations. As a result, it is possible to estimate the correlation of substitution effects across two populations using three elements: magnitudes of additive, dominance, and additive by additive variances; relationships (Nei's minimum distances or Fst indexes); and assumed heterozygosities. Similarly, the theory applies as well to distinct generations in a population, in which case the distance across generations is a function of increase of inbreeding. Simulation results confirmed our derivations. Slight biases were observed, depending on the nonadditive mechanism and the reference allele. Our derivations are useful to understand and forecast the possibility of prediction across populations and the similarity of GWAS effects.
Collapse
Affiliation(s)
- Andres Legarra
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
| | - Carolina A. Garcia-Baccino
- INRAE/INP, UMR 1388 GenPhySE, Castanet-Tolosan 31326, France
- Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires C1417DSQ, Argentina
- SAS NUCLEUS, Le Rheu 35650, France
| | - Yvonne C. J. Wientjes
- Wageningen University & Research, Animal Breeding and Genomics, Wageningen 6700 AH, the Netherlands
| | | |
Collapse
|
17
|
Kardos M, Armstrong EE, Fitzpatrick SW, Hauser S, Hedrick PW, Miller JM, Tallmon DA, Funk WC. The crucial role of genome-wide genetic variation in conservation. Proc Natl Acad Sci U S A 2021; 118:e2104642118. [PMID: 34772759 PMCID: PMC8640931 DOI: 10.1073/pnas.2104642118] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2021] [Indexed: 12/30/2022] Open
Abstract
The unprecedented rate of extinction calls for efficient use of genetics to help conserve biodiversity. Several recent genomic and simulation-based studies have argued that the field of conservation biology has placed too much focus on conserving genome-wide genetic variation, and that the field should instead focus on managing the subset of functional genetic variation that is thought to affect fitness. Here, we critically evaluate the feasibility and likely benefits of this approach in conservation. We find that population genetics theory and empirical results show that conserving genome-wide genetic variation is generally the best approach to prevent inbreeding depression and loss of adaptive potential from driving populations toward extinction. Focusing conservation efforts on presumably functional genetic variation will only be feasible occasionally, often misleading, and counterproductive when prioritized over genome-wide genetic variation. Given the increasing rate of habitat loss and other environmental changes, failure to recognize the detrimental effects of lost genome-wide genetic variation on long-term population viability will only worsen the biodiversity crisis.
Collapse
Affiliation(s)
- Marty Kardos
- Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA 98112;
| | | | - Sarah W Fitzpatrick
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, MI 49060
- Department of Integrative Biology, Michigan State University, East Lansing, MI 48824
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824
| | - Samantha Hauser
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI 53211
| | - Philip W Hedrick
- School of Life Sciences, Arizona State University, Tempe, AZ 85287
| | - Joshua M Miller
- San Diego Zoo Wildlife Alliance, Escondido, CA 92027
- Polar Bears International, Bozeman, MT 59772
- Department of Biological Sciences, MacEwan University, Edmonton, AB T5J 4S2, Canada
| | - David A Tallmon
- Biology and Marine Biology Program, University of Alaska Southeast, Juneau, AK 99801
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO 80523
| |
Collapse
|
18
|
Foster SD, Feutry P, Grewe P, Davies C. Sample size requirements for genetic studies on yellowfin tuna. PLoS One 2021; 16:e0259113. [PMID: 34735482 PMCID: PMC8568148 DOI: 10.1371/journal.pone.0259113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 10/12/2021] [Indexed: 12/02/2022] Open
Abstract
In population genetics, the amount of information for an analytical task is governed by the number of individuals sampled and the amount of genetic information measured on each of those individuals. In this work, we assessed the numbers of individual yellowfin tuna (Thunnus albacares) and genetic markers required for ocean-basin scale inferences. We assessed this for three distinct data analysis tasks that are often employed: testing for differences between genetic profiles; stock delineation, and; assignment of individuals to stocks. For all analytical tasks, we used real (not simulated) data from four sampling locations that span the tropical Pacific Ocean. Whilst spatially separated, the genetic differences between the sampling sites were not substantial, a maximum of approximately Fst = 0.02, which is quite typical of large pelagic fish. We repeatedly sub-sampled the data, mimicking a new survey, and performed the analyses. False positive rates were also assessed by re-sampling and randomly assigning fish to groups. Varying the sample sizes indicated that some analytical tasks, namely profile testing, required relatively few individuals per sampling location (n ≳ 10) and single nucleotide polymorphisms (SNPs, m ≳ 256). Stock delineation required more individuals per sampling location (n ≳ 25). Assignment of fish to sampling locations required substantially more individuals, more in fact than we had available (n > 50), although this sample size could be reduced to n ≳ 30 when individual fish were assumed to belong to one of the groups sampled. With these results, designers of molecular ecological surveys for yellowfin tuna, and users of information from them, can assess whether the information content is adequate for the required inferential task.
Collapse
Affiliation(s)
| | - Pierre Feutry
- CSIRO’s Oceans and Atmospheres, Hobart, Tasmania, Australia
| | - Peter Grewe
- CSIRO’s Oceans and Atmospheres, Hobart, Tasmania, Australia
| | | |
Collapse
|
19
|
Singh VK, Singh SK, Joshi BD, Singh A, Kumar H, Chandra K, Sharma LK, Thakur M. Population genetic attributes of common leopard (Panthera pardus fusca) from Uttarkashi, Western Himalayas. Mol Biol Rep 2021; 49:1573-1579. [PMID: 34729672 DOI: 10.1007/s11033-021-06908-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 10/29/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND The common leopard (Panthera pardus fusca), which persists in most of its historic range, is experiencing steady population decline due to habitat loss, anthrophonic disturbances, illegal poaching for their body parts, and retaliatory killings in response to the leopard-human conflicts. METHODS AND RESULTS We analysed 143 scats samples and identified 32 unique leopards following a selected panel of seven loci with cumulative PID sibs 5.30E-04. We observed moderate genetic diversity at nuclear (Ho = 0.600 ± 0.06) and mitochondrial markers (Hd = 0.569 ± 0.009; π = 0.001 ± 0.0002) and found sub-structuring in the leopard population at Uttarkashi, Western Himalayas. CONCLUSIONS The present study exhibits the utility of non-invasive genetics in monitoring the leopard population and paves the path to investigate population genetic parameters in further studies.
Collapse
Affiliation(s)
- Vinaya Kumar Singh
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Sujeet Kumar Singh
- Amity Institute of Forestry and Wildlife, Amity University, Sector-125, Noida, 201 303, India
| | - Bheem Dutt Joshi
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Abhishek Singh
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Hemant Kumar
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Kailash Chandra
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Lalit Kumar Sharma
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India
| | - Mukesh Thakur
- Zoological Survey of India, New Alipore, Calcutta, West Bengal, 700053, India.
| |
Collapse
|
20
|
Urnikyte A, Molyte A, Kučinskas V. Genome-Wide Landscape of North-Eastern European Populations: A View from Lithuania. Genes (Basel) 2021; 12:genes12111730. [PMID: 34828336 PMCID: PMC8623362 DOI: 10.3390/genes12111730] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 01/15/2023] Open
Abstract
There are still several unanswered questions regarding about ancient events in the Lithuanian population. The Lithuanians, as the subject of this study, are of great interest as they represent a partially isolated population maintaining an ancient genetic composition and show genetic uniqueness in European comparisons. To elucidate the genetic relationships between the Lithuania and North-Eastern European and West Siberian populations, we analyzed the population structure, effective population size, and recent positive selection from genome-wide single nucleotide polymorphism (SNP) data. We identified the close genetic proximity of Lithuanians to neighboring populations (Latvians, Estonians, Belarusians) and in part with West and South Slavs (Poles, Slovaks, and Slovenians), however, with particular genetic distinctiveness. The estimated long-term Ne values ranged from ~5900 in the Estonian population to ~2400 in the South Russian population. The divergence times between the Lithuanian and study populations ranged from 240 to 12,871 YBP. We also found evidence of selection in 24 regions, 21 of which have not been discovered in previous analyses of selection. Undoubtedly, the genetic diversity analysis of geographically specific regions may provide new insights into microevolutionary processes affecting local human populations.
Collapse
Affiliation(s)
- Alina Urnikyte
- Department of Human and Medical Genetics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Santariškiu St. 2, LT-08661 Vilnius, Lithuania; (A.M.); (V.K.)
- Correspondence: ; Tel.: +370-698-55292
| | - Alma Molyte
- Department of Human and Medical Genetics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Santariškiu St. 2, LT-08661 Vilnius, Lithuania; (A.M.); (V.K.)
- Department of Information Systems, Faculty of Fundamentals Sciences, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, Lithuania
| | - Vaidutis Kučinskas
- Department of Human and Medical Genetics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Santariškiu St. 2, LT-08661 Vilnius, Lithuania; (A.M.); (V.K.)
| |
Collapse
|
21
|
Arning N, Sheppard SK, Bayliss S, Clifton DA, Wilson DJ. Machine learning to predict the source of campylobacteriosis using whole genome data. PLoS Genet 2021; 17:e1009436. [PMID: 34662334 PMCID: PMC8553134 DOI: 10.1371/journal.pgen.1009436] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 10/28/2021] [Accepted: 08/26/2021] [Indexed: 11/18/2022] Open
Abstract
Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
Collapse
Affiliation(s)
- Nicolas Arning
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
- * E-mail:
| | - Samuel K. Sheppard
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Sion Bayliss
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - David A. Clifton
- Department of Engineering Science, University of Oxford, Oxford, UK; Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Daniel J. Wilson
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
| |
Collapse
|
22
|
Li D, Li M, Li F, Weng Q, Zhou C, Huang S, Gan S. Transcriptome-derived microsatellite markers for population diversity analysis in Archidendron clypearia (Jack) I.C. Nielsen. Mol Biol Rep 2021; 48:8255-8260. [PMID: 34655020 DOI: 10.1007/s11033-021-06773-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 09/17/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND The medicinal woody leguminous genus Archidendron F. Mueller serves as important herbal resources for curing upper respiratory tract infection, acute pharyngitis, tonsillitis, and gastroenteritis. However, genomic resources including transcriptomic sequences and molecular markers remain scarce in the genus. METHODS AND RESULTS Transcriptome sequencing, genic microsatellite marker development, and population diversity analysis were conducted in Archidendron clypearia (Jack) I.C. Nielsen. Flower and flower bud transcriptomes were de novo assembled into 173,172 transcripts, with an average transcript length of 1597.3 bp and an N50 length of 2427 bp. A total of 34,701 microsatellite loci were identified from 26,716 (15.4 %) transcripts. Primer pairs were designed for 718 microsatellite loci, of which 456 (63.5 %) were polymorphic. Of the 456 polymorphic markers, 391 (85.7 %) and 402 (88.1 %) were transferable to A. lucidum (Benth.) I.C. Nielsen and A. multifoliolatum (H.Q. Wen) T.L. Wu, respectively. Using a subset of 15 microsatellite markers, relatively high genetic diversity was detected over two A. clypearia populations, with overall mean expected heterozygosity (He) being 0.707 and demonstrating the necessity of conservation. Relatively low differentiation between the two populations was revealed despite the distant separation (about 700 km), with overall inbreeding coefficient of sub-population to the total population (Fst) being 8.7 %. CONCLUSIONS This study represents the first attempt to conduct transcriptome sequencing, SSR marker development, and population genetics analysis in the medicinally important genus Archidendron. Our results will offer valuable resources and information for further genetic studies and practical applications in Archidendron and the related taxa.
Collapse
Affiliation(s)
- Dandan Li
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Xiangshan Road, Beijing, 100091, China
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Mei Li
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Fagen Li
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Qijie Weng
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Changpin Zhou
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Shineng Huang
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China
| | - Siming Gan
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Xiangshan Road, Beijing, 100091, China.
- Key Laboratory of National Forestry and Grassland Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, 682 Guangshan Yi Road, Guangzhou, 510520, Guangdong, China.
| |
Collapse
|
23
|
Vasquez KS, Willis L, Cira NJ, Ng KM, Pedro MF, Aranda-Díaz A, Rajendram M, Yu FB, Higginbottom SK, Neff N, Sherlock G, Xavier KB, Quake SR, Sonnenburg JL, Good BH, Huang KC. Quantifying rapid bacterial evolution and transmission within the mouse intestine. Cell Host Microbe 2021; 29:1454-1468.e4. [PMID: 34473943 PMCID: PMC8445907 DOI: 10.1016/j.chom.2021.08.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 05/25/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]
Abstract
Due to limitations on high-resolution strain tracking, selection dynamics during gut microbiota colonization and transmission between hosts remain mostly mysterious. Here, we introduced hundreds of barcoded Escherichia coli strains into germ-free mice and quantified strain-level dynamics and metagenomic changes. Mutations in genes involved in motility and metabolite utilization are reproducibly selected within days. Even with rapid selection, coprophagy enforced similar barcode distributions across co-housed mice. Whole-genome sequencing of hundreds of isolates revealed linked alleles that demonstrate between-host transmission. A population-genetics model predicts substantial fitness advantages for certain mutants and that migration accounted for ∼10% of the resident microbiota each day. Treatment with ciprofloxacin suggests interplay between selection and transmission. While initial colonization was mostly uniform, in two mice a bottleneck reduced diversity and selected for ciprofloxacin resistance in the absence of drug. These findings highlight the interplay between environmental transmission and rapid, deterministic selection during evolution of the intestinal microbiota.
Collapse
Affiliation(s)
- Kimberly S Vasquez
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Lisa Willis
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Nate J Cira
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Katharine M Ng
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Miguel F Pedro
- Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal
| | - Andrés Aranda-Díaz
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Manohary Rajendram
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | - Steven K Higginbottom
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Norma Neff
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Gavin Sherlock
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | - Stephen R Quake
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Justin L Sonnenburg
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Benjamin H Good
- Department of Physics, University of California at Berkeley, Berkeley, CA 94720, USA; Department of Applied Physics, Stanford University, Stanford, CA 94305, USA.
| | - Kerwyn Casey Huang
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
24
|
Abstract
The omnigenic model was proposed as a framework to understand the highly polygenic architecture of complex traits revealed by genome-wide association studies (GWASs). I argue that this model also explains recent observations about cross-population genetic effects, specifically the low transferability of polygenic scores and the lack of clear evidence for polygenic selection. In particular, the omnigenic model explains why the effects of most GWAS variants vary between populations. This interpretation has several consequences for the evolutionary interpretation and practical use of GWAS summary statistics and polygenic scores. First, some polygenic scores may be applicable only in populations of the same ancestry and environment as the discovery population. Second, most GWAS associations will have differing effects between populations and are unlikely to be robust clinical targets. Finally, it may not always be possible to detect polygenic selection from population genetic data. These considerations make it difficult to interpret the clinical and evolutionary meanings of polygenic scores without an explicit model of genetic architecture.
Collapse
Affiliation(s)
- Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
25
|
Alam MJ, Mydam J, Hossain MR, Islam SMS, Mollah MNH. Robust regression based genome-wide multi-trait QTL analysis. Mol Genet Genomics 2021; 296:1103-1119. [PMID: 34170407 DOI: 10.1007/s00438-021-01801-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 06/01/2021] [Indexed: 10/21/2022]
Abstract
In genome-wide quantitative trait locus (QTL) mapping studies, multiple quantitative traits are often measured along with the marker genotypes. Multi-trait QTL (MtQTL) analysis, which includes multiple quantitative traits together in a single model, is an efficient technique to increase the power of QTL identification. The two most widely used classical approaches for MtQTL mapping are Gaussian Mixture Model-based MtQTL (GMM-MtQTL) and Linear Regression Model-based MtQTL (LRM-MtQTL) analyses. There are two types of LRM-MtQTL approach known as least squares-based LRM-MtQTL (LS-LRM-MtQTL) and maximum likelihood-based LRM-MtQTL (ML-LRM-MtQTL). These three classical approaches are equivalent alternatives for QTL detection, but ML-LRM-MtQTL is computationally faster than GMM-MtQTL and LS-LRM-MtQTL. However, one major limitation common to all the above classical approaches is that they are very sensitive to outliers, which leads to misleading results. Therefore, in this study, we developed an LRM-based robust MtQTL approach, called LRM-RobMtQTL, for the backcross population based on the robust estimation of regression parameters by maximizing the β-likelihood function induced from the β-divergence with multivariate normal distribution. When β = 0, the proposed LRM-RobMtQTL method reduces to the classical ML-LRM-MtQTL approach. Simulation studies showed that both ML-LRM-MtQTL and LRM-RobMtQTL methods identified the same QTL positions in the absence of outliers. However, in the presence of outliers, only the proposed method was able to identify all the true QTL positions. Real data analysis results revealed that in the presence of outliers only our LRM-RobMtQTL approach can identify all the QTL positions as those identified in the absence of outliers by both methods. We conclude that our proposed LRM-RobMtQTL analysis approach outperforms the classical MtQTL analysis methods.
Collapse
Affiliation(s)
- Md Jahangir Alam
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Janardhan Mydam
- Division of Neonatology, Department of Pediatrics, John H. Stroger, Jr. Hospital of Cook County, 1969 Ogden Avenue, Chicago, IL, 60612, USA
- Department of Pediatrics, Rush Medical Center, Chicago, USA
| | - Md Ripter Hossain
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - S M Shahinul Islam
- Institute of Biological Science, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
26
|
Tokutomi N, Nakai K, Sugano S. Extreme value theory as a framework for understanding mutation frequency distribution in cancer genomes. PLoS One 2021; 16:e0243595. [PMID: 34424899 PMCID: PMC8382180 DOI: 10.1371/journal.pone.0243595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 08/10/2021] [Indexed: 12/31/2022] Open
Abstract
Currently, the population dynamics of preclonal cancer cells before clonal expansion of tumors has not been sufficiently addressed thus far. By focusing on preclonal cancer cell population as a Darwinian evolutionary system, we formulated and analyzed the observed mutation frequency among tumors (MFaT) as a proxy for the hypothesized sequence read frequency and beneficial fitness effect of a cancer driver mutation. Analogous to intestinal crypts, we assumed that sample donor patients are separate culture tanks where proliferating cells follow certain population dynamics described by extreme value theory (EVT). To validate this, we analyzed three large-scale cancer genome datasets, each harboring > 10000 tumor samples and in total involving > 177898 observed mutation sites. We clarified the necessary premises for the application of EVT in the strong selection and weak mutation (SSWM) regime in relation to cancer genome sequences at scale. We also confirmed that the stochastic distribution of MFaT is likely of the Fréchet type, which challenges the well-known Gumbel hypothesis of beneficial fitness effects. Based on statistical data analysis, we demonstrated the potential of EVT as a population genetics framework to understand and explain the stochastic behavior of driver-mutation frequency in cancer genomes as well as its applicability in real cancer genome sequence data.
Collapse
Affiliation(s)
- Natsuki Tokutomi
- Department of Computational Biology and Medical Science, Graduate School of Frontier Science, University of Tokyo, Kashiwa, Chiba, Japan
- * E-mail:
| | - Kenta Nakai
- Department of Computational Biology and Medical Science, Graduate School of Frontier Science, University of Tokyo, Kashiwa, Chiba, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan
| | - Sumio Sugano
- Medical Research Institute, Tokyo Medical and Dental University, Bunkyou-ku, Tokyo, Japan
- Future Medicine Education and Research Organization, Chiba University, Chiba, Chiba, Japan
| |
Collapse
|
27
|
Skead K, Ang Houle A, Abelson S, Agbessi M, Bruat V, Lin B, Soave D, Shlush L, Wright S, Dick J, Morris Q, Awadalla P. Interacting evolutionary pressures drive mutation dynamics and health outcomes in aging blood. Nat Commun 2021; 12:4921. [PMID: 34389724 PMCID: PMC8363714 DOI: 10.1038/s41467-021-25172-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 07/27/2021] [Indexed: 01/10/2023] Open
Abstract
Age-related clonal hematopoiesis (ARCH) is characterized by age-associated accumulation of somatic mutations in hematopoietic stem cells (HSCs) or their pluripotent descendants. HSCs harboring driver mutations will be positively selected and cells carrying these mutations will rise in frequency. While ARCH is a known risk factor for blood malignancies, such as Acute Myeloid Leukemia (AML), why some people who harbor ARCH driver mutations do not progress to AML remains unclear. Here, we model the interaction of positive and negative selection in deeply sequenced blood samples from individuals who subsequently progressed to AML, compared to healthy controls, using deep learning and population genetics. Our modeling allows us to discriminate amongst evolutionary classes with high accuracy and captures signatures of purifying selection in most individuals. Purifying selection, acting on benign or mildly damaging passenger mutations, appears to play a critical role in preventing disease-predisposing clones from rising to dominance and is associated with longer disease-free survival. Through exploring a range of evolutionary models, we show how different classes of selection shape clonal dynamics and health outcomes thus enabling us to better identify individuals at a high risk of malignancy.
Collapse
Affiliation(s)
- Kimberly Skead
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Armande Ang Houle
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Sagi Abelson
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | | | - Vanessa Bruat
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Boxi Lin
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - David Soave
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Mathematics, Wilfrid Laurier University, Waterloo, ON, Canada
| | - Liran Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Stephen Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - John Dick
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, United States.
| | - Philip Awadalla
- Ontario Institute for Cancer Research, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
28
|
Favela-Mendoza AF, Fricke-Galindo I, Cuevas-Sánchez WF, Aguilar-Velázquez JA, Martínez-Cortés G, Rangel-Villalobos H. Population diversity of three variants of the SLC47A2 gene (MATE2-K transporter) in Mexican Mestizos and Native Americans. Mol Biol Rep 2021; 48:6343-6348. [PMID: 34383246 DOI: 10.1007/s11033-021-06628-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 08/05/2021] [Indexed: 11/25/2022]
Abstract
BACKGROUND MATE2-K is an efflux transporter protein of organic cation expressed mainly in the kidney and encoded by the SLC47A2 gene. Different variants of this gene have shown an impact on the pharmacokinetics of various drugs, including metformin, which represents one of the most widely used drugs in treating type 2 diabetes. The SLC47A2 gene variants have been scarcely studied in Mexican populations, especially in Native American groups. For this reason, we analyzed the distribution of the variants rs12943590, rs35263947, and rs9900497 within the SLC47A2 gene in 173 Native Americans (Tarahumara, Huichol, Maya, Puerépecha) and 182 Mestizos (admixed) individuals from Mexico. METHODS AND RESULTS Genotypes were determined through TaqMan probes (qPCR). The Hardy-Weinberg agreement was confirmed for all three SLC47A2 gene variants in all the Mexican populations analyzed. When worldwide populations were included for comparison purposes, for alleles and genotypes a relative interpopulation homogeneity was observed for rs35263947 (T allele; range 23.3-51.1%) and rs9900497 (T allele; range 18.6-40.9%). Conversely, heterogeneity was evident for rs12943590 (A allele, range 22.1-59.1%), where the most differentiated population was the Huichol, with high frequencies of the risk genotype associated with decreased response to metformin treatment (A/A = 40.9%). CONCLUSIONS Although the SLC47A2 gene variants allow predicting favorable response to the metformin treatment in Mexican populations, the probable high frequency of ineffectiveness should be discarded in Huichols.
Collapse
Affiliation(s)
- Alma Faviola Favela-Mendoza
- Instituto de Investigación en Genética Molecular, Centro Universitario de la Ciénega, Universidad de Guadalajara (CUCiénega-UdeG), Av. Universidad, No. 1115, Col. Lindavista, CP. 47810, Ocotlán, Jalisco, Mexico.
| | - Ingrid Fricke-Galindo
- HLA Laboratory, Instituto Nacional de Enfermedades Respiratorias Ismael Cosío Villega, Mexico City, Mexico
| | - Wendy Fernanda Cuevas-Sánchez
- Instituto de Investigación en Genética Molecular, Centro Universitario de la Ciénega, Universidad de Guadalajara (CUCiénega-UdeG), Av. Universidad, No. 1115, Col. Lindavista, CP. 47810, Ocotlán, Jalisco, Mexico
| | - José Alonso Aguilar-Velázquez
- Instituto de Investigación en Genética Molecular, Centro Universitario de la Ciénega, Universidad de Guadalajara (CUCiénega-UdeG), Av. Universidad, No. 1115, Col. Lindavista, CP. 47810, Ocotlán, Jalisco, Mexico
| | - Gabriela Martínez-Cortés
- Instituto de Investigación en Genética Molecular, Centro Universitario de la Ciénega, Universidad de Guadalajara (CUCiénega-UdeG), Av. Universidad, No. 1115, Col. Lindavista, CP. 47810, Ocotlán, Jalisco, Mexico
| | - Héctor Rangel-Villalobos
- Instituto de Investigación en Genética Molecular, Centro Universitario de la Ciénega, Universidad de Guadalajara (CUCiénega-UdeG), Av. Universidad, No. 1115, Col. Lindavista, CP. 47810, Ocotlán, Jalisco, Mexico.
| |
Collapse
|
29
|
Shin J, Jung J. Comparative population genetics of the invasive mosquito Aedes albopictus and the native mosquito Aedes flavopictus in the Korean peninsula. Parasit Vectors 2021; 14:377. [PMID: 34315478 PMCID: PMC8314453 DOI: 10.1186/s13071-021-04873-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 07/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Aedes mosquitoes are important invasive species contributing to the spread of chikungunya, dengue fever, yellow fever, zika virus, and other dangerous vector-borne diseases. Aedes albopictus is native to southeast Asia, with rapid expansion due to human activity, showing a wide distribution in the Korean peninsula. Aedes flavopictus is considered to be native to East Asia, with a broad distribution in the region, including the Korean peninsula. A better understanding of the genetic diversity of these species is critical for establishing strategies for disease prevention and vector control. METHODS We obtained DNA from 148 specimens of Ae. albopictus and 166 specimens of Ae. flavopictus in Korea, and amplified two mitochondrial genes (COI and ND5) to compare the genetic diversity and structure of the two species. RESULTS We obtained a 658-bp sequence of COI and a 423-bp sequence of ND5 from both mosquito species. We found low diversity and a nonsignificant population genetic structure in Ae. albopictus, and high diversity and a nonsignificant structure in Ae. flavopictus for these two mitochondrial genes. Aedes albopictus had fewer haplotypes with respect to the number of individuals, and a slight mismatch distribution was confirmed. By contrast, Ae. flavopictus had a large number of haplotypes compared with the number of individuals, and a large unimodal-type mismatch distribution was confirmed. Although the genetic structure of both species was nonsignificant, Ae. flavopictus exhibited higher genetic diversity than Ae. albopictus. CONCLUSIONS Aedes albopictus appears to be an introduced species, whereas Ae. flavopictus is endemic to the Korean peninsula, and the difference in genetic diversity between the two species is related to their adaptability and introduction history. Further studies on the genetic structure and diversity of these mosquitos will provide useful data for vector control.
Collapse
Affiliation(s)
- Jiyeong Shin
- The Division of EcoCreative, Ewha Womans University, Seoul, 03760 South Korea
| | - Jongwoo Jung
- The Division of EcoCreative, Ewha Womans University, Seoul, 03760 South Korea
- Department of Science Education, Ewha Womans University, Seoul, 03760 South Korea
| |
Collapse
|
30
|
Leonenko G, Baker E, Stevenson-Hoare J, Sierksma A, Fiers M, Williams J, de Strooper B, Escott-Price V. Identifying individuals with high risk of Alzheimer's disease using polygenic risk scores. Nat Commun 2021; 12:4506. [PMID: 34301930 PMCID: PMC8302739 DOI: 10.1038/s41467-021-24082-z] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 06/02/2021] [Indexed: 11/09/2022] Open
Abstract
Polygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of APOE, what the optimal p-value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (APOE and PRS excluding APOE region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals' scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals' scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.
Collapse
Affiliation(s)
- Ganna Leonenko
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | - Emily Baker
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | | | - Annerieke Sierksma
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
| | - Mark Fiers
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
- UK Dementia Research Institute, University College London, London, UK
| | - Julie Williams
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Bart de Strooper
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Laboratory for the Research of Neurodegenerative Diseases, Department of Neurosciences, Leuven Brain Institute (LBI), KU Leuven (University of Leuven), Leuven, Belgium
- UK Dementia Research Institute, University College London, London, UK
| | - Valentina Escott-Price
- UK Dementia Research Institute, Cardiff University, Cardiff, UK.
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK.
| |
Collapse
|
31
|
Grzegorczyk J, Gurgul A, Oczkowicz M, Szmatoła T, Fornal A, Bugno-Poniewierska M. Single Nucleotide Polymorphism Discovery and Genetic Differentiation Analysis of Geese Bred in Poland, Using Genotyping-by-Sequencing (GBS). Genes (Basel) 2021; 12:genes12071074. [PMID: 34356090 PMCID: PMC8307914 DOI: 10.3390/genes12071074] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/06/2021] [Accepted: 07/12/2021] [Indexed: 11/25/2022] Open
Abstract
Poland is the largest European producer of goose, while goose breeding has become an essential and still increasing branch of the poultry industry. The most frequently bred goose is the White Kołuda® breed, constituting 95% of the country’s population, whereas geese of regional varieties are bred in smaller, conservation flocks. However, a goose’s genetic diversity is inaccurately explored, mainly because the advantages of the most commonly used tools are strongly limited in non-model organisms. One of the most accurate used markers for population genetics is single nucleotide polymorphisms (SNP). A highly efficient strategy for genome-wide SNP detection is genotyping-by-sequencing (GBS), which has been already widely applied in many organisms. This study attempts to use GBS in 12 conservative goose breeds and the White Kołuda® breed maintained in Poland. The GBS method allowed for the detection of 3833 common raw SNPs. Nevertheless, after filtering for read depth and alleles characters, we obtained the final markers panel used for a differentiation analysis that comprised 791 SNPs. These variants were located within 11 different genes, and one of the most diversified variants was associated with the EDAR gene, which is especially interesting as it participates in the plumage development, which plays a crucial role in goose breeding.
Collapse
Affiliation(s)
- Joanna Grzegorczyk
- Department of Molecular Biology of Animals, National Research Institute of Animal Production, Balice n., 32-083 Kraków, Poland; (J.G.); (T.S.); (A.F.)
| | - Artur Gurgul
- Center for Experimental and Innovative Medicine, University of Agriculture in Kraków, Al. Mickiewicza 24-28, 30-059 Kraków, Poland;
| | - Maria Oczkowicz
- Department of Molecular Biology of Animals, National Research Institute of Animal Production, Balice n., 32-083 Kraków, Poland; (J.G.); (T.S.); (A.F.)
- Correspondence:
| | - Tomasz Szmatoła
- Department of Molecular Biology of Animals, National Research Institute of Animal Production, Balice n., 32-083 Kraków, Poland; (J.G.); (T.S.); (A.F.)
- Center for Experimental and Innovative Medicine, University of Agriculture in Kraków, Al. Mickiewicza 24-28, 30-059 Kraków, Poland;
| | - Agnieszka Fornal
- Department of Molecular Biology of Animals, National Research Institute of Animal Production, Balice n., 32-083 Kraków, Poland; (J.G.); (T.S.); (A.F.)
| | - Monika Bugno-Poniewierska
- Department of Animal Reproduction, Faculty Anatomy and Genomics of Animal Breeding and Biology, Agricultural University in Cracow, Al. Mickiewicza 24-28, 30-059 Kraków, Poland;
| |
Collapse
|
32
|
Huang Y, Liu C, Xiao C, Chen X, Han X, Yi S, Huang D. Mutation analysis of 28 autosomal short tandem repeats in the Chinese Han population. Mol Biol Rep 2021; 48:5363-5369. [PMID: 34213710 DOI: 10.1007/s11033-021-06522-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 06/25/2021] [Indexed: 11/26/2022]
Abstract
Short tandem repeats (STRs) have been extensively used in forensic genetics. However, according to previous studies, the mutation rates of STRs are relatively high and are affected by many factors. Therefore, it is important to analyze STR mutations and determine the influence of underlying factors on STR mutation rates. Mutation rates of 28 autosomal STRs were determined from 8708 paternity testing cases in the Chinese Han population, and the relationships between STR mutation rates and population, sex, age, allele length and heterozygosity were investigated. A total of 279 mutations were observed at 27 loci in a total of 233,530 meiosis cases, including 273 (97.8%) one-step, 5 (1.8%) two-step and 1 (0.4%) three-step mutations. The overall average mutation rate was 1.19 × 10-3 (95% CI 1.06 × 10-3 - 1.34 × 10-3) ranging from 0 (TPOX) to 2.79 × 10-3 (D13S325). Mutation rate comparisons revealed statistically significant differences at several STRs among populations. Paternal mutations occurred more frequently than maternal mutations, at a ratio of 6.04:1, and the mutation rate tended to increase with paternal age. Moreover, our study revealed a bias towards contraction mutations for long alleles and expansion mutations for short alleles. No obvious bias was observed in the overall mutation direction. In addition, STR loci with higher expected heterozygosity (Hexp) tended to have higher mutation rates. This work revealed the relationships between STR mutation rates and several influencing factors, providing useful data and information for further research on STR mutations in forensic genetics.
Collapse
Affiliation(s)
- Yujie Huang
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Cong Liu
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chao Xiao
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoying Chen
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xueli Han
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shaohua Yi
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Daixin Huang
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
| |
Collapse
|
33
|
Creary LE, Sacchi N, Mazzocco M, Morris GP, Montero-Martin G, Chong W, Brown CJ, Dinou A, Stavropoulos-Giokas C, Gorodezky C, Narayan S, Periathiruvadi S, Thomas R, De Santis D, Pepperall J, ElGhazali GE, Al Yafei Z, Askar M, Tyagi S, Kanga U, Marino SR, Planelles D, Chang CJ, Fernández-Viña MA. High-resolution HLA allele and haplotype frequencies in several unrelated populations determined by next generation sequencing: 17th International HLA and Immunogenetics Workshop joint report. Hum Immunol 2021; 82:505-522. [PMID: 34030896 DOI: 10.1016/j.humimm.2021.04.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 04/23/2021] [Accepted: 04/28/2021] [Indexed: 12/12/2022]
Abstract
The primary goal of the unrelated population HLA diversity (UPHD) component of the 17th International HLA and Immunogenetics Workshop was to characterize HLA alleles at maximum allelic-resolution in worldwide populations and re-evaluate patterns of HLA diversity across populations. The UPHD project included HLA genotype and sequence data, generated by various next-generation sequencing methods, from 4,240 individuals collated from 12 different countries. Population data included well-defined large datasets from the USA and smaller samples from Europe, Australia, and Western Asia. Allele and haplotype frequencies varied across populations from distant geographical regions. HLA genetic diversity estimated at 2- and 4-field allelic resolution revealed that diversity at the majority of loci, particularly for European-descent populations, was lower at the 2-field resolution. Several common alleles with identical protein sequences differing only by intronic substitutions were found in distinct haplotypes, revealing a more detailed characterization of linkage between variants within the HLA region. The examination of coding and non-coding nucleotide variation revealed many examples in which almost complete biunivocal relations between common alleles at different loci were observed resulting in higher linkage disequilibrium. Our reference data of HLA profiles characterized at maximum resolution from many populations is useful for anthropological studies, unrelated donor searches, transplantation, and disease association studies.
Collapse
Affiliation(s)
- Lisa E Creary
- Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, USA; Histocompatibility and Immunogenetics Laboratory, Stanford Blood Center, Palo Alto CA, USA.
| | - Nicoletta Sacchi
- Italian Bone Marrow Donor Registry Tissue Typing Laboratory, E.O. Ospedali Galliera, Genova, Italy
| | - Michela Mazzocco
- Italian Bone Marrow Donor Registry Tissue Typing Laboratory, E.O. Ospedali Galliera, Genova, Italy
| | - Gerald P Morris
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Gonzalo Montero-Martin
- Histocompatibility and Immunogenetics Laboratory, Stanford Blood Center, Palo Alto CA, USA
| | - Winnie Chong
- Histocompatibility and Immunogenetics Service Development Laboratory, NHS Blood and Transplant, London, UK
| | - Colin J Brown
- Department of Histocompatibility and Immunogenetics, NHS Blood and Transplant, London, UK; Faculty of Life Sciences and Medicine, King's College London, University of London, England, UK
| | - Amalia Dinou
- Biomedical Research Foundation Academy of Athens, Hellenic Cord Blood Bank, Athens, Greece
| | | | - Clara Gorodezky
- Laboratory of Immunology and Immunogenetics, Fundación Comparte Vida, A.C. Mexico City, Mexico
| | | | | | - Rasmi Thomas
- US Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, USA
| | | | - Jennifer Pepperall
- Welsh Transplant and Immunogenetics Laboratory, Welsh Blood Service, Pontyclun, United Kingdom
| | - Gehad E ElGhazali
- Sheikh Khalifa Medical City-Union 71, Abu Dhabi and the Department of Immunology, College of Medicine and Health Sciences, UAE University, Al Ain, United Arab Emirates
| | - Zain Al Yafei
- Sheikh Khalifa Medical City-Union 71, Abu Dhabi and the Department of Immunology, College of Medicine and Health Sciences, UAE University, Al Ain, United Arab Emirates
| | - Medhat Askar
- Department of Pathology and Laboratory Medicine, Baylor University Medical center, Dallas, USA
| | - Shweta Tyagi
- Department of Transplant Immunology and Immunogenetics, All India Institute of Medical Sciences, New Delhi, India
| | - Uma Kanga
- Department of Transplant Immunology and Immunogenetics, All India Institute of Medical Sciences, New Delhi, India
| | - Susana R Marino
- Department of Pathology, The University of Chicago Medicine, Chicago, IL, USA
| | - Dolores Planelles
- Histocompatibility, Centro de Transfusión de la Comunidad Valenciana, Valencia, Spain; Grupo Español de Trabajo en Histocompatibilidad e Inmunología del Trasplante (GETHIT), Spanish Society for Immunology, Madrid, Spain
| | | | - Marcelo A Fernández-Viña
- Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, USA; Histocompatibility and Immunogenetics Laboratory, Stanford Blood Center, Palo Alto CA, USA.
| |
Collapse
|
34
|
Kwong AM, Blackwell TW, LeFaive J, de Andrade M, Barnard J, Barnes KC, Blangero J, Boerwinkle E, Burchard EG, Cade BE, Chasman DI, Chen H, Conomos MP, Cupples LA, Ellinor PT, Eng C, Gao Y, Guo X, Irvin MR, Kelly TN, Kim W, Kooperberg C, Lubitz SA, Mak ACY, Manichaikul AW, Mathias RA, Montasser ME, Montgomery CG, Musani S, Palmer ND, Peloso GM, Qiao D, Reiner AP, Roden DM, Shoemaker MB, Smith JA, Smith NL, Su JL, Tiwari HK, Weeks DE, Weiss ST, Scott LJ, Smith AV, Abecasis GR, Boehnke M, Kang HM. Robust, flexible, and scalable tests for Hardy-Weinberg equilibrium across diverse ancestries. Genetics 2021; 218:iyab044. [PMID: 33720349 PMCID: PMC8128395 DOI: 10.1093/genetics/iyab044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/03/2021] [Indexed: 11/13/2022] Open
Abstract
Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.
Collapse
Affiliation(s)
- Alan M Kwong
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Thomas W Blackwell
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jonathon LeFaive
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - John Barnard
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA
| | - Kathleen C Barnes
- Department of Medicine, Anschultz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - John Blangero
- Department of Human Genetics, South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Eric Boerwinkle
- Department of Epidemiology, Human Genetics Center, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Brian E Cade
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women’s Hospital, Boston, MA 02215, USA
| | - Han Chen
- Department of Epidemiology, Human Genetics Center, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
- Framingham Heart Study, Framingham, MA 01702, USA
| | - Patrick T Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA 02124, USA
| | - Celeste Eng
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Yan Gao
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216 USA
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Marguerite Ryan Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University, New Orleans, LA 70112, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | - Steven A Lubitz
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA 02124, USA
| | - Angel C Y Mak
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Ani W Manichaikul
- Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Rasika A Mathias
- GeneSTAR Research Program and Division of Allergy and Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Courtney G Montgomery
- Sarcoidosis Research Unit, Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA
| | - Solomon Musani
- Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | - Dan M Roden
- Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - M Benjamin Shoemaker
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA 98101, USA
- Department of Veterans Affairs, Seattle Epidemiologic Research and Information Center, Office of Research and Development, Seattle, WA 98108, USA
| | - Jessica Lasky Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Daniel E Weeks
- Departments of Human Genetics and Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Laura J Scott
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Albert V Smith
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gonçalo R Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hyun Min Kang
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
35
|
Korunes KL, Samuk K. pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol Ecol Resour 2021; 21:1359-1368. [PMID: 33453139 PMCID: PMC8044049 DOI: 10.1111/1755-0998.13326] [Citation(s) in RCA: 122] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/12/2020] [Accepted: 01/05/2021] [Indexed: 11/28/2022]
Abstract
Population genetic analyses often use summary statistics to describe patterns of genetic variation and provide insight into evolutionary processes. Among the most fundamental of these summary statistics are π and dXY , which are used to describe genetic diversity within and between populations, respectively. Here, we address a widespread issue in π and dXY calculation: systematic bias generated by missing data of various types. Many popular methods for calculating π and dXY operate on data encoded in the variant call format (VCF), which condenses genetic data by omitting invariant sites. When calculating π and dXY using a VCF, it is often implicitly assumed that missing genotypes (including those at sites not represented in the VCF) are homozygous for the reference allele. Here, we show how this assumption can result in substantial downward bias in estimates of π and dXY that is directly proportional to the amount of missing data. We discuss the pervasive nature and importance of this problem in population genetics, and introduce a user-friendly UNIX command line utility, pixy, that solves this problem via an algorithm that generates unbiased estimates of π and dXY in the face of missing data. We compare pixy to existing methods using both simulated and empirical data, and show that pixy alone produces unbiased estimates of π and dXY regardless of the form or amount of missing data. In summary, our software solves a long-standing problem in applied population genetics and highlights the importance of properly accounting for missing data in population genetic analyses.
Collapse
Affiliation(s)
| | - Kieran Samuk
- Department of Biology, Duke University, Durham, NC, USA
| |
Collapse
|
36
|
Marca-Ysabel MV, Rajabli F, Cornejo-Olivas M, Whitehead PG, Hofmann NK, Illanes Manrique MZ, Veliz Otani DM, Milla Neyra AK, Castro Suarez S, Meza Vega M, Adams LD, Mena PR, Rosario I, Cuccaro ML, Vance JM, Beecham GW, Custodio N, Montesinos R, Mazzetti Soler PE, Pericak-Vance MA. Dissecting the role of Amerindian genetic ancestry and the ApoE ε4 allele on Alzheimer disease in an admixed Peruvian population. Neurobiol Aging 2021; 101:298.e11-298.e15. [PMID: 33541779 PMCID: PMC8122013 DOI: 10.1016/j.neurobiolaging.2020.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/29/2020] [Accepted: 10/01/2020] [Indexed: 01/21/2023]
Abstract
Alzheimer disease (AD) is the leading cause of dementia in the elderly and occurs in all ethnic and racial groups. The apolipoprotein E (ApoE) ε4 is the most significant genetic risk factor for late-onset AD and shows the strongest effect among East Asian populations followed by non-Hispanic white populations and has a relatively lower effect in African descent populations. Admixture analysis in the African American and Puerto Rican populations showed that the variation in ε4 risk is correlated with the genetic ancestral background local to the ApoE gene. Native American populations are substantially underrepresented in AD genetic studies. The Peruvian population with up to ~80 of Amerindian (AI) ancestry provides a unique opportunity to assess the role of AI ancestry in AD. In this study, we assess the effect of the ApoE ε4 allele on AD in the Peruvian population. A total of 79 AD cases and 128 unrelated cognitive healthy controls from Peruvian population were included in the study. Genome-wide genotyping was performed using the Illumina Global screening array v2.0. Global ancestry and local ancestry analyses were assessed. The effect of the ApoE ε4 allele on AD was tested using a logistic regression model by adjusting for age, gender, and population substructure (first 3 principal components). Results showed that the genetic ancestry surrounding the ApoE gene is predominantly AI (60.6%) and the ε4 allele is significantly associated with increased risk of AD in the Peruvian population (odds ratio = 5.02, confidence interval: 2.3-12.5, p-value = 2e-4). Our results showed that the risk for AD from ApoE ε4 in Peruvians is higher than we have observed in non-Hispanic white populations. Given the high admixture of AI ancestry in the Peruvian population, it suggests that the AI genetic ancestry local to the ApoE gene is contributing to a strong risk for AD in ε4 carriers. Our data also support the findings of an interaction between the genetic risk allele ApoE ε4 and the ancestral backgrounds located around the genomic region of ApoE gene.
Collapse
Affiliation(s)
| | - Farid Rajabli
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Mario Cornejo-Olivas
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurológicas, Lima, Peru; Center for Global Health, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Patrice G Whitehead
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Natalia K Hofmann
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | - Diego Martin Veliz Otani
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurológicas, Lima, Peru; Fogarty Northern Pacific Global Health Fellows Program, Lima, Peru; Fogarty Interdisciplinary Cerebrovascular Diseases Training Program in South America, Lima, Peru
| | | | - Sheila Castro Suarez
- CBI en Demencias y Enfermedades Desmielinizantes del Sistema Nervioso, Instituto Nacional de Ciencias Neurológicas, Lima, Peru; Atlantic Fellow of Global Brain Health Institute, San Francisco, CA, USA
| | - Maria Meza Vega
- CBI en Demencias y Enfermedades Desmielinizantes del Sistema Nervioso, Instituto Nacional de Ciencias Neurológicas, Lima, Peru; School of Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Larry D Adams
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Pedro R Mena
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Isasi Rosario
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Michael L Cuccaro
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Jeffery M Vance
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Gary W Beecham
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | | | - Pilar E Mazzetti Soler
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurológicas, Lima, Peru; School of Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Margaret A Pericak-Vance
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA; Dr. John Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
| |
Collapse
|
37
|
Hellenthal G, Bird N, Morris S. Structure and ancestry patterns of Ethiopians in genome-wide autosomal DNA. Hum Mol Genet 2021; 30:R42-R48. [PMID: 33547782 PMCID: PMC8242491 DOI: 10.1093/hmg/ddab019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 12/28/2020] [Accepted: 01/06/2021] [Indexed: 11/14/2022] Open
Abstract
We review some of the current insights derived from the analyses of new large-scale, genome-wide autosomal variation data studies incorporating Ethiopians. Consistent with their substantial degree of cultural and linguistic diversity, genetic diversity among Ethiopians is higher than that seen across much larger geographic regions worldwide. This genetic variation is associated in part with ethnic identity, geography and linguistic classification. Numerous and varied admixture events have been inferred in Ethiopian groups, for example, involving sources related to present-day groups in West Eurasia and North Africa, with inferred dates spanning a few hundred to more than 4500 years ago. These disparate inferred ancestry patterns are correlated in part with groups' broad linguistic classifications, though with some notable exceptions. While deciphering these complex genetic signals remains challenging with available data, these studies and other projects focused on resolving competing hypotheses on the origins of specific ethnolinguistic groups demonstrate how genetic analyses can complement findings from anthropological and linguistic studies on Ethiopians.
Collapse
Affiliation(s)
- Garrett Hellenthal
- Department of Genetics, Evolution and Environment, University College London Genetics Institute (UGI), University College London, London, WC1E 6BT, UK
| | - Nancy Bird
- Department of Genetics, Evolution and Environment, University College London Genetics Institute (UGI), University College London, London, WC1E 6BT, UK
| | - Sam Morris
- Department of Genetics, Evolution and Environment, University College London Genetics Institute (UGI), University College London, London, WC1E 6BT, UK
| |
Collapse
|
38
|
Abstract
Africa is the continent with the greatest genetic diversity among humans and the level of diversity is further enhanced by incorporating non-majority groups, which are often understudied. Many of today's minority populations historically practiced foraging lifestyles, which were the only subsistence strategies prior to the rise of agriculture and pastoralism, but only a few groups practicing these strategies remain today. Genomic investigations of Holocene human remains excavated across the African continent show that the genetic landscape was vastly different compared to today's genetic landscape and that many groups that today are population isolate inhabited larger regions in the past. It is becoming clear that there are periods of isolation among groups and geographic areas, but also genetic contact over large distances throughout human history in Africa. Genomic information from minority populations and from prehistoric remains provide an invaluable source of information on the human past, in particular deep human population history, as Holocene large-scale population movements obscure past patterns of population structure. Here we revisit questions on the nature and time of the radiation of early humans in Africa, the extent of gene-flow among human populations as well as introgression from archaic and extinct lineages on the continent.
Collapse
Affiliation(s)
- Nina Hollfelder
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Gwenna Breton
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Per Sjödin
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Mattias Jakobsson
- Human Evolution, Department of Organismal Biology, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
- Palaeo-Research Institute, University of Johannesburg, Physical, Cnr Kingsway & University Roads, Auckland Park, Johannesburg 2092, South Africa
- SciLifeLab, Stockholm and Uppsala, Entrance C11, BMC, Husargatan 3, 752 37 Uppsala, Sweden
| |
Collapse
|
39
|
|
40
|
Si Y, Vanderwerff B, Zöllner S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 2021; 217:iyab011. [PMID: 33686438 PMCID: PMC8049559 DOI: 10.1093/genetics/iyab011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 12/15/2020] [Indexed: 01/13/2023] Open
Abstract
Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.
Collapse
Affiliation(s)
- Yichen Si
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Brett Vanderwerff
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Sebastian Zöllner
- Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan,1420 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
41
|
Yun SA, Kim SC. Genetic diversity and structure of Saussurea polylepis (Asteraceae) on continental islands of Korea: Implications for conservation strategies and management. PLoS One 2021; 16:e0249752. [PMID: 33831066 PMCID: PMC8031399 DOI: 10.1371/journal.pone.0249752] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 03/24/2021] [Indexed: 11/24/2022] Open
Abstract
Saussurea polylepis Nakai is an herbaceous perennial endemic to Korea and is highly restricted to several continental islands in the southwestern part of the Korean Peninsula. Given its very narrow geographical distribution, it is more vulnerable to anthropogenic activities and global climate changes than more widely distributed species. Despite the need for comprehensive genetic information for conservation and management, no such population genetic studies of S. polylepis have been conducted. In this study, genetic diversity and population structure were evaluated for 97 individuals from 5 populations (Gwanmaedo, Gageodo, Hongdo, Heusando, and Uido) using 19 polymorphic microsatellites. The populations were separated by a distance of 20–90 km. We found moderate levels of genetic diversity in S. polylepis (Ho = 0.42, He = 0.43). This may be due to long lifespans, outcrossing, and gene flow, despite its narrow range. High levels of gene flow (Nm = 1.76, mean Fst = 0.09), especially from wind-dispersed seeds, would contribute to low levels of genetic differentiation among populations. However, the small population size and reduced number of individuals in the reproductive phase of S. polylepis can be a major threat leading to inbreeding depression and genetic diversity loss. Bayesian cluster analysis revealed three significant structures at K = 3, consistent with DAPC and UPGMA. It is thought that sea level rise after the last glacial maximum may have acted as a geographical barrier, limiting the gene flow that would lead to distinct population structures. We proposed the Heuksando population, which is the largest island inhabited by S. polylepis, as a source population because of its large population size and high genetic diversity. Four management units (Gwanmaedo, Gageodo, Hongdo-Heuksando, and Uido) were suggested for conservation considering population size, genetic diversity, population structure, unique alleles, and geographical location (e.g., proximity).
Collapse
Affiliation(s)
- Seon A. Yun
- Department of Biological Sciences, Sungkyunkwan University, Suwon, Gyeonggi-do, Korea
| | - Seung-Chul Kim
- Department of Biological Sciences, Sungkyunkwan University, Suwon, Gyeonggi-do, Korea
- * E-mail: ,
| |
Collapse
|
42
|
Martin AR, Atkinson EG, Chapman SB, Stevenson A, Stroud RE, Abebe T, Akena D, Alemayehu M, Ashaba FK, Atwoli L, Bowers T, Chibnik LB, Daly MJ, DeSmet T, Dodge S, Fekadu A, Ferriera S, Gelaye B, Gichuru S, Injera WE, James R, Kariuki SM, Kigen G, Koenen KC, Kwobah E, Kyebuzibwa J, Majara L, Musinguzi H, Mwema RM, Neale BM, Newman CP, Newton CRJC, Pickrell JK, Ramesar R, Shiferaw W, Stein DJ, Teferra S, van der Merwe C, Zingela Z. Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations. Am J Hum Genet 2021; 108:656-668. [PMID: 33770507 PMCID: PMC8059370 DOI: 10.1016/j.ajhg.2021.03.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 03/05/2021] [Indexed: 12/21/2022] Open
Abstract
Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.
Collapse
Affiliation(s)
- Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| | - Elizabeth G Atkinson
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Sinéad B Chapman
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Anne Stevenson
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Rocky E Stroud
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Tamrat Abebe
- Department of Microbiology, Immunology, and Parasitology, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Dickens Akena
- Department of Psychiatry, School of Medicine, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Melkam Alemayehu
- Department of Psychiatry, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Fred K Ashaba
- Department of Immunology & Molecular Biology, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Lukoye Atwoli
- Department of Mental Health, School of Medicine, Moi University College of Health Sciences, Eldoret, Kenya
| | - Tera Bowers
- Broad Genomics, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | - Lori B Chibnik
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland, Helsinki 00014, Finland
| | - Timothy DeSmet
- Broad Genomics, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | - Sheila Dodge
- Broad Genomics, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | - Abebaw Fekadu
- Department of Psychiatry, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia; Centre for Innovative Drug Development & Therapeutic Trials for Africa, Addis Ababa University, Addis Ababa, Ethiopia
| | - Steven Ferriera
- Broad Genomics, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | - Bizu Gelaye
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Stella Gichuru
- Department of Mental Health, Moi Teaching and Referral Hospital, Eldoret, Kenya
| | - Wilfred E Injera
- Department of Immunology, School of Medicine, Moi University College of Health Sciences, Eldoret, Kenya
| | - Roxanne James
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Symon M Kariuki
- Neurosciences Unit, Clinical Department, KEMRI-Wellcome Trust Research Programme-Coast, Kilifi, Kenya; Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK
| | - Gabriel Kigen
- Department of Pharmacology and Toxicology, School of Medicine, Moi University College of Health Sciences, Eldoret, Kenya
| | - Karestan C Koenen
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Edith Kwobah
- Department of Mental Health, Moi Teaching and Referral Hospital, Eldoret, Kenya
| | - Joseph Kyebuzibwa
- Department of Psychiatry, School of Medicine, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Lerato Majara
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa; SA MRC Human Genetics Research Unit, Division of Human Genetics, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Observatory 7925, South Africa
| | - Henry Musinguzi
- Department of Immunology & Molecular Biology, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Rehema M Mwema
- Neurosciences Unit, Clinical Department, KEMRI-Wellcome Trust Research Programme-Coast, Kilifi, Kenya
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Carter P Newman
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Charles R J C Newton
- Neurosciences Unit, Clinical Department, KEMRI-Wellcome Trust Research Programme-Coast, Kilifi, Kenya; Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK
| | | | - Raj Ramesar
- SA MRC Genomic and Precision Medicine Research Unit, Division of Human Genetics, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Welelta Shiferaw
- Department of Microbiology, Immunology, and Parasitology, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Dan J Stein
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa; SA MRC Unit on Risk & Resilience in Mental Disorders, University of Cape Town and Neuroscience Institute, Cape Town, South Africa
| | - Solomon Teferra
- Department of Psychiatry, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Celia van der Merwe
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Zukiswa Zingela
- Department of Psychiatry and Human Behavioral Sciences, Walter Sisulu University, Mthatha, South Africa
| |
Collapse
|
43
|
Araghi S, Nguyen T. A Hybrid Supervised Approach to Human Population Identification Using Genomics Data. IEEE/ACM Trans Comput Biol Bioinform 2021; 18:443-454. [PMID: 31150342 DOI: 10.1109/tcbb.2019.2919501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are one type of genetic variations and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research demonstrated that SNPs can be used to identify the correct source population of an individual. In addition, variations in the DNA sequences have an influence on human diseases. In this regard, SNPs studies are helpful for personalized medicine and treatment. In the literature, unsupervised clustering methods especially principal component analysis (PCA) have been popular for studying population structure. In this study, we investigate supervised approaches, particularly the LASSO multinomial regression classification method, for recognizing individuals' origin genetic population. Then, we introduce PCA-LASSO as an extension of LASSO method that benefits from advantageous characteristics of both PCA and LASSO regression. The experimental results obtained on the 1,000 genome project dataset show PCA-LASSO's significantly high accuracy in prediction of individual's origin population.
Collapse
|
44
|
Valli AT, Koumandou VL, Iatrou G, Andreou M, Papasotiropoulos V, Trigas P. Conservation biology of threatened Mediterranean chasmophytes: The case of Asperula naufraga endemic to Zakynthos island (Ionian islands, Greece). PLoS One 2021; 16:e0246706. [PMID: 33606745 PMCID: PMC7894959 DOI: 10.1371/journal.pone.0246706] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/22/2021] [Indexed: 11/19/2022] Open
Abstract
Asperula naufraga is a rare and threatened obligate chasmophyte, endemic to Zakynthos island (Ionian islands, Greece). In this study, we provide a combined approach (including monitoring of demographic and reproductive parameters and study of genetic diversity) to assess the current conservation status of the species and to estimate its future extinction risk. The five subpopulations of A. naufraga were monitored for five years (2014-2018). Population size markedly fluctuated between 68-130 mature individuals during the monitoring period. The extent of occurrence (EOO) was estimated at 28.7 km2 and the area of occupancy (AOO) was 8 km2. Stage-structure recordings were similar for all subpopulations, characterized by high proportions of adult and senescent individuals, following a common pattern, which has been observed in other cliff-dwelling plants. Preliminary genetic analysis with SSRs markers revealed low heterozygosity within subpopulations and significant departure from H-W equilibrium, which combined with small population size suggest increased threat of genetic diversity loss. Our results indicate that the species should be placed in the Critically Endangered (CR) IUCN threat category, while according to Population Viability Analysis results its extinction risk increases to 47.8% in the next 50 years. The small population size combined with large fluctuations in its size, low recruitment and low genetic diversity, indicate the need of undertaking effective in situ and ex situ conservation measures.
Collapse
Affiliation(s)
- Anna-Thalassini Valli
- Laboratory of Systematic Botany, Department of Crop Science, School of Plant Sciences, Agricultural University of Athens, Athens, Greece
| | - Vassiliki Lila Koumandou
- Genetics Laboratory, Department of Biotechnology, School of Applied Biology & Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Gregoris Iatrou
- Division of Plant Biology, Laboratory of Botany, Department of Biology, University of Patras, Patras, Greece
| | - Marios Andreou
- Nature Conservation Unit, Frederick University, Nicosia, Cyprus
| | | | - Panayiotis Trigas
- Laboratory of Systematic Botany, Department of Crop Science, School of Plant Sciences, Agricultural University of Athens, Athens, Greece
| |
Collapse
|
45
|
Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, Kim SS, Luo Y, Amariuta T, Huang H, Okada Y, Raychaudhuri S, Sunyaev SR, Price AL. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 2021; 12:1098. [PMID: 33597505 PMCID: PMC7889654 DOI: 10.1038/s41467-021-21286-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/15/2021] [Indexed: 01/31/2023] Open
Abstract
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Collapse
Affiliation(s)
- Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Evan M Koch
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine M Siewert
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yang Luo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tiffany Amariuta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
46
|
De AK, Sawhney S, Bhattacharya D, Sujatha T, Sunder J, Ponraj P, Ravi SK, Mondal S, Malakar D, Kundu A. Origin, genetic diversity and evolution of Andaman local duck, a native duck germplasm of an insular region of India. PLoS One 2021; 16:e0245138. [PMID: 33561119 PMCID: PMC7872295 DOI: 10.1371/journal.pone.0245138] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 12/23/2020] [Indexed: 11/29/2022] Open
Abstract
Domestic ducks are of paramount importance as a cheap source of protein in rural India. Andaman local duck (ALD) is an indigenous avian genetic resource of Andaman and Nicobar islands (ANI) and is mainly distributed in Middle and Northern parts of these islands. Negligence has brought this breed on the edge of extinction necessitating immediate conservation efforts. Here, we report the genetic diversity, population structure and matrilineal genetic root of ALD. Partial mtDNA D-loop sequences were analyzed in 71 ALD samples and analysis revealed 19 polymorphic sites and 13 haplotypes. Estimated haplotype (Hd ± SD) and nucleotide diversity (π ± SD) were 0.881 ± 0.017 and 0.00897 ± 0.00078 respectively. The high genetic diversity of ALD indicates introgression of genetic material from other local duck breeds. In addition, it can be postulated that ALD bearing high genetic diversity has strong ability to adapt to environmental changes and can withstand impending climate change. Phylogenetic and network analysis indicate that ALD falls under Eurasian clade of mallard and ALD forms three clusters; one cluster is phylogenetically close to Southeast Asian countries, one close to Southern part of mainland India and the third one forms an independent cluster. Therefore, ALD might have migrated either from Southeast Asian countries which enjoy a close cultural bondage with ANI from time immemorial or from Southern part of India. The independent cluster may have evolved locally in these islands and natural selection pressure imposed by environmental conditions might be the driving force for evaluation of these duck haplotypes; which mimics Darwin’s theory of natural selection. The results of the study will be beneficial for formulating future breeding programme and conservation strategy towards sustainable development of the duck breed.
Collapse
Affiliation(s)
- Arun Kumar De
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
- * E-mail:
| | - Sneha Sawhney
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - Debasis Bhattacharya
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - T. Sujatha
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - Jai Sunder
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - Perumal Ponraj
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - S. K. Ravi
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| | - Samiran Mondal
- Department of Veterinary Pathology, West Bengal University of Animal and Fishery Sciences, Kolkata, West Bengal, India
| | - Dhruba Malakar
- Animal Biotechnology Centre, National Dairy Research Institute, Karnal, Haryana, India
| | - A. Kundu
- Animal Science Division, ICAR-Central Island Agricultural Research Institute, Port Blair, Andaman and Nicobar Islands, India
| |
Collapse
|
47
|
Sun J, Li YX, Ma PC, Yan S, Cheng HZ, Fan ZQ, Deng XH, Ru K, Wang CC, Chen G, Wei LH. Shared paternal ancestry of Han, Tai-Kadai-speaking, and Austronesian-speaking populations as revealed by the high resolution phylogeny of O1a-M119 and distribution of its sub-lineages within China. Am J Phys Anthropol 2021; 174:686-700. [PMID: 33555039 DOI: 10.1002/ajpa.24240] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 12/27/2022]
Abstract
OBJECTIVES The aim of this research was to explore the origin, diversification, and demographic history of O1a-M119 over the past 10,000 years, as well as its role during the formation of East Asian and Southeast Asian populations, particularly the Han, Tai-Kadai-speaking, and Austronesian-speaking populations. MATERIALS AND METHODS Y-chromosome sequences (n = 141) of the O1a-M119 lineage, including 17 newly generated in this study, were used to reconstruct a revised phylogenetic tree with age estimates, and identify sub-lineages. The geographic distribution of 12 O1a-M119 sub-lineages was summarized, based on 7325 O1a-M119 individuals identified among 60,009 Chinese males. RESULTS A revised phylogenetic tree, age estimation, and distribution maps indicated continuous expansion of haplogroup O1a-M119 over the past 10,000 years, and differences in demographic history across geographic regions. We propose several sub-lineages of O1a-M119 as founding paternal lineages of Han, Tai-Kadai-speaking, and Austronesian-speaking populations. The sharing of several young O1a-M119 sub-lineages with expansion times less than 6000 years between these three population groups supports a partial common ancestry for them in the Neolithic Age; however, the paternal genetic divergence pattern is much more complex than previous hypotheses based on ethnology, archeology, and linguistics. DISCUSSION Our analyses contribute to a better understanding of the demographic history of O1a-M119 sub-lineages over the past 10,000 years during the emergence of Han, Austronesians, Tai-Kadai-speaking populations. The data described in this study will assist in understanding of the history of Han, Tai-Kadai-speaking, and Austronesian-speaking populations from ethnology, archeology, and linguistic perspectives in the future.
Collapse
Affiliation(s)
- Jin Sun
- Xingyi Normal University for Nationalities, Xingyi, China
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
| | - Ying-Xiang Li
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
| | - Peng-Cheng Ma
- School of Life Sciences, Jilin University, Changchun, China
| | - Shi Yan
- School of Ethnology and Sociology, Minzu University of China, Beijing, China
| | - Hui-Zhen Cheng
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
| | - Zhi-Quan Fan
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
| | - Xiao-Hua Deng
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
- Center for collation and studies of Fujian local literature, Fujian University of Technology, Fuzhou, China
| | - Kai Ru
- Enlighten Co., Ltd., Shanghai, China
| | - Chuan-Chao Wang
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
| | - Gang Chen
- Hunan Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lan-Hai Wei
- Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology, Xiamen University, Xiamen, China
- B&R International Joint Laboratory for Eurasian Anthropology, Fudan University, Shanghai, China
| |
Collapse
|
48
|
Garud NR, Messer PW, Petrov DA. Detection of hard and soft selective sweeps from Drosophila melanogaster population genomic data. PLoS Genet 2021; 17:e1009373. [PMID: 33635910 PMCID: PMC7946363 DOI: 10.1371/journal.pgen.1009373] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 03/10/2021] [Accepted: 01/17/2021] [Indexed: 12/12/2022] Open
Abstract
Whether hard sweeps or soft sweeps dominate adaptation has been a matter of much debate. Recently, we developed haplotype homozygosity statistics that (i) can detect both hard and soft sweeps with similar power and (ii) can classify the detected sweeps as hard or soft. The application of our method to population genomic data from a natural population of Drosophila melanogaster (DGRP) allowed us to rediscover three known cases of adaptation at the loci Ace, Cyp6g1, and CHKov1 known to be driven by soft sweeps, and detected additional candidate loci for recent and strong sweeps. Surprisingly, all of the top 50 candidates showed patterns much more consistent with soft rather than hard sweeps. Recently, Harris et al. 2018 criticized this work, suggesting that all the candidate loci detected by our haplotype statistics, including the positive controls, are unlikely to be sweeps at all and that instead these haplotype patterns can be more easily explained by complex neutral demographic models. They also claim that these neutral non-sweeps are likely to be hard instead of soft sweeps. Here, we reanalyze the DGRP data using a range of complex admixture demographic models and reconfirm our original published results suggesting that the majority of recent and strong sweeps in D. melanogaster are first likely to be true sweeps, and second, that they do appear to be soft. Furthermore, we discuss ways to take this work forward given that most demographic models employed in such analyses are necessarily too simple to capture the full demographic complexity, while more realistic models are unlikely to be inferred correctly because they require a large number of free parameters.
Collapse
Affiliation(s)
- Nandita R. Garud
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
| | - Philipp W. Messer
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Dmitri A. Petrov
- Department of Biology, Stanford University, Stanford, California, United States of America
| |
Collapse
|
49
|
Nsibo DL, Barnes I, Omondi DO, Dida MM, Berger DK. Population genetic structure and migration patterns of the maize pathogenic fungus, Cercospora zeina in East and Southern Africa. Fungal Genet Biol 2021; 149:103527. [PMID: 33524555 DOI: 10.1016/j.fgb.2021.103527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 12/13/2020] [Accepted: 12/20/2020] [Indexed: 11/16/2022]
Abstract
Cercospora zeina is a causal pathogen of gray leaf spot (GLS) disease of maize in Africa. This fungal pathogen exhibits a high genetic diversity in South Africa. However, little is known about the pathogen's population structure in the rest of Africa. In this study, we aimed to assess the diversity and gene flow of the pathogen between major maize producing countries in East and Southern Africa (Kenya, Uganda, Zambia, Zimbabwe, and South Africa). A total of 964 single-spore isolates were made from GLS lesions and confirmed as C.zeina using PCR diagnostics. The other causal agent of GLS, Cercospora zeae-maydis, was absent. Genotyping all the C.zeina isolates with 11 microsatellite markers and a mating-type gene diagnostic revealed (i) high genetic diversity with some population structure between the five African countries, (ii) cryptic sexual recombination, (iii) that South Africa and Kenya were the greatest donors of migrants, and (iv) that Zambia had a distinct population. We noted evidence of human-mediated long-distance dispersal, since four haplotypes from one South African site were also present at five sites in Kenya and Uganda. There was no evidence for a single-entry point of the pathogen into Africa. South Africa was the most probable origin of the populations in Kenya, Uganda, and Zimbabwe. Continuous annual maize production in the tropics (Kenya and Uganda) did not result in greater genetic diversity than a single maize season (Southern Africa). Our results will underpin future management of GLS in Africa through effective monitoring of virulent C.zeina strains.
Collapse
Affiliation(s)
- David L Nsibo
- Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, South Africa
| | - Irene Barnes
- Department of Biochemistry, Genetics and Microbiology, FABI, University of Pretoria, South Africa
| | | | | | - Dave K Berger
- Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, South Africa.
| |
Collapse
|
50
|
Ghazi MG, Sharma SP, Tuboi C, Angom S, Gurumayum T, Nigam P, Hussain SA. Population genetics and evolutionary history of the endangered Eld's deer (Rucervus eldii) with implications for planning species recovery. Sci Rep 2021; 11:2564. [PMID: 33510319 PMCID: PMC7844053 DOI: 10.1038/s41598-021-82183-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 01/18/2021] [Indexed: 01/30/2023] Open
Abstract
Eld's deer (Rucervus eldii) with three recognised subspecies (R. e. eldii, R. e. thamin, and R. e. siamensis) represents one of the most threatened cervids found in Southeast Asia. The species has experienced considerable range contractions and local extinctions owing to habitat loss and fragmentation, hunting, and illegal trade across its distribution range over the last century. Understanding the patterns of genetic variation is crucial for planning effective conservation strategies. This study investigated the phylogeography, divergence events and systematics of Eld's deer subspecies using the largest mtDNA dataset compiled to date. We also analysed the genetic structure and demographic history of R. e. eldii using 19 microsatellite markers. Our results showed that R. e. siamensis exhibits two divergent mtDNA lineages (mainland and Hainan Island), which diverged around 0.2 Mya (95% HPD 0.1-0.2), possibly driven by the fluctuating sea levels of the Early Holocene period. The divergence between R. e. eldii and R. e. siamensis occurred around 0.4 Mya (95% HPD 0.3-0.5), potentially associated with the adaptations to warm and humid climate with open grassland vegetation that predominated the region. Furthermore, R. e. eldii exhibits low levels of genetic diversity and small contemporary effective population size (median = 7, 4.7-10.8 at 95% CI) with widespread historical genetic bottlenecks which accentuates its vulnerability to inbreeding and extinction. Based on the observed significant evolutionary and systematic distance between Eld's deer and other species of the genus Rucervus, we propose to classify Eld's deer (Cervus eldii) in the genus Cervus, which is in congruent with previous phylogenetic studies. This study provides important conservation implications required to direct the ongoing population recovery programs and planning future conservation strategies.
Collapse
Affiliation(s)
| | - Surya Prasad Sharma
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India
| | - Chongpi Tuboi
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India
| | - Sangeeta Angom
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India
| | - Tennison Gurumayum
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India
| | - Parag Nigam
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India
| | - Syed Ainul Hussain
- Wildlife Institute of India, Chandrabani, Post Box #18, Dehra Dun, Uttarakhand, 248002, India.
| |
Collapse
|