1
|
Yuan M, Hoskens H, Goovaerts S, Herrick N, Shriver MD, Walsh S, Claes P. Hybrid autoencoder with orthogonal latent space for robust population structure inference. Sci Rep 2023; 13:2612. [PMID: 36788253 PMCID: PMC9929087 DOI: 10.1038/s41598-023-28759-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open
Abstract
Analysis of population structure and genomic ancestry remains an important topic in human genetics and bioinformatics. Commonly used methods require high-quality genotype data to ensure accurate inference. However, in practice, laboratory artifacts and outliers are often present in the data. Moreover, existing methods are typically affected by the presence of related individuals in the dataset. In this work, we propose a novel hybrid method, called SAE-IBS, which combines the strengths of traditional matrix decomposition-based (e.g., principal component analysis) and more recent neural network-based (e.g., autoencoders) solutions. Namely, it yields an orthogonal latent space enhancing dimensionality selection while learning non-linear transformations. The proposed approach achieves higher accuracy than existing methods for projecting poor quality target samples (genotyping errors and missing data) onto a reference ancestry space and generates a robust ancestry space in the presence of relatedness. We introduce a new approach and an accompanying open-source program for robust ancestry inference in the presence of missing data, genotyping errors, and relatedness. The obtained ancestry space allows for non-linear projections and exhibits orthogonality with clearly separable population groups.
Collapse
Affiliation(s)
- Meng Yuan
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium.
| | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Seppe Goovaerts
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Noah Herrick
- Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Mark D Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, USA
| | - Susan Walsh
- Department of Biology, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Peter Claes
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium.
- Murdoch Children's Research Institute, Melbourne, VIC, Australia.
| |
Collapse
|
2
|
Aboul-Naga AM, Alsamman AM, El Allali A, Elshafie MH, Abdelal ES, Abdelkhalek TM, Abdelsabour TH, Mohamed LG, Hamwieh A. Genome-wide analysis identified candidate variants and genes associated with heat stress adaptation in Egyptian sheep breeds. Front Genet 2022; 13:898522. [PMID: 36263427 PMCID: PMC9574253 DOI: 10.3389/fgene.2022.898522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Heat stress caused by climatic changes is one of the most significant stresses on livestock in hot and dry areas. It has particularly adverse effects on the ability of the breed to maintain homeothermy. Developing countries are advised to protect and prepare their animal resources in the face of potential threats such as climate change. The current study was conducted in Egypt’s three hot and dry agro-ecological zones. Three local sheep breeds (Saidi, Wahati, and Barki) were studied with a total of 206 ewes. The animals were exercised under natural heat stress. The heat tolerance index of the animals was calculated to identify animals with high and low heat tolerance based on their response to meteorological and physiological parameters. Genomic variation in these breeds was assessed using 64,756 single nucleotide polymorphic markers (SNPs). From the perspective of comparative adaptability to harsh conditions, our objective was to investigate the genomic structure that might control the adaptability of local sheep breeds to environmental stress under hot and dry conditions. In addition, indices of population structure and diversity of local breeds were examined. Measures of genetic diversity showed a significant influence of breed and location on populations. The standardized index of association (rbarD) ranged from 0.0012 (Dakhla) to 0.026 (Assuit), while for the breed, they ranged from 0.004 (Wahati) to 0.0103 (Saidi). The index of association analysis (Ia) ranged from 1.42 (Dakhla) to 35.88 (Assuit) by location and from 6.58 (Wahati) to 15.36 (Saidi) by breed. The most significant SNPs associated with heat tolerance were found in the MYO5A, PRKG1, GSTCD, and RTN1 genes (p ≤ 0.0001). MYO5A produces a protein widely distributed in the melanin-producing neural crest of the skin. Genetic association between genetic and phenotypic variations showed that OAR1_18300122.1, located in ST3GAL3, had the greatest positive effect on heat tolerance. Genome-wide association analysis identified SNPs associated with heat tolerance in the PLCB1, STEAP3, KSR2, UNC13C, PEBP4, and GPAT2 genes.
Collapse
Affiliation(s)
- Adel M. Aboul-Naga
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
- *Correspondence: Adel M. Aboul-Naga, ; Alsamman M. Alsamman,
| | - Alsamman M. Alsamman
- Agricultural Genetic Engineering Research Institute, Giza, Egypt
- *Correspondence: Adel M. Aboul-Naga, ; Alsamman M. Alsamman,
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Mohmed H. Elshafie
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Ehab S. Abdelal
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Tarek M. Abdelkhalek
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Taha H. Abdelsabour
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Layaly G. Mohamed
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Aladdin Hamwieh
- International Center For Agricultural Research in the Dry Areas (ICARDA), Giza, Egypt
| |
Collapse
|
3
|
Zhang R, Ni X, Yuan K, Pan Y, Xu S. MultiWaverX: modeling latent sex-biased admixture history. Brief Bioinform 2022; 23:6590437. [PMID: 35598333 DOI: 10.1093/bib/bbac179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
Sex-biased gene flow has been common in the demographic history of modern humans. However, the lack of sophisticated methods for delineating the detailed sex-biased admixture process prevents insights into complex admixture history and thus our understanding of the evolutionary mechanisms of genetic diversity. Here, we present a novel algorithm, MultiWaverX, for modeling complex admixture history with sex-biased gene flow. Systematic simulations showed that MultiWaverX is a powerful tool for modeling complex admixture history and inferring sex-biased gene flow. Application of MultiWaverX to empirical data of 17 typical admixed populations in America, Central Asia, and the Middle East revealed sex-biased admixture histories that were largely consistent with the historical records. Notably, fine-scale admixture process reconstruction enabled us to recognize latent sex-biased gene flow in certain populations that would likely be overlooked by much of the routine analysis with commonly used methods. An outstanding example in the real world is the Kazakh population that experienced complex admixture with sex-biased gene flow but in which the overall signature has been canceled due to biased gene flow from an opposite direction.
Collapse
Affiliation(s)
- Rui Zhang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing, 100044, China
| | - Kai Yuan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yuwen Pan
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Shuhua Xu
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai 200032, China.,State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China.,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, China.,Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou 450052, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
4
|
Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases. Int J Mol Sci 2021; 22:ijms22136962. [PMID: 34203440 PMCID: PMC8269095 DOI: 10.3390/ijms22136962] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 12/21/2022] Open
Abstract
Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications. In this review article, we summarize the most common methods of global and local genetic ancestry estimation and discuss the use of admixture mapping studies in human diseases.
Collapse
|
5
|
Owens GL, Todesco M, Bercovich N, Légaré JS, Mitchell N, Whitney KD, Rieseberg LH. Standing variation rather than recent adaptive introgression probably underlies differentiation of the texanus subspecies of Helianthus annuus. Mol Ecol 2021; 30:6229-6245. [PMID: 34080243 DOI: 10.1111/mec.16008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 05/17/2021] [Accepted: 05/26/2021] [Indexed: 12/24/2022]
Abstract
The origins of geographic races in wide-ranging species are poorly understood. In Texas, the texanus subspecies of Helianthus annuus has long been thought to have acquired its defining phenotypic traits via introgression from a local congener, H. debilis, but previous tests of this hypothesis were inconclusive. Here, we explore the origins of H. a. texanus using whole genome sequencing data from across the entire range of H. annuus and possible donor species, as well as phenotypic data from a common garden study. We found that although it is morphologically convergent with H. debilis, H. a. texanus has conflicting signals of introgression. Genome wide tests (Patterson's D and TreeMix) only found evidence of introgression from H. argophyllus (sister species to H. annuus and also sympatric), but not H. debilis, with the exception of one individual of 109 analysed. We further scanned the genome for localized signals of introgression using PCAdmix and found minimal but nonzero introgression from H. debilis and significant introgression from H. argophyllus in some populations. Given the paucity of introgression from H. debilis, we argue that the morphological convergence observed in Texas is probably from standing genetic variation. We also found that genomic differentiation in H. a. texanus is mostly driven by large segregating inversions, several of which have signatures of natural selection based on haplotype frequencies.
Collapse
Affiliation(s)
- Gregory L Owens
- Department of Biology, University of Victoria, Victoria, BC, Canada
| | - Marco Todesco
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Natalia Bercovich
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Jean-Sébastien Légaré
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Nora Mitchell
- Department of Biology, University of Wisconsin - Eau Claire, Eau Claire, WI, USA.,Department of Biology, University of New Mexico, Albuquerque, NM, USA
| | - Kenneth D Whitney
- Department of Biology, University of New Mexico, Albuquerque, NM, USA
| | - Loren H Rieseberg
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
6
|
Insight on the Genetics of Atrial Fibrillation in Puerto Rican Hispanics. Stroke Res Treat 2021; 2021:8819896. [PMID: 33505650 PMCID: PMC7810540 DOI: 10.1155/2021/8819896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 11/25/2020] [Indexed: 11/17/2022] Open
Abstract
Non-Hispanic whites present with higher atrial fibrillation (AF) prevalence than other racial minorities living in the mainland USA. In two hospital-based studies, Puerto Rican Hispanics had a lower prevalence of atrial fibrillation of 2.5% than non-Hispanic Whites with 5.7%. This data is particularly controversial because Hispanics possess a higher prevalence of traditional risk factors for developing AF yet have a lower AF prevalence. This phenomenon is known as the atrial fibrillation paradox. Despite recent advancements in understanding AF, its pathogenesis remains unclear. In this study, we compared a genetic dataset of Puerto Rican Hispanics to 111 SNP known to be associated with AF in a large European cohort and determine if they are associated with AF susceptibility in our cohort. To achieve this aim, we performed a secondary analysis of existing data using the following two studies: (1) The Pharmacogenetics of Warfarin in Puerto Ricans study and the (2) A Genomic Approach for Clopidogrel in Caribbean Hispanics, and assess for the presence of European SNPs associated with AF from the genome-wide association study of 1 million people identifies 111 loci for atrial fibrillation. We used data from 555 cardiovascular Puerto Rican Hispanic patients, consisting of 486 control and 69 cases. We found that the following SNPs showed significant association with AF in PHR: rs2834618, rs6462079, rs7508, rs2040862, and rs10458660. Some of these SNPs are proteins involved in lysosomal activities responsible for breaking ceramides to sphingosines and collagen deposition around atrial cardiomyocytes. Furthermore, we performed a machine learning analysis and determined that Native American admixture and heart failure were strongly predictive of AF in PHR. For the first time, this study provides some genetic insight into AF's mechanisms in a Puerto Rican Hispanic cohort.
Collapse
|
7
|
Leitwein M, Duranton M, Rougemont Q, Gagnaire PA, Bernatchez L. Using Haplotype Information for Conservation Genomics. Trends Ecol Evol 2020; 35:245-258. [DOI: 10.1016/j.tree.2019.10.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/18/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022]
|
8
|
Duranton M, Bonhomme F, Gagnaire P. The spatial scale of dispersal revealed by admixture tracts. Evol Appl 2019; 12:1743-1756. [PMID: 31548854 PMCID: PMC6752141 DOI: 10.1111/eva.12829] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 05/28/2019] [Indexed: 12/11/2022] Open
Abstract
Evaluating species dispersal across the landscape is essential to design appropriate management and conservation actions. However, technical difficulties often preclude direct measures of individual movement, while indirect genetic approaches rely on assumptions that sometimes limit their application. Here, we show that the temporal decay of admixture tracts lengths can be used to assess genetic connectivity within a population introgressed by foreign haplotypes. We present a proof-of-concept approach based on local ancestry inference in a high gene flow marine fish species, the European sea bass (Dicentrarchus labrax). Genetic admixture in the contact zone between Atlantic and Mediterranean sea bass lineages allows the introgression of Atlantic haplotype tracts within the Mediterranean Sea. Once introgressed, blocks of foreign ancestry are progressively eroded by recombination as they diffuse from the western to the eastern Mediterranean basin, providing a means to estimate dispersal. By comparing the length distributions of Atlantic tracts between two Mediterranean populations located at different distances from the contact zone, we estimated the average per-generation dispersal distance within the Mediterranean lineage to less than 50 km. Using simulations, we showed that this approach is robust to a range of demographic histories and sample sizes. Our results thus support that the length of admixture tracts can be used together with a recombination clock to estimate genetic connectivity in species for which the neutral migration-drift balance is not informative or simply does not exist.
Collapse
Affiliation(s)
- Maud Duranton
- ISEM, Univ Montpellier, CNRS, EPHE, IRDMontpellierFrance
| | | | | |
Collapse
|
9
|
Ni X, Yuan K, Liu C, Feng Q, Tian L, Ma Z, Xu S. MultiWaver 2.0: modeling discrete and continuous gene flow to reconstruct complex population admixtures. Eur J Hum Genet 2019; 27:133-139. [PMID: 30206356 PMCID: PMC6303267 DOI: 10.1038/s41431-018-0259-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2018] [Revised: 07/12/2018] [Accepted: 08/09/2018] [Indexed: 11/08/2022] Open
Abstract
Our goal in developing the MultiWaver software series was to be able to infer population admixture history under various complex scenarios. The earlier version of MultiWaver considered only discrete admixture models. Here, we report a newly developed version, MultiWaver 2.0, that implements a more flexible framework and is capable of inferring multiple-wave admixture histories under both discrete and continuous admixture models. MultiWaver 2.0 can automatically select an optimal admixture model based on the length distribution of ancestral tracks of chromosomes, and the program can estimate the corresponding parameters under the selected model. Specifically, for discrete admixture models, we used a likelihood ratio test (LRT) to determine the optimal discrete model and an expectation-maximization algorithm to estimate the parameters. In addition, according to the principles of the Bayesian Information Criterion (BIC), we compared the optimal discrete model with several continuous admixture models. In MultiWaver 2.0, we also applied a bootstrapping technique to provide levels of support for the chosen model and the confidence interval (CI) of the estimations of admixture time. Simulation studies validated the reliability and effectiveness of our method. Finally, the program performed well when applied to real datasets of typical admixed populations, such as African Americans, Uyghurs, and Hazaras.
Collapse
Affiliation(s)
- Xumin Ni
- Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, 100044, China
| | - Kai Yuan
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chang Liu
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qidi Feng
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lei Tian
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhiming Ma
- Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, 100044, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Shuhua Xu
- Chinese Academy of Sciences (CAS) Key Laboratory of Computational Biology, Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology (PICB), Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, CAS, Shanghai, 200031, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
- Collaborative Innovation Center of Genetics and Development, Shanghai, 200438, China.
| |
Collapse
|