1
|
Gladieux P, van Oosterhout C, Fairhead S, Jouet A, Ortiz D, Ravel S, Shrestha RK, Frouin J, He X, Zhu Y, Morel JB, Huang H, Kroj T, Jones JDG. Extensive immune receptor repertoire diversity in disease-resistant rice landraces. Curr Biol 2024; 34:3983-3995.e6. [PMID: 39146939 DOI: 10.1016/j.cub.2024.07.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 04/19/2024] [Accepted: 07/16/2024] [Indexed: 08/17/2024]
Abstract
Plants have powerful defense mechanisms and extensive immune receptor repertoires, yet crop monocultures are prone to epidemic diseases. Rice (Oryza sativa) is susceptible to many diseases, such as rice blast caused by Magnaporthe oryzae. Varietal resistance of rice to blast relies on intracellular nucleotide binding, leucine-rich repeat (NLR) receptors that recognize specific pathogen molecules and trigger immune responses. In the Yuanyang terraces in southwest China, rice landraces rarely show severe losses to disease whereas commercial inbred lines show pronounced field susceptibility. Here, we investigate within-landrace NLR sequence diversity of nine rice landraces and eleven modern varieties using complexity reduction techniques. We find that NLRs display high sequence diversity in landraces, consistent with balancing selection, and that balancing selection at NLRs is more pervasive in landraces than modern varieties. Notably, modern varieties lack many ancient NLR haplotypes that are retained in some landraces. Our study emphasizes the value of standing genetic variation that is maintained in farmer landraces as a resource to make modern crops and agroecosystems less prone to disease. The conservation of landraces is, therefore, crucial for ensuring food security in the face of dynamic biotic and abiotic threats.
Collapse
Affiliation(s)
- Pierre Gladieux
- Plant Health Institute Montpellier, University of Montpellier, INRAE, CIRAD, IRD, Institut Agro, 34398 Montpellier, France.
| | - Cock van Oosterhout
- School of Environmental Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Sebastian Fairhead
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Agathe Jouet
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Diana Ortiz
- Plant Health Institute Montpellier, University of Montpellier, INRAE, CIRAD, IRD, Institut Agro, 34398 Montpellier, France
| | - Sebastien Ravel
- Plant Health Institute Montpellier, University of Montpellier, INRAE, CIRAD, IRD, Institut Agro, 34398 Montpellier, France
| | - Ram-Krishna Shrestha
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Julien Frouin
- CIRAD, UMR AGAP Institut, 34398 Montpellier, France; UMR AGAP Institut, Université de Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France
| | - Xiahong He
- School of Landscape and Horticulture, Southwest Forestry University, Kunming 650233, China
| | - Youyong Zhu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China; Key Laboratory of Agro-Biodiversity and Pest Management of Education Ministry of China, Yunnan Agricultural University, Kunming 650201, China
| | - Jean-Benoit Morel
- Plant Health Institute Montpellier, University of Montpellier, INRAE, CIRAD, IRD, Institut Agro, 34398 Montpellier, France
| | - Huichuan Huang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China; Key Laboratory of Agro-Biodiversity and Pest Management of Education Ministry of China, Yunnan Agricultural University, Kunming 650201, China.
| | - Thomas Kroj
- Plant Health Institute Montpellier, University of Montpellier, INRAE, CIRAD, IRD, Institut Agro, 34398 Montpellier, France.
| | - Jonathan D G Jones
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK.
| |
Collapse
|
2
|
Banjoko AW, Ng'uni T, Naidoo N, Ramsuran V, Hyrien O, Ndhlovu ZM. High Resolution Class I HLA -A, -B, and - C Diversity in Eastern and Southern African Populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.04.611164. [PMID: 39282263 PMCID: PMC11398358 DOI: 10.1101/2024.09.04.611164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/21/2024]
Abstract
Africa remains significantly underrepresented in high-resolution Human Leukocyte Antigen (HLA) data, despite being one of the most genetically diverse regions in the world. This critical gap in genetic information poses a substantial barrier to HLA-based research on the continent. In this study, Class I HLA data from Eastern and Southern African populations were analysed to assess genetic diversity across the region. We examined allele and haplotype frequency distributions, deviations from Hardy-Weinberg Equilibrium (HWE), linkage disequilibrium (LD), and conducted neutrality tests of homozygosity across various populations. Additionally, the African HLA data were compared to those of Caucasian and African American populations using the Jaccard index and multidimensional scaling (MDS) methods. The study revealed that South African populations exhibited 50.4% more genetic diversity within the Class I HLA region compared to other African populations. Zambia showed an estimated 36.5% genetic diversity, with Kenya, Rwanda and Uganda showing 35.7%, 34.2%, and 31.1%, respectively. Furthermore, an analysis of in-country diversity among different tribes indicated an average Class I HLA diversity of 25.7% in Kenya, 17% in Rwanda, 2.8% in South Africa, 13.6% in Uganda, and 6.5% in Zambia. The study also highlighted the genetic distinctness of Caucasian and African American populations compared to African populations. Notably, the differential frequencies of disease-promoting and disease-preventing HLA alleles across these populations emphasize the urgent need to generate high-quality HLA data for all regions of Africa and its major ethnic groups. Such efforts will be crucial in enhancing healthcare outcomes across the continent.
Collapse
Affiliation(s)
- Alabi W Banjoko
- Africa Health Research Institute (AHRI), Nelson R. Mandela School of Medicine, Durban, South Africa
- Department of Statistics, University of Ilorin, Kwara state, Nigeria
| | - Tiza Ng'uni
- Africa Health Research Institute (AHRI), Nelson R. Mandela School of Medicine, Durban, South Africa
| | - Nitalia Naidoo
- Africa Health Research Institute (AHRI), Nelson R. Mandela School of Medicine, Durban, South Africa
| | - Veron Ramsuran
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Olivier Hyrien
- Fred Hutchinson Cancer Center, Vaccine and Infectious Disease Division, Vaccine and Immunology Statistical Centre, Seattle, USA
| | - Zaza M Ndhlovu
- Africa Health Research Institute (AHRI), Nelson R. Mandela School of Medicine, Durban, South Africa
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Cambridge, MA, United States
- School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
3
|
Mikula LC, Vogl C. The expected sample allele frequencies from populations of changing size via orthogonal polynomials. Theor Popul Biol 2024; 157:55-85. [PMID: 38552964 DOI: 10.1016/j.tpb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 03/24/2024] [Accepted: 03/26/2024] [Indexed: 04/11/2024]
Abstract
In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for stepwise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.
Collapse
Affiliation(s)
- Lynette Caitlin Mikula
- Centre for Biological Diversity, School of Biology, University of St. Andrews, St, Andrews KY16 9TH, UK.
| | - Claus Vogl
- Department of Biomedical Sciences and Pathobiology, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria; Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Veterinärplatz 1, A-1210 Wien, Austria.
| |
Collapse
|
4
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning. Mol Biol Evol 2024; 41:msae077. [PMID: 38636507 PMCID: PMC11082913 DOI: 10.1093/molbev/msae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 04/08/2024] [Accepted: 04/12/2024] [Indexed: 04/20/2024] Open
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Connie K Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Travis J Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
5
|
Salter JF, Brumfield RT, Faircloth BC. An island 'endemic' born out of hybridization between introduced lineages. Mol Ecol 2024; 33:e16990. [PMID: 37208829 DOI: 10.1111/mec.16990] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 04/06/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
Humans have profoundly impacted the distribution of plant and animal species over thousands of years. The most direct example of these effects is human-mediated movement of individuals, either through translocation of individuals within their range or through the introduction of species to new habitats. While human involvement may be suspected in species with obvious range disjunctions, it can be difficult to detect natural versus human-mediated dispersal events for populations at the edge of a species' range, and this uncertainty muddles how we understand the evolutionary history of populations and broad biogeographical patterns. Studies combining genetic data with archaeological, linguistic and historical evidence have confirmed prehistoric examples of human-mediated dispersal; however, it is unclear whether these methods can disentangle recent dispersal events, such as species translocated by European colonizers during the past 500 years. We use genomic DNA from historical museum specimens and historical records to evaluate three hypotheses regarding the timing and origin of Northern Bobwhites (Colinus virginianus) in Cuba, whose status as an endemic or introduced population has long been debated. We discovered that bobwhites from southern Mexico arrived in Cuba between the 12th and 16th centuries, followed by the subsequent introduction of bobwhites from the southeastern USA to Cuba between the 18th and 20th centuries. These dates suggest the introduction of bobwhites to Cuba was human-mediated and concomitant with Spanish colonial shipping routes between Veracruz, Mexico and Havana, Cuba during this period. Our results identify endemic Cuban bobwhites as a genetically distinct population born of hybridization between divergent, introduced lineages.
Collapse
Affiliation(s)
- Jessie F Salter
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Louisiana, Baton Rouge, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Louisiana, Baton Rouge, USA
| | - Brant C Faircloth
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Louisiana, Baton Rouge, USA
| |
Collapse
|
6
|
Tran LN, Sun CK, Struck TJ, Sajan M, Gutenkunst RN. Computationally efficient demographic history inference from allele frequencies with supervised machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.24.542158. [PMID: 38405827 PMCID: PMC10888863 DOI: 10.1101/2023.05.24.542158] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we developed donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future input data AFS. We demonstrated that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
Collapse
Affiliation(s)
- Linh N. Tran
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Connie K. Sun
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Travis J. Struck
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Mathews Sajan
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Ryan N. Gutenkunst
- Department of Molecular & Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
7
|
Lucas-Sánchez M, Abdeli A, Bekada A, Calafell F, Benhassine T, Comas D. The Impact of Recent Demography on Functional Genetic Variation in North African Human Groups. Mol Biol Evol 2024; 41:msad283. [PMID: 38152862 PMCID: PMC10783648 DOI: 10.1093/molbev/msad283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 11/22/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
The strategic location of North Africa has made the region the core of a wide range of human demographic events, including migrations, bottlenecks, and admixture processes. This has led to a complex and heterogeneous genetic and cultural landscape, which remains poorly studied compared to other world regions. Whole-exome sequencing is particularly relevant to determine the effects of these demographic events on current-day North Africans' genomes, since it allows to focus on those parts of the genome that are more likely to have direct biomedical consequences. Whole-exome sequencing can also be used to assess the effect of recent demography in functional genetic variation and the efficacy of natural selection, a long-lasting debate. In the present work, we use newly generated whole-exome sequencing and genome-wide array genotypes to investigate the effect of demography in functional variation in 7 North African populations, considering both cultural and demographic differences and with a special focus on Amazigh (plur. Imazighen) groups. We detect genetic differences among populations related to their degree of isolation and the presence of bottlenecks in their recent history. We find differences in the functional part of the genome that suggest a relaxation of purifying selection in the more isolated groups, allowing for an increase of putatively damaging variation. Our results also show a shift in mutational load coinciding with major demographic events in the region and reveal differences within and between cultural and geographic groups.
Collapse
Affiliation(s)
- Marcel Lucas-Sánchez
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Amine Abdeli
- Faculté des Sciences Biologiques, Laboratoire de Biologie Cellulaire et Moléculaire, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - Asmahan Bekada
- Département de Biotechnologie, Faculté des Sciences de la Nature et de la Vie, Université Oran 1 (Ahmad Ben Bella), Oran, Algeria
| | - Francesc Calafell
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Traki Benhassine
- Faculté des Sciences Biologiques, Laboratoire de Biologie Cellulaire et Moléculaire, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - David Comas
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
8
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
9
|
Liao K, Carlson J, Zöllner S. The effect of mutation subtypes on the allele frequency spectrum and population genetics inference. G3 (BETHESDA, MD.) 2023; 13:jkad035. [PMID: 36759699 PMCID: PMC10085755 DOI: 10.1093/g3journal/jkad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 02/11/2023]
Abstract
Population genetics has adapted as technological advances in next-generation sequencing have resulted in an exponential increase of genetic data. A common approach to efficiently analyze genetic variation present in large sequencing data is through the allele frequency spectrum, defined as the distribution of allele frequencies in a sample. While the frequency spectrum serves to summarize patterns of genetic variation, it implicitly assumes mutation types (A→C vs C→T) as interchangeable. However, mutations of different types arise and spread due to spatial and temporal variation in forces such as mutation rate and biased gene conversion that result in heterogeneity in the distribution of allele frequencies across sites. In this work, we explore the impact of this simplification on multiple aspects of population genetic modeling. As a site's mutation rate is strongly affected by flanking nucleotides, we defined a mutation subtype by the base pair change and adjacent nucleotides (e.g. AAA→ATA) and systematically assessed the heterogeneity in the frequency spectrum across 96 distinct 3-mer mutation subtypes using n = 3556 whole-genome sequenced individuals of European ancestry. We observed substantial variation across the subtype-specific frequency spectra, with some of the variation being influenced by molecular factors previously identified for single base mutation types. Estimates of model parameters from demographic inference performed for each mutation subtype's AFS individually varied drastically across the 96 subtypes. In local patterns of variation, a combination of regional subtype composition and local genomic factors shaped the regional frequency spectrum across genomic regions. Our results illustrate how treating variants in large sequencing samples as interchangeable may confound population genetic frameworks and encourages us to consider the unique evolutionary mechanisms of analyzed polymorphisms.
Collapse
Affiliation(s)
- Kevin Liao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jedidiah Carlson
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Population Health, University of Texas at Austin, Austin, TX 78712, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Mavura Y, Song H, Xie J, Tamayo P, Mohammed A, Lawal AT, Bello A, Ibrahim S, Faruk M, Huang FW. Transcriptomic profiling and genomic rearrangement landscape of Nigerian prostate cancer. Prostate 2023; 83:395-402. [PMID: 36598071 DOI: 10.1002/pros.24471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/10/2022] [Accepted: 12/02/2022] [Indexed: 01/05/2023]
Abstract
BACKGROUND Men of African ancestry have disproportionately high incidence rates of prostate cancer (PCa) and have high mortality rates. While there is evidence for a higher genetic predisposition for incidence of PCa in men of African ancestry compared to men of European ancestry, there have been few transcriptomic studies on PCa in men of African ancestry in the African continent. OBJECTIVE We performed transcriptomic profiling and fusion analysis on bulk RNA sequencing (RNA-seq) samples from 24 Nigerian PCa patients to investigate the transcriptomic and genomic rearrangement landscape of PCa in Nigerian men. DESIGN Bulk RNA-seq was performed on 24 formalin-fixed paraffin-embeded (FFPE) prostatectomy specimens of Nigerian men. Transcriptomic analysis was performed on 11 high-quality samples. Arriba Fusion and STAR Fusion were used for fusion detection. RESULTS 4/11 (36%) of the samples harbored an erythroblast transformation-specific (ETS) fusion event; 1/11 (9%) had a TMPRSS2-ERG fusion; 2/11 had a TMPRSS2-ETV5 fusion, and 1/11 had a SLC45A3-SKIL fusion. Hierarchical clustering of normalized and mean-centered gene expression showed clustering of fusion positive samples. Furthermore, we developed gene set signatures for Nigerian PCa based on fusion events. By projecting the cancer genome atlas prostate adenocarcinoma (TCGA-PRAD) bulk RNA-seq data set onto the transcriptional space defined by these signatures derived from Nigerian PCa patients, we identified a positive correlation between the Nigerian fusion signature and fusion positive samples in the TCGA-PRAD data set. CONCLUSIONS Less frequent ETS fusion events other than TMPRSS2-ERG such as TMPRSS2-ETV5 and non-ETS fusion events such as SLC45A3-SKIL may be more common in PCa in Nigerian men. This study provides useful working transcriptomic signatures that characterize oncogenic states representative of specific gene fusion events in PCa from Nigerian men.
Collapse
Affiliation(s)
- Yusuph Mavura
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA
- Institute for Human Genetics, University of California, San Francisco, California, USA
| | - Hanbing Song
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Department of Medicine, Division of Hematology/Oncology, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| | - Jamie Xie
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Department of Medicine, Division of Hematology/Oncology, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
| | - Pablo Tamayo
- Moores Cancer Center, University of California San Diego, La Jolla, California, USA
- Center for Novel Therapeutics, University of California San Diego, La Jolla, California, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Abdullahi Mohammed
- Department of Pathology, Faculty of Basic Clinical Sciences, College of Medical Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Ahmad T Lawal
- Department of Surgery, Division of Urology, Faculty of Clinical Sciences, College of Medical Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Ahmad Bello
- Department of Surgery, Division of Urology, Faculty of Clinical Sciences, College of Medical Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Sani Ibrahim
- Department of Biochemistry, Faculty of Life Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Mohammed Faruk
- Department of Pathology, Faculty of Basic Clinical Sciences, College of Medical Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Franklin W Huang
- Institute for Human Genetics, University of California, San Francisco, California, USA
- Department of Medicine, Division of Hematology/Oncology, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA
- San Francisco Veterans Affairs Health Care System, San Francisco, California, USA
| |
Collapse
|
11
|
Vendrami DLJ, Hoffman JI, Wilding CS. Heterogeneous Genomic Divergence Landscape in Two Commercially Important European Scallop Species. Genes (Basel) 2022; 14:14. [PMID: 36672754 PMCID: PMC9858869 DOI: 10.3390/genes14010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/15/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022] Open
Abstract
Two commercially important scallop species of the genus Pecten are found in Europe: the north Atlantic Pecten maximus and the Mediterranean Pecten jacobaeus whose distributions abut at the Almeria-Orán front. Whilst previous studies have quantified genetic divergence between these species, the pattern of differentiation along the Pecten genome is unknown. Here, we mapped RADseq data from 235 P. maximus and 27 P. jacobaeus to a chromosome-level reference genome, finding a heterogeneous landscape of genomic differentiation. Highly divergent genomic regions were identified across 14 chromosomes, while the remaining five showed little differentiation. Demographic and comparative genomics analyses suggest that this pattern resulted from an initial extended period of isolation, which promoted divergence, followed by differential gene flow across the genome during secondary contact. Single nucleotide polymorphisms present within highly divergent genomic regions were located in areas of low recombination and contrasting patterns of LD decay were found between the two species, hinting at the presence of chromosomal inversions in P. jacobaeus. Functional annotations revealed that highly differentiated regions were enriched for immune-related processes and mRNA modification. While future work is necessary to characterize structural differences, this study provides new insights into the speciation genomics of P. maximus and P. jacobaeus.
Collapse
Affiliation(s)
- David L. J. Vendrami
- Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33615 Bielefeld, Germany
| | - Joseph I. Hoffman
- Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33615 Bielefeld, Germany
- British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK
| | - Craig S. Wilding
- School of Biological and Environmental Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK
| |
Collapse
|
12
|
Dedato MN, Robert C, Taillon J, Shafer ABA, Côté SD. Demographic history and conservation genomics of caribou ( Rangifer tarandus) in Québec. Evol Appl 2022; 15:2043-2053. [PMID: 36540642 PMCID: PMC9753816 DOI: 10.1111/eva.13495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/31/2022] [Accepted: 10/06/2022] [Indexed: 08/04/2023] Open
Abstract
The loss of genetic diversity is a challenge many species are facing, with genomics being a potential tool to inform and prioritize decision-making. Most caribou (Rangifer tarandus) populations have experienced significant recent declines throughout Québec, Canada, and are considered of concern, threatened or endangered. Here, we calculated the ancestral and contemporary patterns of genomic diversity of five representative caribou populations and applied a comparative population genomics framework to assess the interplay between demographic events and genomic diversity. We first calculated a caribou specific mutation rate, μ, by extracting orthologous genes from related ungulates and estimating the rate of synonymous mutations. Whole genome re-sequencing was then completed on 67 caribou: from these data we calculated nucleotide diversity, θ π and estimated the coalescent or ancestral effective population size (N e), which ranged from 12,030 to 15,513. When compared to the census size, N C, the endangered Gaspésie Mountain caribou population had the highest ancestral N e:N C ratio which is consistent with recent work suggesting high ancestral N e:N C is of conservation concern. In contrast, values of contemporary N e, estimated from linkage-disequilibrium, ranged from 11 to 162, with Gaspésie having among the highest contemporary N e:N C ratio. Importantly, classic conservation genetics theory would predict this population to be of less concern based on this ratio. Interestingly, F varied only slightly between populations, and despite evidence of bottlenecks across the province, runs of homozygosity were not abundant in the genome. Tajima's D estimates mirrored the demographic models and current conservation status. Our study highlights how genomic patterns are nuanced and potentially misleading if viewed only through a contemporary lens; we argue a holistic conservation genomics view should integrate ancestral N e and Tajima's D into management decisions.
Collapse
Affiliation(s)
- Morgan N. Dedato
- Environmental and Life Sciences Graduate ProgramTrent UniversityPeterboroughOntarioCanada
| | - Claude Robert
- Département des Sciences AnimalesUniversité LavalQuébecQuébecCanada
| | - Joëlle Taillon
- Direction de l'expertise sur la Faune Terrestre, l'herpétofaune et l'avifaune, Ministère des Forêts, de la faune et des parcsGouvernement du QuébecQuébecQuébecCanada
| | - Aaron B. A. Shafer
- Environmental and Life Sciences Graduate ProgramTrent UniversityPeterboroughOntarioCanada
- Forensics DepartmentTrent UniversityPeterboroughOntarioCanada
| | - Steeve D. Côté
- Département de Biologie, Caribou Ungava and Centre d'Études NordiquesUniversité LavalQuébecQuébecCanada
| |
Collapse
|
13
|
Schiebelhut LM, Giakoumis M, Castilho R, Duffin PJ, Puritz JB, Wares JP, Wessel GM, Dawson MN. Minor Genetic Consequences of a Major Mass Mortality: Short-Term Effects in Pisaster ochraceus. THE BIOLOGICAL BULLETIN 2022; 243:328-338. [PMID: 36716481 PMCID: PMC10668074 DOI: 10.1086/722284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
AbstractMass mortality events are increasing globally in frequency and magnitude, largely as a result of human-induced change. The effects of these mass mortality events, in both the long and short term, are of imminent concern because of their ecosystem impacts. Genomic data can be used to reveal some of the population-level changes associated with mass mortality events. Here, we use reduced-representation sequencing to identify potential short-term genetic impacts of a mass mortality event associated with a sea star wasting outbreak. We tested for changes in the population for genetic differentiation, diversity, and effective population size between pre-sea star wasting and post-sea star wasting populations of Pisaster ochraceus-a species that suffered high sea star wasting-associated mortality (75%-100% at 80% of sites). We detected no significant population-based genetic differentiation over the spatial scale sampled; however, the post-sea star wasting population tended toward more differentiation across sites than the pre-sea star wasting population. Genetic estimates of effective population size did not detectably change, consistent with theoretical expectations; however, rare alleles were lost. While we were unable to detect significant population-based genetic differentiation or changes in effective population size over this short time period, the genetic burden of this mass mortality event may be borne by future generations, unless widespread recruitment mitigates the population decline. Prior results from P. ochraceus indicated that natural selection played a role in altering allele frequencies following this mass mortality event. In addition to the role of selection found in a previous study on the genomic impacts of sea star wasting on P. ochraceus, our current study highlights the potential role the stochastic loss of many individuals plays in altering how genetic variation is structured across the landscape. Future genetic monitoring is needed to determine long-term genetic impacts in this long-lived species. Given the increased frequency of mass mortality events, it is important to implement demographic and genetic monitoring strategies that capture baselines and background dynamics to better contextualize species' responses to large perturbations.
Collapse
Affiliation(s)
- Lauren M. Schiebelhut
- Life and Environmental Sciences, University of California, Merced, 5200 N. Lake Road, Merced, California 95343
| | - Melina Giakoumis
- Graduate Center, City University of New York, 365 5th Avenue, New York, New York 10016
- Department of Biology, City College of New York, 160 Convent Avenue, New York, New York 10031
| | - Rita Castilho
- University of Algarve, Campus de Gambelas, Faro, Portugal
- Center of Marine Sciences (CCMAR), Campus de Gambelas, Faro, Portugal
| | - Paige J. Duffin
- Odum School of Ecology and Department of Genetics, University of Georgia, 120 Green Street, Athens, Georgia 30602
| | - Jonathan B. Puritz
- Department of Biological Sciences, University of Rhode Island, 120 Flagg Road, Kingston, Rhode Island 02881
| | - John P. Wares
- Odum School of Ecology and Department of Genetics, University of Georgia, 120 Green Street, Athens, Georgia 30602
| | - Gary M. Wessel
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island 02912
| | - Michael N Dawson
- Life and Environmental Sciences, University of California, Merced, 5200 N. Lake Road, Merced, California 95343
| |
Collapse
|
14
|
Rasmussen MS, Garcia-Erill G, Korneliussen TS, Wiuf C, Albrechtsen A. Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage. Genetics 2022; 222:iyac148. [PMID: 36173322 PMCID: PMC9713400 DOI: 10.1093/genetics/iyac148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/14/2022] [Indexed: 12/13/2022] Open
Abstract
The site frequency spectrum is an important summary statistic in population genetics used for inference on demographic history and selection. However, estimation of the site frequency spectrum from called genotypes introduces bias when working with low-coverage sequencing data. Methods exist for addressing this issue but sometimes suffer from 2 problems. First, they can have very high computational demands, to the point that it may not be possible to run estimation for genome-scale data. Second, existing methods are prone to overfitting, especially for multidimensional site frequency spectrum estimation. In this article, we present a stochastic expectation-maximization algorithm for inferring the site frequency spectrum from NGS data that address these challenges. We show that this algorithm greatly reduces runtime and enables estimation with constant, trivial RAM usage. Furthermore, the algorithm reduces overfitting and thereby improves downstream inference. An implementation is available at github.com/malthesr/winsfs.
Collapse
Affiliation(s)
| | - Genís Garcia-Erill
- Department of Biology, University of Copenhagen, 2200 København N, Denmark
| | | | - Carsten Wiuf
- Department of Mathematical Sciences, University of Copenhagen, 2100 København Ø, Denmark
| | - Anders Albrechtsen
- Department of Biology, University of Copenhagen, 2200 København N, Denmark
| |
Collapse
|
15
|
Lewanski AL, Golcher-Benavides J, Rick JA, Wagner CE. Variable hybridization between two Lake Tanganyikan cichlid species in recent secondary contact. Mol Ecol 2022; 31:5041-5059. [PMID: 35913373 DOI: 10.1111/mec.16636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 07/22/2022] [Accepted: 07/26/2022] [Indexed: 12/01/2022]
Abstract
Closely related taxa frequently exist in sympatry before the evolution of robust reproductive barriers, which can lead to substantial gene flow. Post-divergence gene flow can promote several disparate trajectories of divergence ranging from the erosion of distinctiveness and eventual collapse of the taxa to the strengthening of reproductive isolation. Among many relevant factors, understanding the demographic history of divergence (e.g. divergence time, extent of historical gene flow) can be particularly informative when examining contemporary gene flow between closely related taxa because this history can influence gene flow's prevalence and consequences. Here, we used genotyping-by-sequencing data to investigate speciation and contemporary hybridization in two closely related and sympatrically distributed Lake Tanganyikan cichlid species in the genus Petrochromis. Demographic modeling supported a speciation scenario involving divergence in isolation followed by secondary contact with bidirectional gene flow. Further investigation of this recent gene flow found evidence of ongoing hybridization between the species that varied in extent between different co-occurring populations. Relationships between abundance and the degree of admixture across populations suggest that the availability of conspecific mates may influence patterns of hybridization. These results, together with the observation that sets of recently diverged cichlid taxa are generally geographically separated in the lake, suggest that ongoing speciation in Lake Tanganyikan cichlids relies on initial spatial isolation. Additionally, the spatially heterogeneous patterns of admixture between the Petrochromis species illustrates the complexities of hybridization when species are in recent secondary contact.
Collapse
Affiliation(s)
| | - Jimena Golcher-Benavides
- Department of Botany, University of Wyoming, Laramie, WY, USA.,Program in Ecology, University of Wyoming, Laramie, WY, USA
| | - Jessica A Rick
- Department of Botany, University of Wyoming, Laramie, WY, USA.,Program in Ecology, University of Wyoming, Laramie, WY, USA
| | - Catherine E Wagner
- Department of Botany, University of Wyoming, Laramie, WY, USA.,Program in Ecology, University of Wyoming, Laramie, WY, USA.,Biodiversity Institute, University of Wyoming, Laramie, WY, USA
| |
Collapse
|
16
|
You W, Henneberg R, Henneberg M. Healthcare services relaxing natural selection may contribute to increase of dementia incidence. Sci Rep 2022; 12:8873. [PMID: 35614150 PMCID: PMC9132962 DOI: 10.1038/s41598-022-12678-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/03/2022] [Indexed: 11/10/2022] Open
Abstract
Ageing and genetic traits can only explain the increasing dementia incidence partially. Advanced healthcare services allow dementia patients to survive natural selection and pass their genes onto the next generation. Country-specific estimates of dementia incidence rates (all ages and 15-49 years old), Biological State Index expressing reduced natural selection (Is), ageing indexed by life expectancy e(65), GDP PPP and urbanization were obtained for analysing the global and regional correlations between reduced natural selection and dementia incidence with SPSS v. 27. Worldwide, Is significantly, but inversely, correlates with dementia incidence rates for both all ages and 15-49 years old in bivariate correlations. These relationships remain inversely correlated regardless of the competing contributing effects from ageing, GDP and urbanization in partial correlation model. Results of multiple linear regression (enter) have shown that Is is the significant predictor of dementia incidence among all ages and 15-49 years old. Subsequently, Is was selected as the variable having the greatest influence on dementia incidence in stepwise multiple linear regression. The Is correlated with dementia incidence more strongly in developed population groupings. Worldwide, reduced natural selection may be yet another significant contributor to dementia incidence with special regard to developed populations.
Collapse
Affiliation(s)
- Wenpeng You
- Biological Anthropology and Comparative Anatomy Unit, School of Biomedicine, The University of Adelaide, Adelaide, SA, 5005, Australia.
| | - Renata Henneberg
- Biological Anthropology and Comparative Anatomy Unit, School of Biomedicine, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Maciej Henneberg
- Biological Anthropology and Comparative Anatomy Unit, School of Biomedicine, The University of Adelaide, Adelaide, SA, 5005, Australia
- Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland
| |
Collapse
|
17
|
Gossmann TI, Waxman D. Correcting Bias in Allele Frequency Estimates Due to an Observation Threshold: A Markov Chain Analysis. Genome Biol Evol 2022; 14:evac047. [PMID: 35349695 PMCID: PMC9016752 DOI: 10.1093/gbe/evac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 11/30/2022] Open
Abstract
There are many problems in biology and related disciplines involving stochasticity, where a signal can only be detected when it lies above a threshold level, while signals lying below threshold are simply not detected. A consequence is that the detected signal is conditioned to lie above threshold, and is not representative of the actual signal. In this work, we present some general results for the conditioning that occurs due to the existence of such an observational threshold. We show that this conditioning is relevant, for example, to gene-frequency trajectories, where many loci in the genome are simultaneously measured in a given generation. Such a threshold can lead to severe biases of allele frequency estimates under purifying selection. In the analysis presented, within the context of Markov chains such as the Wright-Fisher model, we address two key questions: (1) "What is a natural measure of the strength of the conditioning associated with an observation threshold?" (2) "What is a principled way to correct for the effects of the conditioning?". We answer the first question in terms of a proportion. Starting with a large number of trajectories, the relevant quantity is the proportion of these trajectories that are above threshold at a later time and hence are detected. The smaller the value of this proportion, the stronger the effects of conditioning. We provide an approximate analytical answer to the second question, that corrects the bias produced by an observation threshold, and performs to reasonable accuracy in the Wright-Fisher model for biologically plausible parameter values.
Collapse
Affiliation(s)
- Toni I. Gossmann
- Department of Evolutionary Genetics, Bielefeld University, Konsequenz 45, 33501 Bielefeld, Germany
- Berlin Institute for Advanced Study, Wallotstrasse 19, 14193 Berlin, Germany
| | - David Waxman
- Centre for Computational Systems Biology, ISTBI, Fudan University, 220 Handan Road, Shanghai 20433, People’s Republic of China
| |
Collapse
|
18
|
Wang Z, Pierce NE. Fine-scale genome-wide signature of Pleistocene glaciation in Thitarodes moths (Lepidoptera: Hepialidae), host of Ophiocordyceps fungus in the Hengduan Mountains. Mol Ecol 2022; 32:2695-2714. [PMID: 35377501 DOI: 10.1111/mec.16457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 02/12/2022] [Accepted: 03/21/2022] [Indexed: 11/28/2022]
Abstract
The Hengduan Mountains region is a biodiversity hotspot known for its topologically complex, deep valleys and high mountains. While landscape and glacial refugia have been evoked to explain patterns of inter-species divergence, the accumulation of intra-species (i.e. population level) genetic divergence across the mountain-valley landscape in this region has received less attention. We used genome-wide restriction site-associated DNA sequencing (RADseq) to reveal signatures of Pleistocene glaciation in populations of Thitarodes shambalaensis (Lepidoptera: Hepialidae), the host moth of parasitic Ophiocordyceps sinensis (Hypocreales: Ophiocordycipitaceae) or "caterpillar fungus" endemic to the glacier of eastern Mt. Gongga. We used moraine history along the glacier valleys to model the distribution and environmental barriers to gene flow across populations of T. shambalaensis. We found that moth populations separated by less than 10 km exhibited valley-based population genetic clustering and isolation-by-distance (IBD), while gene flow among populations was best explained by models using information about their distributions at the local last glacial maximum (LGML , 58 kya), not their contemporary distribution. Maximum likelihood lineage history among populations, and among subpopulations as little as 500 meters apart, recapitulated glaciation history across the landscape. We also found signals of isolated population expansion following the retreat of LGML glaciers. These results reveal the fine-scale, long-term historical influence of landscape and glaciation on the genetic structuring of populations of an endangered and economically important insect species. Similar mechanisms, given enough time and continued isolation, could explain the contribution of glacier refugia to the generation of species diversity among the Hengduan Mountains.
Collapse
Affiliation(s)
- Zhengyang Wang
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Naomi E Pierce
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
19
|
Hoffman JI, Chen RS, Vendrami DLJ, Paijmans AJ, Dasmahapatra KK, Forcada J. Demographic Reconstruction of Antarctic Fur Seals Supports the Krill Surplus Hypothesis. Genes (Basel) 2022; 13:541. [PMID: 35328094 PMCID: PMC8954904 DOI: 10.3390/genes13030541] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 11/16/2022] Open
Abstract
Much debate surrounds the importance of top-down and bottom-up effects in the Southern Ocean, where the harvesting of over two million whales in the mid twentieth century is thought to have produced a massive surplus of Antarctic krill. This excess of krill may have allowed populations of other predators, such as seals and penguins, to increase, a top-down hypothesis known as the 'krill surplus hypothesis'. However, a lack of pre-whaling population baselines has made it challenging to investigate historical changes in the abundance of the major krill predators in relation to whaling. Therefore, we used reduced representation sequencing and a coalescent-based maximum composite likelihood approach to reconstruct the recent demographic history of the Antarctic fur seal, a pinniped that was hunted to the brink of extinction by 18th and 19th century sealers. In line with the known history of this species, we found support for a demographic model that included a substantial reduction in population size around the time period of sealing. Furthermore, maximum likelihood estimates from this model suggest that the recovered, post-sealing population at South Georgia may have been around two times larger than the pre-sealing population. Our findings lend support to the krill surplus hypothesis and illustrate the potential of genomic approaches to shed light on long-standing questions in population biology.
Collapse
Affiliation(s)
- Joseph I. Hoffman
- Department of Animal Behavior, University of Bielefeld, P.O. BOX 100131, 33615 Bielefeld, Germany; (R.S.C.); (D.L.J.V.); (A.J.P.)
- British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK;
| | - Rebecca S. Chen
- Department of Animal Behavior, University of Bielefeld, P.O. BOX 100131, 33615 Bielefeld, Germany; (R.S.C.); (D.L.J.V.); (A.J.P.)
| | - David L. J. Vendrami
- Department of Animal Behavior, University of Bielefeld, P.O. BOX 100131, 33615 Bielefeld, Germany; (R.S.C.); (D.L.J.V.); (A.J.P.)
| | - Anna J. Paijmans
- Department of Animal Behavior, University of Bielefeld, P.O. BOX 100131, 33615 Bielefeld, Germany; (R.S.C.); (D.L.J.V.); (A.J.P.)
| | | | - Jaume Forcada
- British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK;
| |
Collapse
|
20
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschumar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2022; 220:iyab229. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence “Controlling Microbes to Fight Infections”, Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin 10115, Germany
| | | | - Jared G Galloway
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7264, USA
- Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Warren W Kretzschumar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, State College, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | | | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Peter L Ralph
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
21
|
Moinet A, Schlichta F, Peischl S, Excoffier L. Strong neutral sweeps occurring during a population contraction. Genetics 2022; 220:6529544. [PMID: 35171980 PMCID: PMC8982045 DOI: 10.1093/genetics/iyac021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/22/2022] [Indexed: 11/14/2022] Open
Abstract
A strong reduction in diversity around a specific locus is often interpreted as a recent rapid fixation of a positively selected allele, a phenomenon called a selective sweep. Rapid fixation of neutral variants can however lead to a similar reduction in local diversity, especially when the population experiences changes in population size, e.g. bottlenecks or range expansions. The fact that demographic processes can lead to signals of nucleotide diversity very similar to signals of selective sweeps is at the core of an ongoing discussion about the roles of demography and natural selection in shaping patterns of neutral variation. Here, we quantitatively investigate the shape of such neutral valleys of diversity under a simple model of a single population size change, and we compare it to signals of a selective sweep. We analytically describe the expected shape of such "neutral sweeps" and show that selective sweep valleys of diversity are, for the same fixation time, wider than neutral valleys. On the other hand, it is always possible to parametrize our model to find a neutral valley that has the same width as a given selected valley. Our findings provide further insight into how simple demographic models can create valleys of genetic diversity similar to those attributed to positive selection.
Collapse
Affiliation(s)
- Antoine Moinet
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Flávia Schlichta
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Bern 3012, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Corresponding author.
| | - Laurent Excoffier
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, 3012 Bern, Switzerland
| |
Collapse
|
22
|
Rivera D, Prates I, Firneno TJ, Rodrigues MT, Caldwell JP, Fujita MK. Phylogenomics, introgression, and demographic history of South American true toads (Rhinella). Mol Ecol 2021; 31:978-992. [PMID: 34784086 DOI: 10.1111/mec.16280] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 10/24/2021] [Accepted: 11/11/2021] [Indexed: 11/28/2022]
Abstract
The effects of genetic introgression on species boundaries and how they affect species' integrity and persistence over evolutionary time have received increased attention. The increasing availability of genomic data has revealed contrasting patterns of gene flow across genomic regions, which impose challenges to inferences of evolutionary relationships and of patterns of genetic admixture across lineages. By characterizing patterns of variation across thousands of genomic loci in a widespread complex of true toads (Rhinella), we assess the true extent of genetic introgression across species thought to hybridize to extreme degrees based on natural history observations and multi-locus analyses. Comprehensive geographic sampling of five large-ranged Neotropical taxa revealed multiple distinct evolutionary lineages that span large geographic areas and, at times, distinct biomes. The inferred major clades and genetic clusters largely correspond to currently recognized taxa; however, we also found evidence of cryptic diversity within taxa. While previous phylogenetic studies revealed extensive mito-nuclear discordance, our genetic clustering analyses uncovered several admixed individuals within major genetic groups. Accordingly, historical demographic analyses supported that the evolutionary history of these toads involved cross-taxon gene flow both at ancient and recent times. Lastly, ABBA-BABA tests revealed widespread allele sharing across species boundaries, a pattern that can be confidently attributed to genetic introgression as opposed to incomplete lineage sorting. These results confirm previous assertions that the evolutionary history of Rhinella was characterized by various levels of hybridization even across environmentally heterogeneous regions, posing exciting questions about what factors prevent complete fusion of diverging yet highly interdependent evolutionary lineages.
Collapse
Affiliation(s)
- Danielle Rivera
- Department of Biology, University of Texas at Arlington, Arlington, TX, USA.,Amphibian and Reptile Diversity Research Center, University of Texas at Arlington, TX, USA
| | - Ivan Prates
- Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor, MI, USA
| | - Thomas J Firneno
- Department of Biology, University of Texas at Arlington, Arlington, TX, USA.,Amphibian and Reptile Diversity Research Center, University of Texas at Arlington, TX, USA
| | - Miguel Trefaut Rodrigues
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Janalee P Caldwell
- Sam Noble Museum & Department of Biology, University of Oklahoma, Norman, Oklahoma, 73072-7029, USA
| | - Matthew K Fujita
- Department of Biology, University of Texas at Arlington, Arlington, TX, USA.,Amphibian and Reptile Diversity Research Center, University of Texas at Arlington, TX, USA
| |
Collapse
|
23
|
Whole-exome analysis in Tunisian Imazighen and Arabs shows the impact of demography in functional variation. Sci Rep 2021; 11:21125. [PMID: 34702931 PMCID: PMC8548440 DOI: 10.1038/s41598-021-00576-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/14/2021] [Indexed: 11/08/2022] Open
Abstract
Human populations are genetically affected by their demographic history, which shapes the distribution of their functional genomic variation. However, the genetic impact of recent demography is debated. This issue has been studied in different populations, but never in North Africans, despite their relevant cultural and demographic diversity. In this study we address the question by analyzing new whole-exome sequences from two culturally different Tunisian populations, an isolated Amazigh population and a close non-isolated Arab-speaking population, focusing on the distribution of functional variation. Both populations present clear differences in their variant frequency distribution, in general and for putatively damaging variation. This suggests a relevant effect in the Amazigh population of genetic isolation, drift, and inbreeding, pointing to relaxed purifying selection. We also discover the enrichment in Imazighen of variation associated to specific diseases or phenotypic traits, but the scarce genetic and biomedical data in the region limits further interpretation. Our results show the genomic impact of recent demography and reveal a clear genetic differentiation probably related to culture. These findings highlight the importance of considering cultural and demographic heterogeneity within North Africa when defining population groups, and the need for more data to improve knowledge on the region's health and disease landscape.
Collapse
|
24
|
Dokan K, Kawamura S, Teshima KM. Effects of single nucleotide polymorphism ascertainment on population structure inferences. G3-GENES GENOMES GENETICS 2021; 11:6237890. [PMID: 33871576 PMCID: PMC8496283 DOI: 10.1093/g3journal/jkab128] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 04/08/2021] [Indexed: 11/14/2022]
Abstract
Single nucleotide polymorphism (SNP) data are widely used in research on natural populations. Although they are useful, SNP genotyping data are known to contain bias, normally referred to as ascertainment bias, because they are conditioned by already confirmed variants. This bias is introduced during the genotyping process, including the selection of populations for novel SNP discovery and the number of individuals involved in the discovery panel and selection of SNP markers. It is widely recognized that ascertainment bias can cause inaccurate inferences in population genetics and several methods to address these bias issues have been proposed. However, especially in natural populations, it is not always possible to apply an ideal ascertainment scheme because natural populations tend to have complex structures and histories. In addition, it was not fully assessed if ascertainment bias has the same effect on different types of population structure. Here, we examine the effects of bias produced during the selection of population for SNP discovery and consequent SNP marker selection processes under three demographic models: the island, stepping-stone, and population split models. Results show that site frequency spectra and summary statistics contain biases that depend on the joint effect of population structure and ascertainment schemes. Additionally, population structure inferences are also affected by ascertainment bias. Based on these results, it is recommended to evaluate the validity of the ascertainment strategy prior to the actual typing process because the direction and extent of ascertainment bias vary depending on several factors.
Collapse
Affiliation(s)
- Kotaro Dokan
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Sayu Kawamura
- Graduate School of System Life Science, Kyushu University, Fukuoka 819-0395, Japan
| | - Kosuke M Teshima
- Department of Biology, Kyushu University, Fukuoka 819-0395, Japan
| |
Collapse
|
25
|
Barthelemy E, Fortunel C, Jaunatre M, Munoz F. Imprints of Past Habitat Area Reduction on Extant Taxonomic, Functional, and Phylogenetic Composition. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.634413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Past environmental changes have shaped the evolutionary and ecological diversity of extant organisms. Specifically, climatic fluctuations have made environmental conditions alternatively common or rare over time. Accordingly, most taxa have undergone restriction of their distribution to local refugia during habitat contraction, from which they could expand when suitable habitat became more common. Assessing how past restrictions in refugia have shaped species distributions and genetic diversity has motivated much research in evolutionary biology and biogeography. But there is still lack of clear synthesis on whether and how the taxonomic, functional and phylogenetic composition of extant multispecies assemblages retains the imprint of past restriction in refugia. We devised an original eco-evolutionary model to investigate the temporal dynamics of a regional species pool inhabiting a given habitat today, and which have experienced habitat reduction in the past. The model includes three components: (i) a demographic component driving stochastic changes in population sizes and extinctions due to habitat availability, (ii) a mutation and speciation component representing how divergent genotypes emerge and define new species over time, and (iii) a trait evolution component representing how trait values have changed across descendants over time. We used this model to simulate dynamics of multispecies assemblages that occupied a restricted refugia in the past and could expand their distribution subsequently. We characterized the past restriction in refugia in terms of two parameters representing the ending time of past refugia, and the extent of habitat restriction in the refugia. We characterized extant patterns of taxonomic, functional and phylogenetic diversity depending on these parameters. We found that extant relative abundances reflect the lasting influence of more recent refugia on demographic dynamics, while phylogenetic composition reflects the influence of more ancient habitat change. Extant functional diversity depends on the interplay between diversification dynamics and trait evolution, offering new options to jointly infer current trait adaptation and past trait evolution dynamics.
Collapse
|
26
|
Birolo G, Aneli S, Di Gaetano C, Cugliari G, Russo A, Allione A, Casalone E, Giorgio E, Paraboschi EM, Ardissino D, Duga S, Asselta R, Matullo G. Functional and clinical implications of genetic structure in 1686 Italian exomes. Hum Mutat 2021; 42:272-289. [PMID: 33326653 DOI: 10.1002/humu.24156] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 11/13/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022]
Abstract
To reconstruct the phenotypical and clinical implications of the Italian genetic structure, we thoroughly analyzed a whole-exome sequencing data set comprised of 1686 healthy Italian individuals. We found six previously unreported variants with remarkable frequency differences between Northern and Southern Italy in the HERC2, OR52R1, ADH1B, and THBS4 genes. We reported 36 clinically relevant variants (submitted as pathogenic, risk factors, or drug response in ClinVar) with significant frequency differences between Italy and Europe. We then explored putatively pathogenic variants in the Italian exome. On average, our Italian individuals carried 16.6 protein-truncating variants (PTVs), with 2.5% of the population having a PTV in one of the 59 American College of Medical Genetics (ACMG) actionable genes. Lastly, we looked for PTVs that are likely to cause Mendelian diseases. We found four heterozygous PTVs in haploinsufficient genes (KAT6A, PTCH1, and STXBP1) and three homozygous PTVs in genes causing recessive diseases (DPYD, FLG, and PYGM). Comparing frequencies from our data set to other public databases, like gnomAD, we showed the importance of population-specific databases for a more accurate assessment of variant pathogenicity. For this reason, we made aggregated frequencies from our data set publicly available as a tool for both clinicians and researchers (http://nigdb.cineca.it; NIG-ExIT).
Collapse
Affiliation(s)
- Giovanni Birolo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Serena Aneli
- Department of Medical Sciences, University of Turin, Turin, Italy
| | | | | | - Alessia Russo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | | | | | - Elisa Giorgio
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Elvezia M Paraboschi
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Diego Ardissino
- Division of Cardiology, Azienda Ospedaliero-Universitaria di Parma, Parma, Italy
| | - Stefano Duga
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Rosanna Asselta
- Department of Biomedical Sciences, Humanitas University, Rozzano, Milan, Italy.,Humanitas Clinical and Research Center-IRCCS, Rozzano, Milan, Italy
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, Turin, Italy
| |
Collapse
|
27
|
Schrider DR. Background Selection Does Not Mimic the Patterns of Genetic Diversity Produced by Selective Sweeps. Genetics 2020; 216:499-519. [PMID: 32847814 PMCID: PMC7536861 DOI: 10.1534/genetics.120.303469] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 08/04/2020] [Indexed: 12/28/2022] Open
Abstract
It is increasingly evident that natural selection plays a prominent role in shaping patterns of diversity across the genome. The most commonly studied modes of natural selection are positive selection and negative selection, which refer to directional selection for and against derived mutations, respectively. Positive selection can result in hitchhiking events, in which a beneficial allele rapidly replaces all others in the population, creating a valley of diversity around the selected site along with characteristic skews in allele frequencies and linkage disequilibrium among linked neutral polymorphisms. Similarly, negative selection reduces variation not only at selected sites but also at linked sites, a phenomenon called background selection (BGS). Thus, discriminating between these two forces may be difficult, and one might expect efforts to detect hitchhiking to produce an excess of false positives in regions affected by BGS. Here, we examine the similarity between BGS and hitchhiking models via simulation. First, we show that BGS may somewhat resemble hitchhiking in simplistic scenarios in which a region constrained by negative selection is flanked by large stretches of unconstrained sites, echoing previous results. However, this scenario does not mirror the actual spatial arrangement of selected sites across the genome. By performing forward simulations under more realistic scenarios of BGS, modeling the locations of protein-coding and conserved noncoding DNA in real genomes, we show that the spatial patterns of variation produced by BGS rarely mimic those of hitchhiking events. Indeed, BGS is not substantially more likely than neutrality to produce false signatures of hitchhiking. This holds for simulations modeled after both humans and Drosophila, and for several different demographic histories. These results demonstrate that appropriately designed scans for hitchhiking need not consider BGS's impact on false-positive rates. However, we do find evidence that BGS increases the false-negative rate for hitchhiking, an observation that demands further investigation.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514
| |
Collapse
|
28
|
Kerdoncuff E, Lambert A, Achaz G. Testing for population decline using maximal linkage disequilibrium blocks. Theor Popul Biol 2020; 134:171-181. [DOI: 10.1016/j.tpb.2020.03.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 02/02/2023]
|
29
|
Werren EA, Garcia O, Bigham AW. Identifying adaptive alleles in the human genome: from selection mapping to functional validation. Hum Genet 2020; 140:241-276. [PMID: 32728809 DOI: 10.1007/s00439-020-02206-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 07/07/2020] [Indexed: 12/19/2022]
Abstract
The suite of phenotypic diversity across geographically distributed human populations is the outcome of genetic drift, gene flow, and natural selection throughout human evolution. Human genetic variation underlying local biological adaptations to selective pressures is incompletely characterized. With the emergence of population genetics modeling of large-scale genomic data derived from diverse populations, scientists are able to map signatures of natural selection in the genome in a process known as selection mapping. Inferred selection signals further can be used to identify candidate functional alleles that underlie putative adaptive phenotypes. Phenotypic association, fine mapping, and functional experiments facilitate the identification of candidate adaptive alleles. Functional investigation of candidate adaptive variation using novel techniques in molecular biology is slowly beginning to unravel how selection signals translate to changes in biology that underlie the phenotypic spectrum of our species. In addition to informing evolutionary hypotheses of adaptation, the discovery and functional annotation of adaptive alleles also may be of clinical significance. While selection mapping efforts in non-European populations are growing, there remains a stark under-representation of diverse human populations in current public genomic databases, of both clinical and non-clinical cohorts. This lack of inclusion limits the study of human biological variation. Identifying and functionally validating candidate adaptive alleles in more global populations is necessary for understanding basic human biology and human disease.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, The University of Michigan, Ann Arbor, MI, USA
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Obed Garcia
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Abigail W Bigham
- Department of Anthropology, University of California Los Angeles, 341 Haines Hall, Los Angeles, CA, 90095, USA.
| |
Collapse
|
30
|
Recent effective population size in Eastern European plain Russians correlates with the key historical events. Sci Rep 2020; 10:9729. [PMID: 32546820 PMCID: PMC7298007 DOI: 10.1038/s41598-020-66734-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 05/12/2020] [Indexed: 11/13/2022] Open
Abstract
Effective population size reflects the history of population growth, contraction, and structuring. When the effect of structuring is negligible, the inferred trajectory of the effective population size can be informative about the key events in the history of a population. We used the IBDNe and DoRIS approaches, which exploit the data on IBD sharing between genomes, to reconstruct the recent effective population size in two population datasets of Russians from Eastern European plain: (1) ethnic Russians sampled from the westernmost part of Russia; (2) ethnic Russians, Bashkirs, and Tatars sampled from the Volga-Ural region. In this way, we examined changes in effective population size among ethnic Russians that reside in their historical area at the West of the plain, and that expanded eastward to come into contact with the indigenous peoples at the East of the plain. We compared the inferred demographic trajectories of each ethnic group to written historical data related to demographic events such as migration, war, colonization, famine, establishment, and collapse of empires. According to IBDNe estimations, 200 generations (~6000 years) ago, the effective size of the ancestral populations of Russians, Bashkirs, and Tatars hovered around 3,000, 30,000, and 8,000 respectively. Then, the ethnic Russians exponentially grew with increasing rates for the last 115 generations and become the largest ethnic group of the plain. Russians do not show any drop in effective population size after the key historical conflicts, including the Mongol invasion. The only exception is a moderate drop in the 17th century, which is well known in Russian history as The Smuta. Our analyses suggest a more eventful recent population history for the two small ethnic groups that came into contact with ethnic Russians in the Volga-Ural region. We found that the effective population size of Bashkirs and Tatars started to decrease during the time of the Mongol invasion. Interestingly, there is an even stronger drop in the effective population size that coincides with the expansion of Russians to the East. Thus, 15–20 generations ago, i.e. in the 16–18th centuries in the trajectories of Bashkirs and Tatars, we observe the bottlenecks of four and twenty thousand, respectively. Our results on the recent effective population size correlate with the key events in the history of populations of the Eastern European plain and have importance for designing biomedical studies in the region.
Collapse
|
31
|
Chen H. A Computational Approach for Modeling the Allele Frequency Spectrum of Populations with Arbitrarily Varying Size. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:635-644. [PMID: 32173599 PMCID: PMC7212486 DOI: 10.1016/j.gpb.2019.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 06/04/2019] [Accepted: 08/02/2019] [Indexed: 11/25/2022]
Abstract
The allele frequency spectrum (AFS), or site frequency spectrum, is commonly used to summarize the genomic polymorphism pattern of a sample, which is informative for inferring population history and detecting natural selection. In 2013, Chen and Chen developed a method for analytically deriving the AFS for populations with temporally varying size through the coalescence time-scaling function. However, their approach is only applicable to population history scenarios in which the analytical form of the time-scaling function is tractable. In this paper, we propose a computational approach to extend the method to populations with arbitrary complex varying size by numerically approximating the time-scaling function. We demonstrate the performance of the approach by constructing the AFS for two population history scenarios: the logistic growth model and the Gompertz growth model, for which the AFS are unavailable with existing approaches. Software for implementing the algorithm can be downloaded at http://chenlab.big.ac.cn/software/.
Collapse
Affiliation(s)
- Hua Chen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
32
|
Noskova E, Ulyantsev V, Koepfli KP, O’Brien SJ, Dobrynin P. GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data. Gigascience 2020; 9:giaa005. [PMID: 32112099 PMCID: PMC7049072 DOI: 10.1093/gigascience/giaa005] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 09/16/2019] [Accepted: 01/13/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS. RESULTS Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA). CONCLUSIONS We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history.
Collapse
Affiliation(s)
- Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
| | - Vladimir Ulyantsev
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
| | - Klaus-Peter Koepfli
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA
| | - Stephen J O’Brien
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Guy Harvey Oceanographic Center, Nova Southeastern University Ft. Lauderdale, 8000 North Ocean Drive, Ft. Lauderdale, Florida 33004, USA
| | - Pavel Dobrynin
- Computer Technologies Laboratory, ITMO University, 49 Kronverkskiy Pr., St. Petersburg 197101, Russian Federation
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, 3001 Connecticut Ave., NW Washington, D.C. 20008, USA
| |
Collapse
|
33
|
Kioukis A, Michalopoulou VA, Briers L, Pirintsos S, Studholme DJ, Pavlidis P, Sarris PF. Intraspecific diversification of the crop wild relative Brassica cretica Lam. using demographic model selection. BMC Genomics 2020; 21:48. [PMID: 31937246 PMCID: PMC6961386 DOI: 10.1186/s12864-019-6439-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 12/29/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Crop wild relatives (CWRs) contain genetic diversity, representing an invaluable resource for crop improvement. Many of their traits have the potential to help crops to adapt to changing conditions that they experience due to climate change. An impressive global effort for the conservation of various CWR will facilitate their use in crop breeding for food security. The genus Brassica is listed in Annex I of the International Treaty on Plant Genetic Resources for Food and Agriculture. Brassica oleracea (or wild cabbage), a species native to southern and western Europe, has become established as an important human food crop plant because of its large reserves stored over the winter in its leaves. Brassica cretica Lam. (Bc) is a CWR in the brassica group and B. cretica subsp. nivea (Bcn) has been suggested as a separate subspecies. The species Bc has been proposed as a potential gene donor to brassica crops, including broccoli, cabbage, cauliflower, oilseed rape, etc. RESULTS: We sequenced genomes of four Bc individuals, including two Bcn and two Bc. Demographic analysis based on our whole-genome sequence data suggests that populations of Bc are not isolated. Classification of the Bc into distinct subspecies is not supported by the data. Using only the non-coding part of the data (thus, the parts of the genome that has evolved nearly neutrally), we find the gene flow between different Bc population is recent and its genomic diversity is high. CONCLUSIONS Despite predictions on the disruptive effect of gene flow in adaptation, when selection is not strong enough to prevent the loss of locally adapted alleles, studies show that gene flow can promote adaptation, that local adaptations can be maintained despite high gene flow, and that genetic architecture plays a fundamental role in the origin and maintenance of local adaptation with gene flow. Thus, in the genomic era it is important to link the selected demographic models with the underlying processes of genomic variation because, if this variation is largely selectively neutral, we cannot assume that a diverse population of crop wild relatives will necessarily exhibit the wide-ranging adaptive diversity required for further crop improvement.
Collapse
Affiliation(s)
- Antonios Kioukis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 70013, Crete, Greece
| | - Vassiliki A Michalopoulou
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, 70013, Crete, Greece
| | - Laura Briers
- Biosciences, College of Life and Environmental Sciences, University of Exeter, Exeter, UK
| | - Stergios Pirintsos
- Department of Biology, University of Crete, 714 09, Heraklion, Greece
- Botanical Garden, University of Crete, Gallos Campus, 741 00, Rethymnon, Greece
| | - David J Studholme
- Biosciences, College of Life and Environmental Sciences, University of Exeter, Exeter, UK.
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 70013, Crete, Greece
| | - Panagiotis F Sarris
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion, 70013, Crete, Greece.
- Biosciences, College of Life and Environmental Sciences, University of Exeter, Exeter, UK.
- Department of Biology, University of Crete, 714 09, Heraklion, Greece.
| |
Collapse
|
34
|
Campbell MC, Ashong B, Teng S, Harvey J, Cross CN. Multiple selective sweeps of ancient polymorphisms in and around LTα located in the MHC class III region on chromosome 6. BMC Evol Biol 2019; 19:218. [PMID: 31791241 PMCID: PMC6889576 DOI: 10.1186/s12862-019-1516-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 09/20/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lymphotoxin-α (LTα), located in the Major Histocompatibility Complex (MHC) class III region on chromosome 6, encodes a cytotoxic protein that mediates a variety of antiviral responses among other biological functions. Furthermore, several genotypes at this gene have been implicated in the onset of a number of complex diseases, including myocardial infarction, autoimmunity, and various types of cancer. However, little is known about levels of nucleotide variation and linkage disequilibrium (LD) in and near LTα, which could also influence phenotypic variance. To address this gap in knowledge, we examined sequence variation across ~ 10 kilobases (kbs), encompassing LTα and the upstream region, in 2039 individuals from the 1000 Genomes Project originating from 21 global populations. RESULTS Here, we observed striking patterns of diversity, including an excess of intermediate-frequency alleles, the maintenance of multiple common haplotypes and a deep coalescence time for variation (dating > 1.0 million years ago), in global populations. While these results are generally consistent with a model of balancing selection, we also uncovered a signature of positive selection in the form of long-range LD on chromosomes with derived alleles primarily in Eurasian populations. To reconcile these findings, which appear to support different models of selection, we argue that selective sweeps (particularly, soft sweeps) of multiple derived alleles in and/or near LTα occurred in non-Africans after their ancestors left Africa. Furthermore, these targets of selection were predicted to alter transcription factor binding site affinity and protein stability, suggesting they play a role in gene function. Additionally, our data also showed that a subset of these functional adaptive variants are present in archaic hominin genomes. CONCLUSIONS Overall, this study identified candidate functional alleles in a biologically-relevant genomic region, and offers new insights into the evolutionary origins of these loci in modern human populations.
Collapse
Affiliation(s)
- Michael C. Campbell
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Bryan Ashong
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Shaolei Teng
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Jayla Harvey
- Department of Biology, College of Arts and Sciences, Howard University, Washington, DC 20059 USA
| | - Christopher N. Cross
- Department of Anatomy, College of Medicine, Howard University, Washington, DC 20059 USA
| |
Collapse
|
35
|
Torada L, Lorenzon L, Beddis A, Isildak U, Pattini L, Mathieson S, Fumagalli M. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics 2019; 20:337. [PMID: 31757205 PMCID: PMC6873651 DOI: 10.1186/s12859-019-2927-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 05/31/2019] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.
Collapse
Affiliation(s)
- Luis Torada
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Lucrezia Lorenzon
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Alice Beddis
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| | - Ulas Isildak
- Department of Biological Sciences, Middle East Technical University, METU Üniversiteler Mah. Dumlupınar Blv. No:1, Ankara, 06800 Çankaya Turkey
| | - Linda Pattini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133 Italy
| | - Sara Mathieson
- Department of Computer Science, Swarthmore College, 500 College Ave, Swarthmore, 19081 PA USA
| | - Matteo Fumagalli
- Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY UK
| |
Collapse
|
36
|
Garot E, Joët T, Combes MC, Severac D, Lashermes P. Plant population dynamics on oceanic islands during the Late Quaternary climate changes: genetic evidence from a tree species (Coffea mauritiana) in Reunion Island. THE NEW PHYTOLOGIST 2019; 224:974-986. [PMID: 31291469 DOI: 10.1111/nph.16052] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 07/04/2019] [Indexed: 06/09/2023]
Abstract
Past climatic fluctuations have played a major role in shaping the current plant biodiversity. Although harbouring an exceptional biota, oceanic islands have received little attention in studies on species demographic history and past vegetation patterns. We investigated the impact of past climatic changes on the effective population size of a tree (Coffea mauritiana) that is endemic to Reunion Island, located in the south-western Indian Ocean (SWIO). Demographic changes were inferred using summary statistics calculated from genomic data. Using ecological niche modelling and the current distribution of genetic diversity, the paleodistribution of the species was also assessed. A reduction in the effective population size of C. mauritiana during the last glaciation maximum was inferred. The distribution of the species was reduced on the western side of the island, due to low rainfall. It appeared that a major reduction in rainfall and a slight temperature decrease prevailed in the SWIO. Our findings indicated that analyses on the current patterns of intraspecific genetic variations can efficiently contribute to past climatic changes characterisation in remote islands. Identifying area with higher resilience in oceanic islands could provide guidance in forest management and conservation faced to the global climate change.
Collapse
Affiliation(s)
- Edith Garot
- IRD, University of Montpellier, DIADE, 34394, Montpellier, France
| | - Thierry Joët
- IRD, University of Montpellier, DIADE, 34394, Montpellier, France
| | | | - Dany Severac
- MGX, University of Montpellier, CNRS, INSERM, 34095, Montpellier, France
| | | |
Collapse
|
37
|
Goodman KR, Prost S, Bi K, Brewer MS, Gillespie RG. Host and geography together drive early adaptive radiation of Hawaiian planthoppers. Mol Ecol 2019; 28:4513-4528. [PMID: 31484218 DOI: 10.1111/mec.15231] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 08/19/2019] [Accepted: 08/27/2019] [Indexed: 11/30/2022]
Abstract
The interactions between insects and their plant host have been implicated in driving diversification of both players. Early arguments highlighted the role of ecological opportunity, with the idea that insects "escape and radiate" on new hosts, with subsequent hypotheses focusing on the interplay between host shifting and host tracking, coupled with isolation and fusion, in generating diversity. Because it is rarely possible to capture the initial stages of diversification, it is particularly difficult to ascertain the relative roles of geographic isolation versus host shifts in initiating the process. The current study examines genetic diversity between populations and hosts within a single species of endemic Hawaiian planthopper, Nesosydne umbratica (Hemiptera, Delphacidae). Given that the species was known as a host generalist occupying unrelated hosts, Clermontia (Campanulaceae) and Pipturus (Urticaceae), we set out to determine the relative importance of geography and host in structuring populations in the early stages of differentiation on the youngest islands of the Hawaiian chain. Results from extensive exon capture data showed that N. umbratica is highly structured, both by geography, with discrete populations on each volcano, and by host plant, with parallel radiations on Clermontia and Pipturus leading to extensive co-occurrence. The marked genetic structure suggests that populations can readily become established on novel hosts provided opportunity; subsequent adaptation allows monopolization of the new host. The results support the role of geographic isolation in structuring populations and with host shifts occurring as discrete events that facilitate subsequent parallel geographic range expansion.
Collapse
Affiliation(s)
- Kari Roesch Goodman
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
| | - Stefan Prost
- Department of Integrative Biology, University of California, Berkeley, CA, USA.,LOEWE-Centre for Translational Biodiversity Genomics, Senckenberg Research Institute, Frankfurt/Main, Germany
| | - Ke Bi
- Computational Genomics Resource Laboratory (CGRL), California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA.,Ancestry, San Francisco, CA, USA.,Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| | - Michael S Brewer
- Department of Biology, East Carolina University, Greenville, NC, USA
| | - Rosemary G Gillespie
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
| |
Collapse
|
38
|
Zeng K, Jackson BC, Barton HJ. Methods for Estimating Demography and Detecting Between-Locus Differences in the Effective Population Size and Mutation Rate. Mol Biol Evol 2019; 36:423-433. [PMID: 30428070 PMCID: PMC6409433 DOI: 10.1093/molbev/msy212] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
It is known that the effective population size (Ne) and the mutation rate (u) vary across the genome. Here, we show that ignoring this heterogeneity may lead to biased estimates of past demography. To solve the problem, we develop new methods for jointly inferring past changes in population size and detecting variation in Ne and u between loci. These methods rely on either polymorphism data alone or both polymorphism and divergence data. In addition to inferring demography, we can use the methods to study a variety of questions: 1) comparing sex chromosomes with autosomes (for finding evidence for male-driven evolution, an unequal sex ratio, or sex-biased demographic changes) and 2) analyzing multilocus data from within autosomes or sex chromosomes (for studying determinants of variability in Ne and u). Simulations suggest that the methods can provide accurate parameter estimates and have substantial statistical power for detecting difference in Ne and u. As an example, we use the methods to analyze a polymorphism data set from Drosophila simulans. We find clear evidence for rapid population expansion. The results also indicate that the autosomes have a higher mutation rate than the X chromosome and that the sex ratio is probably female-biased. The new methods have been implemented in a user-friendly package.
Collapse
Affiliation(s)
- Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Benjamin C Jackson
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Henry J Barton
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
39
|
Ragsdale AP, Gravel S. Models of archaic admixture and recent history from two-locus statistics. PLoS Genet 2019; 15:e1008204. [PMID: 31181058 PMCID: PMC6586359 DOI: 10.1371/journal.pgen.1008204] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 06/20/2019] [Accepted: 05/17/2019] [Indexed: 11/18/2022] Open
Abstract
We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 - 8% genetic ancestry to individuals in world-wide populations.
Collapse
Affiliation(s)
- Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| |
Collapse
|
40
|
Linck E, Battey CJ. Minor allele frequency thresholds strongly affect population structure inference with genomic data sets. Mol Ecol Resour 2019; 19:639-647. [DOI: 10.1111/1755-0998.12995] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 12/14/2018] [Accepted: 01/04/2019] [Indexed: 01/25/2023]
Affiliation(s)
- Ethan Linck
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington Seattle Washington
| | - C. J. Battey
- Department of Biology and Institute of Ecology and Evolution University of Oregon Eugene Oregon
| |
Collapse
|
41
|
Warmuth VM, Ellegren H. Genotype‐free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data. Mol Ecol Resour 2019; 19:586-596. [DOI: 10.1111/1755-0998.12990] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 02/07/2023]
Affiliation(s)
- Vera M. Warmuth
- Department of Evolutionary Biology, Evolutionary Biology Centre Uppsala University Uppsala Sweden
- Division of Evolutionary Biology, Faculty of Biology Ludwig‐Maximilians‐Universität München Martinsried Germany
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre Uppsala University Uppsala Sweden
| |
Collapse
|
42
|
Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference. Mol Biol Evol 2019; 36:220-238. [PMID: 30517664 PMCID: PMC6367976 DOI: 10.1093/molbev/msy224] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.
Collapse
Affiliation(s)
- Lex Flagel
- Monsanto Company, Chesterfield, MO
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Yaniv Brandvain
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
43
|
Beichman AC, Huerta-Sanchez E, Lohmueller KE. Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2018. [DOI: 10.1146/annurev-ecolsys-110617-062431] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequence data are now being routinely obtained from many nonmodel organisms. These data contain a wealth of information about the demographic history of the populations from which they originate. Many sophisticated statistical inference procedures have been developed to infer the demographic history of populations from this type of genomic data. In this review, we discuss the different statistical methods available for inference of demography, providing an overview of the underlying theory and logic behind each approach. We also discuss the types of data required and the pros and cons of each method. We then discuss how these methods have been applied to a variety of nonmodel organisms. We conclude by presenting some recommendations for researchers looking to use genomic data to infer demographic history.
Collapse
Affiliation(s)
- Annabel C. Beichman
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
| | - Emilia Huerta-Sanchez
- Department of Molecular and Cell Biology, University of California, Merced, California 95343, USA
- Current affiliation: Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095, USA
- Interdepartmental Program in Bioinformatics and Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
44
|
Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference. Genetics 2018; 210:665-682. [PMID: 30064984 DOI: 10.1534/genetics.118.300733] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 07/30/2018] [Indexed: 11/18/2022] Open
Abstract
The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to zero or diverge to infinity, and show undesirable sensitivity to perturbations in the data. The goal of this article is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographies and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model, and generalize our intuition to arbitrary sample sizes using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only [Formula: see text] epochs, where [Formula: see text] is between [Formula: see text] and [Formula: see text] The set of expected SFS for piecewise-constant demographies with fewer than [Formula: see text] epochs is open and nonconvex, which causes the above phenomena for inference from data.
Collapse
|
45
|
Guo J, Wu Y, Zhu Z, Zheng Z, Trzaskowski M, Zeng J, Robinson MR, Visscher PM, Yang J. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat Commun 2018; 9:1865. [PMID: 29760457 PMCID: PMC5951811 DOI: 10.1038/s41467-018-04191-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 04/12/2018] [Indexed: 11/09/2022] Open
Abstract
There are mean differences in complex traits among global human populations. We hypothesize that part of the phenotypic differentiation is due to natural selection. To address this hypothesis, we assess the differentiation in allele frequencies of trait-associated SNPs among African, Eastern Asian, and European populations for ten complex traits using data of large sample size (up to ~405,000). We show that SNPs associated with height ([Formula: see text]), waist-to-hip ratio ([Formula: see text]), and schizophrenia ([Formula: see text]) are significantly more differentiated among populations than matched "control" SNPs, suggesting that these trait-associated SNPs have undergone natural selection. We further find that SNPs associated with height ([Formula: see text]) and schizophrenia ([Formula: see text]) show significantly higher variance in linkage disequilibrium (LD) scores across populations than control SNPs. Our results support the hypothesis that natural selection has shaped the genetic differentiation of complex traits, such as height and schizophrenia, among worldwide populations.
Collapse
Affiliation(s)
- Jing Guo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Yang Wu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Zhihong Zhu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,The Eye Hospital, School of Ophthalmology and Optometry, Wenzhou Medical University, 325027, Zhejiang, China
| | - Maciej Trzaskowski
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Matthew R Robinson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Department of Computational Biology, University of Lausanne, 1011, Lausanne, Switzerland
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Yang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia. .,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
46
|
Population genetic evidence for positive and purifying selection acting at the human IFN-γ locus in Africa. Genes Immun 2018; 20:143-157. [PMID: 29599512 DOI: 10.1038/s41435-018-0016-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 01/22/2018] [Accepted: 01/26/2018] [Indexed: 01/09/2023]
Abstract
Despite its critical role in the defense against microbial infection and tumor development, little is known about the range of nucleotide and haplotype variation at IFN-γ, or the evolutionary forces that have shaped patterns of diversity at this locus. To address this gap in knowledge, we examined sequence data from the IFN-γ gene in 1461 individuals from 15 worldwide populations. Our analyses uncovered novel patterns of variation in distinct African populations, including an excess of high frequency-derived alleles, unusually long haplotype structure surrounding the IFN-γ gene, and a "star-like" genealogy of African-specific haplotypes carrying variants previously associated with infectious disease. We also inferred a deep time to coalescence of variation at IFN-γ (~ 0.8 million years ago) and ancient ages for common polymorphisms predating the evolution of modern humans. Taken together, these results are congruent with a model of positive selection on standing variation in African populations. Furthermore, we inferred that common variants in intron 3 of IFN-γ are the likely targets of selection. In addition, we observed a paucity of non-synonymous substitutions relative to synonymous changes in the exons of IFN-γ in African and non-African populations, suggestive of strong purifying selection. Therefore, we contend that positive and purifying selection have influenced levels of diversity in different regions of IFN-γ, implying that these distinct genic regions are, or have been, functionally important. Overall, this study provides additional insights into the evolutionary events that have contributed to the frequency and distribution of alleles having a role in human health and disease.
Collapse
|
47
|
Baharian S, Gravel S. On the decidability of population size histories from finite allele frequency spectra. Theor Popul Biol 2018; 120:42-51. [PMID: 29305873 DOI: 10.1016/j.tpb.2017.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 12/15/2017] [Accepted: 12/20/2017] [Indexed: 10/18/2022]
Abstract
Understanding the historical events that shaped current genomic diversity has applications in historical, biological, and medical research. However, the amount of historical information that can be inferred from genetic data is finite, which leads to an identifiability problem. For example, different historical processes can lead to identical distribution of allele frequencies. This identifiability issue casts a shadow of uncertainty over the results of any study which uses the frequency spectrum to infer past demography. It has been argued that imposing mild 'reasonableness' constraints on demographic histories can enable unique reconstruction, at least in an idealized setting where the length of the genome is nearly infinite. Here, we discuss this problem for finite sample size and genome length. Using the diffusion approximation, we obtain bounds on likelihood differences between similar demographic histories, and use them to construct pairs of very different reasonable histories that produce almost-identical frequency distributions. The finite-genome problem therefore remains poorly determined even among reasonable histories. Where fits to few-parameter models produce narrow parameter confidence intervals, large uncertainties lurk hidden by model assumption.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada; McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.
| |
Collapse
|
48
|
Li X, Redline S, Zhang X, Williams S, Zhu X. Height associated variants demonstrate assortative mating in human populations. Sci Rep 2017; 7:15689. [PMID: 29146993 PMCID: PMC5691191 DOI: 10.1038/s41598-017-15864-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 11/03/2017] [Indexed: 12/23/2022] Open
Abstract
Understanding human mating patterns, which can affect population genetic structure, is important for correctly modeling populations and performing genetic association studies. Prior studies of assortative mating in humans focused on trait similarity among spouses and relatives via phenotypic correlations. Limited research has quantified the genetic consequences of assortative mating. The degree to which the non-random mating influences genetic architecture remains unclear. Here, we studied genetic variants associated with human height to assess the degree of height-related assortative mating in European-American and African-American populations. We compared the inbreeding coefficient estimated using known height associated variants with that calculated from frequency matched sets of random variants. We observed significantly higher inbreeding coefficients for the height associated variants than from frequency matched random variants (P < 0.05), demonstrating height-related assortative mating in both populations.
Collapse
Affiliation(s)
- Xiaoyin Li
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Susan Redline
- Departments of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Xiang Zhang
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, State College, PA, USA
| | - Scott Williams
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA.
| |
Collapse
|
49
|
Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories. G3-GENES GENOMES GENETICS 2017; 7:3605-3620. [PMID: 28893846 PMCID: PMC5677151 DOI: 10.1534/g3.117.300259] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Inference of demographic history from genetic data is a primary goal of population genetics of model and nonmodel organisms. Whole genome-based approaches such as the pairwise/multiple sequentially Markovian coalescent methods use genomic data from one to four individuals to infer the demographic history of an entire population, while site frequency spectrum (SFS)-based methods use the distribution of allele frequencies in a sample to reconstruct the same historical events. Although both methods are extensively used in empirical studies and perform well on data simulated under simple models, there have been only limited comparisons of them in more complex and realistic settings. Here we use published demographic models based on data from three human populations (Yoruba, descendants of northwest-Europeans, and Han Chinese) as an empirical test case to study the behavior of both inference procedures. We find that several of the demographic histories inferred by the whole genome-based methods do not predict the genome-wide distribution of heterozygosity, nor do they predict the empirical SFS. However, using simulated data, we also find that the whole genome methods can reconstruct the complex demographic models inferred by SFS-based methods, suggesting that the discordant patterns of genetic variation are not attributable to a lack of statistical power, but may reflect unmodeled complexities in the underlying demography. More generally, our findings indicate that demographic inference from a small number of genomes, routine in genomic studies of nonmodel organisms, should be interpreted cautiously, as these models cannot recapitulate other summaries of the data.
Collapse
|
50
|
Cubry P, Vigouroux Y, François O. The Empirical Distribution of Singletons for Geographic Samples of DNA Sequences. Front Genet 2017; 8:139. [PMID: 29033977 PMCID: PMC5627571 DOI: 10.3389/fgene.2017.00139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 09/14/2017] [Indexed: 12/31/2022] Open
Abstract
Rare variants are important for drawing inference about past demographic events in a species history. A singleton is a rare variant for which genetic variation is carried by a unique chromosome in a sample. How singletons are distributed across geographic space provides a local measure of genetic diversity that can be measured at the individual level. Here, we define the empirical distribution of singletons in a sample of chromosomes as the proportion of the total number of singletons that each chromosome carries, and we present a theoretical background for studying this distribution. Next, we use computer simulations to evaluate the potential for the empirical distribution of singletons to provide a description of genetic diversity across geographic space. In a Bayesian framework, we show that the empirical distribution of singletons leads to accurate estimates of the geographic origin of range expansions. We apply the Bayesian approach to estimating the origin of the cultivated plant species Pennisetum glaucum [L.] R. Br. (pearl millet) in Africa, and find support for range expansion having started from Northern Mali. Overall, we report that the empirical distribution of singletons is a useful measure to analyze results of sequencing projects based on large scale sampling of individuals across geographic space.
Collapse
Affiliation(s)
- Philippe Cubry
- UMR DIADE, University of Montpellier, Montpellier, France
| | - Yves Vigouroux
- UMR DIADE, University of Montpellier, Montpellier, France
| | - Olivier François
- TIMC-IMAG UMR 5525, Centre National de la Recherche Scientifique (CNRS), Université Grenoble-Alpes, Grenoble, France
| |
Collapse
|