1
|
Bieniek-Kobuszewska M, Panasiewicz G. Polymorphism Identification in the Coding Sequences (ORFs) of the Porcine Pregnancy-Associated Glycoprotein 2-like Gene Subfamily in Pigs. Genes (Basel) 2024; 15:1149. [PMID: 39336740 PMCID: PMC11431107 DOI: 10.3390/genes15091149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/22/2024] [Accepted: 08/27/2024] [Indexed: 09/30/2024] Open
Abstract
Pregnancy-associated glycoproteins (PAGs) are a polygenic family with many scattered genes and pseudogenes resulting from the duplication or fusion of a pseudogene with expression beginning in the trophoblast during the peri-implantation period and continuing in the trophectoderm. In this study, single-nucleotide polymorphism (SNP) and insertion/deletion (InDels) in the open reading frame (nine exons) of crossbreed pigs are reported for the first time. Novel SNPs/InDels were researched using genomic DNA templates isolated from the leukocytes of crossbreed pigs (N = 25), which were amplified, gel-out-purified, and sequenced. Sixteen SNPs and one InDel (g.6961_6966 Ins TGCCAA) were identified in the crossbreed pigs. In silico analysis revealed that among 16 SNPs, only 10 SNPs cause amino acid (aa) substitutions, and InDel codes asparagine (N298) and alanine (A299). The results provide a novel broad-based database (main pattern) that will be critical for future research into the possible correlations between the SNP genotypes of the pPAG2-L subfamily in pigs of various breeds whose reproductive traits are known.
Collapse
Affiliation(s)
- Martyna Bieniek-Kobuszewska
- Voivodeship Sanitary-Epidemiological Station in Olsztyn, Laboratory of Epidemiological and Clinical Research, Department of Virology and Serology, Zolnierska Str. 16, 10-561 Olsztyn, Poland;
| | - Grzegorz Panasiewicz
- Department of Animal Anatomy and Physiology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego Str. 1A, 10-719 Olsztyn, Poland
| |
Collapse
|
2
|
Seyum EG, Bille NH, Abtew WG, Rastas P, Arifianto D, Domonhédo H, Cochard B, Jacob F, Riou V, Pomiès V, Lopez D, Bell JM, Cros D. Genome properties of key oil palm (Elaeis guineensis Jacq.) breeding populations. J Appl Genet 2022; 63:633-650. [PMID: 35691996 DOI: 10.1007/s13353-022-00708-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/26/2022] [Accepted: 06/04/2022] [Indexed: 11/29/2022]
Abstract
A good knowledge of the genome properties of the populations makes it possible to optimize breeding methods, in particular genomic selection (GS). In oil palm (Elaeis guineensis Jacq), the world's main source of vegetable oil, this would provide insight into the promising GS results obtained so far. The present study considered two complex breeding populations, Deli and La Mé, with 943 individuals and 7324 single-nucleotide polymorphisms (SNPs) from genotyping-by-sequencing. Linkage disequilibrium (LD), haplotype sharing, effective size (Ne), and fixation index (Fst) were investigated. A genetic linkage map spanning 1778.52 cM and with a recombination rate of 2.85 cM/Mbp was constructed. The LD at r2=0.3, considered the minimum to get reliable GS results, spanned over 1.05 cM/0.22 Mbp in Deli and 0.9 cM/0.21 Mbp in La Mé. The significant degree of differentiation existing between Deli and La Mé was confirmed by the high Fst value (0.53), the pattern of correlation of SNP heterozygosity and allele frequency among populations, and the decrease of persistence of LD and of haplotype sharing among populations with increasing SNP distance. However, the level of resemblance between the two populations over short genomic distances (correlation of r values between populations >0.6 for SNPs separated by <0.5 cM/1 kbp and percentage of common haplotypes >40% for haplotypes <3600 bp/0.20 cM) likely explains the superiority of GS models ignoring the parental origin of marker alleles over models taking this information into account. The two populations had low Ne (<5). Population-specific genetic maps and reference genomes are recommended for future studies.
Collapse
Affiliation(s)
- Essubalew Getachew Seyum
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
- CETIC (African Center of Excellence in Information and Communication Technologies), University of Yaoundé I, Yaoundé, Cameroon
- Department of Horticulture and Plant Sciences, Jimma University College of Agriculture and Veterinary Medicine, P.O. Box 307, Jimma, Ethiopia
| | - Ngalle Hermine Bille
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
| | - Wosene Gebreselassie Abtew
- Department of Horticulture and Plant Sciences, Jimma University College of Agriculture and Veterinary Medicine, P.O. Box 307, Jimma, Ethiopia
| | - Pasi Rastas
- Institute of Biotechnology, Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014, Helsinki, Finland
| | | | | | | | | | - Virginie Riou
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | - Virginie Pomiès
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | - David Lopez
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | - Joseph Martin Bell
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
| | - David Cros
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP Institut, F-34398, Montpellier, France.
- UMR AGAP Institut, University of Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France.
| |
Collapse
|
3
|
Waller DM. Addressing Darwin's dilemma: Can pseudo-overdominance explain persistent inbreeding depression and load? Evolution 2021; 75:779-793. [PMID: 33598971 DOI: 10.1111/evo.14189] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 01/06/2021] [Accepted: 01/30/2021] [Indexed: 01/01/2023]
Abstract
Darwin spent years investigating the effects of self-fertilization, concluding that "nature abhors perpetual self-fertilization." Given that selection purges inbred populations of strongly deleterious mutations and drift fixes mild mutations, why does inbreeding depression (ID) persist in highly inbred taxa and why do no purely selfing taxa exist? Background selection, associations and interference among loci, and drift within small inbred populations all limit selection while often increasing fixation. These mechanisms help to explain why more inbred populations in most species consistently show more fixed load. This drift load is manifest in the considerable heterosis regularly observed in between-population crosses. Such heterosis results in subsequent high ID, suggesting a mechanism by which small populations could retain variation and inbreeding load. Multiple deleterious recessive mutations linked in repulsion generate pseudo-overdominance. Many tightly linked load loci could generate a balanced segregating load high enough to sustain ID over many generations. Such pseudo-overdominance blocks (or "PODs") are more likely to occur in regions of low recombination. They should also result in clear genetic signatures including genomic hotspots of heterozygosity; distinct haplotypes supporting alleles at intermediate frequency; and high linkage disequilibrium in and around POD regions. Simulation and empirical studies tend to support these predictions. Additional simulations and comparative genomic analyses should explore POD dynamics in greater detail to resolve whether PODs exist in sufficient strength and number to account for why ID and load persist within inbred lineages.
Collapse
Affiliation(s)
- Donald M Waller
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin, 53706
| |
Collapse
|
4
|
Lee CY. The fractal dimension as a measure for characterizing genetic variation of the human genome. Comput Biol Chem 2020; 87:107278. [PMID: 32563074 DOI: 10.1016/j.compbiolchem.2020.107278] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 11/18/2019] [Accepted: 05/04/2020] [Indexed: 11/18/2022]
Abstract
Motivated by the characteristics of highly clustered single nucleotide polymorphism (SNP) across the human genome, we propose a set of chromosome-wise fractal dimensions as a measure for identifying an individual for human polymorphism. The fractal dimension quantifies the degree of clustered distribution of SNPs and represents parsimoniously the genetic variation in a chromosome. In this sense, the proposed scheme projects the SNP genotype data into a new space which is simpler and lower in dimension. As an illustrative example, we estimate the chromosome-wise fractal dimensions of SNPs that are extracted from the HapMap of Phase III data set. To determine the validity of the proposed measure, we apply principal component analysis (PCA) to the set of estimated fractal dimensions and demonstrate that the set more or less described the population structure of 11 global populations. We also use multidimensional scaling to relate the genetic distances based on PCA to the geographical distances between global populations. This shows that, similar to the SNP genotype data, the fractal dimensions also has a role in genetic distance in the population structure. In addition, we apply the proposed measure to a signature for the classification of global populations by developing a support vector machine model. The selected feature model predicts the global population with a balanced accuracy of about 77%. These results support that the fractal dimension is an efficient way to describe the genetic variation of global populations.
Collapse
Affiliation(s)
- Chang-Yong Lee
- The Department of Industrial and Systems Engineering, Kongju National University, Cheonan, 31080, South Korea.
| |
Collapse
|
5
|
Boenn M. ShRangeSim: Simulation of Single Nucleotide Polymorphism Clusters in Next-Generation Sequencing Data. J Comput Biol 2018; 25:613-622. [PMID: 29658778 DOI: 10.1089/cmb.2018.0007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomic variations are in the focus of research to uncover mechanisms of host-pathogen interactions and diseases such as cancer. Nowadays, next-generation sequencing (NGS) data are analyzed through dedicated pipelines to detect them. Surrogate NGS data in conjunction with genomic variations help to evaluate pipelines and validate their outcomes, fostering selection of proper tools for a given scientific question. I describe how existing approaches for simulating NGS data in conjunction with genomic variations fail to model local enrichments of single nucleotide polymorphisms (SNPs), so called SNP clusters. Two distributions for count data are applied to publicly available collections of genomic variations. The results suggest modeling of SNP cluster sizes by overdispersion-aware distributions.
Collapse
Affiliation(s)
- Markus Boenn
- 1 Institute of Computer Science, Martin Luther University Halle-Wittenberg , Halle/Saale, Germany .,2 Department of Soil Ecology, UFZ - Helmholtz Centre for Environmental Research , Halle/Saale, Germany .,3 German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig , Leipzig, Germany
| |
Collapse
|
6
|
|
7
|
Lee CY. A model for the clustered distribution of SNPs in the human genome. Comput Biol Chem 2016; 64:94-98. [PMID: 27318295 DOI: 10.1016/j.compbiolchem.2016.06.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Revised: 04/16/2016] [Accepted: 06/06/2016] [Indexed: 12/17/2022]
Abstract
Motivated by a non-random but clustered distribution of SNPs, we introduce a phenomenological model to account for the clustering properties of SNPs in the human genome. The phenomenological model is based on a preferential mutation to the closer proximity of existing SNPs. With the Hapmap SNP data, we empirically demonstrate that the preferential model is better for illustrating the clustered distribution of SNPs than the random model. Moreover, the model is applicable not only to autosomes but also to the X chromosome, although the X chromosome has different characteristics from autosomes. The analysis of the estimated parameters in the model can explain the pronounced population structure and the low genetic diversity of the X chromosome. In addition, correlation between the parameters reveals the population-wise difference of the mutation probability. These results support the mutational non-independence hypothesis against random mutation.
Collapse
Affiliation(s)
- Chang-Yong Lee
- The Department of Industrial and Systems Engineering, Kongju National University, Cheonan 330-717, South Korea.
| |
Collapse
|
8
|
Genome-wide resequencing of KRICE_CORE reveals their potential for future breeding, as well as functional and evolutionary studies in the post-genomic era. BMC Genomics 2016; 17:408. [PMID: 27229151 PMCID: PMC4882841 DOI: 10.1186/s12864-016-2734-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 05/12/2016] [Indexed: 11/10/2022] Open
Abstract
Background Rice germplasm collections continue to grow in number and size around the world. Since maintaining and screening such massive resources remains challenging, it is important to establish practical methods to manage them. A core collection, by definition, refers to a subset of the entire population that preserves the majority of genetic diversity, enhancing the efficiency of germplasm utilization. Results Here, we report whole-genome resequencing of the 137 rice mini core collection or Korean rice core set (KRICE_CORE) that represents 25,604 rice germplasms deposited in the Korean genebank of the Rural Development Administration (RDA). We implemented the Illumina HiSeq 2000 and 2500 platform to produce short reads and then assembled those with 9.8 depths using Nipponbare as a reference. Comparisons of the sequences with the reference genome yielded more than 15 million (M) single nucleotide polymorphisms (SNPs) and 1.3 M INDELs. Phylogenetic and population analyses using 2,046,529 high-quality SNPs successfully assigned rice accessions to the relevant rice subgroups, suggesting that these SNPs capture evolutionary signatures that have accumulated in rice subpopulations. Furthermore, genome-wide association studies (GWAS) for four exemplary agronomic traits in the KRIC_CORE manifest the utility of KRICE_CORE; that is, identifying previously defined genes or novel genetic factors that potentially regulate important phenotypes. Conclusion This study provides strong evidence that the size of KRICE_CORE is small but contains high genetic and functional diversity across the genome. Thus, our resequencing results will be useful for future breeding, as well as functional and evolutionary studies, in the post-genomic era. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2734-y) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Xenikoudakis G, Ersmark E, Tison JL, Waits L, Kindberg J, Swenson JE, Dalén L. Consequences of a demographic bottleneck on genetic structure and variation in the Scandinavian brown bear. Mol Ecol 2015; 24:3441-54. [PMID: 26042479 DOI: 10.1111/mec.13239] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Revised: 05/07/2015] [Accepted: 05/08/2015] [Indexed: 11/30/2022]
Abstract
The Scandinavian brown bear went through a major decline in population size approximately 100 years ago, due to intense hunting. After being protected, the population subsequently recovered and today numbers in the thousands. The genetic diversity in the contemporary population has been investigated in considerable detail, and it has been shown that the population consists of several subpopulations that display relatively high levels of genetic variation. However, previous studies have been unable to resolve the degree to which the demographic bottleneck impacted the contemporary genetic structure and diversity. In this study, we used mitochondrial and microsatellite DNA markers from pre- and postbottleneck Scandinavian brown bear samples to investigate the effect of the bottleneck. Simulation and multivariate analysis suggested the same genetic structure for the historical and modern samples, which are clustered into three subpopulations in southern, central and northern Scandinavia. However, the southern subpopulation appears to have gone through a marked change in allele frequencies. When comparing the mitochondrial DNA diversity in the whole population, we found a major decline in haplotype numbers across the bottleneck. However, the loss of autosomal genetic diversity was less pronounced, although a significant decline in allelic richness was observed in the southern subpopulation. Approximate Bayesian computations provided clear support for a decline in effective population size during the bottleneck, in both the southern and northern subpopulations. These results have implications for the future management of the Scandinavian brown bear because they indicate a recent loss in genetic diversity and also that the current genetic structure may have been caused by historical ecological processes rather than recent anthropogenic persecution.
Collapse
Affiliation(s)
- G Xenikoudakis
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405, Stockholm, Sweden.,Department of Zoology, Stockholm University, SE-106 91, Stockholm, Sweden
| | - E Ersmark
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405, Stockholm, Sweden.,Department of Zoology, Stockholm University, SE-106 91, Stockholm, Sweden
| | - J-L Tison
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405, Stockholm, Sweden
| | - L Waits
- Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID, 83844, USA
| | - J Kindberg
- Department of Wildlife, Fish, and Environmental Studies, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - J E Swenson
- Department of Ecology and Natural Resource Management, Norwegian University of Life Sciences, NO-1432, Ås, Norway.,Norwegian Institute for Nature Research, PO Box 5685 Sluppen, NO-7485, Trondheim, Norway
| | - L Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405, Stockholm, Sweden
| |
Collapse
|
10
|
A linkage disequilibrium perspective on the genetic mosaic of speciation in two hybridizing Mediterranean white oaks. Heredity (Edinb) 2014; 114:373-86. [PMID: 25515016 DOI: 10.1038/hdy.2014.113] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 10/11/2014] [Accepted: 11/12/2014] [Indexed: 01/09/2023] Open
Abstract
We analyzed the genetic mosaic of speciation in two hybridizing Mediterranean white oaks from the Iberian Peninsula (Quercus faginea Lamb. and Quercus pyrenaica Willd.). The two species show ecological divergence in flowering phenology, leaf morphology and composition, and in their basic or acidic soil preferences. Ninety expressed sequence tag-simple sequence repeats (EST-SSRs) and eight nuclear SSRs were genotyped in 96 trees from each species. Genotyping was designed in two steps. First, we used 69 markers evenly distributed over the 12 linkage groups (LGs) of the oak linkage map to confirm the species genetic identity of the sampled genotypes, and searched for differentiation outliers. Then, we genotyped 29 additional markers from the chromosome bins containing the outliers and repeated the multilocus scans. We found one or two additional outliers within four saturated bins, thus confirming that outliers are organized into clusters. Linkage disequilibrium (LD) was extensive; even for loosely linked and for independent markers. Consequently, score tests for association between two-marker haplotypes and the 'species trait' showed a broad genomic divergence, although substantial variation across the genome and within LGs was also observed. We discuss the influence of several confounding effects on neutrality tests and review the evolutionary processes leading to extensive LD. Finally, we examine how LD analyses within regions that contain outlier clusters and quantitative trait loci can help to identify regions of divergence and/or genomic hitchhiking in the light of predictions from ecological speciation theory.
Collapse
|
11
|
Abstract
The "LD curve" relates the linkage disequilibrium (LD) between pairs of nucleotide sites to the distance that separates them along the chromosome. The shape of this curve reflects natural selection, admixture between populations, and the history of population size. This article derives new results about the last of these effects. When a population expands in size, the LD curve grows steeper, and this effect is especially pronounced following a bottleneck in population size. When a population shrinks, the LD curve rises but remains relatively flat. As LD converges toward a new equilibrium, its time path may not be monotonic. Following an episode of growth, for example, it declines to a low value before rising toward the new equilibrium. These changes happen at different rates for different LD statistics. They are especially slow for estimates of [Formula: see text], which therefore allow inferences about ancient population history. For the human population of Europe, these results suggest a history of population growth.
Collapse
|
12
|
Choudhury A, Hazelhurst S, Meintjes A, Achinike-Oduaran O, Aron S, Gamieldien J, Jalali Sefid Dashti M, Mulder N, Tiffin N, Ramsay M. Population-specific common SNPs reflect demographic histories and highlight regions of genomic plasticity with functional relevance. BMC Genomics 2014; 15:437. [PMID: 24906912 PMCID: PMC4092225 DOI: 10.1186/1471-2164-15-437] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 05/19/2014] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data. RESULTS The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones. CONCLUSIONS Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.
Collapse
Affiliation(s)
- Ananyo Choudhury
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Scott Hazelhurst
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Ayton Meintjes
- />Department Clinical Laboratory Sciences, Computational Biology Group, IDM, University of Cape Town, Cape Town, South Africa
| | - Ovokeraye Achinike-Oduaran
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Shaun Aron
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Junaid Gamieldien
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Mahjoubeh Jalali Sefid Dashti
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Nicola Mulder
- />Department Clinical Laboratory Sciences, Computational Biology Group, IDM, University of Cape Town, Cape Town, South Africa
| | - Nicki Tiffin
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Michèle Ramsay
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
13
|
Amos W. Variation in heterozygosity predicts variation in human substitution rates between populations, individuals and genomic regions. PLoS One 2013; 8:e63048. [PMID: 23646173 PMCID: PMC3639965 DOI: 10.1371/journal.pone.0063048] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 03/28/2013] [Indexed: 01/11/2023] Open
Abstract
The "heterozygote instability" (HI) hypothesis suggests that gene conversion events focused on heterozygous sites during meiosis locally increase the mutation rate, but this hypothesis remains largely untested. As humans left Africa they lost variability, which, if HI operates, should have reduced the mutation rate in non-Africans. Relative substitution rates were quantified in diverse humans using aligned whole genome sequences from the 1,000 genomes project. Substitution rate is consistently greater in Africans than in non-Africans, but only in diploid regions of the genome, consistent with a role for heterozygosity. Analysing the same data partitioned into a series of non-overlapping 2 Mb windows reveals a strong, non-linear correlation between the amount of heterozygosity lost "out of Africa" and the difference in substitution rate between Africans and non-Africans. Putative recent mutations, derived variants that occur only once among the 80 human chromosomes sampled, occur preferentially at the centre of 2 Kb windows that have elevated heterozygosity compared both with the same region in a closely related population and with an immediately adjacent region in the same population. More than half of all substitutions appear attributable to variation in heterozygosity. This observation provides strong support for HI with implications for many branches of evolutionary biology.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, Cambridge University, Cambridge, Cambridgeshire, United Kingdom.
| |
Collapse
|
14
|
The transcript-centric mutations in human genomes. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:11-22. [PMID: 22449397 PMCID: PMC5054492 DOI: 10.1016/s1672-0229(11)60029-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 02/15/2012] [Indexed: 01/30/2023]
Abstract
Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular organisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown between gene expression and mutation, where highly-expressed genes accumulate more mutations than lowly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C→T, A→G, C→G, and G→T in human genomes and identified a negative gradient of the sequence variations aligning from the 5′ end to the 3′ end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
Collapse
|
15
|
Hausmann A, Haszprunar G, Hebert PDN. DNA barcoding the geometrid fauna of Bavaria (Lepidoptera): successes, surprises, and questions. PLoS One 2011; 6:e17134. [PMID: 21423340 PMCID: PMC3040642 DOI: 10.1371/journal.pone.0017134] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 01/21/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The State of Bavaria is involved in a research program that will lead to the construction of a DNA barcode library for all animal species within its territorial boundaries. The present study provides a comprehensive DNA barcode library for the Geometridae, one of the most diverse of insect families. METHODOLOGY/PRINCIPAL FINDINGS This study reports DNA barcodes for 400 Bavarian geometrid species, 98 per cent of the known fauna, and approximately one per cent of all Bavarian animal species. Although 98.5% of these species possess diagnostic barcode sequences in Bavaria, records from neighbouring countries suggest that species-level resolution may be compromised in up to 3.5% of cases. All taxa which apparently share barcodes are discussed in detail. One case of modest divergence (1.4%) revealed a species overlooked by the current taxonomic system: Eupithecia goossensiata Mabille, 1869 stat.n. is raised from synonymy with Eupithecia absinthiata (Clerck, 1759) to species rank. Deep intraspecific sequence divergences (>2%) were detected in 20 traditionally recognized species. CONCLUSIONS/SIGNIFICANCE The study emphasizes the effectiveness of DNA barcoding as a tool for monitoring biodiversity. Open access is provided to a data set that includes records for 1,395 geometrid specimens (331 species) from Bavaria, with 69 additional species from neighbouring regions. Taxa with deep intraspecific sequence divergences are undergoing more detailed analysis to ascertain if they represent cases of cryptic diversity.
Collapse
Affiliation(s)
- Axel Hausmann
- Entomology Department, Zoological Collection of the State of Bavaria, Munich, Germany.
| | | | | |
Collapse
|
16
|
Amos W, Bryant C. Using human demographic history to infer natural selection reveals contrasting patterns on different families of immune genes. Proc Biol Sci 2010; 278:1587-94. [PMID: 21068042 DOI: 10.1098/rspb.2010.2056] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Detecting regions of the human genome that are, or have been, influenced by natural selection remains an important goal for geneticists. Many methods are used to infer selection, but there is a general reliance on an accurate understanding of how mutation and recombination events are distributed, and the well-known link between these processes and their evolutionary transience introduces uncertainty into inferences. Here, we present and apply two new, independent approaches; one based on single nucleotide polymorphisms (SNPs) that exploits geographical patterns in how humans lost variability as we colonized the world, the other based on the relationship between microsatellite repeat number and heterozygosity. We show that the two methods give concordant results. Of these, the SNP-based method is both widely applicable and detects selection over a well-defined time interval, the last 50 000 years. Analysis of all human genes by their Gene Ontology codes reveals how accelerated and decelerated loss of variability are both preferentially associated with immune genes. Applied to 168 immune genes used as the focus of a previous study, we show that members of the same gene family tend to yield similar indices of selection, even when located on different chromosomes. We hope our approach will provide a useful tool with which to infer where selection has acted to shape the human genome.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK.
| | | |
Collapse
|
17
|
Amos W. Heterozygosity and mutation rate: evidence for an interaction and its implications: the potential for meiotic gene conversions to influence both mutation rate and distribution. Bioessays 2010; 32:82-90. [PMID: 19967709 DOI: 10.1002/bies.200900108] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
If natural selection chose where new mutations occur it might well favour placing them near existing polymorphisms, thereby avoiding disruption of areas that work while adding novelty to regions where variation is tolerated or even beneficial. Such a system could operate if heterozygous sites are recognised and 'repaired' during the initial stages of crossing over. Such repairs involve an extra round of DNA replication, providing an opportunity for further mutations, thereby raising the local mutation rate. If so, the changes in heterozygosity that occur when populations grow or shrink could feed back to modulate both the rate and the distribution of mutations. Here, I review evidence from isozymes, microsatellites and single nucleotide polymorphisms that this potential is realised in real populations. I then consider the likely implications, focusing particularly on how these processes might affect microsatellites, concluding that heterozygosity does impact on the rate and distribution of mutations.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, University of Cambridge, UK.
| |
Collapse
|
18
|
Amos W. Even small SNP clusters are non-randomly distributed: is this evidence of mutational non-independence? Proc Biol Sci 2010; 277:1443-9. [PMID: 20071383 DOI: 10.1098/rspb.2009.1757] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are distributed highly non-randomly in the human genome through a variety of processes from ascertainment biases (i.e. the preferential development of SNPs around interesting genes) to the action of mutation hotspots and natural selection. However, with more systematic SNP development, one might expect an increasing proportion of SNPs to be distributed more or less randomly. Here, I test this null hypothesis using stochastic simulations and compare this output with that of an alternative hypothesis that mutations are more likely to occur near existing SNPs, a possibility suggested both by molecular studies of meiotic mismatch repair in yeast and by data showing that SNPs cluster around heterozygous deletions. A purely Poisson process generates SNP clusters that differ from equivalent data from human chromosome 1 in both the frequency of different-sized clusters and the SNP density within each cluster, even for small clusters of just four or five SNPs, while clusters on the X chromosome differ from those on the autosomes. In contrast, modest levels of mutational non-independence generate a reasonable fit to the real data for both cluster frequency and density, and also exhibit the evolutionary transience noted for 'mutation hotspots'. Mutational non-independence therefore provides an interesting new hypothesis that appears capable of explaining the distribution of SNPs in the human genome.
Collapse
Affiliation(s)
- William Amos
- Department of Zoology, Cambridge University, , Downing Street, Cambridge CB2 3EJ, UK
| |
Collapse
|
19
|
Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc Natl Acad Sci U S A 2008; 105:10051-6. [PMID: 18621713 DOI: 10.1073/pnas.0801848105] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Regional rates of recombination often correlate with levels of nucleotide diversity, and either selective or neutral hypotheses can explain this relationship. Regional recombination rates also correlate with nucleotide differences between human and chimpanzee, consistent with models where recombination is mutagenic; however, a lack of correlation is observed in the Drosophila melanogaster group, consistent with models invoking natural selection. Here, we revisit the relationship among recombination, diversity, and interspecies difference by generating empirical estimates of these parameters in Drosophila pseudoobscura. To measure recombination rate, we genotyped 1,294 backcross hybrids at 50 markers across the largest assembled linkage group in this species. Genome-wide diversity was estimated by sequencing a second isolate of D. pseudoobscura at shallow coverage. Alignment to the sequenced genome of the closely related species, Drosophila persimilis, provided nucleotide site orthology. Our findings demonstrate that scale is critical in determining correlates to recombination rate: fine-scale cross-over rate estimates are far stronger predictors of both diversity and interspecies difference than broad-scale estimates. The correlation of fine-scale recombination rate to diversity and interspecies difference appears to be genome-wide, evidenced by examination of an X-linked region in greater detail. Because we observe a strong correlation of cross-over rate with interspecies difference, even after correcting for segregating ancestral variation, we suggest that both mutagenic and selective forces generate these correlations, the latter in regions of low crossing over. We propose that it is not cross-overs per se that are mutagenic, but rather repair of DNA double-strand break precursors via crossing over and gene conversion.
Collapse
|