1
|
Abstract
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.
Collapse
Affiliation(s)
- Ariel W. Chan
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| | - Martha T. Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, United States of America
| | - Jean-Luc Jannink
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, United States of America
- RW Holley Center for Agriculture and Health, United States Department of Agriculture—Agricultural Research Service, Ithaca, NY, United States of America
| |
Collapse
|
2
|
Lozano R, Hamblin MT, Prochnik S, Jannink JL. Identification and distribution of the NBS-LRR gene family in the Cassava genome. BMC Genomics 2015; 16:360. [PMID: 25948536 PMCID: PMC4422547 DOI: 10.1186/s12864-015-1554-9] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 04/20/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analysing the genomic organization of resistance genes in this crop. RESULTS With searches for Pfam domains and manual curation of the cassava gene annotations, we identified 228 NBS-LRR type genes and 99 partial NBS genes. These represent almost 1% of the total predicted genes and show high sequence similarity to proteins from other plant species. Furthermore, 34 contained an N-terminal toll/interleukin (TIR)-like domain, and 128 contained an N-terminal coiled-coil (CC) domain. 63% of the 327 R genes occurred in 39 clusters on the chromosomes. These clusters are mostly homogeneous, containing NBS-LRRs derived from a recent common ancestor. CONCLUSIONS This study provides insight into the evolution of NBS-LRR genes in the cassava genome; the phylogenetic and mapping information may aid efforts to further characterize the function of these predicted R genes.
Collapse
Affiliation(s)
- Roberto Lozano
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853, USA.
| | - Martha T Hamblin
- Institute for Genomic Diversity, Biotechnology Building, Cornell University, Ithaca, NY, 14853, USA.
| | - Simon Prochnik
- US Department of Energy, Joint Genome Institute, Walnut Creek, CA, 94598, USA.
| | - Jean-Luc Jannink
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853, USA.
- United States Department of Agriculture, Agricultural Research Service (USDA-ARS) R.W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA.
| |
Collapse
|
3
|
Caniato FF, Hamblin MT, Guimaraes CT, Zhang Z, Schaffert RE, Kochian LV, Magalhaes JV. Association mapping provides insights into the origin and the fine structure of the sorghum aluminum tolerance locus, AltSB. PLoS One 2014; 9:e87438. [PMID: 24498106 PMCID: PMC3907521 DOI: 10.1371/journal.pone.0087438] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 12/24/2013] [Indexed: 11/18/2022] Open
Abstract
Root damage caused by aluminum (Al) toxicity is a major cause of grain yield reduction on acid soils, which are prevalent in tropical and subtropical regions of the world where food security is most tenuous. In sorghum, Al tolerance is conferred by SbMATE, an Al-activated root citrate efflux transporter that underlies the major Al tolerance locus, AltSB, on sorghum chromosome 3. We used association mapping to gain insights into the origin and evolution of Al tolerance in sorghum and to detect functional variants amenable to allele mining applications. Linkage disequilibrium across the AltSB locus decreased much faster than in previous reports in sorghum, and reached basal levels at approximately 1000 bp. Accordingly, intra-locus recombination events were found to be extensive. SNPs and indels highly associated with Al tolerance showed a narrow frequency range, between 0.06 and 0.1, suggesting a rather recent origin of Al tolerance mutations within AltSB. A haplotype network analysis suggested a single geographic and racial origin of causative mutations in primordial guinea domesticates in West Africa. Al tolerance assessment in accessions harboring recombinant haplotypes suggests that causative polymorphisms are localized to a ∼6 kb region including intronic polymorphisms and a transposon (MITE) insertion, whose size variation has been shown to be positively correlated with Al tolerance. The SNP with the strongest association signal, located in the second SbMATE intron, recovers 9 of the 14 highly Al tolerant accessions and 80% of all the Al tolerant and intermediately tolerant accessions in the association panel. Our results also demonstrate the pivotal importance of knowledge on the origin and evolution of Al tolerance mutations in molecular breeding applications. Allele mining strategies based on associated loci are expected to lead to the efficient identification, in diverse sorghum germplasm, of Al tolerant accessions able maintain grain yields under Al toxicity.
Collapse
Affiliation(s)
| | - Martha T. Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
| | | | - Zhiwu Zhang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
| | | | - Leon V. Kochian
- Robert W. Holley Center for Agriculture and Health, U.S. Department of Agriculture – Agricultural Research Service, Cornell University, Ithaca, New York, United States of America
| | | |
Collapse
|
4
|
Longhi S, Hamblin MT, Trainotti L, Peace CP, Velasco R, Costa F. A candidate gene based approach validates Md-PG1 as the main responsible for a QTL impacting fruit texture in apple (Malus x domestica Borkh). BMC Plant Biol 2013; 13:37. [PMID: 23496960 PMCID: PMC3599472 DOI: 10.1186/1471-2229-13-37] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 02/22/2013] [Indexed: 05/03/2023]
Abstract
BACKGROUND Apple is a widely cultivated fruit crop for its quality properties and extended storability. Among the several quality factors, texture is the most important and appreciated, and within the apple variety panorama the cortex texture shows a broad range of variability. Anatomically these variations depend on degradation events occurring in both fruit primary cell wall and middle lamella. This physiological process is regulated by an enzymatic network generally encoded by large gene families, among which polygalacturonase is devoted to the depolymerization of pectin. In apple, Md-PG1, a key gene belonging to the polygalacturonase gene family, was mapped on chromosome 10 and co-localized within the statistical interval of a major hot spot QTL associated to several fruit texture sub-phenotypes. RESULTS In this work, a QTL corresponding to the position of Md-PG1 was validated and new functional alleles associated to the fruit texture properties in 77 apple cultivars were discovered. 38 SNPs genotyped by gene full length resequencing and 2 SSR markers ad hoc targeted in the gene metacontig were employed. Out of this SNP set, eleven were used to define three significant haplotypes statistically associated to several texture components. The impact of Md-PG1 in the fruit cell wall disassembly was further confirmed by the cortex structure electron microscope scanning in two apple varieties characterized by opposite texture performance, such as 'Golden Delicious' and 'Granny Smith'. CONCLUSIONS The results here presented step forward into the genetic dissection of fruit texture in apple. This new set of haplotypes, and microsatellite alleles, can represent a valuable toolbox for a more efficient parental selection as well as the identification of new apple accessions distinguished by superior fruit quality features.
Collapse
Affiliation(s)
- Sara Longhi
- Research and Innovation Centre, Foundation Edmund Mach, Via Mach 1, 38010, San Michele all’Adige, TN, Italy
| | - Martha T Hamblin
- Institute for Genomic Diversity, Cornell University, 130 Biotechnology Building, 14853-2703, Ithaca, NY, USA
| | - Livio Trainotti
- Dipartimento di Biologia, Università di Padova, Viale G. Colombo 3, 35121, Padova, Italy
| | - Cameron P Peace
- Horticulture and Landscape Architecture, Washington State University, PO Box 646414, 99164-6414, Pullman, WA, USA
| | - Riccardo Velasco
- Research and Innovation Centre, Foundation Edmund Mach, Via Mach 1, 38010, San Michele all’Adige, TN, Italy
| | - Fabrizio Costa
- Research and Innovation Centre, Foundation Edmund Mach, Via Mach 1, 38010, San Michele all’Adige, TN, Italy
| |
Collapse
|
5
|
Zamora A, Sun Q, Hamblin MT, Aquadro CF, Kresovich S. Positively selected disease response orthologous gene sets in the cereals identified using Sorghum bicolor L. Moench expression profiles and comparative genomics. Mol Biol Evol 2009; 26:2015-30. [PMID: 19506000 DOI: 10.1093/molbev/msp114] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Disease response genes (DRGs) diverge under recurrent positive selection as a result of a molecular arms race between hosts and pathogens. Most of these studies were conducted in animals, and few defense genes have been shown to evolve adaptively in plants. To test for adaptation in the molecules mediating disease resistance in the cereals, we first combined information from the expression pattern of Sorghum bicolor genes and from divergence to the full genome of rice to identify candidate DRGs. We then used evolutionary analyses of orthologous gene sets from several grass species, to determine whether the DRGs show signals of positive selection and the residues targeted. We found 140 divergent genes upregulated under biotic stress in S. bicolor by evaluating the relative abundance of expressed sequence tags in different libraries and comparing them with rice genes. For 10 of these genes, we found sets of orthologs including sequences from rice and three other cereals; six genes showed a pattern of substitution that was consistent with positive selection. Three of these genes, a thaumatin, a peroxidase, and a barley mlo homolog, are known antifungal proteins. The other three genes with evidence of positive selection were a MCM-1 agamous deficiens SRF- (MADS) box transcription factor, an eIF5 translation initiation factor, and a gene of unknown function but with evidence of expression during stress. Permutation analyses, using different ortholog and paralog sequences, consistently identified five positively selected codons in the peroxidase, a member of a cluster of genes and a large gene family. We mapped the positively selected residues onto the structure of the peroxidase and thaumatin and found that all sites are on the surface of these proteins and several are close to biochemically determined active sites. Identifying new positively selected plant disease resistance genes and the critical amino acid sites provides a basis for functional studies that may increase our understanding of their underlying molecular mechanisms of action. Additionally, it may lead to the identification of individuals having variation at functionally important sites, as well as eventually using this information in the rational design and engineering of proteins involved in plant disease resistance.
Collapse
Affiliation(s)
- Alejandro Zamora
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA.
| | | | | | | | | |
Collapse
|
6
|
Hamblin MT, Casa AM, Sun H, Murray SC, Paterson AH, Aquadro CF, Kresovich S. Challenges of detecting directional selection after a bottleneck: lessons from Sorghum bicolor. Genetics 2006; 173:953-64. [PMID: 16547110 PMCID: PMC1526520 DOI: 10.1534/genetics.105.054312] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2005] [Accepted: 03/13/2006] [Indexed: 11/18/2022] Open
Abstract
Multilocus surveys of sequence variation can be used to identify targets of directional selection, which are expected to have reduced levels of variation. Following a population bottleneck, the signal of directional selection may be hard to detect because many loci may have low variation by chance and the frequency spectrum of variation may be perturbed in ways that resemble the effects of selection. Cultivated Sorghum bicolor contains a subset of the genetic diversity found in its wild ancestor(s) due to the combined effects of a domestication bottleneck and human selection on traits associated with agriculture. As a framework for distinguishing between the effects of demography and selection, we sequenced 204 loci in a diverse panel of 17 cultivated S. bicolor accessions. Genomewide patterns of diversity depart strongly from equilibrium expectations with regard to the variance of the number of segregating sites, the site frequency spectrum, and haplotype configuration. Furthermore, gene genealogies of most loci with an excess of low frequency variants and/or an excess of segregating sites do not show the characteristic signatures of directional and diversifying selection, respectively. A simple bottleneck model provides an improved but inadequate fit to the data, suggesting the action of other population-level factors, such as population structure and migration. Despite a known history of recent selection, we find little evidence for directional selection, likely due to low statistical power and lack of an appropriate null model.
Collapse
Affiliation(s)
- Martha T Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | | | | | | | | | | | | |
Collapse
|
7
|
Hamblin MT, Salas Fernandez MG, Casa AM, Mitchell SE, Paterson AH, Kresovich S. Equilibrium processes cannot explain high levels of short- and medium-range linkage disequilibrium in the domesticated grass Sorghum bicolor. Genetics 2005; 171:1247-56. [PMID: 16157678 PMCID: PMC1456844 DOI: 10.1534/genetics.105.041566] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Patterns of linkage disequilibrium (LD) are of interest because they provide evidence of both equilibrium (e.g., mating system or long-term population structure) and nonequilibrium (e.g., demographic or selective) processes, as well as because of their importance in strategies for identifying the genetic basis of complex phenotypes. We report patterns of short and medium range (up to 100 kb) LD in six unlinked genomic regions in the partially selfing domesticated grass, Sorghum bicolor. The extent of allelic associations in S. bicolor, as assessed by pairwise measures of LD, is higher than in maize but lower than in Arabidopsis, in qualitative agreement with expectations based on mating system. Quantitative analyses of the population recombination parameter, rho, however, based on empirical estimates of rates of recombination, mutation, and self-pollination, show that LD is more extensive than expected under a neutral equilibrium model. The disparity between rho and the population mutation parameter, , is similar to that observed in other species whose population history appears to be complex. From a practical standpoint, these results suggest that S. bicolor is well suited for association studies using reasonable numbers of markers, since LD typically extends at least several kilobases but has largely decayed by 15 kb.
Collapse
Affiliation(s)
- Martha T Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | | | | | | | | | | |
Collapse
|
8
|
Casa AM, Mitchell SE, Hamblin MT, Sun H, Bowers JE, Paterson AH, Aquadro CF, Kresovich S. Diversity and selection in sorghum: simultaneous analyses using simple sequence repeats. Theor Appl Genet 2005; 111:23-30. [PMID: 15864526 DOI: 10.1007/s00122-005-1952-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2004] [Accepted: 02/03/2005] [Indexed: 05/24/2023]
Abstract
Although molecular markers and DNA sequence data are now available for many crop species, our ability to identify genetic variation associated with functional or adaptive diversity is still limited. In this study, our aim was to quantify and characterize diversity in a panel of cultivated and wild sorghums (Sorghum bicolor), establish genetic relationships, and, simultaneously, identify selection signals that might be associated with sorghum domestication. We assayed 98 simple sequence repeat (SSR) loci distributed throughout the genome in a panel of 104 accessions comprising 73 landraces (i.e., cultivated lines) and 31 wild sorghums. Evaluation of SSR polymorphisms indicated that landraces retained 86% of the diversity observed in the wild sorghums. The landraces and wilds were moderately differentiated (F st=0.13), but there was little evidence of population differentiation among racial groups of cultivated sorghums (F st=0.06). Neighbor-joining analysis showed that wild sorghums generally formed a distinct group, and about half the landraces tended to cluster by race. Overall, bootstrap support was low, indicating a history of gene flow among the various cultivated types or recent common ancestry. Statistical methods (Ewens-Watterson test for allele excess, lnRH, and F st) for identifying genomic regions with patterns of variation consistent with selection gave significant results for 11 loci (approx. 15% of the SSRs used in the final analysis). Interestingly, seven of these loci mapped in or near genomic regions associated with domestication-related QTLs (i.e., shattering, seed weight, and rhizomatousness). We anticipate that such population genetics-based statistical approaches will be useful for re-evaluating extant SSR data for mining interesting genomic regions from germplasm collections.
Collapse
Affiliation(s)
- A M Casa
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Hamblin MT, Mitchell SE, White GM, Gallego J, Kukatla R, Wing RA, Paterson AH, Kresovich S. Comparative population genetics of the panicoid grasses: sequence polymorphism, linkage disequilibrium and selection in a diverse sample of sorghum bicolor. Genetics 2005; 167:471-83. [PMID: 15166170 PMCID: PMC1470838 DOI: 10.1534/genetics.167.1.471] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Levels of genetic variation and linkage disequilibrium (LD) are critical factors in association mapping methods as well as in identification of loci that have been targets of selection. Maize, an outcrosser, has a high level of sequence variation and a limited extent of LD. Sorghum, a closely related but largely self-pollinating panicoid grass, is expected to have higher levels of LD. As a first step in estimation of population genetic parameters in sorghum, we surveyed 27 diverse S. bicolor accessions for sequence variation at a total of 29,186 bp in 95 short regions derived from genetically mapped RFLPs located throughout the genome. Consistent with its higher level of inbreeding, the extent of LD is at least severalfold greater in sorghum than in maize. Total sequence variation in sorghum is about fourfold lower than that in maize, while synonymous variation is fivefold lower, suggesting a smaller effective population size in sorghum. Because we surveyed a species-wide sample, the mating system, which primarily affects population-level diversity, may not be primarily responsible for this difference. Comparisons of polymorphism and divergence suggest that both directional and diversifying selection have played important roles in shaping variation in the sorghum genome.
Collapse
Affiliation(s)
- Martha T Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | | | | | | | | | | | | | | |
Collapse
|
10
|
White GM, Hamblin MT, Kresovich S. Molecular Evolution of the Phytochrome Gene Family in Sorghum: Changing Rates of Synonymous and Replacement Evolution. Mol Biol Evol 2004; 21:716-23. [PMID: 14963106 DOI: 10.1093/molbev/msh067] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The photoreceptor phytochromes, encoded by a small gene family, are responsible for controlling the expression of a number of light-responsive genes and photomorphogenic events, including agronomically important phenotypes such as flowering time and shade-avoidance behavior. The understanding and control of flowering time are particularly important goals in sorghum cultivar development for diverse environments, and naturally occurring variation in the phytochrome genes might prove useful in breeding programs. Also of interest is whether variation observed at the phytochrome loci in domesticated sorghum, or in particular races, is a result of human selection. Population genetic studies can reveal evidence of such selection in patterns of polymorphism and divergence. In this study we report a population genetic analysis of the PHY gene family in Sorghum bicolor (L.) Moench in a diverse panel including both cultivated and wild accessions. We show that the level of nucleotide variation in all gene family members is about half the average for this species, consistent with purifying selection acting on these loci. However, the rate of amino acid substitution is accelerated at PHYC compared to the other two loci. In comparisons to a closely related sorghum species, PHYC shows a pattern of intermediate frequency amino acid changes that differ from the patterns observed in comparisons across longer evolutionary distances. There is also a departure from expected patterns of polymorphism and divergence at synonymous sites in PHYC, although the data do not fit a simple model of directional or diversifying selection. Cultivated sorghum has a level of variation similar to that of wild relatives (ssp. verticilliflorum), but many polymorphisms are subspecies-specific, including several amino acid variants.
Collapse
Affiliation(s)
- Gemma M White
- Plant and Invertebrate Ecology Department, Institute of Arable Crops Research-Rothamsted, Harpenden, Hertfordshire, United Kingdom
| | | | | |
Collapse
|
11
|
Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet 2002; 70:369-83. [PMID: 11753822 PMCID: PMC419988 DOI: 10.1086/338628] [Citation(s) in RCA: 200] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2001] [Accepted: 11/08/2001] [Indexed: 11/03/2022] Open
Abstract
The Duffy blood group locus (FY) has long been considered a likely target of natural selection, because of the extreme pattern of geographic differentiation of its three major alleles (FY*B, FY*A, and FY*O). In the present study, we resequenced the FY region in samples of Hausa from Cameroon (fixed for FY*O), Han Chinese (fixed for FY*A), Italians, and Pakistanis. Our goals were to characterize the signature of directional selection on FY*O in sub-Saharan Africa and to understand the extent to which natural selection has also played a role in the extreme geographic differentiation of the other derived allele at this locus, FY*A. The data from the FY region are compared with the patterns of variation observed at 10 unlinked, putatively neutral loci from the same populations, as well as with theoretical expectations from the neutral-equilibrium model. The FY region in the Hausa shows evidence of directional selection in two independent properties of the data (i.e., level of sequence variation and frequency spectrum), observations that are consistent with the FY*O mutation being the target. The Italian and Chinese FY data show patterns of variation that are very unusual, particularly with regard to frequency spectrum and linkage disequilibrium, but do not fit the predictions of any simple model of selection. These patterns may represent a more complex and previously unrecognized signature of positive selection.
Collapse
Affiliation(s)
- Martha T Hamblin
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | | |
Collapse
|
12
|
Abstract
The Toll-like receptor 4 protein acts as the transducing subunit of the lipopolysaccharide receptor complex and assists in the detection of Gram-negative pathogens within the mammalian host. Several lines of evidence support the view that variation at the TLR4 locus may alter host susceptibility to Gram-negative infection or the outcome of infection. Here, we surveyed TLR4 sequence variation in the complete coding region (2.4 kb) in 348 individuals from several population samples; in addition, a subset of the individuals was surveyed at 1.1 kb of intronic sequence. More than 90% of the chromosomes examined encoded the same structural isoform of TLR4, while the rest harbored 12 rare amino acid variants. Conversely, the variants at silent sites (intronic and synonymous positions) occur at both low and high frequencies and are consistent with a neutral model of mutation and random drift. The spectrum of allele frequencies for amino acid variants shows a significant skew toward lower frequencies relative to both the neutral model and the pattern observed at linked silent sites. This is consistent with the hypothesis that weak purifying selection acted on TLR4 and that most mutations affecting TLR4 protein structure have at least mildly deleterious phenotypic effects. These results may imply that genetic variants contributing to disease susceptibility occur at low frequencies in the population and suggest strategies for optimizing the design of disease-mapping studies.
Collapse
Affiliation(s)
- I Smirnova
- Department of Internal Medicine and the Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | | | | | | | | |
Collapse
|
13
|
Hamblin MT, Di Rienzo A. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet 2000; 66:1669-79. [PMID: 10762551 PMCID: PMC1378024 DOI: 10.1086/302879] [Citation(s) in RCA: 233] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2000] [Accepted: 02/28/2000] [Indexed: 11/03/2022] Open
Abstract
The Duffy blood group locus, which encodes a chemokine receptor, is characterized by three alleles-FY*A, FY*B, and FY*O. The frequency of the FY*O allele, which corresponds to the absence of Fy antigen on red blood cells, is at or near fixation in most sub-Saharan African populations but is very rare outside Africa. The FST value for the FY*O allele is the highest observed for any allele in humans, providing strong evidence for the action of natural selection at this locus. Homozygosity for the FY*O allele confers complete resistance to vivax malaria, suggesting that this allele has been the target of selection by Plasmodium vivax or some other infectious agent. To characterize the signature of directional selection at this locus, we surveyed DNA sequence variation, both in a 1.9-kb region centered on the FY*O mutation site and in a 1-kb region 5-6 kb away from it, in 17 Italians and in a total of 24 individuals from five sub-Saharan African populations. The level of variation across both regions is two- to threefold lower in the Africans than in the Italians. As a result, the pooled African sample shows a significant departure from the neutral expectation for the number of segregating sites, whereas the Italian sample does not. The FY*O allele occurs on two major haplotypes in three of the five African populations. This finding could be due to recombination, recurrent mutation, population structure, and/or mutation accumulation and drift. Although we are unable to distinguish among these alternative hypotheses, it is likely that the two major haplotypes originated prior to selection on the FY*O mutation.
Collapse
Affiliation(s)
- M T Hamblin
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | |
Collapse
|
14
|
Abstract
The relationship between rates of recombination and DNA sequence polymorphism was analyzed for the second chromosome of Drosophila pseudoobscura. We constructed integrated genetic and physical maps of this chromosome using molecular markers at 10 loci spanning most of its physical length. The total length of the map was 128.2 cM, almost twice that of the homologous chromosome arm (3R) in D. melanogaster. There appears to be very little centromeric suppression of recombination, and rates of recombination are quite uniform across most of the chromosome. Levels of sequence variation (theta(W), based on the number of segregating sites) at seven loci (tropomyosin 1, Rhodopsin 3, Rhodopsin 1, bicoid, Xanthine dehydrogenase, Myosin light chain 1, and ribosomal protein 49) varied from 0.0036 to 0.0167. Generally consistent with earlier studies, the average estimate of theta(W) at total sites is 1.5-fold higher than that in D. melanogaster, while average theta(W) at silent sites is almost 3-fold higher. These estimates of variation were analyzed in the context of a background selection model under the same parameters of mutation rate and selection as have been proposed for D. melanogaster. It is likely that a significant fraction of the higher level of sequence variation in D. pseudoobscura can be explained by differences in regional rates of recombination rather than a larger species-level effective population size. However, the distribution of variation among synonymous, nonsynonymous, and noncoding sites appears to be quite different between the species, making direct comparisons of neutral variation, and hence inferences about effective population size, difficult. Tajima's D statistics for 6 out of the 7 loci surveyed are negative, suggesting that D. pseudoobscura may have experienced a rapid population expansion in the recent past or, alternatively, that slightly deleterious mutations constitute an important component of standing variation in this species.
Collapse
Affiliation(s)
- M T Hamblin
- Section of Genetics and Development, Cornell University, Ithaca, New York 14853, USA.
| | | |
Collapse
|
15
|
Hamblin MT, Veuille M. Population structure among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture. Genetics 1999; 153:305-17. [PMID: 10471714 PMCID: PMC1460727 DOI: 10.1093/genetics/153.1.305] [Citation(s) in RCA: 71] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Previous studies based on allozyme variation have found little evidence for genetic differentiation in Drosophila simulans. On the basis of DNA sequence variation at two nuclear loci in four African populations of D. simulans, we show that there is significant structure to D. simulans populations within Africa. Variation at one of the loci, vermilion, appears to be neutral and supports an eastern African origin for European and American populations. Samples from the West Indies, Europe, and North America had a nucleotide diversity lower than that of African populations at vermilion and show nonequilibrium haplotype distributions at both vermilion and G6pd, consistent with a hypothesis of recent bottleneck and possibly also admixture in the history of these populations. Directional selection, previously documented at G6pd, appears to have occurred within the coalescence time of the species, obscuring deep population history.
Collapse
Affiliation(s)
- M T Hamblin
- Laboratoire d'Ecologie-EPHE, Université Pierre-et-Marie Curie, 75252 Paris Cedex 05, France.
| | | |
Collapse
|
16
|
Hamblin MT, Aquadro CF. Contrasting patterns of nucleotide sequence variation at the glucose dehydrogenase (Gld) locus in different populations of Drosophila melanogaster. Genetics 1997; 145:1053-62. [PMID: 9093857 PMCID: PMC1207875 DOI: 10.1093/genetics/145.4.1053] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
We have analyzed nucleotide sequence variation at the Glucose dehydrogenase (Gld) locus from four populations of Drosophila melanogaster from four continents. All four population samples show a significant reduction in silent variation compared to the neutral expectation. The levels of silent variation across all four populations are consistent with the predictions of the background selection model; however, Zimbabwe has a remarkably low level of variation. In the face of dramatically reduced silent polymorphism, an amino acid variant, leading to the common allozyme polymorphism at Gld, remains in low to intermediate frequency in all non-African samples. In the Chinese population sample, the ratio of replacement to silent variation is significantly elevated compared to the neutral expectation. The difference in patterns of variation across these population samples suggests that selection on Gld (or the Gld region) has been different in the Chinese population than in the other three.
Collapse
Affiliation(s)
- M T Hamblin
- Section of Genetics and Development, Cornell University, Ithaca, NY14853, USA.
| | | |
Collapse
|
17
|
Hamblin MT, Aquadro CF. High nucleotide sequence variation in a region of low recombination in Drosophila simulans is consistent with the background selection model. Mol Biol Evol 1996; 13:1133-40. [PMID: 8865667 DOI: 10.1093/oxfordjournals.molbev.a025676] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
We surveyed nucleotide sequence variation at glucose dehydrogenase (Gld), in a region of low recombination on chromosome 3R, from a population sample of Drosophila simulans. The levels of nucleotide variation were surprisingly high. There was no departure from the expectation of a neutral model for the level of polymorphism, indicating no evidence of a selective sweep in this region. There was a significant deficiency of singleton polymorphisms according to the Fu and Li test, although Tajima and Hudson, Kreitman, and Aguade (HKA) tests do not provide evidence of a significant elevation of variation due to balancing selection. Genetic map data for the D. simulans third chromosome were used to calculate expected values of pi for Gld under a current model of background selection, varying the values for the parameter sh (selection coefficient against deleterious mutations). We show that the recombinational landscape of D. simulans is sufficiently different from that of D. melanogaster that we expect higher variation under the background selection model, even when effective population sizes are assumed to be equal. The data for Gld were tested against the predictions using computer simulations of the distribution of the number of segregating sites conditioned on pi. Background selection alone can explain our observations as long as sh is larger than 0.005 and species-level effective population size is assumed to be several-fold larger than in D. melanogaster. Alternatively, the deleterious mutation rate may be smaller in D. simulans, or balancing selection may be acting nearby, thereby reducing the effect of background selection.
Collapse
Affiliation(s)
- M T Hamblin
- Section of Genetics and Development, Cornell University, Ithaca, New York 14853, USA.
| | | |
Collapse
|
18
|
Lin Y, Hamblin MT, Edwards MJ, Barillas-Mury C, Kanost MR, Knipple DC, Wolfner MF, Hagedorn HH. Structure, expression, and hormonal control of genes from the mosquito, Aedes aegypti, which encode proteins similar to the vitelline membrane proteins of Drosophila melanogaster. Dev Biol 1993; 155:558-68. [PMID: 8432405 DOI: 10.1006/dbio.1993.1052] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Genomic and cDNA clones of a gene expressed after a blood meal in the mosquito, Aedes aegypti, were identified as having significant similarity to the vitelline membrane protein genes of Drosophila melanogaster. The predicted protein had unusually high contents of alanine, histidine, and proline and contained a region of hydrophobic amino acids that was highly conserved in the predicted protein of the D. melanogaster vitelline membrane protein genes. The 15a gene was expressed from 5 to 40 hr after a blood meal. It was expressed only in the follicle cells of the ovary, particularly in the cells surrounding the oocyte. The 15a gene was expressed in ovaries of the blood-fed, decapitated female in response to an injection of 20-hydroxyecdysone, and in ovaries from non-blood-fed females incubated with the hormone, even in the presence of cycloheximide. A second gene, with weaker homology to 15a, is presumably another member of a family of related genes, as is the case with D. melanogaster vitelline membrane protein genes. This second gene contained a coding sequence similar to a decapeptide recently isolated from mosquito ovaries as an "oostatic factor" (Borovsky et al., FASEB J. 4, 3015-3020, 1990).
Collapse
Affiliation(s)
- Y Lin
- Department of Entomology, University of Arizona, Tucson 85721
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
We have been interested in identifying genes that play a role in reproduction of the mosquito Aedes aegypti. Our interests are currently focused on the vitellogenin genes which in the mosquito are expressed only in the fat body in response to the insect steroid hormone, 20-hydroxyecdysone. Four of the five vitellogenin genes in the genome have been cloned. We have examined the relationships between these genes and find that they form a small gene family exhibiting different levels of relationship.
Collapse
Affiliation(s)
- M T Hamblin
- Department of Entomology, Cornell University, Ithaca, New York 14853
| | | | | | | |
Collapse
|