26
|
Fernandes JD, Faust TB, Strauli NB, Smith C, Crosby DC, Nakamura RL, Hernandez RD, Frankel AD. Functional Segregation of Overlapping Genes in HIV. Cell 2017; 167:1762-1773.e12. [PMID: 27984726 DOI: 10.1016/j.cell.2016.11.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2016] [Revised: 09/29/2016] [Accepted: 11/15/2016] [Indexed: 11/28/2022]
Abstract
Overlapping genes pose an evolutionary dilemma as one DNA sequence evolves under the selection pressures of multiple proteins. Here, we perform systematic statistical and mutational analyses of the overlapping HIV-1 genes tat and rev and engineer exhaustive libraries of non-overlapped viruses to perform deep mutational scanning of each gene independently. We find a "segregated" organization in which overlapped sites encode functional residues of one gene or the other, but never both. Furthermore, this organization eliminates unfit genotypes, providing a fitness advantage to the population. Our comprehensive analysis reveals the extraordinary manner in which HIV minimizes the constraint of overlapping genes and repurposes that constraint to its own advantage. Thus, overlaps are not just consequences of evolutionary constraints, but rather can provide population fitness advantages.
Collapse
|
27
|
Mathias RA, Taub MA, Gignoux CR, Fu W, Musharoff S, O'Connor TD, Vergara C, Torgerson DG, Pino-Yanes M, Shringarpure SS, Huang L, Rafaels N, Boorgula MP, Johnston HR, Ortega VE, Levin AM, Song W, Torres R, Padhukasahasram B, Eng C, Mejia-Mejia DA, Ferguson T, Qin ZS, Scott AF, Yazdanbakhsh M, Wilson JG, Marrugo J, Lange LA, Kumar R, Avila PC, Williams LK, Watson H, Ware LB, Olopade C, Olopade O, Oliveira R, Ober C, Nicolae DL, Meyers D, Mayorga A, Knight-Madden J, Hartert T, Hansel NN, Foreman MG, Ford JG, Faruque MU, Dunston GM, Caraballo L, Burchard EG, Bleecker E, Araujo MI, Herrera-Paz EF, Gietzen K, Grus WE, Bamshad M, Bustamante CD, Kenny EE, Hernandez RD, Beaty TH, Ruczinski I, Akey J, Barnes KC. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 2016; 7:12522. [PMID: 27725671 PMCID: PMC5062574 DOI: 10.1038/ncomms12522] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 07/12/2016] [Indexed: 01/20/2023] Open
Abstract
The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.
Collapse
|
28
|
Strauli NB, Hernandez RD. Statistical inference of a convergent antibody repertoire response to influenza vaccine. Genome Med 2016; 8:60. [PMID: 27255379 PMCID: PMC4891843 DOI: 10.1186/s13073-016-0314-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 05/05/2016] [Indexed: 12/31/2022] Open
Abstract
Background Vaccines dramatically affect an individual’s adaptive immune system and thus provide an excellent means to study human immunity. Upon vaccination, the B cells that express antibodies (Abs) that happen to bind the vaccine are stimulated to proliferate and undergo mutagenesis at their Ab locus. This process may alter the composition of B cell lineages within an individual, which are known collectively as the antibody repertoire (AbR). Antibodies are also highly expressed in whole blood, potentially enabling RNA sequencing (RNA-seq) technologies to query this diversity. Less is known about the diversity of AbR responses across individuals to a given vaccine and if individuals tend to yield a similar response to the same antigenic stimulus. Methods Here we implement a bioinformatic pipeline that extracts the AbR information from a time-series RNA-seq dataset of five patients who were administered a seasonal trivalent influenza vaccine (TIV). We harness the detailed time-series nature of this dataset and use methods based in functional data analysis (FDA) to identify the Abs that respond to the vaccine. We then design and implement rigorous statistical tests in order to ask whether or not these patients exhibit a convergent AbR response to the same TIV. Results We find that high-resolution time-series data can be used to help identify the Abs that respond to an antigenic stimulus and that this response can exhibit a convergent nature across patients inoculated with the same vaccine. However, correlations in AbR diversity among individuals prior to inoculation can confound inference of a convergent signal unless it is taken into account. Conclusions We developed a framework to identify the elements of an AbR that respond to an antigen. This information could be used to understand the diversity of different immune responses in different individuals, as well as to gauge the effectiveness of the immune response to a given stimulus within an individual. We also present a framework for testing a convergent hypothesis between AbRs; a hypothesis that is more difficult to test than previously appreciated. Our discovery of a convergent signal suggests that similar epitopes do select for antibodies with similar sequence characteristics. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0314-z) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res 2016; 26:863-73. [PMID: 27197206 PMCID: PMC4937562 DOI: 10.1101/gr.202440.115] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 05/16/2016] [Indexed: 12/20/2022]
Abstract
The role of rare alleles in complex phenotypes has been hotly debated, but most rare variant association tests (RVATs) do not account for the evolutionary forces that affect genetic architecture. Here, we use simulation and numerical algorithms to show that explosive population growth, as experienced by human populations, can dramatically increase the impact of very rare alleles on trait variance. We then assess the ability of RVATs to detect causal loci using simulations and human RNA-seq data. Surprisingly, we find that statistical performance is worst for phenotypes in which genetic variance is due mainly to rare alleles, and explosive population growth decreases power. Although many studies have attempted to identify causal rare variants, few have reported novel associations. This has sometimes been interpreted to mean that rare variants make negligible contributions to complex trait heritability. Our work shows that RVATs are not robust to realistic human evolutionary forces, so general conclusions about the impact of rare variants on complex traits may be premature.
Collapse
|
30
|
Torgerson DG, Giri T, Druley TE, Zheng J, Huntsman S, Seibold MA, Young AL, Schweiger T, Yin-Declue H, Sajol GD, Schechtman KB, Hernandez RD, Randolph AG, Bacharier LB, Castro M. Pooled Sequencing of Candidate Genes Implicates Rare Variants in the Development of Asthma Following Severe RSV Bronchiolitis in Infancy. PLoS One 2015; 10:e0142649. [PMID: 26587832 PMCID: PMC4654486 DOI: 10.1371/journal.pone.0142649] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 02/06/2015] [Indexed: 12/17/2022] Open
Abstract
Severe infection with respiratory syncytial virus (RSV) during infancy is strongly associated with the development of asthma. To identify genetic variation that contributes to asthma following severe RSV bronchiolitis during infancy, we sequenced the coding exons of 131 asthma candidate genes in 182 European and African American children with severe RSV bronchiolitis in infancy using anonymous pools for variant discovery, and then directly genotyped a set of 190 nonsynonymous variants. Association testing was performed for physician-diagnosed asthma before the 7th birthday (asthma) using genotypes from 6,500 individuals from the Exome Sequencing Project (ESP) as controls to gain statistical power. In addition, among patients with severe RSV bronchiolitis during infancy, we examined genetic associations with asthma, active asthma, persistent wheeze, and bronchial hyperreactivity (methacholine PC20) at age 6 years. We identified four rare nonsynonymous variants that were significantly associated with asthma following severe RSV bronchiolitis, including single variants in ADRB2, FLG and NCAM1 in European Americans (p = 4.6x10-4, 1.9x10-13 and 5.0x10-5, respectively), and NOS1 in African Americans (p = 2.3x10-11). One of the variants was a highly functional nonsynonymous variant in ADRB2 (rs1800888), which was also nominally associated with asthma (p = 0.027) and active asthma (p = 0.013) among European Americans with severe RSV bronchiolitis without including the ESP. Our results suggest that rare nonsynonymous variants contribute to the development of asthma following severe RSV bronchiolitis in infancy, notably in ADRB2. Additional studies are required to explore the role of rare variants in the etiology of asthma and asthma-related traits following severe RSV bronchiolitis.
Collapse
|
31
|
Pino-Yanes M, Gignoux CR, Galanter JM, Levin AM, Campbell CD, Eng C, Huntsman S, Nishimura KK, Gourraud PA, Mohajeri K, O'Roak BJ, Hu D, Mathias RA, Nguyen EA, Roth LA, Padhukasahasram B, Moreno-Estrada A, Sandoval K, Winkler CA, Lurmann F, Davis A, Farber HJ, Meade K, Avila PC, Serebrisky D, Chapela R, Ford JG, Lenoir MA, Thyne SM, Brigino-Buenaventura E, Borrell LN, Rodriguez-Cintron W, Sen S, Kumar R, Rodriguez-Santana JR, Bustamante CD, Martinez FD, Raby BA, Weiss ST, Nicolae DL, Ober C, Meyers DA, Bleecker ER, Mack SJ, Hernandez RD, Eichler EE, Barnes KC, Williams LK, Torgerson DG, Burchard EG. Genome-wide association study and admixture mapping reveal new loci associated with total IgE levels in Latinos. J Allergy Clin Immunol 2015; 135:1502-10. [PMID: 25488688 PMCID: PMC4458233 DOI: 10.1016/j.jaci.2014.10.033] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Revised: 09/06/2014] [Accepted: 10/15/2014] [Indexed: 12/20/2022]
Abstract
BACKGROUND IgE is a key mediator of allergic inflammation, and its levels are frequently increased in patients with allergic disorders. OBJECTIVE We sought to identify genetic variants associated with IgE levels in Latinos. METHODS We performed a genome-wide association study and admixture mapping of total IgE levels in 3334 Latinos from the Genes-environments & Admixture in Latino Americans (GALA II) study. Replication was evaluated in 454 Latinos, 1564 European Americans, and 3187 African Americans from independent studies. RESULTS We confirmed associations of 6 genes identified by means of previous genome-wide association studies and identified a novel genome-wide significant association of a polymorphism in the zinc finger protein 365 gene (ZNF365) with total IgE levels (rs200076616, P = 2.3 × 10(-8)). We next identified 4 admixture mapping peaks (6p21.32-p22.1, 13p22-31, 14q23.2, and 22q13.1) at which local African, European, and/or Native American ancestry was significantly associated with IgE levels. The most significant peak was 6p21.32-p22.1, where Native American ancestry was associated with lower IgE levels (P = 4.95 × 10(-8)). All but 22q13.1 were replicated in an independent sample of Latinos, and 2 of the peaks were replicated in African Americans (6p21.32-p22.1 and 14q23.2). Fine mapping of 6p21.32-p22.1 identified 6 genome-wide significant single nucleotide polymorphisms in Latinos, 2 of which replicated in European Americans. Another single nucleotide polymorphism was peak-wide significant within 14q23.2 in African Americans (rs1741099, P = 3.7 × 10(-6)) and replicated in non-African American samples (P = .011). CONCLUSION We confirmed genetic associations at 6 genes and identified novel associations within ZNF365, HLA-DQA1, and 14q23.2. Our results highlight the importance of studying diverse multiethnic populations to uncover novel loci associated with total IgE levels.
Collapse
|
32
|
Maher MC, Hernandez RD. CauseMap: fast inference of causality from complex time series. PeerJ 2015; 3:e824. [PMID: 25780776 PMCID: PMC4359046 DOI: 10.7717/peerj.824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 02/17/2015] [Indexed: 11/20/2022] Open
Abstract
Background. Establishing health-related causal relationships is a central pursuit in biomedical research. Yet, the interdependent non-linearity of biological systems renders causal dynamics laborious and at times impractical to disentangle. This pursuit is further impeded by the dearth of time series that are sufficiently long to observe and understand recurrent patterns of flux. However, as data generation costs plummet and technologies like wearable devices democratize data collection, we anticipate a coming surge in the availability of biomedically-relevant time series data. Given the life-saving potential of these burgeoning resources, it is critical to invest in the development of open source software tools that are capable of drawing meaningful insight from vast amounts of time series data. Results. Here we present CauseMap, the first open source implementation of convergent cross mapping (CCM), a method for establishing causality from long time series data (≳25 observations). Compared to existing time series methods, CCM has the advantage of being model-free and robust to unmeasured confounding that could otherwise induce spurious associations. CCM builds on Takens' Theorem, a well-established result from dynamical systems theory that requires only mild assumptions. This theorem allows us to reconstruct high dimensional system dynamics using a time series of only a single variable. These reconstructions can be thought of as shadows of the true causal system. If reconstructed shadows can predict points from opposing time series, we can infer that the corresponding variables are providing views of the same causal system, and so are causally related. Unlike traditional metrics, this test can establish the directionality of causation, even in the presence of feedback loops. Furthermore, since CCM can extract causal relationships from times series of, e.g., a single individual, it may be a valuable tool to personalized medicine. We implement CCM in Julia, a high-performance programming language designed for facile technical computing. Our software package, CauseMap, is platform-independent and freely available as an official Julia package. Conclusions. CauseMap is an efficient implementation of a state-of-the-art algorithm for detecting causality from time series data. We believe this tool will be a valuable resource for biomedical research and personalized medicine.
Collapse
|
33
|
Davis ZH, Verschueren E, Jang GM, Kleffman K, Johnson JR, Park J, Von Dollen J, Maher MC, Johnson T, Newton W, Jäger S, Shales M, Horner J, Hernandez RD, Krogan NJ, Glaunsinger BA. Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol Cell 2015; 57:349-60. [PMID: 25544563 PMCID: PMC4305015 DOI: 10.1016/j.molcel.2014.11.026] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 08/20/2014] [Accepted: 11/21/2014] [Indexed: 12/19/2022]
Abstract
Mapping host-pathogen interactions has proven instrumental for understanding how viruses manipulate host machinery and how numerous cellular processes are regulated. DNA viruses such as herpesviruses have relatively large coding capacity and thus can target an extensive network of cellular proteins. To identify the host proteins hijacked by this pathogen, we systematically affinity tagged and purified all 89 proteins of Kaposi's sarcoma-associated herpesvirus (KSHV) from human cells. Mass spectrometry of this material identified over 500 virus-host interactions. KSHV causes AIDS-associated cancers, and its interaction network is enriched for proteins linked to cancer and overlaps with proteins that are also targeted by HIV-1. We found that the conserved KSHV protein ORF24 binds to RNA polymerase II and brings it to viral late promoters by mimicking and replacing cellular TATA-box-binding protein (TBP). This is required for herpesviral late gene expression, a complex and poorly understood phase of the viral lifecycle.
Collapse
|
34
|
Uricchio LH, Torres R, Witte JS, Hernandez RD. Population genetic simulations of complex phenotypes with implications for rare variant association tests. Genet Epidemiol 2014; 39:35-44. [PMID: 25417809 DOI: 10.1002/gepi.21866] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 09/09/2014] [Accepted: 09/26/2014] [Indexed: 12/12/2022]
Abstract
Demographic events and natural selection alter patterns of genetic variation within populations and may play a substantial role in shaping the genetic architecture of complex phenotypes and disease. However, the joint impact of these basic evolutionary forces is often ignored in the assessment of statistical tests of association. Here, we provide a simulation-based framework for generating DNA sequences that incorporates selection and demography with flexible models for simulating phenotypic variation (sfs_coder). This tool also allows the user to perform locus-specific simulations by automatically querying annotated genomic functional elements and genetic maps. We demonstrate the effects of evolutionary forces on patterns of genetic variation by simulating recently inferred models of human selection and demography. We use these simulations to show that the demographic model and locus-specific features, such as the proportion of sites under selection, may have practical implications for estimating the statistical power of sequencing-based rare variant association tests. In particular, for some phenotype models, there may be higher power to detect rare variant associations in African populations compared to non-Africans, but power is considerably reduced in regions of the genome with rampant negative selection. Furthermore, we show that existing methods for simulating large samples based on resampling from a small set of observed haplotypes fail to recapitulate the distribution of rare variants in the presence of rapid population growth (as has been observed in several human populations).
Collapse
|
35
|
Chen HS, Hutter CM, Mechanic LE, Amos CI, Bafna V, Hauser ER, Hernandez RD, Li C, Liberles DA, McAllister K, Moore JH, Paltoo DN, Papanicolaou GJ, Peng B, Ritchie MD, Rosenfeld G, Witte JS, Gillanders EM, Feuer EJ. Genetic simulation tools for post-genome wide association studies of complex diseases. Genet Epidemiol 2014; 39:11-19. [PMID: 25371374 DOI: 10.1002/gepi.21870] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Revised: 09/02/2014] [Accepted: 09/26/2014] [Indexed: 01/12/2023]
Abstract
Genetic simulation programs are used to model data under specified assumptions to facilitate the understanding and study of complex genetic systems. Standardized data sets generated using genetic simulation are essential for the development and application of novel analytical tools in genetic epidemiology studies. With continuing advances in high-throughput genomic technologies and generation and analysis of larger, more complex data sets, there is a need for updating current approaches in genetic simulation modeling. To provide a forum to address current and emerging challenges in this area, the National Cancer Institute (NCI) sponsored a workshop, entitled "Genetic Simulation Tools for Post-Genome Wide Association Studies of Complex Diseases" at the National Institutes of Health (NIH) in Bethesda, Maryland on March 11-12, 2014. The goals of the workshop were to (1) identify opportunities, challenges, and resource needs for the development and application of genetic simulation models; (2) improve the integration of tools for modeling and analysis of simulated data; and (3) foster collaborations to facilitate development and applications of genetic simulation. During the course of the meeting, the group identified challenges and opportunities for the science of simulation, software and methods development, and collaboration. This paper summarizes key discussions at the meeting, and highlights important challenges and opportunities to advance the field of genetic simulation.
Collapse
|
36
|
Szpiech ZA, Hernandez RD. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol 2014; 31:2824-7. [PMID: 25015648 PMCID: PMC4166924 DOI: 10.1093/molbev/msu211] [Citation(s) in RCA: 421] [Impact Index Per Article: 42.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Haplotype-based scans to detect natural selection are useful to identify recent or ongoing positive selection in genomes. As both real and simulated genomic data sets grow larger, spanning thousands of samples and millions of markers, there is a need for a fast and efficient implementation of these scans for general use. Here, we present selscan, an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED, and performs extremely well on both simulated and real data and over an order of magnitude faster than existing available implementations. It calculates iHS on chromosome 22 (22,147 loci) across 204 CEU haplotypes in 353 s on one thread (33 s on 16 threads) and calculates XPEHH for the same data relative to 210 YRI haplotypes in 578 s on one thread (52 s on 16 threads). Source code and binaries (Windows, OSX, and Linux) are available at https://github.com/szpiech/selscan.
Collapse
|
37
|
Vujkovic-Cvijin I, Dunham RM, Iwai S, Maher MC, Albright RG, Broadhurst MJ, Hernandez RD, Lederman MM, Huang Y, Somsouk M, Deeks SG, Hunt PW, Lynch SV, McCune JM. Dysbiosis of the gut microbiota is associated with HIV disease progression and tryptophan catabolism. Sci Transl Med 2014; 5:193ra91. [PMID: 23843452 DOI: 10.1126/scitranslmed.3006438] [Citation(s) in RCA: 491] [Impact Index Per Article: 49.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Progressive HIV infection is characterized by dysregulation of the intestinal immune barrier, translocation of immunostimulatory microbial products, and chronic systemic inflammation that is thought to drive progression of disease to AIDS. Elements of this pathologic process persist despite viral suppression during highly active antiretroviral therapy (HAART), and drivers of these phenomena remain poorly understood. Disrupted intestinal immunity can precipitate dysbiosis that induces chronic inflammation in the mucosa and periphery of mice. However, putative microbial drivers of HIV-associated immunopathology versus recovery have not been identified in humans. Using high-resolution bacterial community profiling, we identified a dysbiotic mucosal-adherent community enriched in Proteobacteria and depleted of Bacteroidia members that was associated with markers of mucosal immune disruption, T cell activation, and chronic inflammation in HIV-infected subjects. Furthermore, this dysbiosis was evident among HIV-infected subjects undergoing HAART, and the extent of dysbiosis correlated with activity of the kynurenine pathway of tryptophan catabolism and plasma concentrations of the inflammatory cytokine interleukin-6 (IL-6), two established markers of disease progression. Gut-resident bacteria with capacity to catabolize tryptophan through the kynurenine pathway were found to be enriched in HIV-infected subjects, strongly correlated with kynurenine levels in HIV-infected subjects, and capable of kynurenine production in vitro. These observations demonstrate a link between mucosal-adherent colonic bacteria and immunopathogenesis during progressive HIV infection that is apparent even in the setting of viral suppression during HAART. This link suggests that gut-resident microbial populations may influence intestinal homeostasis during HIV disease.
Collapse
|
38
|
Drake KA, Torgerson DG, Gignoux CR, Galanter JM, Roth LA, Huntsman S, Eng C, Oh SS, Yee SW, Lin L, Bustamante CD, Moreno-Estrada A, Sandoval K, Davis A, Borrell LN, Farber HJ, Kumar R, Avila PC, Brigino-Buenaventura E, Chapela R, Ford JG, Lenoir MA, Lurmann F, Meade K, Serebrisky D, Thyne S, Rodríguez-Cintrón W, Sen S, Rodríguez-Santana JR, Hernandez RD, Giacomini KM, Burchard EG. A genome-wide association study of bronchodilator response in Latinos implicates rare variants. J Allergy Clin Immunol 2014; 133:370-8. [PMID: 23992748 PMCID: PMC3938989 DOI: 10.1016/j.jaci.2013.06.043] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Revised: 05/09/2013] [Accepted: 06/18/2013] [Indexed: 01/29/2023]
Abstract
BACKGROUND The primary rescue medication to treat acute asthma exacerbation is the short-acting β₂-adrenergic receptor agonist; however, there is variation in how well a patient responds to treatment. Although these differences might be due to environmental factors, there is mounting evidence for a genetic contribution to variability in bronchodilator response (BDR). OBJECTIVE To identify genetic variation associated with bronchodilator drug response in Latino children with asthma. METHODS We performed a genome-wide association study (GWAS) for BDR in 1782 Latino children with asthma using standard linear regression, adjusting for genetic ancestry and ethnicity, and performed replication studies in an additional 531 Latinos. We also performed admixture mapping across the genome by testing for an association between local European, African, and Native American ancestry and BDR, adjusting for genomic ancestry and ethnicity. RESULTS We identified 7 genetic variants associated with BDR at a genome-wide significant threshold (P < 5 × 10(-8)), all of which had frequencies of less than 5%. Furthermore, we observed an excess of small P values driven by rare variants (frequency, <5%) and by variants in the proximity of solute carrier (SLC) genes. Admixture mapping identified 5 significant peaks; fine mapping within these peaks identified 2 rare variants in SLC22A15 as being associated with increased BDR in Mexicans. Quantitative PCR and immunohistochemistry identified SLC22A15 as being expressed in the lung and bronchial epithelial cells. CONCLUSION Our results suggest that rare variation contributes to individual differences in response to albuterol in Latinos, notably in SLC genes that include membrane transport proteins involved in the transport of endogenous metabolites and xenobiotics. Resequencing in larger, multiethnic population samples and additional functional studies are required to further understand the role of rare variation in BDR.
Collapse
|
39
|
Maher MC, Uricchio LH, Torgerson DG, Hernandez RD. Population genetics of rare variants and complex diseases. Hum Hered 2013; 74:118-28. [PMID: 23594490 PMCID: PMC3698246 DOI: 10.1159/000346826] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
OBJECTIVES Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high-throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. METHODS Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with many human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral. RESULTS Genes associated with several complex disease categories exhibit stronger signatures of purifying selection than non-disease genes. In addition, loci identified through genome-wide association studies of complex traits also exhibit signatures consistent with being in regions recurrently targeted by purifying selection. Through simulations, we show that population bottlenecks and rapid growth enable deleterious rare variants to persist at low frequencies just as long as neutral variants, but low-frequency and common variants tend to be much younger than neutral variants. This has resulted in a large proportion of modern-day rare alleles that have a deleterious effect on function and that potentially contribute to disease susceptibility. CONCLUSIONS The key question for sequencing-based association studies of complex traits is how to distinguish between deleterious and benign genetic variation. We used population genetic simulations to uncover patterns of genetic variation that distinguish these two categories, especially derived allele age, thereby providing inroads into novel methods for characterizing rare genetic variation driving complex diseases.
Collapse
|
40
|
Torgerson DG, Gignoux CR, Galanter JM, Drake KA, Roth LA, Eng C, Huntsman S, Torres R, Avila PC, Chapela R, Ford JG, Rodríguez-Santana JR, Rodríguez-Cintrón W, Hernandez RD, Burchard EG. Case-control admixture mapping in Latino populations enriches for known asthma-associated genes. J Allergy Clin Immunol 2012; 130:76-82.e12. [PMID: 22502797 DOI: 10.1016/j.jaci.2012.02.040] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2011] [Revised: 12/20/2011] [Accepted: 02/02/2012] [Indexed: 12/22/2022]
Abstract
BACKGROUND Polymorphisms in more than 100 genes have been associated with asthma susceptibility, yet much of the heritability remains to be explained. Asthma disproportionately affects different racial and ethnic groups in the United States, suggesting that admixture mapping is a useful strategy to identify novel asthma-associated loci. OBJECTIVE We sought to identify novel asthma-associated loci in Latino populations using case-control admixture mapping. METHODS We performed genome-wide admixture mapping by comparing levels of local Native American, European, and African ancestry between children with asthma and nonasthmatic control subjects in Puerto Rican and Mexican populations. Within candidate peaks, we performed allelic tests of association, controlling for differences in local ancestry. RESULTS Between the 2 populations, we identified a total of 62 admixture mapping peaks at a P value of less than 10(-3) that were significantly enriched for previously identified asthma-associated genes (P= .0051). One of the peaks was statistically significant based on 100 permutations in the Mexican sample (6q15); however, it was not significant in Puerto Rican subjects. Another peak was identified at nominal significance in both populations (8q12); however, the association was observed with different ancestries. CONCLUSION Case-control admixture mapping is a promising strategy for identifying novel asthma-associated loci in Latino populations and implicates genetic variation at 6q15 and 8q12 regions with asthma susceptibility. This approach might be useful for identifying regions that contribute to both shared and population-specific differences in asthma susceptibility.
Collapse
|
41
|
Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T, Leffler EM, Bowden R, Aneas I, Broxholme J, Humburg P, Iqbal Z, Lunter G, Maller J, Hernandez RD, Melton C, Venkat A, Nobrega MA, Bontrop R, Myers S, Donnelly P, Przeworski M, McVean G. A fine-scale chimpanzee genetic map from population sequencing. Science 2012; 336:193-8. [PMID: 22422862 PMCID: PMC3532813 DOI: 10.1126/science.1216872] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To study the evolution of recombination rates in apes, we developed methodology to construct a fine-scale genetic map from high-throughput sequence data from 10 Western chimpanzees, Pan troglodytes verus. Compared to the human genetic map, broad-scale recombination rates tend to be conserved, but with exceptions, particularly in regions of chromosomal rearrangements and around the site of ancestral fusion in human chromosome 2. At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans even though rates are similarly elevated around CpG islands and decreased within genes. The hotspot-specifying protein PRDM9 shows extensive variation among Western chimpanzees, and there is little evidence that any sequence motifs are enriched in hotspots. The contrasting locations of hotspots provide a natural experiment, which demonstrates the impact of recombination on base composition.
Collapse
|
42
|
Jäger S, Cimermancic P, Gulbahce N, Johnson JR, McGovern KE, Clarke SC, Shales M, Mercenne G, Pache L, Li K, Hernandez H, Jang GM, Roth SL, Akiva E, Marlett J, Stephens M, D'Orso I, Fernandes J, Fahey M, Mahon C, O'Donoghue AJ, Todorovic A, Morris JH, Maltby DA, Alber T, Cagney G, Bushman FD, Young JA, Chanda SK, Sundquist WI, Kortemme T, Hernandez RD, Craik CS, Burlingame A, Sali A, Frankel AD, Krogan NJ. Global landscape of HIV-human protein complexes. Nature 2011; 481:365-70. [PMID: 22190034 DOI: 10.1038/nature10719] [Citation(s) in RCA: 552] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2011] [Accepted: 11/18/2011] [Indexed: 12/16/2022]
Abstract
Human immunodeficiency virus (HIV) has a small genome and therefore relies heavily on the host cellular machinery to replicate. Identifying which host proteins and complexes come into physical contact with the viral proteins is crucial for a comprehensive understanding of how HIV rewires the host's cellular machinery during the course of infection. Here we report the use of affinity tagging and purification mass spectrometry to determine systematically the physical interactions of all 18 HIV-1 proteins and polyproteins with host proteins in two different human cell lines (HEK293 and Jurkat). Using a quantitative scoring system that we call MiST, we identified with high confidence 497 HIV-human protein-protein interactions involving 435 individual human proteins, with ∼40% of the interactions being identified in both cell types. We found that the host proteins hijacked by HIV, especially those found interacting in both cell types, are highly conserved across primates. We uncovered a number of host complexes targeted by viral proteins, including the finding that HIV protease cleaves eIF3d, a subunit of eukaryotic translation initiation factor 3. This host protein is one of eleven identified in this analysis that act to inhibit HIV replication. This data set facilitates a more comprehensive and detailed understanding of how the host machinery is manipulated during the course of HIV infection.
Collapse
|
43
|
Wilson DJ, Hernandez RD, Andolfatto P, Przeworski M. A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet 2011; 7:e1002395. [PMID: 22144911 PMCID: PMC3228810 DOI: 10.1371/journal.pgen.1002395] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Accepted: 10/08/2011] [Indexed: 01/23/2023] Open
Abstract
Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.
Collapse
|
44
|
Torgerson DG, Ampleford EJ, Chiu GY, Gauderman WJ, Gignoux CR, Graves PE, Himes BE, Levin AM, Mathias RA, Hancock DB, Baurley JW, Eng C, Stern DA, Celedón JC, Rafaels N, Capurso D, Conti DV, Roth LA, Soto-Quiros M, Togias A, Li X, Myers RA, Romieu I, Van Den Berg DJ, Hu D, Hansel NN, Hernandez RD, Israel E, Salam MT, Galanter J, Avila PC, Avila L, Rodriquez-Santana JR, Chapela R, Rodriguez-Cintron W, Diette GB, Adkinson NF, Abel RA, Ross KD, Shi M, Faruque MU, Dunston GM, Watson HR, Mantese VJ, Ezurum SC, Liang L, Ruczinski I, Ford JG, Huntsman S, Chung KF, Vora H, Li X, Calhoun WJ, Castro M, Sienra-Monge JJ, del Rio-Navarro B, Deichmann KA, Heinzmann A, Wenzel SE, Busse WW, Gern JE, Lemanske RF, Beaty TH, Bleecker ER, Raby BA, Meyers DA, London SJ, Gilliland FD, Burchard EG, Martinez FD, Weiss ST, Williams LK, Barnes KC, Ober C, Nicolae DL. Meta-analysis of genome-wide association studies of asthma in ethnically diverse North American populations. Nat Genet 2011; 43:887-92. [PMID: 21804549 PMCID: PMC3445408 DOI: 10.1038/ng.888] [Citation(s) in RCA: 619] [Impact Index Per Article: 47.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Accepted: 06/16/2011] [Indexed: 11/09/2022]
Abstract
Asthma is a common disease with a complex risk architecture including both genetic and environmental factors. We performed a meta-analysis of North American genome-wide association studies of asthma in 5,416 individuals with asthma (cases) including individuals of European American, African American or African Caribbean, and Latino ancestry, with replication in an additional 12,649 individuals from the same ethnic groups. We identified five susceptibility loci. Four were at previously reported loci on 17q21, near IL1RL1, TSLP and IL33, but we report for the first time, to our knowledge, that these loci are associated with asthma risk in three ethnic groups. In addition, we identified a new asthma susceptibility locus at PYHIN1, with the association being specific to individuals of African descent (P = 3.9 × 10(-9)). These results suggest that some asthma susceptibility loci are robust to differences in ancestry when sufficiently large samples sizes are investigated, and that ancestry-specific associations also contribute to the complex genetic architecture of asthma.
Collapse
|
45
|
Sun C, Southard C, Huo D, Hernandez RD, Witonsky DB, Olopade OI, Di Rienzo A. SNP discovery, expression and cis-regulatory variation in the UGT2B genes. THE PHARMACOGENOMICS JOURNAL 2011; 12:287-96. [PMID: 21358749 DOI: 10.1038/tpj.2011.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
UGT2B enzymes metabolize multiple endogenous and exogenous molecules, including steroid hormones and clinical drugs. However, little is known about the inter-individual variation in gene expression and its determinants. We re-sequenced candidate regulatory regions and the partial coding regions (41.1 kb) of UGT2B genes and identified 332 genetic variants. We measured gene expression in normal breast and liver samples and observed different patterns. The expression levels varied greatly across individuals in both tissues and were significantly correlated with each other in liver. Genotyping of tagging single-nucleotide polymorphisms (SNPs) in the same samples and association tests between genotype and transcript levels identified 62 variants that were associated with at least one UGT2B mRNA levels in either tissue. Most of these cis-regulatory SNPs were not shared between tissues, suggesting that this gene family is regulated in a tissue-specific manner. Our results provide insight into studying the role of UGT2B variation in hormone-dependent cancers and drug response.
Collapse
|
46
|
Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Sella G, Przeworski M. Classic selective sweeps were rare in recent human evolution. Science 2011; 331:920-4. [PMID: 21330547 PMCID: PMC3669691 DOI: 10.1126/science.1198878] [Citation(s) in RCA: 307] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Efforts to identify the genetic basis of human adaptations from polymorphism data have sought footprints of "classic selective sweeps" (in which a beneficial mutation arises and rapidly fixes in the population).Yet it remains unknown whether this form of natural selection was common in our evolution. We examined the evidence for classic sweeps in resequencing data from 179 human genomes. As expected under a recurrent-sweep model, we found that diversity levels decrease near exons and conserved noncoding regions. In contrast to expectation, however, the trough in diversity around human-specific amino acid substitutions is no more pronounced than around synonymous substitutions. Moreover, relative to the genome background, amino acid and putative regulatory sites are not significantly enriched in alleles that are highly differentiated between populations. These findings indicate that classic sweeps were not a dominant mode of human adaptation over the past ~250,000 years.
Collapse
|
47
|
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 2009; 5:e1000695. [PMID: 19851460 PMCID: PMC2760211 DOI: 10.1371/journal.pgen.1000695] [Citation(s) in RCA: 1119] [Impact Index Per Article: 74.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 09/23/2009] [Indexed: 11/18/2022] Open
Abstract
Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU). The demographic history of our species is reflected in patterns of genetic variation within and among populations. We developed an efficient method for calculating the expected distribution of genetic variation, given a demographic model including such events as population size changes, population splits and joins, and migration. We applied our approach to publicly available human sequencing data, searching for models that best reproduce the observed patterns. Our joint analysis of data from African, European, and Asian populations yielded new dates for when these populations diverged. In particular, we found that African and Eurasian populations diverged around 100,000 years ago. This is earlier than other genetic studies suggest, because our model includes the effects of migration, which we found to be important for reproducing observed patterns of variation in the data. We also analyzed data from European, Asian, and Mexican populations to model the peopling of the Americas. Here, we find no evidence for recurrent migration after East Asian and Native American populations diverged. Our methods are not limited to studying humans, and we hope that future sequencing projects will offer more insights into the history of both our own species and others.
Collapse
|
48
|
Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet 2009; 5:e1000592. [PMID: 19662163 PMCID: PMC2714078 DOI: 10.1371/journal.pgen.1000592] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Accepted: 07/10/2009] [Indexed: 01/30/2023] Open
Abstract
Analysis of polymorphism and divergence in the non-coding portion of the human genome yields crucial information about factors driving the evolution of gene regulation. Candidate cis-regulatory regions spanning more than 15,000 genes in 15 African Americans and 20 European Americans were re-sequenced and aligned to the chimpanzee genome in order to identify potentially functional polymorphism and to characterize and quantify departures from neutral evolution. Distortions of the site frequency spectra suggest a general pattern of selective constraint on conserved non-coding sites in the flanking regions of genes (CNCs). Moreover, there is an excess of fixed differences that cannot be explained by a Gamma model of deleterious fitness effects, suggesting the presence of positive selection on CNCs. Extensions of the McDonald-Kreitman test identified candidate cis-regulatory regions with high probabilities of positive and negative selection near many known human genes, the biological characteristics of which exhibit genome-wide trends that differ from patterns observed in protein-coding regions. Notably, there is a higher probability of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, suggesting that a larger portion of adaptive regulatory changes has occurred in genes expressed during brain development. Overall we find that natural selection has played an important role in the evolution of candidate cis-regulatory regions throughout hominid evolution. It has been suggested that changes in gene expression may have played a more important role in the evolution of modern humans than changes in protein-coding sequences. In order to identify signatures of natural selection on candidate cis-regulatory regions, we examined single nucleotide polymorphisms obtained from the complete re-sequencing of conserved non-coding sites (CNCs) in the flanking regions of over 15,000 genes in 35 humans. Patterns of allele frequencies in CNCs indicate the presence of both positive and negative selection acting on standing variation within these candidate cis-regulatory regions, particularly for the 5′ and 3′ UTRs of genes. Gene-specific tests comparing levels of polymorphism and divergence identify several genes with strong signatures of selection on candidate cis-regulatory regions and suggest that the biological characteristics of genes subject to selection are different between coding and candidate cis-regulatory regions with respect to gene expression and function. For example, we find stronger signatures of positive selection in candidate cis-regulatory regions near genes expressed in the fetal brain, which we do not observe in a concurrent analysis on protein-coding regions. Our results suggest that both positive and negative selection have acted on candidate cis-regulatory regions and that the evolution of non-coding DNA has played an important role throughout hominid evolution.
Collapse
|
49
|
Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, Green RD, Hamernik DL, Kappes SM, Lien S, Matukumalli LK, McEwan JC, Nazareth LV, Schnabel RD, Weinstock GM, Wheeler DA, Ajmone-Marsan P, Boettcher PJ, Caetano AR, Garcia JF, Hanotte O, Mariani P, Skow LC, Sonstegard TS, Williams JL, Diallo B, Hailemariam L, Martinez ML, Morris CA, Silva LOC, Spelman RJ, Mulatu W, Zhao K, Abbey CA, Agaba M, Araujo FR, Bunch RJ, Burton J, Gorni C, Olivier H, Harrison BE, Luff B, Machado MA, Mwakaya J, Plastow G, Sim W, Smith T, Thomas MB, Valentini A, Williams P, Womack J, Woolliams JA, Liu Y, Qin X, Worley KC, Gao C, Jiang H, Moore SS, Ren Y, Song XZ, Bustamante CD, Hernandez RD, Muzny DM, Patil S, San Lucas A, Fu Q, Kent MP, Vega R, Matukumalli A, McWilliam S, Sclep G, Bryc K, Choi J, Gao H, Grefenstette JJ, Murdoch B, Stella A, Villa-Angulo R, Wright M, Aerts J, Jann O, Negrini R, Goddard ME, Hayes BJ, Bradley DG, Barbosa da Silva M, Lau LPL, Liu GE, Lynn DJ, Panzitta F, Dodds KG. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 2009; 324:528-32. [PMID: 19390050 PMCID: PMC2735092 DOI: 10.1126/science.1167936] [Citation(s) in RCA: 561] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.
Collapse
|
50
|
Hernandez RD. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 2008; 24:2786-7. [PMID: 18842601 DOI: 10.1093/bioinformatics/btn522] [Citation(s) in RCA: 180] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED This article introduces a new forward population genetic simulation program that can efficiently generate samples from populations with complex demographic histories under various models of natural selection. The program (SFS_CODE) is highly flexible, allowing the user to simulate realistic genomic regions with several loci evolving according to a variety of mutation models (from simple to context-dependent), and allows for insertions and deletions. Each locus can be annotated as either coding or non-coding, sex-linked or autosomal, selected or neutral, and have an arbitrary linkage structure (from completely linked to independent). AVAILABILITY The source code (written in the C programming language) is available at http://sfscode.sourceforge.net, and a web server (http://cbsuapps.tc.cornell.edu/sfscode.aspx) allows the user to perform simulations using the high-performance computing cluster hosted by the Cornell University Computational Biology Service Unit.
Collapse
|