Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Chen GK, Marjoram P, Wall JD. Fast and flexible simulation of DNA sequence data. Genome Res 2009;19:136-42. [PMID: 19029539 DOI: 10.1101/gr.083634.108] [Citation(s) in RCA: 254] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

201

Gao C, Tignor NL, Salit J, Strulovici-Barel Y, Hackett NR, Crystal RG, Mezey JG. HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors. ACTA ACUST UNITED AC 2013;30:369-76. [PMID: 24307700 DOI: 10.1093/bioinformatics/btt690] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Abstract

MOTIVATION

Identification of expression Quantitative Trait Loci (eQTL), the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis.

METHODS

We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model is a combined multivariate regression and factor analysis, where the complete likelihood of the model is used to derive a ridge estimator for simultaneous factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects; it provides P-values and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of single nucleotide polymorphisms on a standard 8 core 2.6 G desktop.

RESULTS

By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and significantly outperforms all related hidden factor methods for eQTL analysis when there are eQTL with univariate and multivariate (pleiotropic) effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in the human lung for a study that included presumptive hidden factors. HEFT identified all of the cis-eQTL found by other hidden factor methods and 91 additional cis-eQTL. HEFT also identified a number of eQTLs with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer.

AVAILABILITY

Software is available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

202

Sikora MJ, Colonna V, Xue Y, Tyler-Smith C. Modeling the contrasting Neolithic male lineage expansions in Europe and Africa. INVESTIGATIVE GENETICS 2013;4:25. [PMID: 24262073 PMCID: PMC4177147 DOI: 10.1186/2041-2223-4-25] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 10/21/2013] [Indexed: 11/10/2022]

203

Browning BL, Browning SR. Detecting identity by descent and estimating genotype error rates in sequence data. Am J Hum Genet 2013;93:840-51. [PMID: 24207118 DOI: 10.1016/j.ajhg.2013.09.014] [Citation(s) in RCA: 114] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 09/21/2013] [Accepted: 09/26/2013] [Indexed: 11/17/2022] Open

204

Clark SA, Kinghorn BP, Hickey JM, van der Werf JHJ. The effect of genomic information on optimal contribution selection in livestock breeding programs. Genet Sel Evol 2013;45:44. [PMID: 24171942 PMCID: PMC4176995 DOI: 10.1186/1297-9686-45-44] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Accepted: 10/13/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Long-term benefits in animal breeding programs require that increases in genetic merit be balanced with the need to maintain diversity (lost due to inbreeding). This can be achieved by using optimal contribution selection. The availability of high-density DNA marker information enables the incorporation of genomic data into optimal contribution selection but this raises the question about how this information affects the balance between genetic merit and diversity.

METHODS

The effect of using genomic information in optimal contribution selection was examined based on simulated and real data on dairy bulls. We compared the genetic merit of selected animals at various levels of co-ancestry restrictions when using estimated breeding values based on parent average, genomic or progeny test information. Furthermore, we estimated the proportion of variation in estimated breeding values that is due to within-family differences.

RESULTS

Optimal selection on genomic estimated breeding values increased genetic gain. Genetic merit was further increased using genomic rather than pedigree-based measures of co-ancestry under an inbreeding restriction policy. Using genomic instead of pedigree relationships to restrict inbreeding had a significant effect only when the population consisted of many large full-sib families; with a half-sib family structure, no difference was observed. In real data from dairy bulls, optimal contribution selection based on genomic estimated breeding values allowed for additional improvements in genetic merit at low to moderate inbreeding levels. Genomic estimated breeding values were more accurate and showed more within-family variation than parent average breeding values; for genomic estimated breeding values, 30 to 40% of the variation was due to within-family differences. Finally, there was no difference between constraining inbreeding via pedigree or genomic relationships in the real data.

CONCLUSIONS

The use of genomic estimated breeding values increased genetic gain in optimal contribution selection. Genomic estimated breeding values were more accurate and showed more within-family variation, which led to higher genetic gains for the same restriction on inbreeding. Using genomic relationships to restrict inbreeding provided no additional gain, except in the case of very large full-sib families.

Collapse

205

Ferretti L, Ramos-Onsins SE, Pérez-Enciso M. Population genomics from pool sequencing. Mol Ecol 2013;22:5561-76. [DOI: 10.1111/mec.12522] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Revised: 08/03/2013] [Accepted: 09/06/2013] [Indexed: 11/30/2022]

206

Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 2013;342:1235587. [PMID: 24092746 PMCID: PMC3947637 DOI: 10.1126/science.1235587] [Citation(s) in RCA: 270] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Affiliation(s)

Ekta Khurana Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Yao Fu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Vincenza Colonna Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK Institute of Genetics and Biophysics, National Research Council (CNR), 80131 Naples, Italy
Xinmeng Jasmine Mu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Hyun Min Kang Center for Statistical Genetics, Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
Tuuli Lappalainen Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
Andrea Sboner Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA
Lucas Lochovsky Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Jieming Chen Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA
Arif Harmanci Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Jishnu Das Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
Alexej Abyzov Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Suganthi Balasubramanian Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Kathryn Beal European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Dimple Chakravarty Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Daniel Challis Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Yuan Chen Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Declan Clarke Department of Chemistry, Yale University, New Haven, CT 06520, USA
Laura Clarke European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Fiona Cunningham European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Uday S. Evani Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Robert Fragoza Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
Erik Garrison Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
Richard Gibbs Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Zeynep H. Gümüş The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
Javier Herrero European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Naoki Kitabayashi Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Yong Kong Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Keck Biotechnology Resource Laboratory, Yale University, New Haven, CT 06511, USA
Kasper Lage Pediatric Surgical Research Laboratories, MassGeneral Hospital for Children, Massachusetts General Hospital, Boston, MA 02114, USA Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA Harvard Medical School, Boston, MA 02115, USA Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
Vaja Liluashvili The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
Steven M. Lipkin Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
Daniel G. MacArthur Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA
Gabor Marth Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
Donna Muzny Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Tune H. Pers Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark Division of Endocrinology and Center for Basic and Translational Obesity Research, Children’s Hospital, Boston, MA 02115, USA Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Graham R. S. Ritchie European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Jeffrey A. Rosenfeld Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ 07101, USA IST/High Performance and Research Computing, Rutgers University Newark, NJ 07101, USA Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
Cristina Sisu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Xiaomu Wei Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
Michael Wilson Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Child Study Center, Yale University, New Haven, CT 06520, USA
Yali Xue Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Fuli Yu Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
1000 Genomes Project Consortium
Emmanouil T. Dermitzakis Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
Haiyuan Yu Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
Mark A. Rubin Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Chris Tyler-Smith Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Mark Gerstein Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Department of Computer Science, Yale University, New Haven, CT 06520, USA

Collapse

207

A novel approach to estimating heterozygosity from low-coverage genome sequence. Genetics 2013;195:553-61. [PMID: 23934885 DOI: 10.1534/genetics.113.154500] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

208

Cridland JM, Macdonald SJ, Long AD, Thornton KR. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Mol Biol Evol 2013;30:2311-27. [PMID: 23883524 PMCID: PMC3773372 DOI: 10.1093/molbev/mst129] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that “burden tests” that test for the effect of TEs as a class may be more fruitful.

Collapse

209

Pavlidis P, Živkovic D, Stamatakis A, Alachiotis N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol Evol 2013;30:2224-34. [PMID: 23777627 PMCID: PMC3748355 DOI: 10.1093/molbev/mst112] [Citation(s) in RCA: 278] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Abstract

The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.

Collapse

210

A sequential coalescent algorithm for chromosomal inversions. Heredity (Edinb) 2013;111:200-9. [PMID: 23632894 DOI: 10.1038/hdy.2013.38] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 02/04/2013] [Accepted: 03/25/2013] [Indexed: 01/06/2023] Open

211

Hara Y, Imanishi T, Satta Y. Reconstructing the demographic history of the human lineage using whole-genome sequences from human and three great apes. Genome Biol Evol 2013;4:1133-45. [PMID: 22975719 PMCID: PMC3752010 DOI: 10.1093/gbe/evs075] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

212

Padhukasahasram B, Rannala B. Meiotic gene-conversion rate and tract length variation in the human genome. Eur J Hum Genet 2013:ejhg201330. [PMID: 23443031 DOI: 10.1038/ejhg.2013.30] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Revised: 12/17/2012] [Accepted: 01/10/2013] [Indexed: 01/11/2023] Open

213

Inferring admixture histories of human populations using linkage disequilibrium. Genetics 2013;193:1233-54. [PMID: 23410830 PMCID: PMC3606100 DOI: 10.1534/genetics.112.147330] [Citation(s) in RCA: 295] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

214

Higher levels of neanderthal ancestry in East Asians than in Europeans. Genetics 2013;194:199-209. [PMID: 23410836 DOI: 10.1534/genetics.112.148213] [Citation(s) in RCA: 158] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

215

Shang J, Zhang J, Lei X, Zhao W, Dong Y. EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes Genomics 2013. [DOI: 10.1007/s13258-013-0081-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

216

Hickey JM, Kinghorn BP, Tier B, Clark SA, van der Werf JHJ, Gorjanc G. Genomic evaluations using similarity between haplotypes. J Anim Breed Genet 2012;130:259-69. [PMID: 23855628 DOI: 10.1111/jbg.12020] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 11/07/2012] [Indexed: 10/27/2022]

217

Chan AH, Jenkins PA, Song YS. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet 2012;8:e1003090. [PMID: 23284288 PMCID: PMC3527307 DOI: 10.1371/journal.pgen.1003090] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 09/29/2012] [Indexed: 01/18/2023] Open

Abstract

Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity.

Recombination is a process by which chromosomes exchange genetic material during meiosis. It is important in evolution because it provides offspring with new combinations of genes, and so estimating the rate of recombination is of fundamental importance in various population genomic inference problems. In this paper, we develop a new statistical method to enable robust estimation of fine-scale recombination maps of Drosophila, a genus of common fruit flies, in which the background recombination rate is high and natural selection has been prevalent. We apply our method to produce fine-scale recombination maps for a North American population and an African population of D. melanogaster. For both populations, we find extensive fine-scale variation in recombination rate throughout the genome. We provide a quantitative characterization of the similarities and differences between the recombination maps of the two populations; our study reveals high correlation at broad scales and low correlation at fine scales, as has been documented among human populations. We also examine the correlation between various genomic features. Furthermore, using a conservative approach, we find a handful of putative recombination “hotspot” regions with solid statistical support for a local elevation of at least 10 times the background recombination rate.

Collapse

218

Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, Crepeau MW, Duchen P, Emerson JJ, Saelao P, Begun DJ, Langley CH. Population Genomics of sub-saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet 2012;8:e1003080. [PMID: 23284287 PMCID: PMC3527209 DOI: 10.1371/journal.pgen.1003080] [Citation(s) in RCA: 229] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 09/27/2012] [Indexed: 11/25/2022] Open

Abstract

Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa F(ST) were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations.

Collapse

219

Mailund T, Halager AE, Westergaard M, Dutheil JY, Munch K, Andersen LN, Lunter G, Prüfer K, Scally A, Hobolth A, Schierup MH. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet 2012;8:e1003125. [PMID: 23284294 PMCID: PMC3527290 DOI: 10.1371/journal.pgen.1003125] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 10/14/2012] [Indexed: 11/18/2022] Open

220

Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 2012;193:347-65. [PMID: 23222650 DOI: 10.1534/genetics.112.147983] [Citation(s) in RCA: 239] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract

The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

Collapse

221

Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics 2012;192:1065-93. [PMID: 22960212 PMCID: PMC3522152 DOI: 10.1534/genetics.112.145037] [Citation(s) in RCA: 1504] [Impact Index Per Article: 125.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Accepted: 08/28/2012] [Indexed: 12/11/2022] Open

222

Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D. Ancient admixture in human history. Genetics 2012. [PMID: 22960212 DOI: 10.1534/genetics.112.145037/-/dc1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open

223

Haubold B, Pfaffelhuber P. Alignment-free population genomics: an efficient estimator of sequence diversity. G3 (BETHESDA, MD.) 2012;2:883-9. [PMID: 22908037 PMCID: PMC3411244 DOI: 10.1534/g3.112.002527] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 05/29/2012] [Indexed: 11/18/2022]

224

Theunert C, Tang K, Lachmann M, Hu S, Stoneking M. Inferring the history of population size change from genome-wide SNP data. Mol Biol Evol 2012;29:3653-67. [PMID: 22787284 DOI: 10.1093/molbev/mss175] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

225

Alachiotis N, Stamatakis A, Pavlidis P. OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets. Bioinformatics 2012;28:2274-5. [DOI: 10.1093/bioinformatics/bts419] [Citation(s) in RCA: 88] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

226

Pavlidis P, Jensen JD, Stephan W, Stamatakis A. A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. Mol Biol Evol 2012;29:3237-48. [PMID: 22617950 DOI: 10.1093/molbev/mss136] [Citation(s) in RCA: 159] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

227

Adhikari K, AlChawa T, Ludwig K, Mangold E, Laird N, Lange C. Is it rare or common? Genet Epidemiol 2012;36:419-29. [PMID: 22549767 DOI: 10.1002/gepi.21637] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2011] [Revised: 03/09/2012] [Accepted: 03/09/2012] [Indexed: 11/09/2022]

228

Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods. G3-GENES GENOMES GENETICS 2012;2:425-7. [PMID: 22540033 PMCID: PMC3337470 DOI: 10.1534/g3.111.001297] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 11/09/2011] [Indexed: 11/18/2022]

229

Brown MD, Glazner CG, Zheng C, Thompson EA. Inferring coancestry in population samples in the presence of linkage disequilibrium. Genetics 2012;190:1447-60. [PMID: 22298700 PMCID: PMC3316655 DOI: 10.1534/genetics.111.137570] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 01/16/2012] [Indexed: 01/03/2023] Open

230

Clark SA, Hickey JM, Daetwyler HD, van der Werf JHJ. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet Sel Evol 2012. [PMID: 22321529 DOI: 10.1186/1297‐9686‐44‐4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Abstract

BACKGROUND

The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values.

METHODS

Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated.

RESULTS

The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy.

CONCLUSIONS

An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.

Collapse

231

Clark SA, Hickey JM, Daetwyler HD, van der Werf JHJ. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet Sel Evol 2012;44:4. [PMID: 22321529 PMCID: PMC3299588 DOI: 10.1186/1297-9686-44-4] [Citation(s) in RCA: 184] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 02/09/2012] [Indexed: 01/17/2023] Open

Abstract

Background

Methods

Results

Conclusions

Collapse

232

Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 2012;13:110-22. [DOI: 10.1038/nrg3130] [Citation(s) in RCA: 169] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

233

Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K. Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res 2012;22:51-63. [PMID: 22006216 PMCID: PMC3246206 DOI: 10.1101/gr.124123.111] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Accepted: 10/05/2011] [Indexed: 11/25/2022]

234

Parida L. Nonredundant representation of ancestral recombinations graphs. Methods Mol Biol 2012;856:315-32. [PMID: 22399465 DOI: 10.1007/978-1-61779-585-5_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

235

Metspalu M, Romero I, Yunusbayev B, Chaubey G, Mallick C, Hudjashov G, Nelis M, Mägi R, Metspalu E, Remm M, Pitchappan R, Singh L, Thangaraj K, Villems R, Kivisild T. Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia. Am J Hum Genet 2011;89:731-44. [PMID: 22152676 DOI: 10.1016/j.ajhg.2011.11.010] [Citation(s) in RCA: 121] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Revised: 09/06/2011] [Accepted: 11/12/2011] [Indexed: 02/06/2023] Open

236

Yuan X, Miller DJ, Zhang J, Herrington D, Wang Y. An overview of population genetic data simulation. J Comput Biol 2011;19:42-54. [PMID: 22149682 DOI: 10.1089/cmb.2010.0188] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

237

Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet 2011;7:e1002316. [PMID: 22022279 PMCID: PMC3192833 DOI: 10.1371/journal.pgen.1002316] [Citation(s) in RCA: 273] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/30/2011] [Indexed: 12/30/2022] Open

Abstract

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.

There are hundreds of dog breeds that exhibit massive differences in appearance and behavior sculpted by tightly controlled selective breeding. This large-scale natural experiment has provided an ideal resource that geneticists can use to search for genetic variants that control these differences. With this goal, we developed a high-density array that surveys variable sites at more than 170,000 positions in the dog genome and used it to analyze genetic variation in 46 breeds. We identify 44 chromosomal regions that are extremely variable between breeds and are likely to control many of the traits that vary between them, including curly tails and sociality. Many other regions also bear the signature of strong artificial selection. We characterize one such region, known to associate with body size and ear type, in detail using “next-generation” sequencing technology to identify candidate mutations that may control these traits. Our results suggest that artificial selection has targeted genes involved in development and metabolism and that it may have increased the incidence of disease in dog breeds. Knowledge of these regions will be of great importance for uncovering the genetic basis of variation between dog breeds and for finding mutations that cause disease.

Collapse

238

Esteve-Codina A, Kofler R, Himmelbauer H, Ferretti L, Vivancos AP, Groenen MAM, Folch JM, Rodríguez MC, Pérez-Enciso M. Partial short-read sequencing of a highly inbred Iberian pig and genomics inference thereof. Heredity (Edinb) 2011;107:256-64. [PMID: 21407255 PMCID: PMC3183945 DOI: 10.1038/hdy.2011.13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Revised: 01/20/2011] [Accepted: 01/27/2011] [Indexed: 11/08/2022] Open

239

Wegmann D, Kessner DE, Veeramah KR, Mathias RA, Nicolae DL, Yanek LR, Sun YV, Torgerson DG, Rafaels N, Mosley T, Becker LC, Ruczinski I, Beaty TH, Kardia SLR, Meyers DA, Barnes KC, Becker DM, Freimer NB, Novembre J. Recombination rates in admixed individuals identified by ancestry-based inference. Nat Genet 2011;43:847-53. [DOI: 10.1038/ng.894] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Accepted: 07/01/2011] [Indexed: 12/17/2022]

240

He Y, Li C, Amos CI, Xiong M, Ling H, Jin L. Accelerating haplotype-based genome-wide association study using perfect phylogeny and phase-known reference data. PLoS One 2011;6:e22097. [PMID: 21789217 PMCID: PMC3137625 DOI: 10.1371/journal.pone.0022097] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 06/17/2011] [Indexed: 11/18/2022] Open

241

Clark SA, Hickey JM, van der Werf JHJ. Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 2011;43:18. [PMID: 21575265 PMCID: PMC3114710 DOI: 10.1186/1297-9686-43-18] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Accepted: 05/17/2011] [Indexed: 12/30/2022] Open

Abstract

Background

The theory of genomic selection is based on the prediction of the effects of quantitative trait loci (QTL) in linkage disequilibrium (LD) with markers. However, there is increasing evidence that genomic selection also relies on "relationships" between individuals to accurately predict genetic values. Therefore, a better understanding of what genomic selection actually predicts is relevant so that appropriate methods of analysis are used in genomic evaluations.

Methods

Simulation was used to compare the performance of estimates of breeding values based on pedigree relationships (Best Linear Unbiased Prediction, BLUP), genomic relationships (gBLUP), and based on a Bayesian variable selection model (Bayes B) to estimate breeding values under a range of different underlying models of genetic variation. The effects of different marker densities and varying animal relationships were also examined.

Results

This study shows that genomic selection methods can predict a proportion of the additive genetic value when genetic variation is controlled by common quantitative trait loci (QTL model), rare loci (rare variant model), all loci (infinitesimal model) and a random association (a polygenic model). The Bayes B method was able to estimate breeding values more accurately than gBLUP under the QTL and rare variant models, for the alternative marker densities and reference populations. The Bayes B and gBLUP methods had similar accuracies under the infinitesimal model.

Conclusions

Our results suggest that Bayes B is superior to gBLUP to estimate breeding values from genomic data. The underlying model of genetic variation greatly affects the predictive ability of genomic selection methods, and the superiority of Bayes B over gBLUP is highly dependent on the presence of large QTL effects. The use of SNP sequence data will outperform the less dense marker panels. However, the size and distribution of QTL effects and the size of reference populations still greatly influence the effectiveness of using sequence data for genomic prediction.

Collapse

242

Amaral AJ, Ferretti L, Megens HJ, Crooijmans RPMA, Nie H, Ramos-Onsins SE, Perez-Enciso M, Schook LB, Groenen MAM. Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA. PLoS One 2011;6:e14782. [PMID: 21483733 PMCID: PMC3070695 DOI: 10.1371/journal.pone.0014782] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2010] [Accepted: 01/29/2011] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity.

METHODOLOGY/MAIN FINDINGS

Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ∼2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar.

CONCLUSIONS/SIGNIFICANCE

These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection.

Collapse

243

Paul JS, Steinrücken M, Song YS. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 2011;187:1115-28. [PMID: 21270390 PMCID: PMC3070520 DOI: 10.1534/genetics.110.125534] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 01/21/2011] [Indexed: 02/07/2023] Open

244

Excoffier L, Foll M. fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 2011;27:1332-4. [DOI: 10.1093/bioinformatics/btr124] [Citation(s) in RCA: 343] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

245

Hickey JM, Kinghorn BP, Tier B, Wilson JF, Dunstan N, van der Werf JHJ. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet Sel Evol 2011;43:12. [PMID: 21388557 PMCID: PMC3068938 DOI: 10.1186/1297-9686-43-12] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Accepted: 03/10/2011] [Indexed: 06/24/2024] Open

Abstract

BACKGROUND

Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.

METHODS

A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.

RESULTS

The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.

CONCLUSIONS

The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.

Collapse

246

Mailund T, Dutheil JY, Hobolth A, Lunter G, Schierup MH. Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet 2011;7:e1001319. [PMID: 21408205 PMCID: PMC3048369 DOI: 10.1371/journal.pgen.1001319] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 01/25/2011] [Indexed: 12/01/2022] Open

Abstract

Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence—because they share a most recent common ancestor—when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.

We present a hidden Markov model that uses variation in coalescence times between two distantly related populations, or closely related species, to infer population genetics parameters in ancestral population or species. The model infers the divergence times in segments along the alignment. Using coalescent simulations, we show that the model accurately estimates the divergence time between the two populations and the effective population size of the ancestral population. We apply the model to the recently sequenced orangutan sub-species and estimate their divergence time and the effective population size of their ancestor population.

Collapse

247

Parida L, Palamara PF, Javed A. A minimal descriptor of an ancestral recombinations graph. BMC Bioinformatics 2011;12 Suppl 1:S6. [PMID: 21342589 PMCID: PMC3044314 DOI: 10.1186/1471-2105-12-s1-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

248

Albert FW, Hodges E, Jensen JD, Besnier F, Xuan Z, Rooks M, Bhattacharjee A, Brizuela L, Good JM, Green RE, Burbano HA, Plyusnina IZ, Trut L, Andersson L, Schöneberg T, Carlborg O, Hannon GJ, Pääbo S. Targeted resequencing of a genomic region influencing tameness and aggression reveals multiple signals of positive selection. Heredity (Edinb) 2011;107:205-14. [PMID: 21304545 DOI: 10.1038/hdy.2011.4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

249

Yuan X, Zhang J, Wang Y. Simulating linkage disequilibrium structures in a human population for SNP association studies. Biochem Genet 2011;49:395-409. [PMID: 21234669 DOI: 10.1007/s10528-011-9416-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2010] [Accepted: 12/02/2010] [Indexed: 12/22/2022]

250

Le SQ, Durbin R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res 2010;21:952-60. [PMID: 20980557 DOI: 10.1101/gr.113084.110] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]