101
|
Parida L, Haiminen N. SimBA: simulation algorithm to fit extant-population distributions. BMC Bioinformatics 2015; 16:82. [PMID: 25886895 PMCID: PMC4372275 DOI: 10.1186/s12859-015-0525-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/04/2015] [Indexed: 12/28/2022] Open
Abstract
Background Simulation of populations with specified characteristics such as allele frequencies, linkage disequilibrium etc., is an integral component of many studies, including in-silico breeding optimization. Since the accuracy and sensitivity of population simulation is critical to the quality of the output of the applications that use them, accurate algorithms are required to provide a strong foundation to the methods in these studies. Results In this paper we present SimBA (Simulation using Best-fit Algorithm) a non-generative approach, based on a combination of stochastic techniques and discrete methods. We optimize a hill climbing algorithm and extend the framework to include multiple subpopulation structures. Additionally, we show that SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods. Conclusions We show that SimBA outperforms the existing population simulation methods, both in terms of accuracy as well as time-efficiency. Not only does it construct populations that meet the input specifications more stringently than other published methods, SimBA is also easy to use. It does not require explicit parameter adaptations or calibrations. Also, it can work with input specified as distributions, without an exemplar matrix or population as required by some methods. SimBA is available at http://researcher.ibm.com/project/5669.
Collapse
Affiliation(s)
- Laxmi Parida
- Computational Biology Center, IBM T. J. Watson Research, Yorktown Heights, NY, USA.
| | - Niina Haiminen
- Computational Biology Center, IBM T. J. Watson Research, Yorktown Heights, NY, USA.
| |
Collapse
|
102
|
Pérez-Losada M, Arenas M, Galán JC, Palero F, González-Candelas F. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2015; 30:296-307. [PMID: 25541518 PMCID: PMC7106159 DOI: 10.1016/j.meegid.2014.12.022] [Citation(s) in RCA: 198] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 02/08/2023]
Abstract
Recombination is a pervasive process generating diversity in most viruses. It joins variants that arise independently within the same molecule, creating new opportunities for viruses to overcome selective pressures and to adapt to new environments and hosts. Consequently, the analysis of viral recombination attracts the interest of clinicians, epidemiologists, molecular biologists and evolutionary biologists. In this review we present an overview of three major areas related to viral recombination: (i) the molecular mechanisms that underlie recombination in model viruses, including DNA-viruses (Herpesvirus) and RNA-viruses (Human Influenza Virus and Human Immunodeficiency Virus), (ii) the analytical procedures to detect recombination in viral sequences and to determine the recombination breakpoints, along with the conceptual and methodological tools currently used and a brief overview of the impact of new sequencing technologies on the detection of recombination, and (iii) the major areas in the evolutionary analysis of viral populations on which recombination has an impact. These include the evaluation of selective pressures acting on viral populations, the application of evolutionary reconstructions in the characterization of centralized genes for vaccine design, and the evaluation of linkage disequilibrium and population structure.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Portugal; Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Juan Carlos Galán
- Servicio de Microbiología, Hospital Ramón y Cajal and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; CIBER en Epidemiología y Salud Pública, Spain
| | - Ferran Palero
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain
| | - Fernando González-Candelas
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain.
| |
Collapse
|
103
|
Emerson BC, Hickerson MJ. Lack of support for the time-dependent molecular evolution hypothesis. Mol Ecol 2015; 24:702-9. [DOI: 10.1111/mec.13070] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Revised: 12/17/2014] [Accepted: 12/30/2014] [Indexed: 01/24/2023]
Affiliation(s)
- Brent C. Emerson
- Island Ecology and Evolution Research Group; Instituto de Productos Naturales y Agrobiología (IPNA-CSIC); C/Astrofísico Francisco Sánchez 3 La Laguna Tenerife, Canary Islands 38206 Spain
| | - Michael J. Hickerson
- Biology Department; City College of New York; New York NY 10031 USA
- The Graduate Center; City University of New York; New York NY 10016 USA
| |
Collapse
|
104
|
Andrello M, Manel S. MetaPopGen: anrpackage to simulate population genetics in large size metapopulations. Mol Ecol Resour 2015; 15:1153-62. [DOI: 10.1111/1755-0998.12371] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Revised: 12/27/2014] [Accepted: 01/01/2015] [Indexed: 11/28/2022]
Affiliation(s)
- Marco Andrello
- CEFE UMR 5175; CNRS - Université de Montpellier - Université Paul-Valéry Montpellier - EPHE; laboratoire Biogéographie et écologie des vertébrés; 1919 route de Mende 34293 Montpellier Cedex 5 France
| | - Stéphanie Manel
- CEFE UMR 5175; CNRS - Université de Montpellier - Université Paul-Valéry Montpellier - EPHE; laboratoire Biogéographie et écologie des vertébrés; 1919 route de Mende 34293 Montpellier Cedex 5 France
| |
Collapse
|
105
|
Beheregaray LB, Cooke GM, Chao NL, Landguth EL. Ecological speciation in the tropics: insights from comparative genetic studies in Amazonia. Front Genet 2015; 5:477. [PMID: 25653668 PMCID: PMC4301025 DOI: 10.3389/fgene.2014.00477] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 12/29/2014] [Indexed: 11/26/2022] Open
Abstract
Evolution creates and sustains biodiversity via adaptive changes in ecologically relevant traits. Ecologically mediated selection contributes to genetic divergence both in the presence or absence of geographic isolation between populations, and is considered an important driver of speciation. Indeed, the genetics of ecological speciation is becoming increasingly studied across a variety of taxa and environments. In this paper we review the literature of ecological speciation in the tropics. We report on low research productivity in tropical ecosystems and discuss reasons accounting for the rarity of studies. We argue for research programs that simultaneously address biogeographical and taxonomic questions in the tropics, while effectively assessing relationships between reproductive isolation and ecological divergence. To contribute toward this goal, we propose a new framework for ecological speciation that integrates information from phylogenetics, phylogeography, population genomics, and simulations in evolutionary landscape genetics (ELG). We introduce components of the framework, describe ELG simulations (a largely unexplored approach in ecological speciation), and discuss design and experimental feasibility within the context of tropical research. We then use published genetic datasets from populations of five codistributed Amazonian fish species to assess the performance of the framework in studies of tropical speciation. We suggest that these approaches can assist in distinguishing the relative contribution of natural selection from biogeographic history in the origin of biodiversity, even in complex ecosystems such as Amazonia. We also discuss on how to assess ecological speciation using ELG simulations that include selection. These integrative frameworks have considerable potential to enhance conservation management in biodiversity rich ecosystems and to complement historical biogeographic and evolutionary studies of tropical biotas.
Collapse
Affiliation(s)
- Luciano B Beheregaray
- Molecular Ecology Lab, School of Biological Sciences, Flinders University Adelaide, SA, Australia
| | - Georgina M Cooke
- The Australian Museum, The Australian Museum Research Institute Sydney, NSW, Australia
| | - Ning L Chao
- Departamento de Ciências Pesqueiras, Universidade Federal do Amazonas Manaus, Brazil ; National Museum of Marine Biology and Aquarium Pintung, Taiwan
| | - Erin L Landguth
- Division of Biological Sciences, University of Montana Missoula, MT, USA
| |
Collapse
|
106
|
Peng B. Reproducible simulations of realistic samples for next-generation sequencing studies using Variant Simulation Tools. Genet Epidemiol 2015; 39:45-52. [PMID: 25395236 PMCID: PMC6432799 DOI: 10.1002/gepi.21867] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 09/14/2014] [Accepted: 09/26/2014] [Indexed: 12/31/2022]
Abstract
Computer simulations have been widely used to validate and evaluate the power of statistical methods for genetic epidemiological studies. Although a large number of simulation methods and software packages have been developed for genome-wide association studies, methodological and bioinformatics challenges have limited their applications in simulating datasets for whole-genome and whole-exome sequencing studies. With the development of more sophisticated statistical methods that make fuller use of available data and our knowledge of the human genome, there is a pressing need for genetic simulators that capture more features of empirical data (e.g., multiallele variants, indels, use of the Variant Call Format) and the human genome (e.g., functional annotations of genetic variants). This article introduces Variant Simulation Tools (VST), a module of Variant Tools for the simulation of genetic variants for sequencing-based genetic epidemiological studies. Although multiple simulation engines are provided, the core of VST is a novel forward-time simulation engine that simulates real nucleotide sequences of the human genome using DNA mutation models, fine-scale recombination maps, and a selection model based on amino acid changes of translated protein sequences. The design of VST allows users to easily create and distribute simulation methods and simulated datasets for a variety of applications and encourages fair comparison between statistical methods through the use of existing or reproduced simulated datasets.
Collapse
Affiliation(s)
- Bo Peng
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler Street, Unit 1401, Houston, TX, 77030
| |
Collapse
|
107
|
Peng B, Chen HS, Mechanic LE, Racine B, Clarke J, Gillanders E, Feuer EJ. Genetic data simulators and their applications: an overview. Genet Epidemiol 2014; 39:2-10. [PMID: 25504286 DOI: 10.1002/gepi.21876] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Revised: 09/14/2014] [Accepted: 10/31/2014] [Indexed: 11/10/2022]
Abstract
Computer simulations have played an indispensable role in the development and applications of statistical models and methods for genetic studies across multiple disciplines. The need to simulate complex evolutionary scenarios and pseudo-datasets for various studies has fueled the development of dozens of computer programs with varying reliability, performance, and application areas. To help researchers compare and choose the most appropriate simulators for their studies, we have created the genetic simulation resources (GSR) website, which allows authors of simulation software to register their applications and describe them with more than 160 defined attributes. This article summarizes the properties of 93 simulators currently registered at GSR and provides an overview of the development and applications of genetic simulators. Unlike other review articles that address technical issues or compare simulators for particular application areas, we focus on software development, maintenance, and features of simulators, often from a historical perspective. Publications that cite these simulators are used to summarize both the applications of genetic simulations and the utilization of simulators.
Collapse
Affiliation(s)
- Bo Peng
- Department of Bioinformatics and Computational Biology, The University of Texas, MD Anderson Cancer Center, Houston, Texas, United States of America
| | | | | | | | | | | | | |
Collapse
|
108
|
Schiffers KH, Travis JM. ALADYN - a spatially explicit, allelic model for simulating adaptive dynamics. ECOGRAPHY 2014; 37:1288-1291. [PMID: 25698848 PMCID: PMC4330972 DOI: 10.1111/ecog.00680] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
ALADYN is a freely available cross-platform C++ modeling framework for stochastic simulation of joint allelic and demographic dynamics of spatially-structured populations. Juvenile survival is linked to the degree of match between an individual's phenotype and the local phenotypic optimum. There is considerable flexibility provided for the demography of the considered species and the genetic architecture of the traits under selection. ALADYN facilitates the investigation of adaptive processes to spatially and/or temporally changing conditions and the resulting niche and range dynamics. To our knowledge ALADYN is so far the only model that allows a continuous resolution of individuals' locations in a spatially explicit landscape together with the associated patterns of selection.
Collapse
Affiliation(s)
- Katja H Schiffers
- Katja H Schiffers ( ), Evolution, Modeling and Analysis of BIOdiversity group, Laboratoire d'Ecologie Alpine, UMR CNRS 5553, Université Joseph Fourier, Grenoble Cedex 9, France
| | - Justin Mj Travis
- Justin MJ Travis, Institute of Biological and Environmental Sciences, Zoology building, Tillydrone Avenue, University of Aberdeen, Aberdeen AB24 2TZ, United Kingdom
| |
Collapse
|
109
|
Mijangos JL, Pacioni C, Spencer PBS, Craig MD. Contribution of genetics to ecological restoration. Mol Ecol 2014; 24:22-37. [DOI: 10.1111/mec.12995] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 10/17/2014] [Accepted: 11/01/2014] [Indexed: 12/17/2022]
Affiliation(s)
- Jose Luis Mijangos
- School of Veterinary and Life Sciences; Murdoch University; Murdoch WA 6150 Australia
| | - Carlo Pacioni
- School of Veterinary and Life Sciences; Murdoch University; Murdoch WA 6150 Australia
| | - Peter B. S. Spencer
- School of Veterinary and Life Sciences; Murdoch University; Murdoch WA 6150 Australia
| | - Michael D. Craig
- School of Veterinary and Life Sciences; Murdoch University; Murdoch WA 6150 Australia
- School of Plant Biology; University of Western Australia; Crawley WA 6009 Australia
| |
Collapse
|
110
|
Mäkinen H, Vasemägi A, McGinnity P, Cross TF, Primmer CR. Population genomic analyses of early-phase Atlantic Salmon (Salmo salar) domestication/captive breeding. Evol Appl 2014; 8:93-107. [PMID: 25667605 PMCID: PMC4310584 DOI: 10.1111/eva.12230] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 10/10/2014] [Indexed: 12/28/2022] Open
Abstract
Domestication can have adverse genetic consequences, which may reduce the fitness of individuals once released back into the wild. Many wild Atlantic salmon (Salmo salarL.) populations are threatened by anthropogenic influences, and they are supplemented with captively bred fish. The Atlantic salmon is also widely used in selective breeding programs to increase the mean trait values for desired phenotypic traits. We analyzed a genomewide set of SNPs in three domesticated Atlantic salmon strains and their wild conspecifics to identify loci underlying domestication. The genetic differentiation between domesticated strains and wild populations was low (FST < 0.03), and domesticated strains harbored similar levels of genetic diversity compared to their wild conspecifics. Only a few loci showed footprints of selection, and these loci were located in different linkage groups among the different wild population/hatchery strain comparisons. Simulated scenarios indicated that differentiation in quantitative trait loci exceeded that in neutral markers during the early phases of divergence only when the difference in the phenotypic optimum between populations was large. This study indicates that detecting selection using standard approaches in the early phases of domestication might be challenging unless selection is strong and the traits under selection show simple inheritance patterns.
Collapse
Affiliation(s)
- Hannu Mäkinen
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland
| | - Anti Vasemägi
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland ; Department of Aquaculture, Estonian University of Life Sciences Tartu, Estonia
| | - Philip McGinnity
- Aquaculture and Fisheries Development Centre, School of Biological, Earth, and Environmental Sciences, University College Cork Cork, Ireland ; Marine Institute, Furnace Newport, Co. Mayo, Ireland
| | - Tom F Cross
- Aquaculture and Fisheries Development Centre, School of Biological, Earth, and Environmental Sciences, University College Cork Cork, Ireland
| | - Craig R Primmer
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland
| |
Collapse
|
111
|
Chen HS, Hutter CM, Mechanic LE, Amos CI, Bafna V, Hauser ER, Hernandez RD, Li C, Liberles DA, McAllister K, Moore JH, Paltoo DN, Papanicolaou GJ, Peng B, Ritchie MD, Rosenfeld G, Witte JS, Gillanders EM, Feuer EJ. Genetic simulation tools for post-genome wide association studies of complex diseases. Genet Epidemiol 2014; 39:11-19. [PMID: 25371374 DOI: 10.1002/gepi.21870] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Revised: 09/02/2014] [Accepted: 09/26/2014] [Indexed: 01/12/2023]
Abstract
Genetic simulation programs are used to model data under specified assumptions to facilitate the understanding and study of complex genetic systems. Standardized data sets generated using genetic simulation are essential for the development and application of novel analytical tools in genetic epidemiology studies. With continuing advances in high-throughput genomic technologies and generation and analysis of larger, more complex data sets, there is a need for updating current approaches in genetic simulation modeling. To provide a forum to address current and emerging challenges in this area, the National Cancer Institute (NCI) sponsored a workshop, entitled "Genetic Simulation Tools for Post-Genome Wide Association Studies of Complex Diseases" at the National Institutes of Health (NIH) in Bethesda, Maryland on March 11-12, 2014. The goals of the workshop were to (1) identify opportunities, challenges, and resource needs for the development and application of genetic simulation models; (2) improve the integration of tools for modeling and analysis of simulated data; and (3) foster collaborations to facilitate development and applications of genetic simulation. During the course of the meeting, the group identified challenges and opportunities for the science of simulation, software and methods development, and collaboration. This paper summarizes key discussions at the meeting, and highlights important challenges and opportunities to advance the field of genetic simulation.
Collapse
Affiliation(s)
- Huann-Sheng Chen
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Carolyn M Hutter
- Division of Genomic Medicine, National Human Genome Research Institute, NIH, Bethesda, MD 20892
| | - Leah E Mechanic
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Christopher I Amos
- Division of Community, Family Medicine, Dartmouth College, Lebanon, NH 03755
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093
| | | | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143
| | - Chun Li
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37235
| | - David A Liberles
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071
| | - Kimberly McAllister
- Susceptibility and Population Health Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709
| | - Jason H Moore
- Department of Genetics, Dartmouth College, Lebanon, NH 03755
| | - Dina N Paltoo
- Office of Director, National Institutes of Health, Bethesda, MD 20892
| | - George J Papanicolaou
- Division of Cardiovascular Sciences, Prevention and Population Sciences Program, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892
| | - Bo Peng
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802
| | - Gabriel Rosenfeld
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94107
| | - Elizabeth M Gillanders
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| | - Eric J Feuer
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, MD 20892
| |
Collapse
|
112
|
Putman AI, Carbone I. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol Evol 2014; 4:4399-428. [PMID: 25540699 PMCID: PMC4267876 DOI: 10.1002/ece3.1305] [Citation(s) in RCA: 237] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 10/02/2014] [Accepted: 10/03/2014] [Indexed: 12/14/2022] Open
Abstract
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (F ST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high-throughput technologies in population genetics.
Collapse
Affiliation(s)
- Alexander I Putman
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| | - Ignazio Carbone
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| |
Collapse
|
113
|
Hoban S, Arntzen JA, Bruford MW, Godoy JA, Rus Hoelzel A, Segelbacher G, Vilà C, Bertorelle G. Comparative evaluation of potential indicators and temporal sampling protocols for monitoring genetic erosion. Evol Appl 2014; 7:984-98. [PMID: 25553062 PMCID: PMC4231590 DOI: 10.1111/eva.12197] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 06/27/2014] [Indexed: 01/13/2023] Open
Abstract
Genetic biodiversity contributes to individual fitness, species' evolutionary potential, and ecosystem stability. Temporal monitoring of the genetic status and trends of wild populations' genetic diversity can provide vital data to inform policy decisions and management actions. However, there is a lack of knowledge regarding which genetic metrics, temporal sampling protocols, and genetic markers are sufficiently sensitive and robust, on conservation-relevant timescales. Here, we tested six genetic metrics and various sampling protocols (number and arrangement of temporal samples) for monitoring genetic erosion following demographic decline. To do so, we utilized individual-based simulations featuring an array of different initial population sizes, types and severity of demographic decline, and DNA markers [single nucleotide polymorphisms (SNPs) and microsatellites] as well as decline followed by recovery. Number of alleles markedly outperformed other indicators across all situations. The type and severity of demographic decline strongly affected power, while the number and arrangement of temporal samples had small effect. Sampling 50 individuals at as few as two time points with 20 microsatellites performed well (good power), and could detect genetic erosion while 80-90% of diversity remained. This sampling and genotyping effort should often be affordable. Power increased substantially with more samples or markers, and we observe that power of 2500 SNPs was nearly equivalent to 250 microsatellites, a result of theoretical and practical interest. Our results suggest high potential for using historic collections in monitoring programs, and demonstrate the need to monitor genetic as well as other levels of biodiversity.
Collapse
Affiliation(s)
- Sean Hoban
- National Institute for Mathematical and Biological Synthesis (NIMBioS), University of TennesseeKnoxville, TN, USA
- Department of Life Science, Università di FerraraFerrara, Italy
| | - Jan A Arntzen
- Naturalis Biodiversity CenterLeiden, the Netherlands
| | | | - José A Godoy
- Estación Biológica de Doñana (EBD-CSIC)Seville, Spain
| | | | | | - Carles Vilà
- Estación Biológica de Doñana (EBD-CSIC)Seville, Spain
| | | |
Collapse
|
114
|
Dellicour S, Kastally C, Hardy OJ, Mardulyn P. Comparing phylogeographic hypotheses by simulating DNA sequences under a spatially explicit model of coalescence. Mol Biol Evol 2014; 31:3359-72. [PMID: 25261404 DOI: 10.1093/molbev/msu277] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computer simulations of genetic data are increasingly used to investigate the impact of complex historical scenarios on patterns of genetic variation. Yet, in most empirical studies, relatively large portions of species ranges are often treated as panmictic populations, ignoring the underlying spatial context. In some cases, however, a more accurate spatial model is required. We use a spatially explicit model of coalescence (easily constructed by overlaying a two-dimensional grid on maps displaying an estimate of past and current species ranges) to evaluate the potential of several summary statistics to differentiate three typical phylogeographic scenarios. We first explore the variation of each summary statistic within the boundaries of each phylogeographic scenario, and identify those that appear most promising for a comparison of historical scenarios and/or to infer historical parameters. We then combine a selected set of summary statistics in a single chi-square statistic and evaluate whether it can be used to differentiate past geographic fragmentation or range expansion from a simple scenario of isolation by distance. We also investigate the benefits of using a spatially explicit model by comparing its performance to alternative models that are less spatially explicit (lower geographic resolution). The results identify conditions in which each summary statistic is useful to infer the evolution of a species range, and allow us to validate our spatially explicit model of coalescence and our procedure to compare simulated and observed sequence data. We also provide a detailed description of the spatially explicit model of coalescence used, which is currently lacking.
Collapse
Affiliation(s)
- Simon Dellicour
- Evolutionary Biology and Ecology, Université Libre de Bruxelles, Brussels, Belgium
| | - Chedly Kastally
- Evolutionary Biology and Ecology, Université Libre de Bruxelles, Brussels, Belgium
| | - Olivier J Hardy
- Evolutionary Biology and Ecology, Université Libre de Bruxelles, Brussels, Belgium
| | - Patrick Mardulyn
- Evolutionary Biology and Ecology, Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
115
|
Li P, Guo M, Wang C, Liu X, Zou Q. An overview of SNP interactions in genome-wide association studies. Brief Funct Genomics 2014; 14:143-55. [PMID: 25241224 DOI: 10.1093/bfgp/elu036] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
With the recent explosion in high-throughput genotyping technology, the amount and quality of single-nucleotide polymorphism (SNP) data has increased exponentially. Therefore, the identification of SNP interactions that are associated with common diseases is playing an increasing and important role in interpreting the genetic basis of disease susceptibility and in devising new diagnostic tests and treatments. However, because these data sets are large, although they typically have small sample sizes and low signal-to-noise ratios, there has been no major breakthrough despite many efforts, making this a major focus in the field of bioinformatics. In this article, we review the two main aspects of SNP interaction studies in recent years-the simulation and identification of SNP interactions-and then discuss the principles, efficiency and differences between these methods.
Collapse
|
116
|
Nyman T, Valtonen M, Aspi J, Ruokonen M, Kunnasranta M, Palo JU. Demographic histories and genetic diversities of Fennoscandian marine and landlocked ringed seal subspecies. Ecol Evol 2014; 4:3420-34. [PMID: 25535558 PMCID: PMC4228616 DOI: 10.1002/ece3.1193] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 07/21/2014] [Accepted: 07/22/2014] [Indexed: 11/18/2022] Open
Abstract
Island populations are on average smaller, genetically less diverse, and at a higher risk to go extinct than mainland populations. Low genetic diversity may elevate extinction probability, but the genetic component of the risk can be affected by the mode of diversity loss, which, in turn, is connected to the demographic history of the population. Here, we examined the history of genetic erosion in three Fennoscandian ringed seal subspecies, of which one inhabits the Baltic Sea 'mainland' and two the 'aquatic islands' composed of Lake Saimaa in Finland and Lake Ladoga in Russia. Both lakes were colonized by marine seals after their formation c. 9500 years ago, but Lake Ladoga is larger and more contiguous than Lake Saimaa. All three populations suffered dramatic declines during the 20th century, but the bottleneck was particularly severe in Lake Saimaa. Data from 17 microsatellite loci and mitochondrial control-region sequences show that Saimaa ringed seals have lost most of the genetic diversity present in their Baltic ancestors, while the Ladoga population has experienced only minor reductions. Using Approximate Bayesian computing analyses, we show that the genetic uniformity of the Saimaa subspecies derives from an extended founder event and subsequent slow erosion, rather than from the recent bottleneck. This suggests that the population has persisted for nearly 10,000 years despite having low genetic variation. The relatively high diversity of the Ladoga population appears to result from a high number of initial colonizers and a high post-colonization population size, but possibly also by a shorter isolation period and/or occasional gene flow from the Baltic Sea.
Collapse
Affiliation(s)
- Tommi Nyman
- Department of Biology, University of Eastern FinlandPO Box 111, Joensuu, FI-80101, Finland
- Institute for Systematic Botany, University of Zurich, Zollikerstrasse 107Zurich, CH-8008, Switzerland
| | - Mia Valtonen
- Department of Biology, University of Eastern FinlandPO Box 111, Joensuu, FI-80101, Finland
| | - Jouni Aspi
- Department of Biology, University of OuluPO Box 3000, Oulu, FI-90014, Finland
| | - Minna Ruokonen
- Department of Biology, University of OuluPO Box 3000, Oulu, FI-90014, Finland
| | - Mervi Kunnasranta
- Department of Biology, University of Eastern FinlandPO Box 111, Joensuu, FI-80101, Finland
| | - Jukka U Palo
- Laboratory of Forensic Biology, Hjelt Institute, University of HelsinkiPO Box 40, Helsinki, FI-00014, Finland
| |
Collapse
|
117
|
Benguigui M, Arenas M. Spatial and temporal simulation of human evolution. Methods, frameworks and applications. Curr Genomics 2014; 15:245-55. [PMID: 25132795 PMCID: PMC4133948 DOI: 10.2174/1389202915666140506223639] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 04/05/2014] [Accepted: 05/04/2014] [Indexed: 01/29/2023] Open
Abstract
Analyses of human evolution are fundamental to understand the current gradients of human diversity. In this concern, genetic samples collected from current populations together with archaeological data are the most important resources to study human evolution. However, they are often insufficient to properly evaluate a variety of evolutionary scenarios, leading to continuous debates and discussions. A commonly applied strategy consists of the use of computer simulations based on, as realistic as possible, evolutionary models, to evaluate alternative evolutionary scenarios through statistical correlations with the real data. Computer simulations can also be applied to estimate evolutionary parameters or to study the role of each parameter on the evolutionary process. Here we review the mainly used methods and evolutionary frameworks to perform realistic spatially explicit computer simulations of human evolution. Although we focus on human evolution, most of the methods and software we describe can also be used to study other species. We also describe the importance of considering spatially explicit models to better mimic human evolutionary scenarios based on a variety of phenomena such as range expansions, range shifts, range contractions, sex-biased dispersal, long-distance dispersal or admixtures of populations. We finally discuss future implementations to improve current spatially explicit simulations and their derived applications in human evolution.
Collapse
Affiliation(s)
- Macarena Benguigui
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| |
Collapse
|
118
|
Petitjean M, Vanet A. VIRAPOPS2 supports the influenza virus reassortments. SOURCE CODE FOR BIOLOGY AND MEDICINE 2014; 9:18. [PMID: 25183993 PMCID: PMC4144320 DOI: 10.1186/1751-0473-9-18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 08/07/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND For over 400 years, due to the reassortment of their segmented genomes, influenza viruses evolve extremely quickly and cause devastating epidemics. This reassortment arises because two flu viruses can infect the same cell and therefore the new virions' genomes will be composed of segment reassortments of the two parental strains. A treatment developed against parents could then be ineffective if the virions' genomes are different enough from their parent's genomes. It is therefore essential to simulate such reassortment phenomena to assess the risk of apparition of new flu strain. FINDINGS So we decided to upgrade the forward simulator VIRAPOPS, containing already the necessary options to handle non-segmented viral populations. This new version can mimic single or successive reassortments, in birds, humans and/or swines. Other options such as the ability to treat populations of positive or negative sense viral RNAs, were also added. Finally, we propose output options giving statistics of the results. CONCLUSION In this paper we present a new version of VIRAPOPS which now manages the viral segment reassortments and the negative sense single strain RNA viruses, these two issues being the cause of serious public health problems.
Collapse
Affiliation(s)
- Michel Petitjean
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013 Paris, France ; MTI, INSERM UMR-S 973, F-75013 Paris, France
| | - Anne Vanet
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013 Paris, France ; CNRS, UMR7592, Institut Jacques Monod, F-75013 Paris, France ; Atelier de Bio Informatique, F-75005 Paris, France
| |
Collapse
|
119
|
Spurgin LG, Wright DJ, van der Velde M, Collar NJ, Komdeur J, Burke T, Richardson DS. Museum DNA reveals the demographic history of the endangered Seychelles warbler. Evol Appl 2014; 7:1134-43. [PMID: 25553073 PMCID: PMC4231601 DOI: 10.1111/eva.12191] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 06/30/2014] [Indexed: 01/10/2023] Open
Abstract
The importance of evolutionary conservation – how understanding evolutionary forces can help guide conservation decisions – is widely recognized. However, the historical demography of many endangered species is unknown, despite the fact that this can have important implications for contemporary ecological processes and for extinction risk. Here, we reconstruct the population history of the Seychelles warbler (Acrocephalus sechellensis) – an ecological model species. By the 1960s, this species was on the brink of extinction, but its previous history is unknown. We used DNA samples from contemporary and museum specimens spanning 140 years to reconstruct bottleneck history. We found a 25% reduction in genetic diversity between museum and contemporary populations, and strong genetic structure. Simulations indicate that the Seychelles warbler was bottlenecked from a large population, with an ancestral Ne of several thousands falling to <50 within the last century. Such a rapid decline, due to anthropogenic factors, has important implications for extinction risk in the Seychelles warbler, and our results will inform conservation practices. Reconstructing the population history of this species also allows us to better understand patterns of genetic diversity, inbreeding and promiscuity in the contemporary populations. Our approaches can be applied across species to test ecological hypotheses and inform conservation.
Collapse
Affiliation(s)
- Lewis G Spurgin
- School of Biological Sciences, University of East Anglia Norwich, Norfolk, UK ; Behavioural Ecology and Self-organization Group, Centre for Ecological and Evolutionary Studies, University of Groningen Groningen, The Netherlands
| | - David J Wright
- School of Biological Sciences, University of East Anglia Norwich, Norfolk, UK ; Department of Animal and Plant Sciences, NERC Biomolecular Analysis Facility, University of Sheffield Sheffield, UK
| | - Marco van der Velde
- Behavioural Ecology and Self-organization Group, Centre for Ecological and Evolutionary Studies, University of Groningen Groningen, The Netherlands
| | - Nigel J Collar
- School of Biological Sciences, University of East Anglia Norwich, Norfolk, UK ; BirdLife International Cambridge, UK
| | - Jan Komdeur
- Behavioural Ecology and Self-organization Group, Centre for Ecological and Evolutionary Studies, University of Groningen Groningen, The Netherlands
| | - Terry Burke
- Department of Animal and Plant Sciences, NERC Biomolecular Analysis Facility, University of Sheffield Sheffield, UK
| | - David S Richardson
- School of Biological Sciences, University of East Anglia Norwich, Norfolk, UK ; Nature Seychelles Roche Caiman, Mahé, Republic of Seychelles
| |
Collapse
|
120
|
Rebaudo F, Costa J, Almeida CE, Silvain JF, Harry M, Dangles O. Simulating population genetics of pathogen vectors in changing landscapes: guidelines and application with Triatoma brasiliensis. PLoS Negl Trop Dis 2014; 8:e3068. [PMID: 25102068 PMCID: PMC4125301 DOI: 10.1371/journal.pntd.0003068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 06/21/2014] [Indexed: 11/28/2022] Open
Abstract
Background Understanding the mechanisms that influence the population dynamics and spatial genetic structure of the vectors of pathogens infecting humans is a central issue in tropical epidemiology. In view of the rapid changes in the features of landscape pathogen vectors live in, this issue requires new methods that consider both natural and human systems and their interactions. In this context, individual-based model (IBM) simulations represent powerful yet poorly developed approaches to explore the response of pathogen vectors in heterogeneous social-ecological systems, especially when field experiments cannot be performed. Methodology/Principal Findings We first present guidelines for the use of a spatially explicit IBM, to simulate population genetics of pathogen vectors in changing landscapes. We then applied our model with Triatoma brasiliensis, originally restricted to sylvatic habitats and now found in peridomestic and domestic habitats, posing as the most important Trypanosoma cruzi vector in Northeastern Brazil. We focused on the effects of vector migration rate, maximum dispersal distance and attraction by domestic habitat on T. brasiliensis population dynamics and spatial genetic structure. Optimized for T. brasiliensis using field data pairwise fixation index (FST) from microsatellite loci, our simulations confirmed the importance of these three variables to understand vector genetic structure at the landscape level. We then ran prospective scenarios accounting for land-use change (deforestation and urbanization), which revealed that human-induced land-use change favored higher genetic diversity among sampling points. Conclusions/Significance Our work shows that mechanistic models may be useful tools to link observed patterns with processes involved in the population genetics of tropical pathogen vectors in heterogeneous social-ecological landscapes. Our hope is that our study may provide a testable and applicable modeling framework to a broad community of epidemiologists for formulating scenarios of landscape change consequences on vector dynamics, with potential implications for their surveillance and control. Worldwide, humans are modifying landscapes at an unprecedented rate. These modifications have an influence on the ecology of pathogen vectors, yet this issue has received relatively little input from modeling research. The current study presents guidelines for the use of a modeling framework for the representation of the dynamics and spatial genetic structure of pathogen vectors. It allows considering spatiotemporal landscape modifications explicitly, to represent human-altered modifications and consequences. We applied this modeling framework to Triatoma brasiliensis, vector of the pathogen Trypanosoma cruzi responsible for the Chagas disease, in the semi-arid Northeastern Brazil. Using field data of pairwise fixation index (FST) from microsatellite loci, we found that migration rate, maximum dispersal distance and attraction by domestic habitat were all key parameters to understand vector spatial genetic structure at the landscape level. At the interface across disciplines, this study provides to the community of epidemiologists a testable and applicable framework to foresee landscape modification consequences on vector dynamics and genetic structure, with potential implications for their surveillance and control.
Collapse
Affiliation(s)
- Francois Rebaudo
- BEI-UR072, IRD, Gif-sur-Yvette, France
- LEGS-UPR9034, CNRS-UPSud11, Gif-sur-Yvette, France
- * E-mail:
| | - Jane Costa
- Laboratório de Biodiversidade Entomológica, Instituto Oswaldo Cruz - Fiocruz, Rio de Janeiro, Rio de Janeiro, Brasil
| | - Carlos E. Almeida
- Departamento de Ciências Biológicas, Faculdade de Ciências Farmacêuticas, UNESP, Araraquara, Sao Paolo, Brasil
| | - Jean-Francois Silvain
- BEI-UR072, IRD, Gif-sur-Yvette, France
- LEGS-UPR9034, CNRS-UPSud11, Gif-sur-Yvette, France
| | - Myriam Harry
- LEGS-UPR9034, CNRS-UPSud11, Gif-sur-Yvette, France
| | - Olivier Dangles
- BEI-UR072, IRD, Gif-sur-Yvette, France
- LEGS-UPR9034, CNRS-UPSud11, Gif-sur-Yvette, France
- Instituto de Ecología, Campus Cotacota, Universidad Mayor San Andrés, La Paz, Bolivia
| |
Collapse
|
121
|
Tellier A, Lemaire C. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol Ecol 2014; 23:2637-52. [DOI: 10.1111/mec.12755] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Revised: 04/08/2014] [Accepted: 04/13/2014] [Indexed: 02/01/2023]
Affiliation(s)
- Aurélien Tellier
- Section of Population Genetics; Center of Life and Food Sciences Weihenstephan; Technische Universität München; 85354 Freising Germany
| | - Christophe Lemaire
- LUNAM; UMR1345 Institut de Recherche en Horticulture et Semences; Université d'Angers; SFR 4207 QUASAV 49045 Angers France
- INRA; UMR1345 Institut de Recherche en Horticulture et Semences; 49071 Beaucouzé France
- AgroCampus-Ouest; UMR1345 Institut de Recherche en Horticulture et Semences; 49045 Angers France
| |
Collapse
|
122
|
Killcoyne S, del Sol A. FIGG: simulating populations of whole genome sequences for heterogeneous data analyses. BMC Bioinformatics 2014; 15:149. [PMID: 24885193 PMCID: PMC4039316 DOI: 10.1186/1471-2105-15-149] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 05/09/2014] [Indexed: 12/15/2022] Open
Abstract
Background High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. Results We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. Conclusion In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools.
Collapse
Affiliation(s)
| | - Antonio del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts fourneaux, Esch/Alzette L-4362, Luxembourg.
| |
Collapse
|
123
|
Bielejec F, Lemey P, Carvalho LM, Baele G, Rambaut A, Suchard MA. πBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios. BMC Bioinformatics 2014; 15:133. [PMID: 24885610 PMCID: PMC4020384 DOI: 10.1186/1471-2105-15-133] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 04/24/2014] [Indexed: 01/12/2023] Open
Abstract
Background Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results We present a flexible Monte Carlo simulation tool, called πBUSS, that employs the BEAGLE high performance library for phylogenetic computations to rapidly generate large sequence alignments under complex evolutionary models. πBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. πBUSS may serve as an easy-to-use, standard sequence simulation tool, but the available models and data types are particularly useful to assess the performance of complex BEAST inferences. The connection with BEAST is further strengthened through the use of a common extensible markup language (XML), allowing to specify also more advanced evolutionary models. To support simulation under the latter, as well as to support simulation and analysis in a single run, we also add the πBUSS core simulation routine to the list of BEAST XML parsers. Conclusions πBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, πBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. Applications are not restricted to the BEAST framework, or even time-measured evolutionary histories, and πBUSS can be connected to various other programs using standard input and output format.
Collapse
Affiliation(s)
- Filip Bielejec
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
124
|
Della Croce P, Poole GC, Payn RA, Izurieta C. Simulating the effects of stream network topology on the spread of introgressive hybridization across fish populations. Ecol Modell 2014. [DOI: 10.1016/j.ecolmodel.2014.02.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
125
|
Hoban S. An overview of the utility of population simulation software in molecular ecology. Mol Ecol 2014; 23:2383-401. [DOI: 10.1111/mec.12741] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Revised: 03/22/2014] [Accepted: 03/26/2014] [Indexed: 01/12/2023]
Affiliation(s)
- Sean Hoban
- National Institute for Mathematical and Biological Synthesis; University of Tennessee; 1122 Volunteer Blvd. Suite 110A Knoxville TN 37996-3410 USA
| |
Collapse
|
126
|
Bocedi G, Palmer SC, Pe'er G, Heikkinen RK, Matsinos YG, Watts K, Travis JM. RangeShifter: a platform for modelling spatial eco-evolutionary dynamics and species' responses to environmental changes. Methods Ecol Evol 2014. [DOI: 10.1111/2041-210x.12162] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Greta Bocedi
- Institute of Biological and Environmental Sciences; University of Aberdeen; Zoology Building, Tillydrone Avenue Aberdeen AB24 2TZ UK
| | - Stephen C.F. Palmer
- Institute of Biological and Environmental Sciences; University of Aberdeen; Zoology Building, Tillydrone Avenue Aberdeen AB24 2TZ UK
| | - Guy Pe'er
- Department of Conservation Biology; UFZ - Helmholtz Centre for Environmental Research; Permoserstr. 15 Leipzig 04318 Germany
| | - Risto K. Heikkinen
- Finnish Environment Institute; Natural Environment Centre; P.O. Box 140 Helsinki FI-00251 Finland
| | - Yiannis G. Matsinos
- Department of Environment; Biodiversity Conservation Laboratory; University of the Aegean; Mytilini 81100 Greece
| | - Kevin Watts
- Forest Research; Alice Holt Lodge, Farnham Surrey GU10 4LH UK
| | - Justin M.J. Travis
- Institute of Biological and Environmental Sciences; University of Aberdeen; Zoology Building, Tillydrone Avenue Aberdeen AB24 2TZ UK
| |
Collapse
|
127
|
Arenas M, Posada D. Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 2014; 31:1295-301. [PMID: 24557445 PMCID: PMC3995339 DOI: 10.1093/molbev/msu078] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE—Simulation of Genome-Wide Evolution—is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | | |
Collapse
|
128
|
Cope RC, Lanyon JM, Seddon JM, Pollett PK. Development and testing of a genetic marker-based pedigree reconstruction system 'PR-genie' incorporating size-class data. Mol Ecol Resour 2014; 14:857-70. [PMID: 24373173 DOI: 10.1111/1755-0998.12219] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 12/02/2013] [Accepted: 12/11/2013] [Indexed: 11/28/2022]
Abstract
For wildlife populations, it is often difficult to determine biological parameters that indicate breeding patterns and population mixing, but knowledge of these parameters is essential for effective management. A pedigree encodes the relationship between individuals and can provide insight into the dynamics of a population over its recent history. Here, we present a method for the reconstruction of pedigrees for wild populations of animals that live long enough to breed multiple times over their lifetime and that have complex or unknown generational structures. Reconstruction was based on microsatellite genotype data along with ancillary biological information: sex and observed body size class as an indicator of relative age of individuals within the population. Using body size-class data to infer relative age has not been considered previously in wildlife genealogy and provides a marked improvement in accuracy of pedigree reconstruction. Body size-class data are particularly useful for wild populations because it is much easier to collect noninvasively than absolute age data. This new pedigree reconstruction system, PR-genie, performs reconstruction using maximum likelihood with optimization driven by the cross-entropy method. We demonstrated pedigree reconstruction performance on simulated populations (comparing reconstructed pedigrees to known true pedigrees) over a wide range of population parameters and under assortative and intergenerational mating schema. Reconstruction accuracy increased with the presence of size-class data and as the amount and quality of genetic data increased. We provide recommendations as to the amount and quality of data necessary to provide insight into detailed familial relationships in a wildlife population using this pedigree reconstruction technique.
Collapse
Affiliation(s)
- Robert C Cope
- School of Biological Science, The University of Queensland, St Lucia, Qld, 4072, Australia
| | | | | | | |
Collapse
|
129
|
Bay RA, Ramakrishnan U, Hadly EA. A call for tiger management using "reserves" of genetic diversity. J Hered 2013; 105:295-302. [PMID: 24336928 DOI: 10.1093/jhered/est086] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Tigers (Panthera tigris), like many large carnivores, are threatened by anthropogenic impacts, primarily habitat loss and poaching. Current conservation plans for tigers focus on population expansion, with the goal of doubling census size in the next 10 years. Previous studies have shown that because the demographic decline was recent, tiger populations still retain a large amount of genetic diversity. Although maintaining this diversity is extremely important to avoid deleterious effects of inbreeding, management plans have yet to consider predictive genetic models. We used coalescent simulations based on previously sequenced mitochondrial fragments (n = 125) from 5 of 6 extant subspecies to predict the population growth needed to maintain current genetic diversity over the next 150 years. We found that the level of gene flow between populations has a large effect on the local population growth necessary to maintain genetic diversity, without which tigers may face decreases in fitness. In the absence of gene flow, we demonstrate that maintaining genetic diversity is impossible based on known demographic parameters for the species. Thus, managing for the genetic diversity of the species should be prioritized over the riskier preservation of distinct subspecies. These predictive simulations provide unique management insights, hitherto not possible using existing analytical methods.
Collapse
Affiliation(s)
- Rachael A Bay
- the Department of Biology, Stanford University, Stanford, CA 94305
| | | | | |
Collapse
|
130
|
Kessner D, Novembre J. forqs: forward-in-time simulation of recombination, quantitative traits and selection. ACTA ACUST UNITED AC 2013; 30:576-7. [PMID: 24336146 PMCID: PMC3928523 DOI: 10.1093/bioinformatics/btt712] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Summary: forqs is a forward-in-time simulation of recombination, quantitative traits and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. Availability and implementation: forqs is implemented as a command-line C++ program. Source code and binary executables for Linux, OSX and Windows are freely available under a permissive BSD license: https://bitbucket.org/dkessner/forqs. Contact:jnovembre@uchicago.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Darren Kessner
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA 90095 and Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | |
Collapse
|
131
|
Hoban S, Arntzen JW, Bertorelle G, Bryja J, Fernandes M, Frith K, Gaggiotti O, Galbusera P, Godoy JA, Hauffe HC, Rus Hoelzel A, Nichols RA, Pérez-Espona S, Primmer C, Russo IRM, Segelbacher G, Siegismund HR, Sihvonen M, Sjögren-Gulve P, Vernesi C, Vilà C, Bruford MW. Conservation Genetic Resources for Effective Species Survival (ConGRESS): Bridging the divide between conservation research and practice. J Nat Conserv 2013. [DOI: 10.1016/j.jnc.2013.07.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
132
|
White JW, Rassweiler A, Samhouri JF, Stier AC, White C. Ecologists should not use statistical significance tests to interpret simulation model results. OIKOS 2013. [DOI: 10.1111/j.1600-0706.2013.01073.x] [Citation(s) in RCA: 242] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
133
|
Ellstrand NC, Meirmans P, Rong J, Bartsch D, Ghosh A, de Jong TJ, Haccou P, Lu BR, Snow AA, Neal Stewart C, Strasburg JL, van Tienderen PH, Vrieling K, Hooftman D. Introgression of Crop Alleles into Wild or Weedy Populations. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2013. [DOI: 10.1146/annurev-ecolsys-110512-135840] [Citation(s) in RCA: 149] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Norman C. Ellstrand
- Department of Botany and Plant Sciences, University of California, Riverside, California 92521;
| | - Patrick Meirmans
- Instituut voor Biodiversiteit en Ecosysteem Dynamica, Universiteit van Amsterdam, 1098 XH Amsterdam, The Netherlands;
| | - Jun Rong
- Center for Watershed Ecology, Institute of Life Science and Key Laboratory of Poyang Lake Environment and Resource Utilization, Ministry of Education, Nanchang University, 330031 Honggutan Nanchang, People's Republic of China;
| | - Detlef Bartsch
- Federal Office of Consumer Protection and Food Safety, 10117 Berlin, Germany;
| | - Atiyo Ghosh
- Integrative Systems Biology, Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan;
| | - Tom J. de Jong
- Institute of Biology, Leiden University, 2333 BE Leiden, The Netherlands; ,
| | - Patsy Haccou
- Leiden University College The Hague, Leiden University, 2514 EG The Hague, The Netherlands;
| | - Bao-Rong Lu
- Ministry of Education Key Laboratory for Biodiversity and Ecological Engineering, Department of Ecology and Evolutionary Biology, Fudan University, Shanghai 200433, People's Republic of China; ,
| | - Allison A. Snow
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, Ohio 43210;
| | - C. Neal Stewart
- Department of Plant Sciences, University of Tennessee, Knoxville, Tennessee 37996;
| | | | - Peter H. van Tienderen
- Instituut voor Biodiversiteit en Ecosysteem Dynamica, Universiteit van Amsterdam, 1090 GE Amsterdam, The Netherlands;
| | - Klaas Vrieling
- Institute of Biology, Leiden University, 2333 BE Leiden, The Netherlands; ,
| | - Danny Hooftman
- Center for Ecology and Hydrology, National Environmental Research Council, Wallingford, Oxfordshire OX10 8BB, United Kingdom;
| |
Collapse
|
134
|
Hoban SM, Mezzavilla M, Gaggiotti OE, Benazzo A, van Oosterhout C, Bertorelle G. High variance in reproductive success generates a false signature of a genetic bottleneck in populations of constant size: a simulation study. BMC Bioinformatics 2013; 14:309. [PMID: 24131797 PMCID: PMC3852946 DOI: 10.1186/1471-2105-14-309] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 10/09/2013] [Indexed: 11/26/2022] Open
Abstract
Background Demographic bottlenecks can severely reduce the genetic variation of a population or a species. Establishing whether low genetic variation is caused by a bottleneck or a constantly low effective number of individuals is important to understand a species’ ecology and evolution, and it has implications for conservation management. Recent studies have evaluated the power of several statistical methods developed to identify bottlenecks. However, the false positive rate, i.e. the rate with which a bottleneck signal is misidentified in demographically stable populations, has received little attention. We analyse this type of error (type I) in forward computer simulations of stable populations having greater than Poisson variance in reproductive success (i.e., variance in family sizes). The assumption of Poisson variance underlies bottleneck tests, yet it is commonly violated in species with high fecundity. Results With large variance in reproductive success (Vk ≥ 40, corresponding to a ratio between effective and census size smaller than 0.1), tests based on allele frequencies, allelic sizes, and DNA sequence polymorphisms (heterozygosity excess, M-ratio, and Tajima’s D test) tend to show erroneous signals of a bottleneck. Similarly, strong evidence of population decline is erroneously detected when ancestral and current population sizes are estimated with the model based method MSVAR. Conclusions Our results suggest caution when interpreting the results of bottleneck tests in species showing high variance in reproductive success. Particularly in species with high fecundity, computer simulations are recommended to confirm the occurrence of a population bottleneck.
Collapse
Affiliation(s)
| | | | | | | | | | - Giorgio Bertorelle
- Department of Life Sciences and Biotechnology, University of Ferrara, via Borsari 46, Ferrara I-44121, Italy.
| |
Collapse
|
135
|
Affiliation(s)
- K. Petren
- Department of Biological Sciences; University of Cincinnati; Cincinnati Ohio 45221
| |
Collapse
|
136
|
Johansson ML, Raimondi PT, Reed DC, Coelho NC, Serrão EA, Alberto FA. Looking into the black box: simulating the role of self-fertilization and mortality in the genetic structure of Macrocystis pyrifera. Mol Ecol 2013; 22:4842-54. [PMID: 23962179 DOI: 10.1111/mec.12444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 07/03/2013] [Indexed: 01/10/2024]
Abstract
Patterns of spatial genetic structure (SGS), typically estimated by genotyping adults, integrate migration over multiple generations and measure the effective gene flow of populations. SGS results can be compared with direct ecological studies of dispersal or mating system to gain additional insights. When mismatches occur, simulations can be used to illuminate the causes of these mismatches. Here, we report a SGS and simulation-based study of self-fertilization in Macrocystis pyrifera, the giant kelp. We found that SGS is weaker than expected in M. pyrifera and used computer simulations to identify selfing and early mortality rates for which the individual heterozygosity distribution fits that of the observed data. Only one (of three) population showed both elevated kinship in the smallest distance class and a significant negative slope between kinship and geographical distance. All simulations had poor fit to the observed data unless mortality due to inbreeding depression was imposed. This mortality could only be imposed for selfing, as these were the only simulations to show an excess of homozygous individuals relative to the observed data. Thus, the expected data consistently achieved nonsignificant differences from the observed data only under models of selfing with mortality, with best fits between 32% and 42% selfing. Inbreeding depression ranged from 0.70 to 0.73. The results suggest that density-dependent mortality of early life stages is a significant force in structuring Macrocystis populations, with few highly homozygous individuals surviving. The success of these results should help to validate simulation approaches even in data-poor systems, as a means to estimate otherwise difficult-to-measure life cycle parameters.
Collapse
Affiliation(s)
- Mattias L Johansson
- Department of Biological Sciences, University of Wisconsin - Milwaukee, PO Box 413, Milwaukee, WI, 53201, USA
| | | | | | | | | | | |
Collapse
|
137
|
Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013; 29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. RESULTS We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology 'Severo Ochoa', Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain and Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
138
|
Capocasa M, Taglioli L, Anagnostou P, Paoli G, Danubio ME. Determinants of marital behaviour in five Apennine communities of Central Italy inferred by surname analysis, repeated pairs and kinship estimates. HOMO-JOURNAL OF COMPARATIVE HUMAN BIOLOGY 2013; 65:64-74. [PMID: 24012323 DOI: 10.1016/j.jchb.2013.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 07/23/2013] [Indexed: 10/26/2022]
Abstract
The work makes use of surname analysis, repeated pairs and kinship estimates in 11,009 marriage records celebrated in five communities of the Italian Central Apennine (Celano, Lecce dei Marsi, Ortucchio, Roio, Villavallelonga) from 1802 to 1965 with the objective to deepen knowledge of the relative influence of several determinants on their marital behaviour. These towns are part of the same geographic and economic environment: the slopes of the ancient Fucino Lake. This work further elaborates the results from previous studies on the bio-demographic model of the region. The data were analyzed according to three periods of approximately 50 years. Results show the highest inbreeding coefficients in the pastoral towns of Roio and Villavallelonga. Repeated pair analysis highlights a certain degree of population subdivision which declined in time in Celano, Lecce dei Marsi and Ortucchio. The highest and increasing values of RP-RPr in time in Roio suggest a general reduction in genetic heterogeneity. This is possibly due to the celebration of marriages among families selected on the economic basis of pastoralism, as this town historically has had a leading tradition of sheep-farming. Villavallelonga, excluding isonymous marriages, shows an increase in repeated pair unions in time, thus revealing a substructure with marriages among preferred lineages. This is in line with previous results on consanguineous marriages which indicated the tendency of avoiding unions between close relatives in this small geographic isolate. This study demonstrates the influence of geographical (altitude) and social factors (pastoralism) on the marital structures of the investigated populations.
Collapse
Affiliation(s)
- M Capocasa
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, Piazzale Aldo Moro 5, 00185 Rome, Italy; Istituto Italiano di Antropologia, Piazzale Aldo Moro 5, 00185 Rome, Italy.
| | - L Taglioli
- Dipartimento di Biologia, Università di Pisa, Via Luca Ghini 13, 56126 Pisa, Italy
| | - P Anagnostou
- Dipartimento di Biologia Ambientale, Sapienza Università di Roma, Piazzale Aldo Moro 5, 00185 Rome, Italy; Istituto Italiano di Antropologia, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - G Paoli
- Dipartimento di Biologia, Università di Pisa, Via Luca Ghini 13, 56126 Pisa, Italy
| | - M E Danubio
- Dipartimento di Medicina clinica, sanità pubblica, scienze della vita e dell'ambiente, Università di L'Aquila, Piazzale Salvatore Tommasi 1, L'Aquila, Italy; Istituto Italiano di Antropologia, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
139
|
Aberer AJ, Stamatakis A. Rapid forward-in-time simulation at the chromosome and genome level. BMC Bioinformatics 2013; 14:216. [PMID: 23834340 PMCID: PMC3718712 DOI: 10.1186/1471-2105-14-216] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Accepted: 07/03/2013] [Indexed: 11/10/2022] Open
Abstract
Background In population genetics, simulation is a fundamental tool for analyzing how basic evolutionary forces such as natural selection, recombination, and mutation shape the genetic landscape of a population. Forward simulation represents the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. Results We introduce AnA-FiTS, a highly optimized forward simulation software, that is up to two orders of magnitude faster than current state-of-the-art software. In addition, we present a novel algorithm that further improves runtimes by up to an additional order of magnitude, for simulations where a fraction of the mutations is neutral (e.g., only 10% of mutations have an effect on fitness). Apart from simulated sequences, our tool also generates a graph structure that depicts the complete observable history of neutral mutations. Conclusions The substantial performance improvements allow for conducting forward simulations at the chromosome and genome level. The graph structure generated by our algorithm can give rise to novel approaches for visualizing and analyzing the output of forward simulations.
Collapse
Affiliation(s)
- Andre J Aberer
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, Heidelberg D-69118, Germany.
| | | |
Collapse
|
140
|
Chung RH, Shih CC. SeqSIMLA: a sequence and phenotype simulation tool for complex disease studies. BMC Bioinformatics 2013; 14:199. [PMID: 23782512 PMCID: PMC3693898 DOI: 10.1186/1471-2105-14-199] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Accepted: 06/14/2013] [Indexed: 11/22/2022] Open
Abstract
Background Association studies based on next-generation sequencing (NGS) technology have become popular, and statistical association tests for NGS data have been developed rapidly. A flexible tool for simulating sequence data in either unrelated case–control or family samples with different disease and quantitative trait models would be useful for evaluating the statistical power for planning a study design and for comparing power among statistical methods based on NGS data. Results We developed a simulation tool, SeqSIMLA, which can simulate sequence data with user-specified disease and quantitative trait models. We implemented two disease models, in which the user can flexibly specify the number of disease loci, effect sizes or population attributable risk, disease prevalence, and risk or protective loci. We also implemented a quantitative trait model, in which the user can specify the number of quantitative trait loci (QTL), proportions of variance explained by the QTL, and genetic models. We compiled recombination rates from the HapMap project so that genomic structures similar to the real data can be simulated. Conclusions SeqSIMLA can efficiently simulate sequence data with disease or quantitative trait models specified by the user. SeqSIMLA will be very useful for evaluating statistical properties for new study designs and new statistical methods using NGS. SeqSIMLA can be downloaded for free at http://seqsimla.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli, Taiwan.
| | | |
Collapse
|
141
|
Ruths T, Nakhleh L. Boosting forward-time population genetic simulators through genotype compression. BMC Bioinformatics 2013; 14:192. [PMID: 23763838 PMCID: PMC3700844 DOI: 10.1186/1471-2105-14-192] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 05/24/2013] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Forward-time population genetic simulations play a central role in deriving and testing evolutionary hypotheses. Such simulations may be data-intensive, depending on the settings to the various parameters controlling them. In particular, for certain settings, the data footprint may quickly exceed the memory of a single compute node. RESULTS We develop a novel and general method for addressing the memory issue inherent in forward-time simulations by compressing and decompressing, in real-time, active and ancestral genotypes, while carefully accounting for the time overhead. We propose a general graph data structure for compressing the genotype space explored during a simulation run, along with efficient algorithms for constructing and updating compressed genotypes which support both mutation and recombination. We tested the performance of our method in very large-scale simulations. Results show that our method not only scales well, but that it also overcomes memory issues that would cripple existing tools. CONCLUSIONS As evolutionary analyses are being increasingly performed on genomes, pathways, and networks, particularly in the era of systems biology, scaling population genetic simulators to handle large-scale simulations is crucial. We believe our method offers a significant step in that direction. Further, the techniques we provide are generic and can be integrated with existing population genetic simulators to boost their performance in terms of memory usage.
Collapse
Affiliation(s)
- Troy Ruths
- Department of Computer Science, Rice University, Houston, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, USA
| |
Collapse
|
142
|
Paz-Vinas I, Quéméré E, Chikhi L, Loot G, Blanchet S. The demographic history of populations experiencing asymmetric gene flow: combining simulated and empirical data. Mol Ecol 2013; 22:3279-91. [PMID: 23718226 DOI: 10.1111/mec.12321] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 03/05/2013] [Accepted: 03/11/2013] [Indexed: 11/27/2022]
Abstract
Population structure can significantly affect genetic-based demographic inferences, generating spurious bottleneck-like signals. Previous studies have typically assumed island or stepping-stone models, which are characterized by symmetric gene flow. However, many organisms are characterized by asymmetric gene flow. Here, we combined simulated and empirical data to test whether asymmetric gene flow affects the inference of past demographic changes. Through the analysis of simulated genetic data with three methods (i.e. bottleneck, M-ratio and msvar), we demonstrated that asymmetric gene flow biases past demographic changes. Most biases were towards spurious signals of expansion, albeit their strength depended on values of effective population size and migration rate. It is noteworthy that the spurious signals of demographic changes also depended on the statistical approach underlying each of the three methods. For one of the three methods, biases induced by asymmetric gene flow were confirmed in an empirical multispecific data set involving four freshwater fish species (Squalius cephalus, Leuciscus burdigalensis, Gobio gobio and Phoxinus phoxinus). However, for the two other methods, strong signals of bottlenecks were detected for all species and across two rivers. This suggests that, although potentially biased by asymmetric gene flow, some of these methods were able to bypass this bias when a bottleneck actually occurred. Our results show that population structure and dispersal patterns have to be considered for proper inference of demographic changes from genetic data.
Collapse
Affiliation(s)
- I Paz-Vinas
- Centre National de la Recherche Scientifique (CNRS), Station d'Écologie Expérimentale du CNRS à Moulis, USR 2936, Moulis, F-09200, France; Centre National de la Recherche Scientifique (CNRS), Université Paul Sabatier, École Nationale de Formation Agronomique (ENFA), UMR 5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, Toulouse cedex 4, F-31062, France; Université de Toulouse, UPS, UMR 5174 (EDB), 118 route de Narbonne, Toulouse cedex 4, F-31062, France
| | | | | | | | | |
Collapse
|
143
|
Abstract
SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene structure, and user-defined recombination maps.
Collapse
|
144
|
Sousa V, Hey J. Understanding the origin of species with genome-scale data: modelling gene flow. Nat Rev Genet 2013; 14:404-14. [PMID: 23657479 DOI: 10.1038/nrg3446] [Citation(s) in RCA: 181] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.
Collapse
Affiliation(s)
- Vitor Sousa
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | |
Collapse
|
145
|
Rebaudo F, Le Rouzic A, Dupas S, Silvain JF, Harry M, Dangles O. SimAdapt: an individual-based genetic model for simulating landscape management impacts on populations. Methods Ecol Evol 2013. [DOI: 10.1111/2041-210x.12041] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- François Rebaudo
- Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes; IRD-BEI-UR072; 91198 Gif-sur-Yvette Cedex France
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
| | - Arnaud Le Rouzic
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
| | - Stéphane Dupas
- Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes; IRD-BEI-UR072; 91198 Gif-sur-Yvette Cedex France
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
| | - Jean-François Silvain
- Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes; IRD-BEI-UR072; 91198 Gif-sur-Yvette Cedex France
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
| | - Myriam Harry
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
| | - Olivier Dangles
- Biodiversité et évolution des complexes plantes-insectes ravageurs-antagonistes; IRD-BEI-UR072; 91198 Gif-sur-Yvette Cedex France
- Laboratoire Évolution Génome et Spéciation; CNRS-LEGS-UPR9034; Université Paris-Sud 11 91198 Gif-sur-Yvette Cedex France
- Facultad de Ciencias Naturales y Biológicas; Pontificia Universidad Católica del Ecuador; Quito Ecuador
| |
Collapse
|
146
|
Peng B, Chen HS, Mechanic LE, Racine B, Clarke J, Clarke L, Gillanders E, Feuer EJ. Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators. Bioinformatics 2013; 29:1101-2. [PMID: 23435068 PMCID: PMC3624809 DOI: 10.1093/bioinformatics/btt094] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Summary: Many simulation methods and programs have been developed to simulate genetic data of the human genome. These data have been widely used, for example, to predict properties of populations retrospectively or prospectively according to mathematically intractable genetic models, and to assist the validation, statistical inference and power analysis of a variety of statistical models. However, owing to the differences in type of genetic data of interest, simulation methods, evolutionary features, input and output formats, terminologies and assumptions for different applications, choosing the right tool for a particular study can be a resource-intensive process that usually involves searching, downloading and testing many different simulation programs. Genetic Simulation Resources (GSR) is a website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features. Availability:http://popmodels.cancercontrol.cancer.gov/gsr. Contact:gsr@mail.nih.gov
Collapse
Affiliation(s)
- Bo Peng
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | | | | | | | | | | | | |
Collapse
|
147
|
Interplay between isolation by distance and genetic clusters in the red coral Corallium rubrum: insights from simulated and empirical data. CONSERV GENET 2013. [DOI: 10.1007/s10592-013-0464-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
148
|
Capocasa M, Battaggia C, Anagnostou P, Montinaro F, Boschi I, Ferri G, Alù M, Coia V, Crivellaro F, Bisol GD. Detecting genetic isolation in human populations: a study of European language minorities. PLoS One 2013; 8:e56371. [PMID: 23418562 PMCID: PMC3572090 DOI: 10.1371/journal.pone.0056371] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 01/08/2013] [Indexed: 12/01/2022] Open
Abstract
The identification of isolation signatures is fundamental to better understand the genetic structure of human populations and to test the relations between cultural factors and genetic variation. However, with current approaches, it is not possible to distinguish between the consequences of long-term isolation and the effects of reduced sample size, selection and differential gene flow. To overcome these limitations, we have integrated the analysis of classical genetic diversity measures with a Bayesian method to estimate gene flow and have carried out simulations based on the coalescent. Combining these approaches, we first tested whether the relatively short history of cultural and geographical isolation of four “linguistic islands” of the Eastern Alps (Lessinia, Sauris, Sappada and Timau) had left detectable signatures in their genetic structure. We then compared our findings to previous studies of European population isolates. Finally, we explored the importance of demographic and cultural factors in shaping genetic diversity among the groups under study. A combination of small initial effective size and continued genetic isolation from surrounding populations seems to provide a coherent explanation for the diversity observed among Sauris, Sappada and Timau, which was found to be substantially greater than in other groups of European isolated populations. Simulations of micro-evolutionary scenarios indicate that ethnicity might have been important in increasing genetic diversity among these culturally related and spatially close populations.
Collapse
Affiliation(s)
- Marco Capocasa
- Dipartimento Biologia e Biotecnologie “Charles Darwin”, Università La Sapienza, Rome, Italy
- Istituto Italiano di Antropologia, Rome, Italy
| | - Cinzia Battaggia
- Dipartimento di Biologia Ambientale, Università “La Sapienza”, Rome, Italy
| | - Paolo Anagnostou
- Dipartimento di Biologia Ambientale, Università “La Sapienza”, Rome, Italy
- Istituto Italiano di Antropologia, Rome, Italy
| | - Francesco Montinaro
- Facolta di Medicina, Istituto di Medicina Legale, Università Cattolica, Rome, Italy
| | - Ilaria Boschi
- Facolta di Medicina, Istituto di Medicina Legale, Università Cattolica, Rome, Italy
| | - Gianmarco Ferri
- Dipartimento ad Attività Integrata di Laboratori, Anatomia Patologica, Medicina Legale, Struttura Complessa di Medicina Legale, Università di Modena e Reggio Emilia, Modena, Italy
| | - Milena Alù
- Dipartimento ad Attività Integrata di Laboratori, Anatomia Patologica, Medicina Legale, Struttura Complessa di Medicina Legale, Università di Modena e Reggio Emilia, Modena, Italy
| | - Valentina Coia
- Dipartimento di Filosofia, Storia e Beni culturali, Universita degli Studi di Trento, Trento, Italy
| | - Federica Crivellaro
- Division of Biological Anthropology, Leverhulme Centre for Human Evolutionary Studies, Cambridge, United Kingdom
| | - Giovanni Destro Bisol
- Dipartimento di Biologia Ambientale, Università “La Sapienza”, Rome, Italy
- Dipartimento Biologia e Biotecnologie “Charles Darwin”, Università La Sapienza, Rome, Italy
- * E-mail:
| |
Collapse
|
149
|
McGaughran A, Morgan K, Sommer RJ. Unraveling the evolutionary history of the nematode Pristionchus pacificus: from lineage diversification to island colonization. Ecol Evol 2013; 3:667-75. [PMID: 23532968 PMCID: PMC3605854 DOI: 10.1002/ece3.495] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 01/04/2013] [Accepted: 01/08/2013] [Indexed: 11/14/2022] Open
Abstract
The hermaphroditic nematode Pristionchus pacificus is a model organism with a range of fully developed genetic tools. The species is globally widespread and highly diverse genetically, consisting of four major independent lineages (lineages A, B, C, and D). Despite its young age (∼2.1 Ma), volcanic La Réunion Island harbors all four lineages. Ecological and population genetic research studies suggest that this diversity is due to repeated independent island colonizations by P. pacificus. Here, we use model-based statistical methods to rigorously test hypotheses regarding the evolutionary history of P. pacificus. First, we employ divergence analyses to date diversification events among the four “world” lineages. Next, we examine demographic properties of a subset of four populations (“a”, “b”, “c”, and “d”), present on La Réunion Island. Finally, we use the results of the divergence and demographic analyses to inform a modeling-based approximate Bayesian computation (ABC) approach, where we test hypotheses about the order and timing of establishment of the Réunion populations. Our dating estimates place the recent common ancestor of P. pacificus lineages at nearly 500,000 generations past. Our demographic analysis supports recent (<150,000 generations) spatial expansion for the island populations, and our ABC approach supports c>a>b>d as the most likely colonization order of the island populations. Collectively, our study comprehensively improves previous inferences about the evolutionary history of P. pacificus.
Collapse
Affiliation(s)
- Angela McGaughran
- Department for Evolutionary Biology, Max Planck Institute for Developmental Biology Tübingen, Germany
| | | | | |
Collapse
|
150
|
Arenas M. Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 2013; 4:9. [PMID: 23378848 PMCID: PMC3561691 DOI: 10.3389/fgene.2013.00009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 01/17/2013] [Indexed: 11/13/2022] Open
Abstract
Computer simulations are useful in evolutionary biology for hypothesis testing, to verify analytical methods, to analyze interactions among evolutionary processes, and to estimate evolutionary parameters. In particular, the simulation of DNA sequences with recombination may help in understanding the role of recombination in diverse evolutionary questions, such as the genome structure. Consequently, plenty of computer simulators have been developed to simulate DNA sequence data with recombination. However, the choice of an appropriate tool, among all currently available simulators, is critical if recombination simulations are to be biologically meaningful. This review provides a practical survival guide to commonly used computer programs and methodologies for the simulation of coding and non-coding DNA sequences with recombination. It may help in the correct design of computer simulation experiments of recombination. In addition, the study includes a review of simulation studies investigating the impact of ignoring recombination when performing various evolutionary analyses, such as phylogenetic tree and ancestral sequence reconstructions. Alternative analytical methodologies accounting for recombination are also reviewed.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas Madrid, Spain
| |
Collapse
|