1
|
Lauterbur ME, Cavassim MIA, Gladstein AL, Gower G, Pope NS, Tsambos G, Adrion J, Belsare S, Biddanda A, Caudill V, Cury J, Echevarria I, Haller BC, Hasan AR, Huang X, Iasi LNM, Noskova E, Obsteter J, Pavinato VAC, Pearson A, Peede D, Perez MF, Rodrigues MF, Smith CCR, Spence JP, Teterina A, Tittes S, Unneberg P, Vazquez JM, Waples RK, Wohns AW, Wong Y, Baumdicker F, Cartwright RA, Gorjanc G, Gutenkunst RN, Kelleher J, Kern AD, Ragsdale AP, Ralph PL, Schrider DR, Gronau I. Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife 2023; 12:RP84874. [PMID: 37342968 DOI: 10.7554/elife.84874] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2023] Open
Abstract
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
Collapse
Affiliation(s)
- M Elise Lauterbur
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, United States
| | - Maria Izabel A Cavassim
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, United States
| | | | - Graham Gower
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Nathaniel S Pope
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Georgia Tsambos
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Jeffrey Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Ancestry DNA, San Francisco, United States
| | - Saurabh Belsare
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | | | - Victoria Caudill
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jean Cury
- Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique, Orsay, France
| | | | - Benjamin C Haller
- Department of Computational Biology, Cornell University, Ithaca, United States
| | - Ahmed R Hasan
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Biology, University of Toronto Mississauga, Mississauga, Canada
| | - Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | | | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, St Petersburg, Russian Federation
| | - Jana Obsteter
- Agricultural Institute of Slovenia, Department of Animal Science, Ljubljana, Slovenia
| | | | - Alice Pearson
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - David Peede
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, United States
- Center for Computational Molecular Biology, Brown University, Providence, United States
| | - Manolo F Perez
- Department of Genetics and Evolution, Federal University of Sao Carlos, Sao Carlos, Brazil
| | - Murillo F Rodrigues
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Chris C R Smith
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Jeffrey P Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Anastasia Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Silas Tittes
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Juan Manuel Vazquez
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, United States
| | | | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Franz Baumdicker
- Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen, Tubingen, Germany
| | - Reed A Cartwright
- School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, United States
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, United States
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Andrew D Kern
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, United States
| | - Peter L Ralph
- Institute of Ecology and Evolution, University of Oregon, Eugene, United States
- Department of Mathematics, University of Oregon, Eugene, United States
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
| |
Collapse
|
2
|
Gower G, Ragsdale AP, Bisschop G, Gutenkunst RN, Hartfield M, Noskova E, Schiffels S, Struck TJ, Kelleher J, Thornton KR. Demes: a standard format for demographic models. Genetics 2022; 222:iyac131. [PMID: 36173327 PMCID: PMC9630982 DOI: 10.1093/genetics/iyac131] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/23/2022] [Indexed: 11/12/2022] Open
Abstract
Understanding the demographic history of populations is a key goal in population genetics, and with improving methods and data, ever more complex models are being proposed and tested. Demographic models of current interest typically consist of a set of discrete populations, their sizes and growth rates, and continuous and pulse migrations between those populations over a number of epochs, which can require dozens of parameters to fully describe. There is currently no standard format to define such models, significantly hampering progress in the field. In particular, the important task of translating the model descriptions in published work into input suitable for population genetic simulators is labor intensive and error prone. We propose the Demes data model and file format, built on widely used technologies, to alleviate these issues. Demes provide a well-defined and unambiguous model of populations and their properties that is straightforward to implement in software, and a text file format that is designed for simplicity and clarity. We provide thoroughly tested implementations of Demes parsers in multiple languages including Python and C, and showcase initial support in several simulators and inference methods. An introduction to the file format and a detailed specification are available at https://popsim-consortium.github.io/demes-spec-docs/.
Collapse
Affiliation(s)
- Graham Gower
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Gertjan Bisschop
- Institute of Ecology and Evolution, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Matthew Hartfield
- Institute of Ecology and Evolution, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Ekaterina Noskova
- Computer Technologies Laboratory, ITMO University, 197101 Saint-Petersburg, Russia
| | - Stephan Schiffels
- Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany
| | - Travis J Struck
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| |
Collapse
|
3
|
Stahlke A, Bell D, Dhendup T, Kern B, Pannoni S, Robinson Z, Strait J, Smith S, Hand BK, Hohenlohe PA, Luikart G. Population Genomics Training for the Next Generation of Conservation Geneticists: ConGen 2018 Workshop. J Hered 2021; 111:227-236. [PMID: 32037446 PMCID: PMC7117792 DOI: 10.1093/jhered/esaa001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 01/06/2020] [Indexed: 12/30/2022] Open
Abstract
The increasing availability and complexity of next-generation sequencing (NGS) data sets make ongoing training an essential component of conservation and population genetics research. A workshop entitled “ConGen 2018” was recently held to train researchers in conceptual and practical aspects of NGS data production and analysis for conservation and ecological applications. Sixteen instructors provided helpful lectures, discussions, and hands-on exercises regarding how to plan, produce, and analyze data for many important research questions. Lecture topics ranged from understanding probabilistic (e.g., Bayesian) genotype calling to the detection of local adaptation signatures from genomic, transcriptomic, and epigenomic data. We report on progress in addressing central questions of conservation genomics, advances in NGS data analysis, the potential for genomic tools to assess adaptive capacity, and strategies for training the next generation of conservation genomicists.
Collapse
Affiliation(s)
- Amanda Stahlke
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Donavan Bell
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT
| | - Tashi Dhendup
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT.,Department of Forest and Park Services, Ugyen Wangchuck Institute for Conservation and Environmental Research, Bumthang, Bhutan
| | - Brooke Kern
- Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT.,Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN
| | - Samuel Pannoni
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT.,Flathead Lake Biological Station, Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT
| | - Zachary Robinson
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT
| | - Jeffrey Strait
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT
| | - Seth Smith
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT.,Flathead Lake Biological Station, Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI
| | - Brian K Hand
- Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT.,Flathead Lake Biological Station, Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT
| | - Paul A Hohenlohe
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID
| | - Gordon Luikart
- Wildlife Biology Program, College of Forestry and Conservation, University of Montana, Missoula, MT.,Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT.,Flathead Lake Biological Station, Division of Biological Sciences, College of Humanities and Sciences, University of Montana, Missoula, MT
| |
Collapse
|