1
|
Fonseca EM, Carstens BC. Artificial intelligence enables unified analysis of historical and landscape influences on genetic diversity. Mol Phylogenet Evol 2024; 198:108116. [PMID: 38871263 DOI: 10.1016/j.ympev.2024.108116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 04/04/2024] [Accepted: 06/04/2024] [Indexed: 06/15/2024]
Abstract
While genetic variation in any species is potentially shaped by a range of processes, phylogeography and landscape genetics are largely concerned with inferring how environmental conditions and landscape features impact neutral intraspecific diversity. However, even as both disciplines have come to utilize SNP data over the last decades, analytical approaches have remained for the most part focused on either broad-scale inferences of historical processes (phylogeography) or on more localized inferences about environmental and/or landscape features (landscape genetics). Here we demonstrate that an artificial intelligence model-based analytical framework can consider both deeper historical factors and landscape-level processes in an integrated analysis. We implement this framework using data collected from two Brazilian anurans, the Brazilian sibilator frog (Leptodactylus troglodytes) and granular toad (Rhinella granulosa). Our results indicate that historical demographic processes shape most the genetic variation in the sibulator frog, while landscape processes primarily influence variation in the granular toad. The machine learning framework used here allows both historical and landscape processes to be considered equally, rather than requiring researchers to make an a priori decision about which factors are important.
Collapse
Affiliation(s)
- Emanuel M Fonseca
- Museum of Biological Diversity & Department of Evolution, Ecology and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus OH 43212, USA
| | - Bryan C Carstens
- Museum of Biological Diversity & Department of Evolution, Ecology and Organismal Biology, The Ohio State University, 1315 Kinnear Rd., Columbus OH 43212, USA.
| |
Collapse
|
2
|
Gómez-Palacio A, Morinaga G, Turner PE, Micieli MV, Elnour MAB, Salim B, Surendran SN, Ramasamy R, Powell JR, Soghigian J, Gloria-Soria A. Robustness in population-structure and demographic-inference results derived from the Aedes aegypti genotyping chip and whole-genome sequencing data. G3 (BETHESDA, MD.) 2024; 14:jkae082. [PMID: 38626295 PMCID: PMC11152066 DOI: 10.1093/g3journal/jkae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 03/04/2024] [Accepted: 04/04/2024] [Indexed: 04/18/2024]
Abstract
The mosquito Aedes aegypti is the primary vector of many human arboviruses such as dengue, yellow fever, chikungunya, and Zika, which affect millions of people worldwide. Population genetic studies on this mosquito have been important in understanding its invasion pathways and success as a vector of human disease. The Axiom aegypti1 SNP chip was developed from a sample of geographically diverse A. aegypti populations to facilitate genomic studies on this species. We evaluate the utility of the Axiom aegypti1 SNP chip for population genetics and compare it with a low-depth shotgun sequencing approach using mosquitoes from the native (Africa) and invasive ranges (outside Africa). These analyses indicate that results from the SNP chip are highly reproducible and have a higher sensitivity to capture alternative alleles than a low-coverage whole-genome sequencing approach. Although the SNP chip suffers from ascertainment bias, results from population structure, ancestry, demographic, and phylogenetic analyses using the SNP chip were congruent with those derived from low-coverage whole-genome sequencing, and consistent with previous reports on Africa and outside Africa populations using microsatellites. More importantly, we identified a subset of SNPs that can be reliably used to generate merged databases, opening the door to combined analyses. We conclude that the Axiom aegypti1 SNP chip is a convenient, more accurate, low-cost alternative to low-depth whole-genome sequencing for population genetic studies of A. aegypti that do not rely on full allelic frequency spectra. Whole-genome sequencing and SNP chip data can be easily merged, extending the usefulness of both approaches.
Collapse
Affiliation(s)
- Andrés Gómez-Palacio
- Department of Entomology, Center for Vector Biology & Zoonotic Diseases, The Connecticut Agricultural Experiment Station, 123 Huntington St., New Haven, CT 06511, USA
- Laboratorio de Investigación en Genética Evolutiva, Universidad Pedagógica y Tecnológica de Colombia, Avenida Central del Norte 39-115, Boyacá 150003, Colombia
| | - Gen Morinaga
- Faculty of Veterinary Medicine, University of Calgary, 2500 University Drive NW., Calgary, AB 2TN 1N4, Canada
| | - Paul E Turner
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect St., New Haven, CT 06511, USA
- Quantitative Biology Institute, Yale University, 260 Whitney Ave., New Haven, CT 06511, USA
| | - Maria Victoria Micieli
- Centro de Estudios Parasitológicos y de Vectores (CEPAVE), CONICET, Universidad Nacional de la Plata, Boulevard 120 s/n between Av. 60 and Calle 64, La Plata 1900, Argentina
| | - Mohammed-Ahmed B Elnour
- Department of Parasitology and Medical Entomology, Tropical Medicine Research Institute, National Center for Research, Khartoum 11111, Sudan
| | - Bashir Salim
- Faculty of Veterinary Medicine, Department of Parasitology, University of Khartoum, Khartoum North 11111, Sudan
- Camel Research Center, King Faisal University, P.O. Box. 400, Al-Ahsa 31982, Saudi Arabia
| | | | - Ranjan Ramasamy
- Department of Zoology, University of Jaffna, Jaffna 40000, Sri Lanka
| | - Jeffrey R Powell
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect St., New Haven, CT 06511, USA
| | - John Soghigian
- Faculty of Veterinary Medicine, University of Calgary, 2500 University Drive NW., Calgary, AB 2TN 1N4, Canada
| | - Andrea Gloria-Soria
- Department of Entomology, Center for Vector Biology & Zoonotic Diseases, The Connecticut Agricultural Experiment Station, 123 Huntington St., New Haven, CT 06511, USA
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect St., New Haven, CT 06511, USA
| |
Collapse
|
3
|
Lee S, Choi T, Son D. Multiple introductions of divergent lineages and admixture conferred the high invasiveness in a widespread weed ( Hypochaeris radicata). Evol Appl 2024; 17:e13740. [PMID: 38911265 PMCID: PMC11192970 DOI: 10.1111/eva.13740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 05/21/2024] [Accepted: 05/27/2024] [Indexed: 06/25/2024] Open
Abstract
Biological invasion consists of spatially and temporally varying stages, accompanied by ecological and evolutionary changes. Understanding the genomics underlying invasion dynamics provides critical insights into the geographic sources and genetic diversity, contributing to successful invasions across space and time. Here, we used genomic data and model-based approaches to characterize the invasion dynamics of Hypochaeris radicata L., a noxious weed in Korea. Genetic diversity and assignment patterns were investigated using 3563 SNPs of 283 individuals sampled from 22 populations. We employed a coalescent-based simulation method to estimate demographic changes for each population and inferred colonization history using both phylogenetic and population genetic model-based approaches. Our data suggest that H. radicata has been repeatedly been introduced to Korea from multiple genetic sources within the last 50 years, experiencing weak population bottlenecks followed by subsequent population expansions. These findings highlight the potential for further range expansion, particularly in the presence of human-mediated dispersal. Our study represents the first population-level genomic research documenting the invasion dynamics of the successful worldwide invader, H. radicata, outside of Europe.
Collapse
Affiliation(s)
- Soo‐Rang Lee
- Department of Biology Education, College of EducationChosun UniversityGwangjuSouth Korea
| | - Tae‐Young Choi
- Department of Biology Education, College of EducationChosun UniversityGwangjuSouth Korea
| | - Dong‐Chan Son
- Division of Forest Biodiversity and HerbariumKorea National ArboretumPocheonKorea
| |
Collapse
|
4
|
Wei Z, Alam S, Verma M, Hilderbran M, Wu Y, Anderson B, Ho DE, Suckale J. Integrating water quality data with a Bayesian network model to improve spatial and temporal phosphorus attribution: Application to the Maumee River Basin. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 360:121120. [PMID: 38759558 DOI: 10.1016/j.jenvman.2024.121120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 04/22/2024] [Accepted: 05/07/2024] [Indexed: 05/19/2024]
Abstract
Surface water nutrient pollution, the primary cause of eutrophication, remains a major environmental concern in Western Lake Erie despite intergovernmental efforts to regulate nutrient sources. The Maumee River Basin has been the largest nutrient contributor. The two primary nutrient sources are inorganic fertilizer and livestock manure applied to croplands, which are later carried to the streams via runoff and soil erosion. Prior studies of nutrient source attribution have focused on large watersheds or counties at annual time scales. Source attribution at finer spatiotemporal scales, which enables more effective nutrient management, remains a substantial challenge. This study aims to address this challenge by developing a generalizable Bayesian network model for phosphorus source attribution at the subwatershed scale (12-digit Hydrologic Unit Code). Since phosphorus release is uncertain, we combine excess phosphorus derived from manure and fertilizer application and crop uptake data, flow information simulated by the SWAT model, and in-stream water quality measurements using Approximate Bayesian Computation to derive a posterior that attributes phosphorus contributions to subwatersheds. Our results show significant variability in subwatershed-scale phosphorus release that is lost in coarse-scale attribution. Phosphorus contributions attributed to the subwatersheds are on average lower than the excess phosphorus estimated by the nutrient balance approach currently adopted by environmental agencies. Fertilizer contributes more soluble reactive phosphorus than manure, while manure contributes most of the unreactive phosphorus. While developed for the specific context of Maumee River Basin, our lightweight and generalizable model framework could be adapted to other regions and pollutants and could help inform targeted environmental regulation and enforcement.
Collapse
Affiliation(s)
- Zihan Wei
- Department of Geophysics, Stanford University, Stanford, 94305, CA, USA.
| | - Sarfaraz Alam
- Department of Geophysics, Stanford University, Stanford, 94305, CA, USA; Regulation, Evaluation, and Governance Lab, Stanford University, Stanford, 94305, CA, USA.
| | - Miki Verma
- Symbolic Systems Program, Stanford University, Stanford, 94305, CA, USA.
| | - Margaret Hilderbran
- Regulation, Evaluation, and Governance Lab, Stanford University, Stanford, 94305, CA, USA.
| | - Yuchen Wu
- Department of Statistics, Stanford University, Stanford, 94305, CA, USA.
| | - Brandon Anderson
- Regulation, Evaluation, and Governance Lab, Stanford University, Stanford, 94305, CA, USA.
| | - Daniel E Ho
- Regulation, Evaluation, and Governance Lab, Stanford University, Stanford, 94305, CA, USA.
| | - Jenny Suckale
- Department of Geophysics, Stanford University, Stanford, 94305, CA, USA.
| |
Collapse
|
5
|
Chinazzi M, Davis JT, Y Piontti AP, Mu K, Gozzi N, Ajelli M, Perra N, Vespignani A. A multiscale modeling framework for Scenario Modeling: Characterizing the heterogeneity of the COVID-19 epidemic in the US. Epidemics 2024; 47:100757. [PMID: 38493708 DOI: 10.1016/j.epidem.2024.100757] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/22/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
The Scenario Modeling Hub (SMH) initiative provides projections of potential epidemic scenarios in the United States (US) by using a multi-model approach. Our contribution to the SMH is generated by a multiscale model that combines the global epidemic metapopulation modeling approach (GLEAM) with a local epidemic and mobility model of the US (LEAM-US), first introduced here. The LEAM-US model consists of 3142 subpopulations each representing a single county across the 50 US states and the District of Columbia, enabling us to project state and national trajectories of COVID-19 cases, hospitalizations, and deaths under different epidemic scenarios. The model is age-structured, and multi-strain. It integrates data on vaccine administration, human mobility, and non-pharmaceutical interventions. The model contributed to all 17 rounds of the SMH, and allows for the mechanistic characterization of the spatio-temporal heterogeneities observed during the COVID-19 pandemic. Here we describe the mathematical and computational structure of our model, and present the results concerning the emergence of the SARS-CoV-2 Alpha variant (lineage designation B.1.1.7) as a case study. Our findings show considerable spatial and temporal heterogeneity in the introduction and diffusion of the Alpha variant, both at the level of individual states and combined statistical areas, as it competes against the ancestral lineage. We discuss the key factors driving the time required for the Alpha variant to rise to dominance within a population, and quantify the impact that the emergence of the Alpha variant had on the effective reproduction number at the state level. Overall, we show that our multiscale modeling approach is able to capture the complexity and heterogeneity of the COVID-19 pandemic response in the US.
Collapse
Affiliation(s)
- Matteo Chinazzi
- The Roux Institute, Northeastern University, Portland, ME, USA; Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Jessica T Davis
- Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Ana Pastore Y Piontti
- Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Kunpeng Mu
- Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA
| | - Nicolò Gozzi
- Institute for Scientific Interchange Foundation, Turin, Italy
| | - Marco Ajelli
- Laboratory for Computational Epidemiology and Public Health, Department of Epidemiology and Biostatistics, Indiana University School of Public Health, Bloomington, IN, USA
| | - Nicola Perra
- Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA; School of Mathematical Sciences, Queen Mary University, London, UK
| | - Alessandro Vespignani
- Laboratory for the Modeling of Biological and Socio-technical Systems, Network Science Institute, Northeastern University, Boston, MA, USA; Institute for Scientific Interchange Foundation, Turin, Italy.
| |
Collapse
|
6
|
Daigle A, Johri P. Hill-Robertson interference may bias the inference of fitness effects of new mutations in highly selfing species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579142. [PMID: 38370745 PMCID: PMC10871249 DOI: 10.1101/2024.02.06.579142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The accurate estimation of the distribution of fitness effects (DFE) of new mutations is critical for population genetic inference but remains a challenging task. While various methods have been developed for DFE inference using the site frequency spectrum of putatively neutral and selected sites, their applicability in species with diverse life history traits and complex demographic scenarios is not well understood. Selfing is common among eukaryotic species and can lead to decreased effective recombination rates, increasing the effects of selection at linked sites, including interference between selected alleles. We employ forward simulations to investigate the limitations of current DFE estimation approaches in the presence of selfing and other model violations, such as linkage, departures from semidominance, population structure, and uneven sampling. We find that distortions of the site frequency spectrum due to Hill-Robertson interference in highly selfing populations lead to mis-inference of the deleterious DFE of new mutations. Specifically, while accounting for the decrease in the effective population size due to linked effects of selection allows an accurate estimation of selection coefficients in moderately selfing populations, this correction is unable to accurately estimate selection coefficients in highly selfing populations when interference between selected alleles is pervasive. In addition, the presence of cryptic population structure with low rates of migration and uneven sampling across subpopulations leads to the false inference of a deleterious DFE skewed towards effectively neutral/mildly deleterious mutations. Finally, the proportion of adaptive substitutions estimated at high rates of selfing is substantially overestimated. Our observations apply broadly to species and genomic regions with little/no recombination and where interference might be pervasive.
Collapse
|
7
|
Barton DL, Chang YR, Ducker W, Dobnikar J. Data-driven modelling makes quantitative predictions regarding bacteria surface motility. PLoS Comput Biol 2024; 20:e1012063. [PMID: 38743804 PMCID: PMC11125545 DOI: 10.1371/journal.pcbi.1012063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/24/2024] [Accepted: 04/09/2024] [Indexed: 05/16/2024] Open
Abstract
In this work, we quantitatively compare computer simulations and existing cell tracking data of P. aeruginosa surface motility in order to analyse the underlying motility mechanism. We present a three dimensional twitching motility model, that simulates the extension, retraction and surface association of individual Type IV Pili (TFP), and is informed by recent experimental observations of TFP. Sensitivity analysis is implemented to minimise the number of model parameters, and quantitative estimates for the remaining parameters are inferred from tracking data by approximate Bayesian computation. We argue that the motility mechanism is highly sensitive to experimental conditions. We predict a TFP retraction speed for the tracking data we study that is in a good agreement with experimental results obtained under very similar conditions. Furthermore, we examine whether estimates for biologically important parameters, whose direct experimental determination is challenging, can be inferred directly from tracking data. One example is the width of the distribution of TFP on the bacteria body. We predict that the TFP are broadly distributed over the bacteria pole in both walking and crawling motility types. Moreover, we identified specific configurations of TFP that lead to transitions between walking and crawling states.
Collapse
Affiliation(s)
- Daniel L. Barton
- CAS Key Laboratory of Soft Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, China
| | - Yow-Ren Chang
- National Institute of Standards and Technology (NIST), 100 Bureau Dr, Gaithersburg, Maryland, United States of America
| | - William Ducker
- Department of Chemical Engineering and Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, Virgina, United States of America
| | - Jure Dobnikar
- CAS Key Laboratory of Soft Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, China
- Wenzhou Institute of the University of Chinese Academy of Sciences, Wenzhou, China
- School of Physical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
8
|
Mwima R, Hui TYJ, Kayondo JK, Burt A. The population genetics of partial diapause, with applications to the aestivating malaria mosquito Anopheles coluzzii. Mol Ecol Resour 2024; 24:e13949. [PMID: 38511493 DOI: 10.1111/1755-0998.13949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 02/27/2024] [Accepted: 03/08/2024] [Indexed: 03/22/2024]
Abstract
Diapause, a form of dormancy to delay or halt the reproductive development during unfavourable seasons, has evolved in many insect species. One example is aestivation, an adult-stage diapause enhancing malaria vectors' survival during the dry season (DS) and their re-establishment in the next rainy season (RS). This work develops a novel genetic approach to estimate the number or proportion of individuals undergoing diapause, as well as the breeding sizes of the two seasons, using signals from temporal allele frequency dynamics. Our modelling shows the magnitude of drift is dampened at early RS when previously aestivating individuals reappear. Aestivation severely biases the temporal effective population size (N e $$ {N}_e $$ ), leading to overestimation of the DS breeding size by1 / 1 - α 2 $$ 1/{\left(1-\alpha \right)}^2 $$ across 1 year, whereα $$ \alpha $$ is the aestivating proportion. We find sampling breeding individuals in three consecutive seasons starting from an RS is sufficient for parameter estimation, and perform extensive simulations to verify our derivations. This method does not require sampling individuals in the dormant state, the biggest challenge in most studies. We illustrate the method by applying it to a published data set for Anopheles coluzzii mosquitoes from Thierola, Mali. Our method and the expected evolutionary implications are applicable to any species in which a fraction of the population diapauses for more than one generation, and are difficult or impossible to sample during that stage.
Collapse
Affiliation(s)
- Rita Mwima
- Department of Entomology, Uganda Virus Research Institute (UVRI), Entebbe, Uganda
- Department of Biotechnical and Diagnostic Sciences, College of Veterinary Medicine, Animal Resources and Biosecurity (COVAB), Makerere University, Kampala, Uganda
| | - Tin-Yu J Hui
- Department of Life Sciences, Imperial College London, Ascot, UK
| | - Jonathan K Kayondo
- Department of Entomology, Uganda Virus Research Institute (UVRI), Entebbe, Uganda
| | - Austin Burt
- Department of Life Sciences, Imperial College London, Ascot, UK
| |
Collapse
|
9
|
Saubin M, Tellier A, Stoeckel S, Andrieux A, Halkett F. Approximate Bayesian Computation applied to time series of population genetic data disentangles rapid genetic changes and demographic variations in a pathogen population. Mol Ecol 2024; 33:e16965. [PMID: 37150947 DOI: 10.1111/mec.16965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 05/09/2023]
Abstract
Adaptation can occur at remarkably short timescales in natural populations, leading to drastic changes in phenotypes and genotype frequencies over a few generations only. The inference of demographic parameters can allow understanding how evolutionary forces interact and shape the genetic trajectories of populations during rapid adaptation. Here we propose a new Approximate Bayesian Computation (ABC) framework that couples a forward and individual-based model with temporal genetic data to disentangle genetic changes and demographic variations in a case of rapid adaptation. We test the accuracy of our inferential framework and evaluate the benefit of considering a dense versus sparse sampling. Theoretical investigations demonstrate high accuracy in both model and parameter estimations, even if a strong thinning is applied to time series data. Then, we apply our ABC inferential framework to empirical data describing the population genetic changes of the poplar rust pathogen following a major event of resistance overcoming. We successfully estimate key demographic and genetic parameters, including the proportion of resistant hosts deployed in the landscape and the level of standing genetic variation from which selection occurred. Inferred values are in accordance with our empirical knowledge of this biological system. This new inferential framework, which contrasts with coalescent-based ABC analyses, is promising for a better understanding of evolutionary trajectories of populations subjected to rapid adaptation.
Collapse
Affiliation(s)
- Méline Saubin
- Université de Lorraine, INRAE, IAM, Nancy, France
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Solenn Stoeckel
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | | | | |
Collapse
|
10
|
Smiley O, Hoffmann T, Onnela JP. Approximate inference for longitudinal mechanistic HIV contact network. APPLIED NETWORK SCIENCE 2024; 9:12. [PMID: 38699247 PMCID: PMC11060975 DOI: 10.1007/s41109-024-00616-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 04/06/2024] [Indexed: 05/05/2024]
Abstract
Network models are increasingly used to study infectious disease spread. Exponential Random Graph models have a history in this area, with scalable inference methods now available. An alternative approach uses mechanistic network models. Mechanistic network models directly capture individual behaviors, making them suitable for studying sexually transmitted diseases. Combining mechanistic models with Approximate Bayesian Computation allows flexible modeling using domain-specific interaction rules among agents, avoiding network model oversimplifications. These models are ideal for longitudinal settings as they explicitly incorporate network evolution over time. We implemented a discrete-time version of a previously published continuous-time model of evolving contact networks for men who have sex with men and proposed an ABC-based approximate inference scheme for it. As expected, we found that a two-wave longitudinal study design improves the accuracy of inference compared to a cross-sectional design. However, the gains in precision in collecting data twice, up to 18%, depend on the spacing of the two waves and are sensitive to the choice of summary statistics. In addition to methodological developments, our results inform the design of future longitudinal network studies in sexually transmitted diseases, specifically in terms of what data to collect from participants and when to do so.
Collapse
Affiliation(s)
- Octavious Smiley
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| | - Till Hoffmann
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| | - Jukka-Pekka Onnela
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| |
Collapse
|
11
|
Zuckerman DM, George A. Bayesian Mechanistic Inference, Statistical Mechanics, and a New Era for Monte Carlo. J Chem Theory Comput 2024; 20:2971-2984. [PMID: 38603773 PMCID: PMC11089648 DOI: 10.1021/acs.jctc.4c00014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
On the one hand, much of computational chemistry is concerned with "bottom-up" calculations which elucidate observable behavior starting from exact or approximated physical laws, a paradigm exemplified by typical quantum mechanical calculations and molecular dynamics simulations. On the other hand, "top down" computations aiming to formulate mathematical models consistent with observed data, e.g., parametrizing force fields, binding or kinetic models, have been of interest for decades but recently have grown in sophistication with the use of Bayesian inference (BI). Standard BI provides an estimation of parameter values, uncertainties, and correlations among parameters. Used for "model selection," BI can also distinguish between model structures such as the presence or absence of individual states and transitions. Fortunately for physical scientists, BI can be formulated within a statistical mechanics framework, and indeed, BI has led to a resurgence of interest in Monte Carlo (MC) algorithms, many of which have been directly adapted from or inspired by physical strategies. Certain MC algorithms─notably procedures using an "infinite temperature" reference state─can be successful in a 5-20 parameter BI context which would be unworkable in molecular spaces of 103 coordinates and more. This Review provides a pedagogical introduction to BI and reviews key aspects of BI through a physical lens, setting the computations in terms of energy landscapes and free energy calculations and describing promising sampling algorithms. Statistical mechanics and basic probability theory also provide a reference for understanding intrinsic limitations of Bayesian inference with regard to model selection and the choice of priors.
Collapse
Affiliation(s)
- Daniel M Zuckerman
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon 97239, United States
| | - August George
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon 97239, United States
| |
Collapse
|
12
|
Dabi A, Schrider DR. Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588318. [PMID: 38645049 PMCID: PMC11030438 DOI: 10.1101/2024.04.07.588318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Q , and compared the deviation of key outcomes (fixation times, fixation probabilities, allele frequencies, and linkage disequilibrium) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Q . Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward, thus it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Q . In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling effect's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Q .
Collapse
Affiliation(s)
- Amjad Dabi
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
13
|
Twumasi C, Cable J, Pepelyshev A. Mathematical Modelling of Parasite Dynamics: A Stochastic Simulation-Based Approach and Parameter Estimation via Modified Sequential-Type Approximate Bayesian Computation. Bull Math Biol 2024; 86:54. [PMID: 38598133 PMCID: PMC11006762 DOI: 10.1007/s11538-024-01281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/12/2024] [Indexed: 04/11/2024]
Abstract
The development of mathematical models for studying newly emerging and re-emerging infectious diseases has gained momentum due to global events. The gyrodactylid-fish system, like many host-parasite systems, serves as a valuable resource for ecological, evolutionary, and epidemiological investigations owing to its ease of experimental manipulation and long-term monitoring. Although this system has an existing individual-based model, it falls short in capturing information about species-specific microhabitat preferences and other biological details for different Gyrodactylus strains across diverse fish populations. This current study introduces a new individual-based stochastic simulation model that uses a hybrid τ -leaping algorithm to incorporate this essential data, enhancing our understanding of the complexity of the gyrodactylid-fish system. We compare the infection dynamics of three gyrodactylid strains across three host populations. A modified sequential-type approximate Bayesian computation (ABC) method, based on sequential Monte Carlo and sequential importance sampling, is developed. Additionally, we establish two penalised local-linear regression methods (based on L1 and L2 regularisations) for ABC post-processing analysis to fit our model using existing empirical data. With the support of experimental data and the fitted mathematical model, we address open biological questions for the first time and propose directions for future studies on the gyrodactylid-fish system. The adaptability of the mathematical model extends beyond the gyrodactylid-fish system to other host-parasite systems. Furthermore, the modified ABC methodologies provide efficient calibration for other multi-parameter models characterised by a large set of correlated or independent summary statistics.
Collapse
Affiliation(s)
- Clement Twumasi
- Nuffield Department of Medicine, University of Oxford, South Parks Road, Oxford, Oxfordshire, OX1 3SY, UK.
- School of Public Health, Imperial College London, 68 Wood Lane, London, Greater London, W12 7RH, UK.
- School of Mathematics, Cardiff University, Senghennydd Road, Cardiff, South Glamorgan, CF24 4AG, UK.
- School of Biosciences, Cardiff University, Sir Martin Evans Building, Cardiff, South Glamorgan, CF10 3AX, UK.
| | - Joanne Cable
- School of Biosciences, Cardiff University, Sir Martin Evans Building, Cardiff, South Glamorgan, CF10 3AX, UK
| | - Andrey Pepelyshev
- School of Mathematics, Cardiff University, Senghennydd Road, Cardiff, South Glamorgan, CF24 4AG, UK.
| |
Collapse
|
14
|
Nguyen KC, Jameson CD, Baldwin SA, Nardini JT, Smith RC, Haugh JM, Flores KB. Quantifying collective motion patterns in mesenchymal cell populations using topological data analysis and agent-based modeling. Math Biosci 2024; 370:109158. [PMID: 38373479 DOI: 10.1016/j.mbs.2024.109158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 02/06/2024] [Accepted: 02/11/2024] [Indexed: 02/21/2024]
Abstract
Fibroblasts in a confluent monolayer are known to adopt elongated morphologies in which cells are oriented parallel to their neighbors. We collected and analyzed new microscopy movies to show that confluent fibroblasts are motile and that neighboring cells often move in anti-parallel directions in a collective motion phenomenon we refer to as "fluidization" of the cell population. We used machine learning to perform cell tracking for each movie and then leveraged topological data analysis (TDA) to show that time-varying point-clouds generated by the tracks contain significant topological information content that is driven by fluidization, i.e., the anti-parallel movement of individual neighboring cells and neighboring groups of cells over long distances. We then utilized the TDA summaries extracted from each movie to perform Bayesian parameter estimation for the D'Orsgona model, an agent-based model (ABM) known to produce a wide array of different patterns, including patterns that are qualitatively similar to fluidization. Although the D'Orsgona ABM is a phenomenological model that only describes inter-cellular attraction and repulsion, the estimated region of D'Orsogna model parameter space was consistent across all movies, suggesting that a specific level of inter-cellular repulsion force at close range may be a mechanism that helps drive fluidization patterns in confluent mesenchymal cell populations.
Collapse
Affiliation(s)
- Kyle C Nguyen
- Biomathematics Graduate Program, North Carolina State University, Raleigh, NC 27607, USA; Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27607, USA.
| | | | - Scott A Baldwin
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695, USA
| | - John T Nardini
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ 08628, USA
| | - Ralph C Smith
- Department of Mathematics, North Carolina State University, Raleigh, NC 27607, USA
| | - Jason M Haugh
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695, USA
| | - Kevin B Flores
- Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC 27607, USA; Department of Mathematics, North Carolina State University, Raleigh, NC 27607, USA
| |
Collapse
|
15
|
Dinnage R, Sarre SD, Duncan RP, Dickman CR, Edwards SV, Greenville AC, Wardle GM, Gruber B. slimr: An R package for tailor-made integrations of data in population genomic simulations over space and time. Mol Ecol Resour 2024; 24:e13916. [PMID: 38124500 DOI: 10.1111/1755-0998.13916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 11/20/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
Software for realistically simulating complex population genomic processes is revolutionizing our understanding of evolutionary processes, and providing novel opportunities for integrating empirical data with simulations. However, the integration between standalone simulation software and R is currently not well developed. Here, we present slimr, an R package designed to create a seamless link between standalone software SLiM >3.0, one of the most powerful population genomic simulation frameworks, and the R development environment, with its powerful data manipulation and analysis tools. We show how slimr facilitates smooth integration between genetic data, ecological data and simulation in a single environment. The package enables pipelines that begin with data reading, cleaning and manipulation, proceed to constructing empirically based parameters and initial conditions for simulations, then to running numerical simulations and finally to retrieving simulation results in a format suitable for comparisons with empirical data - aided by advanced analysis and visualization tools provided by R. We demonstrate the use of slimr with an example from our own work on the landscape population genomics of desert mammals, highlighting the advantage of having a single integrated tool for both data analysis and simulation. slimr makes the powerful simulation ability of SLiM directly accessible to R users, allowing integrated simulation projects that incorporate empirical data without the need to switch between software environments. This should provide more opportunities for evolutionary biologists and ecologists to use realistic simulations to better understand the interplay between ecological and evolutionary processes.
Collapse
Affiliation(s)
- Russell Dinnage
- Institute of Environment, Department of Biological Sciences, Florida International University, Miami, Florida, USA
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Stephen D Sarre
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Richard P Duncan
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| | - Christopher R Dickman
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA
| | - Aaron C Greenville
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Glenda M Wardle
- Desert Ecology Research Group, School of Life and Environmental Sciences, University of Sydney, Camperdown, New South Wales, Australia
| | - Bernd Gruber
- Centre for Conservation Ecology and Genomics, Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
16
|
Zeng ZH, Zhong L, Sun HY, Wu ZK, Wang X, Wang H, Li DZ, Barrett SCH, Zhou W. Parallel evolution of morphological and genomic selfing syndromes accompany the breakdown of heterostyly. THE NEW PHYTOLOGIST 2024; 242:302-316. [PMID: 38214455 DOI: 10.1111/nph.19522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 12/18/2023] [Indexed: 01/13/2024]
Abstract
Evolutionary transitions from outcrossing to selfing in flowering plants have convergent morphological and genomic signatures and can involve parallel evolution within related lineages. Adaptive evolution of morphological traits is often assumed to evolve faster than nonadaptive features of the genomic selfing syndrome. We investigated phenotypic and genomic changes associated with transitions from distyly to homostyly in the Primula oreodoxa complex. We determined whether the transition to selfing occurred more than once and investigated stages in the evolution of morphological and genomic selfing syndromes using 22 floral traits and both nuclear and plastid genomic data from 25 populations. Two independent transitions were detected representing an earlier and a more recently derived selfing lineage. The older lineage exhibited classic features of the morphological and genomic selfing syndrome. Although features of both selfing syndromes were less developed in the younger selfing lineage, they exhibited parallel development with the older selfing lineage. This finding contrasts with the prediction that some genomic changes should lag behind adaptive changes to morphological traits. Our findings highlight the value of comparative studies on the timing and extent of transitions from outcrossing to selfing between related lineages for investigating the tempo of morphological and molecular evolution.
Collapse
Affiliation(s)
- Zhi-Hua Zeng
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Li Zhong
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hua-Ying Sun
- School of Chinese Materia Medica, Yunnan University of Chinese Medicine, Kunming, Yunnan, 650500, China
| | - Zhi-Kun Wu
- Department of Pharmacy, Guizhou University of Traditional Chinese Medicine, Guiyang, Guizhou, 550002, China
| | - Xin Wang
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Hong Wang
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Spencer C H Barrett
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Wei Zhou
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- Lijiang Forest Biodiversity National Observation and Research Station, Kunming Institute of Botany, Chinese Academy of Sciences, Lijiang, Yunnan, 674100, China
| |
Collapse
|
17
|
Wang MH, Onnela JP. Flexible Bayesian inference on partially observed epidemics. JOURNAL OF COMPLEX NETWORKS 2024; 12:cnae017. [PMID: 38533184 PMCID: PMC10962317 DOI: 10.1093/comnet/cnae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 03/02/2024] [Indexed: 03/28/2024]
Abstract
Individual-based models of contagious processes are useful for predicting epidemic trajectories and informing intervention strategies. In such models, the incorporation of contact network information can capture the non-randomness and heterogeneity of realistic contact dynamics. In this article, we consider Bayesian inference on the spreading parameters of an SIR contagion on a known, static network, where information regarding individual disease status is known only from a series of tests (positive or negative disease status). When the contagion model is complex or information such as infection and removal times is missing, the posterior distribution can be difficult to sample from. Previous work has considered the use of Approximate Bayesian Computation (ABC), which allows for simulation-based Bayesian inference on complex models. However, ABC methods usually require the user to select reasonable summary statistics. Here, we consider an inference scheme based on the Mixture Density Network compressed ABC, which minimizes the expected posterior entropy in order to learn informative summary statistics. This allows us to conduct Bayesian inference on the parameters of a partially observed contagious process while also circumventing the need for manual summary statistic selection. This methodology can be extended to incorporate additional simulation complexities, including behavioural change after positive tests or false test results.
Collapse
Affiliation(s)
- Maxwell H Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| |
Collapse
|
18
|
Dickson ZW, Golding GB. Evolution of Transcript Abundance is Influenced by Indels in Protein Low Complexity Regions. J Mol Evol 2024; 92:153-168. [PMID: 38485789 DOI: 10.1007/s00239-024-10158-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/24/2024] [Indexed: 04/02/2024]
Abstract
Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.
Collapse
Affiliation(s)
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
19
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
20
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
21
|
Biello R, Ghirotto S, Schmidt DJ, Fuselli S, Roberts DT, Espinoza T, Hughes JM, Bertorelle G. Unravelling the mystery of endemic versus translocated populations of the endangered Australian lungfish (Neoceratodus forsteri). Mol Ecol 2024; 33:e17266. [PMID: 38240411 DOI: 10.1111/mec.17266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 12/24/2023] [Accepted: 01/04/2024] [Indexed: 02/22/2024]
Abstract
The Australian lungfish is a primitive and endangered representative of the subclass Dipnoi. The distribution of this species is limited to south-east Queensland, with some populations considered endemic and others possibly descending from translocations in the late nineteenth century shortly after European discovery. Attempts to resolve the historical distribution of this species have met with conflicting results based on descriptive genetic studies. Understanding if all populations are endemic or some are the result of, or influenced by, translocation events, has implications for conservation management. In this work, we analysed the genetic variation at three types of markers (mtDNA genomes, 11 STRs and 5196 nuclear SNPs) using the approximate Bayesian computation (ABC) algorithm to compare several demographic models. We postulated different contributions of Mary River and Burnett River gene pools into the Brisbane River and North Pine River populations, related to documented translocation events. We ran the analysis for each marker type separately, and we also estimated the posterior probabilities of the models combining the markers. Nuclear SNPs have the highest power to correctly identify the true model among the simulated datasets (where the model was known), but different marker types typically provided similar answers. The most supported demographic model able to explain the real dataset implies that an endemic gene pool is still present in the Brisbane and North Pine Rivers and coexists with the gene pools derived from past documented translocation events. These results support the view that ABC modelling can be useful to reconstruct complex historical translocation events with contemporary implications, and will inform ongoing conservation efforts for the endangered and iconic Australian lungfish.
Collapse
Affiliation(s)
- Roberto Biello
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
- Department of Crop Genetics, John Innes Centre, Norwich, UK
| | - Silvia Ghirotto
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Daniel J Schmidt
- Australian Rivers Institute, Griffith University, Brisbane, Queensland, Australia
| | - Silvia Fuselli
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | | | - Tom Espinoza
- Burnett Mary Regional Group, Bargara, Queensland, Australia
| | - Jane M Hughes
- Australian Rivers Institute, Griffith University, Brisbane, Queensland, Australia
| | - Giorgio Bertorelle
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| |
Collapse
|
22
|
Vollert SA, Drovandi C, Adams MP. Unlocking ensemble ecosystem modelling for large and complex networks. PLoS Comput Biol 2024; 20:e1011976. [PMID: 38483981 DOI: 10.1371/journal.pcbi.1011976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/26/2024] [Accepted: 03/07/2024] [Indexed: 03/27/2024] Open
Abstract
The potential effects of conservation actions on threatened species can be predicted using ensemble ecosystem models by forecasting populations with and without intervention. These model ensembles commonly assume stable coexistence of species in the absence of available data. However, existing ensemble-generation methods become computationally inefficient as the size of the ecosystem network increases, preventing larger networks from being studied. We present a novel sequential Monte Carlo sampling approach for ensemble generation that is orders of magnitude faster than existing approaches. We demonstrate that the methods produce equivalent parameter inferences, model predictions, and tightly constrained parameter combinations using a novel sensitivity analysis method. For one case study, we demonstrate a speed-up from 108 days to 6 hours, while maintaining equivalent ensembles. Additionally, we demonstrate how to identify the parameter combinations that strongly drive feasibility and stability, drawing ecological insight from the ensembles. Now, for the first time, larger and more realistic networks can be practically simulated and analysed.
Collapse
Affiliation(s)
- Sarah A Vollert
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Christopher Drovandi
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Matthew P Adams
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Chemical Engineering, The University of Queensland, St Lucia, Australia
| |
Collapse
|
23
|
Lukaszewicz M, Salia OI, Hohenlohe PA, Buzbas EO. Approximate Bayesian computational methods to estimate the strength of divergent selection in population genomics models. JOURNAL OF COMPUTATIONAL MATHEMATICS AND DATA SCIENCE 2024; 10:100091. [PMID: 38616846 PMCID: PMC11014422 DOI: 10.1016/j.jcmds.2024.100091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Statistical estimation of parameters in large models of evolutionary processes is often too computationally inefficient to pursue using exact model likelihoods, even with single-nucleotide polymorphism (SNP) data, which offers a way to reduce the size of genetic data while retaining relevant information. Approximate Bayesian Computation (ABC) to perform statistical inference about parameters of large models takes the advantage of simulations to bypass direct evaluation of model likelihoods. We develop a mechanistic model to simulate forward-in-time divergent selection with variable migration rates, modes of reproduction (sexual, asexual), length and number of migration-selection cycles. We investigate the computational feasibility of ABC to perform statistical inference and study the quality of estimates on the position of loci under selection and the strength of selection. To expand the parameter space of positions under selection, we enhance the model by implementing an outlier scan on summarized observed data. We evaluate the usefulness of summary statistics well-known to capture the strength of selection, and assess their informativeness under divergent selection. We also evaluate the effect of genetic drift with respect to an idealized deterministic model with single-locus selection. We discuss the role of the recombination rate as a confounding factor in estimating the strength of divergent selection, and emphasize its importance in break down of linkage disequilibrium (LD). We answer the question for which part of the parameter space of the model we recover strong signal for estimating the selection, and determine whether population differentiation-based summary statistics or LD-based summary statistics perform well in estimating selection.
Collapse
Affiliation(s)
- Martyna Lukaszewicz
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
| | - Ousseini Issaka Salia
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
- Department of Horticulture, Washington State University, Pullman, WA, United States of America
| | - Paul A. Hohenlohe
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, ID, United States of America
| | - Erkan O. Buzbas
- Institute for Interdisciplinary Data Sciences (IIDS), University of Idaho, Moscow, ID, United States of America
- Institute for Modeling Collaboration and Innovation (IMCI), University of Idaho, Moscow, ID, United States of America
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, United States of America
| |
Collapse
|
24
|
Alamoudi E, Reck F, Bundgaard N, Graw F, Brusch L, Hasenauer J, Schälte Y. A wall-time minimizing parallelization strategy for approximate Bayesian computation. PLoS One 2024; 19:e0294015. [PMID: 38386671 PMCID: PMC10883530 DOI: 10.1371/journal.pone.0294015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/24/2023] [Indexed: 02/24/2024] Open
Abstract
Approximate Bayesian Computation (ABC) is a widely applicable and popular approach to estimating unknown parameters of mechanistic models. As ABC analyses are computationally expensive, parallelization on high-performance infrastructure is often necessary. However, the existing parallelization strategies leave computing resources unused at times and thus do not optimally leverage them yet. We present look-ahead scheduling, a wall-time minimizing parallelization strategy for ABC Sequential Monte Carlo algorithms, which avoids idle times of computing units by preemptive sampling of subsequent generations. This allows to utilize all available resources. The strategy can be integrated with e.g. adaptive distance function and summary statistic selection schemes, which is essential in practice. Our key contribution is the theoretical assessment of the strategy of preemptive sampling and the proof of unbiasedness. Complementary, we provide an implementation and evaluate the strategy on different problems and numbers of parallel cores, showing speed-ups of typically 10-20% and up to 50% compared to the best established approach, with some variability. Thus, the proposed strategy allows to improve the cost and run-time efficiency of ABC methods on high-performance infrastructure.
Collapse
Affiliation(s)
- Emad Alamoudi
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Felipe Reck
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Nils Bundgaard
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
| | - Frederik Graw
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany
- Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Lutz Brusch
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Yannik Schälte
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
25
|
Wygoda E, Loewenthal G, Moshe A, Alburquerque M, Mayrose I, Pupko T. Statistical framework to determine indel-length distribution. Bioinformatics 2024; 40:btae043. [PMID: 38269647 PMCID: PMC10868340 DOI: 10.1093/bioinformatics/btae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/10/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.
Collapse
Affiliation(s)
- Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
26
|
Gabrielli M, Leroy T, Salmona J, Nabholz B, Milá B, Thébaud C. Demographic responses of oceanic island birds to local and regional ecological disruptions revealed by whole-genome sequencing. Mol Ecol 2024; 33:e17243. [PMID: 38108507 DOI: 10.1111/mec.17243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/26/2023] [Accepted: 11/30/2023] [Indexed: 12/19/2023]
Abstract
Disentangling the effects of ecological disruptions operating at different spatial and temporal scales in shaping past species' demography is particularly important in the current context of rapid environmental changes driven by both local and regional factors. We argue that volcanic oceanic islands provide useful settings to study the influence of past ecological disruptions operating at local and regional scales on population demographic histories. We investigate potential drivers of past population dynamics for three closely related species of passerine birds from two volcanic oceanic islands, Reunion and Mauritius (Mascarene archipelago), with distinct volcanic history. Using ABC and PSMC inferences from complete genomes, we reconstructed the demographic history of the Reunion Grey White-eye (Zosterops borbonicus (Pennant, 1781)), the Reunion Olive White-eye (Z. olivaceus (Linnaeus, 1766)) and the Mauritius Grey White-eye (Z. mauritianus (Gmelin, 1789)) and searched for possible causes underlying similarities or differences between species living on the same or different islands. Both demographic inferences strongly support ancient and long-term expansions in all species. They also reveal different trajectories between species inhabiting different islands, but consistent demographic trajectories in species or populations from the same island. Species from Reunion appear to have experienced synchronous reductions in population size during the Last Glacial Maximum, a trend not seen in the Mauritian species. Overall, this study suggests that local events may have played a role in shaping population trajectories of these island species. It also highlights the potential of our conceptual framework to disentangle the effects of local and regional drivers on past species' demography and long-term population processes.
Collapse
Affiliation(s)
- Maëva Gabrielli
- Laboratoire Évolution et Diversité Biologique (EDB), UMR 5174 (Université Paul Sabatier, CNRS, IRD), Toulouse, France
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Thibault Leroy
- GenPhySE, INRAE, INP, ENVT, Université de Toulouse, Castanet-Tolosan, France
| | - Jordi Salmona
- Laboratoire Évolution et Diversité Biologique (EDB), UMR 5174 (Université Paul Sabatier, CNRS, IRD), Toulouse, France
| | - Benoit Nabholz
- Institut des Sciences de l'Evolution de Montpellier, UMR 5554 (Université de Montpellier, CNRS, IRD, EPHE), Montpellier, France
| | - Borja Milá
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| | - Christophe Thébaud
- Laboratoire Évolution et Diversité Biologique (EDB), UMR 5174 (Université Paul Sabatier, CNRS, IRD), Toulouse, France
| |
Collapse
|
27
|
Ruiz-Montoya L, Sánchez-Rosario M, López-Gómez E, Garcia-Bautista M, Canedo-Texón A, Haymer D, Liedo P. Mass-Rearing Conditions Do Not Always Reduce Genetic Diversity: The Case of the Mexican Fruit Fly, Anastrepha ludens (Diptera: Tephritidae). INSECTS 2024; 15:56. [PMID: 38249062 PMCID: PMC10816967 DOI: 10.3390/insects15010056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/21/2023] [Accepted: 01/11/2024] [Indexed: 01/23/2024]
Abstract
The application of the sterile insect technique (SIT) requires the adaptation of insects to mass-rearing conditions. It is generally accepted that this adaptation may include a reduction in genetic diversity and an associated loss of desirable characteristics for the effective performance of sterile insects in the field. Here, we compare the genetic diversity of two mass-reared strains of the Mexican fruit fly, Anastrepha ludens, and a wild (WIL) population collected near Tapachula, Mexico, using seven DNA microsatellites as molecular genetic markers. The mass-reared strains were a bisexual laboratory strain (LAB) with approximately 130 generations under mass-rearing and a genetic sexing strain, Tapachula-7 (TA7), also under mass-rearing for 100 generations. Our results revealed an overall low level of genetic differentiation (approximately 15%) among the three strains, with the LAB and WIL populations being genetically most similar and TA7 most genetically differentiated. Although there were some differences in allele frequencies between strains, our results show that overall, the adaptation to mass-rearing conditions did not reduce genetic variability compared to the wild sample in terms of heterozygosity or allelic richness, nor did it appear to alter the level of inbreeding with respect to the wild populations. These results are contrary to the general idea that mass-rearing always results in a reduction in genetic diversity. Overall, our findings can contribute to a better understanding of the impact that adaptation to mass-rearing conditions may have on the genetic make-up of strains.
Collapse
Affiliation(s)
- Lorena Ruiz-Montoya
- El Colegio de la Frontera Sur (ECOSUR), Carretera Panamericana y Periférico Sur, Barrio María Auxiliadora, San Cristóbal de las Casas 29290, Chiapas, Mexico;
| | - Mayren Sánchez-Rosario
- El Colegio de la Frontera Sur (ECOSUR), Carretera Antiguo Aeropuerto, Tapachula 30700, Chiapas, Mexico; (M.S.-R.); (P.L.)
| | - Emiliano López-Gómez
- Instituto de Biociencias, Universidad Autónoma de Chiapas, Boulevard Príncipe Akishino Sin Número Colonia Solidaridad 2000, Tapachula 30798, Chiapas, Mexico;
| | - Maricela Garcia-Bautista
- El Colegio de la Frontera Sur (ECOSUR), Carretera Panamericana y Periférico Sur, Barrio María Auxiliadora, San Cristóbal de las Casas 29290, Chiapas, Mexico;
| | - Anahí Canedo-Texón
- El Colegio de la Frontera Sur (ECOSUR), Carretera Panamericana y Periférico Sur, Barrio María Auxiliadora, San Cristóbal de las Casas 29290, Chiapas, Mexico;
| | - David Haymer
- Department of Cell and Molecular Biology, University of Hawaii, 1960 East-West Rd, Biomed T511, Honolulu, HI 96822, USA;
| | - Pablo Liedo
- El Colegio de la Frontera Sur (ECOSUR), Carretera Antiguo Aeropuerto, Tapachula 30700, Chiapas, Mexico; (M.S.-R.); (P.L.)
| |
Collapse
|
28
|
Lambert S, Voznica J, Morlon H. Deep Learning from Phylogenies for Diversification Analyses. Syst Biol 2023; 72:1262-1279. [PMID: 37556735 DOI: 10.1093/sysbio/syad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 06/20/2023] [Accepted: 08/08/2023] [Indexed: 08/11/2023] Open
Abstract
Birth-death (BD) models are widely used in combination with species phylogenies to study past diversification dynamics. Current inference approaches typically rely on likelihood-based methods. These methods are not generalizable, as a new likelihood formula must be established each time a new model is proposed; for some models, such a formula is not even tractable. Deep learning can bring solutions in such situations, as deep neural networks can be trained to learn the relation between simulations and parameter values as a regression problem. In this paper, we adapt a recently developed deep learning method from pathogen phylodynamics to the case of diversification inference, and we extend its applicability to the case of the inference of state-dependent diversification models from phylogenies associated with trait data. We demonstrate the accuracy and time efficiency of the approach for the time-constant homogeneous BD model and the Binary-State Speciation and Extinction model. Finally, we illustrate the use of the proposed inference machinery by reanalyzing a phylogeny of primates and their associated ecological role as seed dispersers. Deep learning inference provides at least the same accuracy as likelihood-based inference while being faster by several orders of magnitude, offering a promising new inference approach for the deployment of future models in the field.
Collapse
Affiliation(s)
- Sophia Lambert
- Institut de Biologie de l'École Normale Supérieure, École Normale Supérieure, CNRS, INSERM, Université Paris Sciences et Lettres, 46 Rue d'Ulm, 75005 Paris, France
- Institute of Ecology and Evolution, Department of Biology, 5289 University of Oregon, Eugene, OR 97403, USA
| | - Jakub Voznica
- Institut Pasteur, Université Paris Cité, Unité Bioinformatique Evolutive, 25-28 Rue du Dr Roux, 75015 Paris, France
- Unité de Biologie Computationnelle, USR 3756 CNRS, 25-28 Rue du Dr Roux, 75015 Paris, France
| | - Hélène Morlon
- Institut de Biologie de l'École Normale Supérieure, École Normale Supérieure, CNRS, INSERM, Université Paris Sciences et Lettres, 46 Rue d'Ulm, 75005 Paris, France
| |
Collapse
|
29
|
Kishino H, Nakamichi R, Kitada S. Genetic adaptations in the population history of Arabidopsis thaliana. G3 (BETHESDA, MD.) 2023; 13:jkad218. [PMID: 37748020 PMCID: PMC10700115 DOI: 10.1093/g3journal/jkad218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 05/26/2023] [Accepted: 09/14/2023] [Indexed: 09/27/2023]
Abstract
A population encounters a variety of environmental stresses, so the full source of its resilience can only be captured by collecting all the signatures of adaptation to the selection of the local environment in its population history. Based on the multiomic data of Arabidopsis thaliana, we constructed a database of phenotypic adaptations (p-adaptations) and gene expression (e-adaptations) adaptations in the population. Through the enrichment analysis of the identified adaptations, we inferred a likely scenario of adaptation that is consistent with the biological evidence from experimental work. We analyzed the dynamics of the allele frequencies at the 23,880 QTLs of 174 traits and 8,618 eQTLs of 1,829 genes with respect to the total SNPs in the genomes and identified 650 p-adaptations and 3,925 e-adaptations [false discovery rate (FDR) = 0.05]. The population underwent large-scale p-adaptations and e-adaptations along 4 lineages. Extremely cold winters and short summers prolonged seed dormancy and expanded the root system architecture. Low temperatures prolonged the growing season, and low light intensity required the increased chloroplast activity. The subtropical and humid environment enhanced phytohormone signaling pathways in response to the biotic and abiotic stresses. Exposure to heavy metals selected alleles for lower heavy metal uptake from soil, lower growth rate, lower resistance to bacteria, and higher expression of photosynthetic genes were selected. The p-adaptations are directly interpretable, while the coadapted gene expressions reflect the physiological requirements for the adaptation. The integration of this information characterizes when and where the population has experienced environmental stress and how the population responded at the molecular level.
Collapse
Affiliation(s)
- Hirohisa Kishino
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
- Research and Development Initiative, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
| | - Reiichiro Nakamichi
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, 2-12-4 Fukuura, Kanazawa-ku, Yokohama, Kanagawa 236-8648, Japan
| | - Shuichi Kitada
- Graduate School of Marine Science and Technology, Tokyo University of Marine Science and Technology, 4-5-7 Konan, Minato-ku, Tokyo 108-8477, Japan
| |
Collapse
|
30
|
Di Santo LN, Quilodrán CS, Currat M. Temporal Variation in Introgressed Segments' Length Statistics Computed from a Limited Number of Ancient Genomes Sheds Light on Past Admixture Pulses. Mol Biol Evol 2023; 40:msad252. [PMID: 37992125 PMCID: PMC10715198 DOI: 10.1093/molbev/msad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 10/16/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open
Abstract
Hybridization is recognized as an important evolutionary force, but identifying and timing admixture events between divergent lineages remain a major aim of evolutionary biology. While this has traditionally been done using inferential tools on contemporary genomes, the latest advances in paleogenomics have provided a growing wealth of temporally distributed genomic data. Here, we used individual-based simulations to generate chromosome-level genomic data for a 2-population system and described temporal neutral introgression patterns under a single- and 2-pulse admixture model. We computed 6 summary statistics aiming to inform the timing and number of admixture pulses between interbreeding entities: lengths of introgressed sequences and their variance within genomes, as well as genome-wide introgression proportions and related measures. The first 2 statistics could confidently be used to infer interlineage hybridization history, peaking at the beginning and shortly after an admixture pulse. Temporal variation in introgression proportions and related statistics provided more limited insights, particularly when considering their application to ancient genomes still scant in number. Lastly, we computed these statistics on Homo sapiens paleogenomes and successfully inferred the hybridization pulse from Neanderthal that occurred approximately 40 to 60 kya. The scarce number of genomes dating from this period prevented more precise inferences, but the accumulation of paleogenomic data opens promising perspectives as our approach only requires a limited number of ancient genomes.
Collapse
Affiliation(s)
- Lionel N Di Santo
- Department of Genetics and Evolution, University of Geneva, Geneva CH-1205
| | | | - Mathias Currat
- Department of Genetics and Evolution, University of Geneva, Geneva CH-1205
- Institute of Genetics and Genomics in Geneva (IGE3), University of Geneva, Geneva CH-1205
| |
Collapse
|
31
|
Ong HG, Kim Y, Lee J, Kim B, Kang D, Jung E, Shin J, Kim Y. Approximate Bayesian computation and ecological niche models elucidate the demographic history and current fragmented population distribution of a Korean endemic shrub. Ecol Evol 2023; 13:e10792. [PMID: 38077507 PMCID: PMC10700048 DOI: 10.1002/ece3.10792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/15/2023] [Accepted: 11/20/2023] [Indexed: 12/26/2023] Open
Abstract
Climatic fluctuations and geological events since the LGM are believed to have significantly impacted the population size, distribution, and mobility of many species that we observe today. In this paper, we determined the processes driving the phylogeographic structure of the Korean endemic white forsythia by combining the use of genome-wide SNPs and predicting paleoclimatic habitats during the LGM (21 kya), Early Holocene (10 kya), Mid-Holocene (6 kya), and Late Holocene (3 kya). Using a maximum of 1897 SNPs retrieved from 124 samples across nine wild populations, five environmental predictors, and the species' natural occurrence records, we aimed to infer the species' demographic history and reconstruct its possible paleodistributions with the use of approximate Bayesian computation and ecological niche models, respectively. Under this integrated framework, we found strong evidence for patterns of range shift and expansion, and population divergence events from the onset of the Holocene, resulting in the formation of its five distinct genetic units. The most highly supported model inferred that after the split of an ancestral population into the southern group and a larger central metapopulation lineage, the latter gave rise to the eastern and northern clusters, before finally dividing into two sub-central groups. While the use of molecular data allowed us to identify and refine the (phylo)genetic relationships of the species' lineages and populations, the use of ecological data helped us infer a past LGM refugium and the directions of post-glacial range dynamics. The time frames of these demographic events were shown to be congruent with climatic and geological events that affected the central Korean Peninsula during these periods. These findings gave us a better understanding of the consequences of past spatiotemporal factors that may have resulted in the current fragmented population distribution of this endangered plant.
Collapse
Affiliation(s)
| | - Yong‐In Kim
- On Biological Resource Research Institute (OBRRI)ChuncheonSouth Korea
| | - Jung‐Hoon Lee
- On Biological Resource Research Institute (OBRRI)ChuncheonSouth Korea
| | - Bo‐Yun Kim
- National Institute of Biological Resources (NIBR)IncheonSouth Korea
| | - Dae‐Hyun Kang
- Korea National Park Research InstituteWonjuSouth Korea
| | - Eui‐Kwon Jung
- Department of Life ScienceHallym UniversityChuncheonSouth Korea
| | - Jae‐Seo Shin
- Department of Life ScienceHallym UniversityChuncheonSouth Korea
| | - Young‐Dong Kim
- Multidisciplinary Genome InstituteHallym UniversityChuncheonSouth Korea
- Department of Life ScienceHallym UniversityChuncheonSouth Korea
| |
Collapse
|
32
|
Long H, Johri P, Gout JF, Ni J, Hao Y, Licknack T, Wang Y, Pan J, Jiménez-Marín B, Lynch M. Paramecium Genetics, Genomics, and Evolution. Annu Rev Genet 2023; 57:391-410. [PMID: 38012024 DOI: 10.1146/annurev-genet-071819-104035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The ciliate genus Paramecium served as one of the first model systems in microbial eukaryotic genetics, contributing much to the early understanding of phenomena as diverse as genome rearrangement, cryptic speciation, cytoplasmic inheritance, and endosymbiosis, as well as more recently to the evolution of mating types, introns, and roles of small RNAs in DNA processing. Substantial progress has recently been made in the area of comparative and population genomics. Paramecium species combine some of the lowest known mutation rates with some of the largest known effective populations, along with likely very high recombination rates, thereby harboring a population-genetic environment that promotes an exceptionally efficient capacity for selection. As a consequence, the genomes are extraordinarily streamlined, with very small intergenic regions combined with small numbers of tiny introns. The subject of the bulk of Paramecium research, the ancient Paramecium aurelia species complex, is descended from two whole-genome duplication events that retain high degrees of synteny, thereby providing an exceptional platform for studying the fates of duplicate genes. Despite having a common ancestor dating to several hundred million years ago, the known descendant species are morphologically indistinguishable, raising significant questions about the common view that gene duplications lead to the origins of evolutionary novelties.
Collapse
Affiliation(s)
- Hongan Long
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao, Shandong Province, China
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jean-Francois Gout
- Department of Biological Sciences, Mississippi State University, Starkville, Mississippi, USA
| | - Jiahao Ni
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Yue Hao
- Cancer and Cell Biology Division, Translational Genomics Research Institute, Phoenix, Arizona, USA
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Timothy Licknack
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Yaohai Wang
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Jiao Pan
- Institute of Evolution and Marine Biodiversity, KLMME, Ocean University of China, Qingdao, Shandong Province, China;
| | - Berenice Jiménez-Marín
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona, USA;
| |
Collapse
|
33
|
Huang X, Athrey GN, Kaufman PE, Fredregill C, Slotman MA. Effective population size of Culex quinquefasciatus under insecticide-based vector management and following Hurricane Harvey in Harris County, Texas. Front Genet 2023; 14:1297271. [PMID: 38075683 PMCID: PMC10702589 DOI: 10.3389/fgene.2023.1297271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 10/24/2023] [Indexed: 02/12/2024] Open
Abstract
Introduction: Culex quinquefasciatus is a mosquito species of significant public health importance due to its ability to transmit multiple pathogens that can cause mosquito-borne diseases, such as West Nile fever and St. Louis encephalitis. In Harris County, Texas, Cx. quinquefasciatus is a common vector species and is subjected to insecticide-based management by the Harris County Public Health Department. However, insecticide resistance in mosquitoes has increased rapidly worldwide and raises concerns about maintaining the effectiveness of vector control approaches. This concern is highly relevant in Texas, with its humid subtropical climate along the Gulf Coast that provides suitable habitat for Cx. quinquefasciatus and other mosquito species that are known disease vectors. Therefore, there is an urgent and ongoing need to monitor the effectiveness of current vector control programs. Methods: In this study, we evaluated the impact of vector control approaches by estimating the effective population size of Cx. quinquefasciatus in Harris County. We applied Approximate Bayesian Computation to microsatellite data to estimate effective population size. We collected Cx. quinquefasciatus samples from two mosquito control operation areas; 415 and 802, during routine vector monitoring in 2016 and 2017. No county mosquito control operations were applied at area 415 in 2016 and 2017, whereas extensive adulticide spraying operations were in effect at area 802 during the summer of 2016. We collected data for eighteen microsatellite markers for 713 and 723 mosquitoes at eight timepoints from 2016 to 2017 in areas 415 and 802, respectively. We also investigated the impact of Hurricane Harvey's landfall in the Houston area in August of 2017 on Cx. quinquefasciatus population fluctuation. Results: We found that the bottleneck scenario was the most probable historical scenario describing the impact of the winter season at area 415 and area 802, with the highest posterior probability of 0.9167 and 0.4966, respectively. We also detected an expansion event following Hurricane Harvey at area 802, showing a 3.03-fold increase in 2017. Discussion: Although we did not detect significant effects of vector control interventions, we found considerable influences of the winter season and a major hurricane on the effective population size of Cx. quinquefasciatus. The fluctuations in effective population size in both areas showed a significant seasonal pattern. Additionally, the significant population expansion following Hurricane Harvey in 2017 supports the necessity for post-hurricane vector-control interventions.
Collapse
Affiliation(s)
- Xinyue Huang
- Department of Entomology, Texas A&M University, College Station, TX, United States
| | - Giridhar N. Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, United States
| | - Phillip E. Kaufman
- Department of Entomology, Texas A&M University, College Station, TX, United States
| | - Chris Fredregill
- Harris County Public Health, Mosquito & Vector Control Division, Houston, TX, United States
| | - Michel A. Slotman
- Department of Entomology, Texas A&M University, College Station, TX, United States
| |
Collapse
|
34
|
Ascunce MS, Toloza AC, González-Oliver A, Reed DL. Nuclear genetic diversity of head lice sheds light on human dispersal around the world. PLoS One 2023; 18:e0293409. [PMID: 37939041 PMCID: PMC10631634 DOI: 10.1371/journal.pone.0293409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 09/26/2023] [Indexed: 11/10/2023] Open
Abstract
The human louse, Pediculus humanus, is an obligate blood-sucking ectoparasite that has coevolved with humans for millennia. Given the intimate relationship between this parasite and the human host, the study of human lice has the potential to shed light on aspects of human evolution that are difficult to interpret using other biological evidence. In this study, we analyzed the genetic variation in 274 human lice from 25 geographic sites around the world by using nuclear microsatellite loci and female-inherited mitochondrial DNA sequences. Nuclear genetic diversity analysis revealed the presence of two distinct genetic clusters I and II, which are subdivided into subclusters: Ia-Ib and IIa-IIb, respectively. Among these samples, we observed the presence of the two most common louse mitochondrial haplogroups: A and B that were found in both nuclear Clusters I and II. Evidence of nuclear admixture was uncommon (12%) and was predominate in the New World potentially mirroring the history of colonization in the Americas. These findings were supported by novel DIYABC simulations that were built using both host and parasite data to define parameters and models suggesting that admixture between cI and cII was very recent. This pattern could also be the result of a reproductive barrier between these two nuclear genetic clusters. In addition to providing new evolutionary knowledge about this human parasite, our study could guide the development of new analyses in other host-parasite systems.
Collapse
Affiliation(s)
- Marina S. Ascunce
- Department of Plant Pathology, Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
- USDA-ARS Center for Medical, Agricultural, and Veterinary Entomology, Gainesville, Florida, United States of America
| | - Ariel C. Toloza
- Centro de Investigaciones de Plagas e Insecticidas (CONICET-UNIDEF), Villa Martelli, Buenos Aires, Argentina
| | - Angélica González-Oliver
- Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - David L. Reed
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
35
|
Hong H, Cortez MJ, Cheng YY, Kim HJ, Choi B, Josić K, Kim JK. Inferring delays in partially observed gene regulation processes. Bioinformatics 2023; 39:btad670. [PMID: 37935426 PMCID: PMC10660296 DOI: 10.1093/bioinformatics/btad670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 11/09/2023] Open
Abstract
MOTIVATION Cell function is regulated by gene regulatory networks (GRNs) defined by protein-mediated interaction between constituent genes. Despite advances in experimental techniques, we can still measure only a fraction of the processes that govern GRN dynamics. To infer the properties of GRNs using partial observation, unobserved sequential processes can be replaced with distributed time delays, yielding non-Markovian models. Inference methods based on the resulting model suffer from the curse of dimensionality. RESULTS We develop a simulation-based Bayesian MCMC method employing an approximate likelihood for the efficient and accurate inference of GRN parameters when only some of their products are observed. We illustrate our approach using a two-step activation model: an activation signal leads to the accumulation of an unobserved regulatory protein, which triggers the expression of observed fluorescent proteins. With prior information about observed fluorescent protein synthesis, our method successfully infers the dynamics of the unobserved regulatory protein. We can estimate the delay and kinetic parameters characterizing target regulation including transcription, translation, and target searching of an unobserved protein from experimental measurements of the products of its target gene. Our method is scalable and can be used to analyze non-Markovian models with hidden components. AVAILABILITY AND IMPLEMENTATION Our code is implemented in R and is freely available with a simple example data at https://github.com/Mathbiomed/SimMCMC.
Collapse
Affiliation(s)
- Hyukpyo Hong
- Department of Mathematical Sciences, KAIST, Daejeon 34141, Korea
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon 34126, Korea
| | - Mark Jayson Cortez
- Institute of Mathematical Sciences and Physics, University of the Philippines Los Baños, Laguna 4031, Philippines
| | - Yu-Yu Cheng
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706, United States
| | - Hang Joon Kim
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH 45221, United States
| | - Boseung Choi
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon 34126, Korea
- Division of Big Data Science, Korea University Sejong Campus, Sejong 30019, Korea
- College of Public Health, The Ohio State University, Columbus, OH 43210, United States
| | - Krešimir Josić
- Department of Mathematics, University of Houston, Houston, TX 77204, United States
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, United States
| | - Jae Kyoung Kim
- Department of Mathematical Sciences, KAIST, Daejeon 34141, Korea
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon 34126, Korea
| |
Collapse
|
36
|
Alamoudi E, Schälte Y, Müller R, Starruß J, Bundgaard N, Graw F, Brusch L, Hasenauer J. FitMultiCell: simulating and parameterizing computational models of multi-scale and multi-cellular processes. Bioinformatics 2023; 39:btad674. [PMID: 37947308 PMCID: PMC10666203 DOI: 10.1093/bioinformatics/btad674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/12/2023] Open
Abstract
MOTIVATION Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyse and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as approximate Bayesian computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes. RESULTS Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology. AVAILABILITY AND IMPLEMENTATION FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit.
Collapse
Affiliation(s)
- Emad Alamoudi
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
| | - Yannik Schälte
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany
| | - Robert Müller
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Jörn Starruß
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Nils Bundgaard
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany
| | - Frederik Graw
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg 69120, Germany
- Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Lutz Brusch
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany
| |
Collapse
|
37
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
38
|
Boyle JH, Strickler S, Twyford AD, Ricono A, Powell A, Zhang J, Xu H, Smith R, Dalgleish HJ, Jander G, Agrawal AA, Puzey JR. Temporal matches between monarch butterfly and milkweed population changes over the past 25,000 years. Curr Biol 2023; 33:3702-3710.e5. [PMID: 37607548 DOI: 10.1016/j.cub.2023.07.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 04/13/2023] [Accepted: 07/26/2023] [Indexed: 08/24/2023]
Abstract
In intimate ecological interactions, the interdependency of species may result in correlated demographic histories. For species of conservation concern, understanding the long-term dynamics of such interactions may shed light on the drivers of population decline. Here, we address the demographic history of the monarch butterfly, Danaus plexippus, and its dominant host plant, the common milkweed Asclepias syriaca (A. syriaca), using broad-scale sampling and genomic inference. Because genetic resources for milkweed have lagged behind those for monarchs, we first release a chromosome-level genome assembly and annotation for common milkweed. Next, we show that despite its enormous geographic range across eastern North America, A. syriaca is best characterized as a single, roughly panmictic population. Using approximate Bayesian computation with random forests (ABC-RF), a machine learning method for reconstructing demographic histories, we show that both monarchs and milkweed experienced population expansion during the most recent recession of North American glaciers 10,000-20,000 years ago. Our data also identify concurrent population expansions in both species during the large-scale clearing of eastern forests (∼200 years ago). Finally, we find no evidence that either species experienced a reduction in effective population size over the past 75 years. Thus, the well-documented decline of monarch abundance over the past 40 years is not visible in our genomic dataset, reflecting a possible mismatch of the overwintering census population to effective population size in this species.
Collapse
Affiliation(s)
- John H Boyle
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA; Biology Department, University of Mary, 7500 University Dr., Bismarck, ND 58504, USA
| | - Susan Strickler
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA; Chicago Botanic Garden, Plant Science and Conservation, 1000 Lake Cook Rd., Glencoe, IL 60022, USA; Northwestern University, Plant Biology and Conservation Program, 2145 Sheridan Rd., Evanston, IL 60208, USA
| | - Alex D Twyford
- Institute of Ecology and Evolution, University of Edinburgh, Charlotte Auerbach Rd., Edinburgh EH9 3FL, UK; Royal Botanic Garden Edinburgh, Edinburgh EH3 5NZ, UK
| | - Angela Ricono
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Adrian Powell
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Jing Zhang
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Hongxing Xu
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA; College of Life Sciences, Shaanxi Normal University, South Chang'an Rd., Xi'an 710062, China
| | - Ronald Smith
- Data Science Program, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Harmony J Dalgleish
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA
| | - Georg Jander
- Boyce Thompson Institute, 533 Tower Rd., Ithaca, NY 14853, USA
| | - Anurag A Agrawal
- Department of Ecology and Evolutionary Biology, Cornell University, Corson Hall, Ithaca, NY 14853, USA
| | - Joshua R Puzey
- Biology Department, College of William & Mary, 540 Landrum Dr., Williamsburg, VA 23185, USA.
| |
Collapse
|
39
|
Asadi M, Oloye FF, Xie Y, Cantin J, Challis JK, McPhedran KN, Yusuf W, Champredon D, Xia P, De Lange C, El-Baroudy S, Servos MR, Jones PD, Giesy JP, Brinkmann M. A wastewater-based risk index for SARS-CoV-2 infections among three cities on the Canadian Prairie. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 876:162800. [PMID: 36914129 PMCID: PMC10008033 DOI: 10.1016/j.scitotenv.2023.162800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/06/2023] [Accepted: 03/07/2023] [Indexed: 06/01/2023]
Abstract
Wastewater surveillance (WWS) is useful to better understand the spreading of coronavirus disease 2019 (COVID-19) in communities, which can help design and implement suitable mitigation measures. The main objective of this study was to develop the Wastewater Viral Load Risk Index (WWVLRI) for three Saskatchewan cities to offer a simple metric to interpret WWS. The index was developed by considering relationships between reproduction number, clinical data, daily per capita concentrations of virus particles in wastewater, and weekly viral load change rate. Trends of daily per capita concentrations of SARS-CoV-2 in wastewater for Saskatoon, Prince Albert, and North Battleford were similar during the pandemic, suggesting that per capita viral load can be useful to quantitatively compare wastewater signals among cities and develop an effective and comprehensible WWVLRI. The effective reproduction number (Rt) and the daily per capita efficiency adjusted viral load thresholds of 85 × 106 and 200 × 106 N2 gene counts (gc)/population day (pd) were determined. These values with rates of change were used to categorize the potential for COVID-19 outbreaks and subsequent declines. The weekly average was considered 'low risk' when the per capita viral load was 85 × 106 N2 gc/pd. A 'medium risk' occurs when the per capita copies were between 85 × 106 and 200 × 106 N2 gc/pd. with a rate of change <100 %. The start of an outbreak is indicated by a 'medium-high' risk classification when the week-over-week rate of change was >100 %, and the absolute magnitude of concentrations of viral particles was >85 × 106 N2 gc/pd. Lastly, a 'high risk' occurs when the viral load exceeds 200 × 106 N2 gc/pd. This methodology provides a valuable resource for decision-makers and health authorities, specifically given the limitation of COVID-19 surveillance based on clinical data.
Collapse
Affiliation(s)
- Mohsen Asadi
- Department of Civil, Geological and Environmental Engineering, College of Engineering, University of Saskatchewan, Saskatoon, SK, Canada; Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada.
| | - Femi F Oloye
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada.
| | - Yuwei Xie
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada
| | - Jenna Cantin
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada
| | | | - Kerry N McPhedran
- Department of Civil, Geological and Environmental Engineering, College of Engineering, University of Saskatchewan, Saskatoon, SK, Canada; Global Institute for Water Security, University of Saskatchewan, Saskatoon, SK, Canada
| | - Warsame Yusuf
- Public Health Risk Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - David Champredon
- Public Health Risk Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Pu Xia
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada
| | - Chantel De Lange
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada
| | - Seba El-Baroudy
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada
| | - Mark R Servos
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
| | - Paul D Jones
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada; School of Environment and Sustainability, University of Saskatchewan, Saskatoon, SK, Canada
| | - John P Giesy
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada; Department of Veterinary Biomedical Sciences, University of Saskatchewan, Saskatoon, SK, Canada; Department of Environmental Sciences, Baylor University, Waco, TX, USA; Department of Integrative Biology and Center for Integrative Toxicology, Michigan State University, East Lansing, MI, USA.
| | - Markus Brinkmann
- Toxicology Centre, University of Saskatchewan, Saskatoon, SK, Canada; Global Institute for Water Security, University of Saskatchewan, Saskatoon, SK, Canada; School of Environment and Sustainability, University of Saskatchewan, Saskatoon, SK, Canada.
| |
Collapse
|
40
|
Berliner LM, Herbei R, Wikle CK, Milliff RF. Excursions in the Bayesian treatment of model error. PLoS One 2023; 18:e0286624. [PMID: 37267337 DOI: 10.1371/journal.pone.0286624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 05/20/2023] [Indexed: 06/04/2023] Open
Abstract
Advances in observational and computational assets have led to revolutions in the range and quality of results in many science and engineering settings. However, those advances have led to needs for new research in treating model errors and assessing their impacts. We consider two settings. The first involves physically-based statistical models that are sufficiently manageable to allow incorporation of a stochastic "model error process". In the second case we consider large-scale models in which incorporation of a model error process and updating its distribution is impractical. Our suggestion is to treat dimension-reduced model output as if it is observational data, with a data model that incorporates a bias component to represent the impacts of model error. We believe that our suggestions are valuable quantitative, yet relatively simple, ways to extract useful information from models while including adjustment for model error. These ideas are illustrated and assessed using an application inspired by a classical oceanographic problem.
Collapse
Affiliation(s)
- L Mark Berliner
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
| | - Radu Herbei
- Department of Statistics, The Ohio State University, Columbus, OH, United States of America
| | - Christopher K Wikle
- Department of Statistics, University of Missouri, Columbia, MO, United States of America
| | - Ralph F Milliff
- Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO, United States of America
| |
Collapse
|
41
|
Asher M, Lomax N, Morrissey K, Spooner F, Malleson N. Dynamic calibration with approximate Bayesian computation for a microsimulation of disease spread. Sci Rep 2023; 13:8637. [PMID: 37244962 DOI: 10.1038/s41598-023-35580-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 05/20/2023] [Indexed: 05/29/2023] Open
Abstract
The global COVID-19 pandemic brought considerable public and policy attention to the field of infectious disease modelling. A major hurdle that modellers must overcome, particularly when models are used to develop policy, is quantifying the uncertainty in a model's predictions. By including the most recent available data in a model, the quality of its predictions can be improved and uncertainties reduced. This paper adapts an existing, large-scale, individual-based COVID-19 model to explore the benefits of updating the model in pseudo-real time. We use Approximate Bayesian Computation (ABC) to dynamically recalibrate the model's parameter values as new data emerge. ABC offers advantages over alternative calibration methods by providing information about the uncertainty associated with particular parameter values and the resulting COVID-19 predictions through posterior distributions. Analysing such distributions is crucial in fully understanding a model and its outputs. We find that forecasts of future disease infection rates are improved substantially by incorporating up-to-date observations and that the uncertainty in forecasts drops considerably in later simulation windows (as the model is provided with additional data). This is an important outcome because the uncertainty in model predictions is often overlooked when models are used in policy.
Collapse
Affiliation(s)
- Molly Asher
- School of Earth and Environment, University of Leeds, Leeds, LS2 9JT, UK
| | - Nik Lomax
- School of Geography, University of Leeds, Leeds, LS2 9JT, UK
- British Library, Alan Turing Institute, London, NW1 2DB, UK
| | - Karyn Morrissey
- Department of Management, DTU Technical University of Denmark, Copenhagen, Denmark
| | - Fiona Spooner
- Our World in Data, Global Change Data Lab, Oxford, UK
| | - Nick Malleson
- School of Geography, University of Leeds, Leeds, LS2 9JT, UK.
- British Library, Alan Turing Institute, London, NW1 2DB, UK.
| |
Collapse
|
42
|
Schälte Y, Hasenauer J. Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation. PLoS One 2023; 18:e0285836. [PMID: 37216372 DOI: 10.1371/journal.pone.0285836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 05/02/2023] [Indexed: 05/24/2023] Open
Abstract
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
Collapse
Affiliation(s)
- Yannik Schälte
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Jan Hasenauer
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
43
|
Bon JJ, Bretherton A, Buchhorn K, Cramb S, Drovandi C, Hassan C, Jenner AL, Mayfield HJ, McGree JM, Mengersen K, Price A, Salomone R, Santos-Fernandez E, Vercelloni J, Wang X. Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220156. [PMID: 36970822 PMCID: PMC10041356 DOI: 10.1098/rsta.2022.0156] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]
Abstract
Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated analysis, inference for implicit models, model transfer and purposeful software products. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Joshua J. Bon
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adam Bretherton
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Katie Buchhorn
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Susanna Cramb
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Christopher Drovandi
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Conor Hassan
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Adrianne L. Jenner
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Helen J. Mayfield
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Public Health, The University of Queensland, Saint Lucia, Queensland, Australia
| | - James M. McGree
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kerrie Mengersen
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Aiden Price
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Robert Salomone
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Edgar Santos-Fernandez
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Julie Vercelloni
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Xiaoyu Wang
- Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| |
Collapse
|
44
|
Nie L, Ročková V. Deep bootstrap for Bayesian inference. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220154. [PMID: 36970831 DOI: 10.1098/rsta.2022.0154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
For a Bayesian, the task to define the likelihood can be as perplexing as the task to define the prior. We focus on situations when the parameter of interest has been emancipated from the likelihood and is linked to data directly through a loss function. We survey existing work on both Bayesian parametric inference with Gibbs posteriors and Bayesian non-parametric inference. We then highlight recent bootstrap computational approaches to approximating loss-driven posteriors. In particular, we focus on implicit bootstrap distributions defined through an underlying push-forward mapping. We investigate independent, identically distributed (iid) samplers from approximate posteriors that pass random bootstrap weights through a trained generative network. After training the deep-learning mapping, the simulation cost of such iid samplers is negligible. We compare the performance of these deep bootstrap samplers with exact bootstrap as well as MCMC on several examples (including support vector machines or quantile regression). We also provide theoretical insights into bootstrap posteriors by drawing upon connections to model mis-specification. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Lizhen Nie
- University of Chicago Division of the Physical Sciences, Chicago, IL, USA
| | - Veronika Ročková
- University of Chicago Booth School of Business, Chicago, IL, USA
| |
Collapse
|
45
|
Jackson AC, White OW, Carine M, Chapman MA. The role of geography, ecology, and hybridization in the evolutionary history of Canary Island Descurainia. AMERICAN JOURNAL OF BOTANY 2023; 110:e16162. [PMID: 36990083 DOI: 10.1002/ajb2.16162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/15/2023] [Accepted: 03/15/2023] [Indexed: 05/31/2023]
Abstract
PREMISE Oceanic islands offer the opportunity to understand evolutionary processes underlying rapid diversification. Along with geographic isolation and ecological shifts, a growing body of genomic evidence has suggested that hybridization can play an important role in island evolution. Here we use genotyping-by-sequencing (GBS) to understand the roles of hybridization, ecology, and geographic isolation in the radiation of Canary Island Descurainia (Brassicaceae). METHODS We carried out GBS for multiple individuals of all Canary Island species and two outgroups. Phylogenetic analyses of the GBS data were performed using both supermatrix and gene tree approaches and hybridization events were examined using D-statistics and Approximate Bayesian Computation. Climatic data were analyzed to examine the relationship between ecology and diversification. RESULTS Analysis of the supermatrix data set resulted in a fully resolved phylogeny. Species networks suggest a hybridization event has occurred for D. gilva, with these results being supported by Approximate Bayesian Computation analysis. Strong phylogenetic signals for temperature and precipitation indicate one major ecological shift within Canary Island Descurainia. CONCLUSIONS Inter-island dispersal played a significant role in the diversification of Descurainia, with evidence of only one major shift in climate preferences. Despite weak reproductive barriers and the occurrence of hybrids, hybridization appears to have played only a limited role in the diversification of the group with a single instance detected. The results highlight the need to use phylogenetic network approaches that can simultaneously accommodate incomplete lineage sorting and gene flow when studying groups prone to hybridization; patterns that might otherwise be obscured in species trees.
Collapse
Affiliation(s)
- Amy C Jackson
- Biological Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom
- Algae, Fungi and Plants Division, Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom
| | - Oliver W White
- Biological Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom
- Algae, Fungi and Plants Division, Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom
| | - Mark Carine
- Algae, Fungi and Plants Division, Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom
| | - Mark A Chapman
- Biological Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom
| |
Collapse
|
46
|
Bacci M, Sukys J, Reichert P, Ulzega S, Albert C. A comparison of numerical approaches for statistical inference with stochastic models. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT : RESEARCH JOURNAL 2023; 37:3041-3061. [PMID: 37502198 PMCID: PMC10368571 DOI: 10.1007/s00477-023-02434-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/23/2023] [Indexed: 07/29/2023]
Abstract
Due to our limited knowledge about complex environmental systems, our predictions of their behavior under different scenarios or decision alternatives are subject to considerable uncertainty. As this uncertainty can often be relevant for societal decisions, the consideration, quantification and communication of it is very important. Due to internal stochasticity, often poorly known influence factors, and only partly known mechanisms, in many cases, a stochastic model is needed to get an adequate description of uncertainty. As this implies the need to infer constant parameters, as well as the time-course of stochastic model states, a very high-dimensional inference problem for model calibration has to be solved. This is very challenging from a methodological and a numerical perspective. To illustrate aspects of this problem and show options to successfully tackle it, we compare three numerical approaches: Hamiltonian Monte Carlo, Particle Markov Chain Monte Carlo, and Conditional Ornstein-Uhlenbeck Sampling. As a case study, we select the analysis of hydrological data with a stochastic hydrological model. We conclude that the performance of the investigated techniques is comparable for the analyzed system, and that also generality and practical considerations may be taken into account to guide the choice of which technique is more appropriate for a particular application. Supplementary Information The online version contains supplementary material available at 10.1007/s00477-023-02434-z.
Collapse
Affiliation(s)
- Marco Bacci
- SIAM, Eawag: Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
| | - Jonas Sukys
- SIAM, Eawag: Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
| | - Peter Reichert
- SIAM, Eawag: Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
| | - Simone Ulzega
- Institute of Computational Life Sciences, ZHAW Zurich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Carlo Albert
- SIAM, Eawag: Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
| |
Collapse
|
47
|
Johri P, Pfeifer SP, Jensen JD. Developing an evolutionary baseline model for humans: jointly inferring purifying selection with population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.11.536488. [PMID: 37090533 PMCID: PMC10120674 DOI: 10.1101/2023.04.11.536488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Building evolutionarily appropriate baseline models for natural populations is not only important for answering fundamental questions in population genetics - including quantifying the relative contributions of adaptive vs. non-adaptive processes - but it is also essential for identifying candidate loci experiencing relatively rare and episodic forms of selection ( e.g., positive or balancing selection). Here, a baseline model was developed for a human population of West African ancestry, the Yoruba, comprising processes constantly operating on the genome ( i.e. , purifying and background selection, population size changes, recombination rate heterogeneity, and gene conversion). Specifically, to perform joint inference of selective effects with demography, an approximate Bayesian approach was employed that utilizes the decay of background selection effects around functional elements, taking into account genomic architecture. This approach inferred a recent 6-fold population growth together with a distribution of fitness effects that is skewed towards effectively neutral mutations. Importantly, these results further suggest that, while strong and/or frequent recurrent positive selection is inconsistent with observed data, weak to moderate positive selection is consistent but unidentifiable if rare.
Collapse
|
48
|
Gilabert A, Rieux A, Robert S, Vitalis R, Zapater M, Abadie C, Carlier J, Ravigné V. Revisiting the historical scenario of a disease dissemination using genetic data and Approximate Bayesian Computation methodology: The case of Pseudocercospora fijiensis invasion in Africa. Ecol Evol 2023; 13:e10013. [PMID: 37091563 PMCID: PMC10116021 DOI: 10.1002/ece3.10013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 04/25/2023] Open
Abstract
The reconstruction of geographic and demographic scenarios of dissemination for invasive pathogens of crops is a key step toward improving the management of emerging infectious diseases. Nowadays, the reconstruction of biological invasions typically uses the information of both genetic and historical information to test for different hypotheses of colonization. The Approximate Bayesian Computation framework and its recent Random Forest development (ABC-RF) have been successfully used in evolutionary biology to decipher multiple histories of biological invasions. Yet, for some organisms, typically plant pathogens, historical data may not be reliable notably because of the difficulty to identify the organism and the delay between the introduction and the first mention. We investigated the history of the invasion of Africa by the fungal pathogen of banana Pseudocercospora fijiensis, by testing the historical hypothesis against other plausible hypotheses. We analyzed the genetic structure of eight populations from six eastern and western African countries, using 20 microsatellite markers and tested competing scenarios of population foundation using the ABC-RF methodology. We do find evidence for an invasion front consistent with the historical hypothesis, but also for the existence of another front never mentioned in historical records. We question the historical introduction point of the disease on the continent. Crucially, our results illustrate that even if ABC-RF inferences may sometimes fail to infer a single, well-supported scenario of invasion, they can be helpful in rejecting unlikely scenarios, which can prove much useful to shed light on disease dissemination routes.
Collapse
Affiliation(s)
- A. Gilabert
- Université de la Réunion, UMR PVBMTSaint‐PierreFrance
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
- Present address:
CIRAD, UMR AGAP InstitutMontpellierFrance
- Present address:
UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut AgroMontpellierFrance
| | - A. Rieux
- CIRAD, UMR PVBMTSaint‐PierreFrance
| | - S. Robert
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - R. Vitalis
- CBGPUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - M.‐F. Zapater
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - C. Abadie
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - J. Carlier
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - V. Ravigné
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| |
Collapse
|
49
|
Lewinsohn MA, Bedford T, Müller NF, Feder AF. State-dependent evolutionary models reveal modes of solid tumour growth. Nat Ecol Evol 2023; 7:581-596. [PMID: 36894662 PMCID: PMC10089931 DOI: 10.1038/s41559-023-02000-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 01/26/2023] [Indexed: 03/11/2023]
Abstract
Spatial properties of tumour growth have profound implications for cancer progression, therapeutic resistance and metastasis. Yet, how spatial position governs tumour cell division remains difficult to evaluate in clinical tumours. Here, we demonstrate that faster division on the tumour periphery leaves characteristic genetic patterns, which become evident when a phylogenetic tree is reconstructed from spatially sampled cells. Namely, rapidly dividing peripheral lineages branch more extensively and acquire more mutations than slower-dividing centre lineages. We develop a Bayesian state-dependent evolutionary phylodynamic model (SDevo) that quantifies these patterns to infer the differential division rates between peripheral and central cells. We demonstrate that this approach accurately infers spatially varying birth rates of simulated tumours across a range of growth conditions and sampling strategies. We then show that SDevo outperforms state-of-the-art, non-cancer multi-state phylodynamic methods that ignore differential sequence evolution. Finally, we apply SDevo to single-time-point, multi-region sequencing data from clinical hepatocellular carcinomas and find evidence of a three- to six-times-higher division rate on the tumour edge. With the increasing availability of high-resolution, multi-region sequencing, we anticipate that SDevo will be useful in interrogating spatial growth restrictions and could be extended to model non-spatial factors that influence tumour progression.
Collapse
Affiliation(s)
- Maya A Lewinsohn
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Trevor Bedford
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Nicola F Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Alison F Feder
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
50
|
Barroso GV, Lohmueller KE. Inferring the mode and strength of ongoing selection. Genome Res 2023; 33:632-643. [PMID: 37055196 PMCID: PMC10234300 DOI: 10.1101/gr.276386.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/29/2023] [Indexed: 04/15/2023]
Abstract
Genome sequence data are no longer scarce. The UK Biobank alone comprises 200,000 individual genomes, with more on the way, leading the field of human genetics toward sequencing entire populations. Within the next decades, other model organisms will follow suit, especially domesticated species such as crops and livestock. Having sequences from most individuals in a population will present new challenges for using these data to improve health and agriculture in the pursuit of a sustainable future. Existing population genetic methods are designed to model hundreds of randomly sampled sequences but are not optimized for extracting the information contained in the larger and richer data sets that are beginning to emerge, with thousands of closely related individuals. Here we develop a new method called trio-based inference of dominance and selection (TIDES) that uses data from tens of thousands of family trios to make inferences about natural selection acting in a single generation. TIDES further improves on the state of the art by making no assumptions regarding demography, linkage, or dominance. We discuss how our method paves the way for studying natural selection from new angles.
Collapse
Affiliation(s)
- Gustavo V Barroso
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095-1606, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| |
Collapse
|