1
|
Gangavarapu K, Ji X, Baele G, Fourment M, Lemey P, Matsen FA, Suchard MA. Many-core algorithms for high-dimensional gradients on phylogenetic trees. Bioinformatics 2024; 40:btae030. [PMID: 38243701 PMCID: PMC10868298 DOI: 10.1093/bioinformatics/btae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 12/20/2023] [Accepted: 01/15/2024] [Indexed: 01/21/2024] Open
Abstract
MOTIVATION Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).
Collapse
Affiliation(s)
- Karthik Gangavarapu
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, United States
| | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA, United States
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Frederick A Matsen
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States
- Department of Statistics, University of Washington, Seattle, WA, United States
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States
| |
Collapse
|
2
|
Didier G, Glatt-Holtz NE, Holbrook AJ, Magee AF, Suchard MA. On the surprising effectiveness of a simple matrix exponential derivative approximation, with application to global SARS-CoV-2. Proc Natl Acad Sci U S A 2024; 121:e2318989121. [PMID: 38215186 PMCID: PMC10801879 DOI: 10.1073/pnas.2318989121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 11/30/2023] [Indexed: 01/14/2024] Open
Abstract
The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.
Collapse
Affiliation(s)
- Gustavo Didier
- Department of Mathematics, Tulane University, New Orleans, LA70118
| | | | - Andrew J. Holbrook
- Department of Biostatistics, University of California, Los Angeles, CA90095
| | - Andrew F. Magee
- Department of Biostatistics, University of California, Los Angeles, CA90095
| | - Marc A. Suchard
- Department of Biostatistics, University of California, Los Angeles, CA90095
- Department of Biomathematics, University of California, Los Angeles, CA90095
- Department of Human Genetics, University of California, Los Angeles, CA90095
| |
Collapse
|
3
|
Junsiri W, Islam SI, Thiptara A, Jeenpun A, Sangkhapaitoon P, Thongcham K, Phakphien R, Taweethavonsawat P. First report of Strongylidae nematode from pilot whale ( Globicephala macrorhynchus) by molecular analysis reveals the cosmopolitan distribution of the taxon. Front Vet Sci 2023; 10:1313783. [PMID: 38162478 PMCID: PMC10755461 DOI: 10.3389/fvets.2023.1313783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
This study investigates the identification, genetic composition, and placement in the evolutionary tree of a particular nematode species found in a short-finned pilot whale in the Gulf of Thailand. To accomplish this, we utilized various methods, including microscopic observations, molecular techniques, and comparative analyses to better understand the characteristics of this parasite. Initially, we concentrated on studying the 18s rDNA sequence through nested PCR, resulting in a 774-bp product. After conducting a BLASTn analysis, we discovered that there were only a few sequences in the GeneBank that shared similarities with our nematode, particularly with Cyathostomum catinatum, although the percent identity was relatively low. To confirm the uniqueness of our sequence, we constructed a phylogenetic tree that demonstrated a distinct branch for our nematode, suggesting significant genetic differentiation from C. catinatum. Additionally, we sequenced a 399-bp section of the ITS2 gene using PCR, and the resulting data showed a close association with the Strongylidae family, specifically with Cylicocyclus insigne. This was further confirmed by BLASTn and CD-HIT-est results, which indicated a 99 and ~94% sequence homology with C. insigne, respectively. The ITS2 phylogenetic tree also supported the position of our isolated sequence within the Strongylidae family, clustering closely with C.insigne. Our findings shed light on the genetic connections, taxonomy, and evolutionary trends within the Strongylidae family, with a particular focus on the widespread nature of the Cylicocyclus genus. This study emphasizes the importance of utilizing molecular techniques and interdisciplinary approaches to gain insight into nematode diversity, evolution, and ecological dynamics in marine environments.
Collapse
Affiliation(s)
- Witchuta Junsiri
- Parasitology Unit, Department of Pathology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
| | - Sk Injamamul Islam
- Parasitology Unit, Department of Pathology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
| | - Auyarat Thiptara
- Epidemiology and Information Group, Veterinary Research and Development Center (Upper Southern Region), Nakhon Sri Thammarat, Thailand
| | - Autthaporn Jeenpun
- Epidemiology and Information Group, Veterinary Research and Development Center (Upper Southern Region), Nakhon Sri Thammarat, Thailand
| | - Piyanan Sangkhapaitoon
- Animal Diagnostic Group, Veterinary Research and Development Center (Upper Southern Region), Nakhon Sri Thammarat, Thailand
| | - Khunanont Thongcham
- Marine Endangered Species Unit, Marine and Coastal Resource Research Center, Lower Gulf of Thailand, Department of Marine and Coastal Resources, Thailand
| | - Rattanakorn Phakphien
- Marine Endangered Species Unit, Marine and Coastal Resource Research Center, Lower Gulf of Thailand, Department of Marine and Coastal Resources, Thailand
| | - Piyanan Taweethavonsawat
- Parasitology Unit, Department of Pathology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
- Biomarkers in Animal Parasitology Research Group, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
4
|
Li YQ, Ghafari M, Holbrook AJ, Boonen I, Amor N, Catalano S, Webster JP, Li YY, Li HT, Vergote V, Maes P, Chong YL, Laudisoit A, Baelo P, Ngoy S, Mbalitini SG, Gembu GC, Musaba AP, Goüy de Bellocq J, Leirs H, Verheyen E, Pybus OG, Katzourakis A, Alagaili AN, Gryseels S, Li YC, Suchard MA, Bletsa M, Lemey P. The evolutionary history of hepaciviruses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.30.547218. [PMID: 37425679 PMCID: PMC10327235 DOI: 10.1101/2023.06.30.547218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
In the search for natural reservoirs of hepatitis C virus (HCV), a broad diversity of non-human viruses within the Hepacivirus genus has been uncovered. However, the evolutionary dynamics that shaped the diversity and timescale of hepaciviruses evolution remain elusive. To gain further insights into the origins and evolution of this genus, we screened a large dataset of wild mammal samples (n = 1,672) from Africa and Asia, and generated 34 full-length hepacivirus genomes. Phylogenetic analysis of these data together with publicly available genomes emphasizes the importance of rodents as hepacivirus hosts and we identify 13 rodent species and 3 rodent genera (in Cricetidae and Muridae families) as novel hosts of hepaciviruses. Through co-phylogenetic analyses, we demonstrate that hepacivirus diversity has been affected by cross-species transmission events against the backdrop of detectable signal of virus-host co-divergence in the deep evolutionary history. Using a Bayesian phylogenetic multidimensional scaling approach, we explore the extent to which host relatedness and geographic distances have structured present-day hepacivirus diversity. Our results provide evidence for a substantial structuring of mammalian hepacivirus diversity by host as well as geography, with a somewhat more irregular diffusion process in geographic space. Finally, using a mechanistic model that accounts for substitution saturation, we provide the first formal estimates of the timescale of hepacivirus evolution and estimate the origin of the genus to be about 22 million years ago. Our results offer a comprehensive overview of the micro- and macroevolutionary processes that have shaped hepacivirus diversity and enhance our understanding of the long-term evolution of the Hepacivirus genus.
Collapse
Affiliation(s)
- YQ Li
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
| | - M Ghafari
- Department of Biology, University of Oxford, Oxford, OX1, UK
| | - AJ Holbrook
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - I Boonen
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
| | - N Amor
- Laboratory of Biodiversity, Parasitology, and Ecology of Aquatic Ecosystems, Department of Biology - Faculty of Sciences of Tunis, University of Tunis El Manar, Tunis, 2092, Tunisia
| | - S Catalano
- School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G61 1QH, UK
- Department of Pathobiology and Population Sciences, the Royal Veterinary College, University of London, Herts, AL9 7TA, UK
| | - JP Webster
- Department of Pathobiology and Population Sciences, the Royal Veterinary College, University of London, Herts, AL9 7TA, UK
| | - YY Li
- College of Life Sciences, Linyi University, Linyi, 276000, China
- Marine College, Shandong University (Weihai), Weihai, 264209, China
| | - HT Li
- College of Life Sciences, Liaocheng University, Liaocheng, 252000, China
- Marine College, Shandong University (Weihai), Weihai, 264209, China
| | - V Vergote
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
| | - P Maes
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
| | - YL Chong
- Animal Resource Science and Management Group, Faculty of Resource Science and Technology, Universiti Malaysia Sarawak (UNIMAS), 94300, Malaysia
- Department of Science and Environmental Studies, The Education University of Hong Kong, Hong Kong, 999077, China
| | - A Laudisoit
- EcoHealth Alliance, New York, NY 10018, USA
- Evolutionary Ecology group (EVECO), Department of Biology, University of Antwerp, Antwerp, 2020, Belgium
| | - P Baelo
- Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - S Ngoy
- Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - SG Mbalitini
- Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - GC Gembu
- Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - Akawa P Musaba
- Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - J Goüy de Bellocq
- Institute of Vertebrate Biology, The Czech Academy of Sciences, Květná 8, 603 65 Brno, Czech Republic
| | - H Leirs
- Evolutionary Ecology group (EVECO), Department of Biology, University of Antwerp, Antwerp, 2020, Belgium
| | - E Verheyen
- Evolutionary Ecology group (EVECO), Department of Biology, University of Antwerp, Antwerp, 2020, Belgium
| | - OG Pybus
- Department of Biology, University of Oxford, Oxford, OX1, UK
- Department of Pathobiology and Population Sciences, the Royal Veterinary College, University of London, Herts, AL9 7TA, UK
| | - A Katzourakis
- Department of Biology, University of Oxford, Oxford, OX1, UK
| | - AN Alagaili
- Laboratory of Biodiversity, Parasitology, and Ecology of Aquatic Ecosystems, Department of Biology - Faculty of Sciences of Tunis, University of Tunis El Manar, Tunis, 2092, Tunisia
| | - S Gryseels
- Evolutionary Ecology group (EVECO), Department of Biology, University of Antwerp, Antwerp, 2020, Belgium
| | - YC Li
- Marine College, Shandong University (Weihai), Weihai, 264209, China
| | - MA Suchard
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - M Bletsa
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
- Department of Hygiene Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | - P Lemey
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, KU Leuven, Leuven, 3000, Belgium
| |
Collapse
|
5
|
Holbrook AJ. A quantum parallel Markov chain Monte Carlo. J Comput Graph Stat 2023; 32:1402-1415. [PMID: 38127472 PMCID: PMC10723820 DOI: 10.1080/10618600.2023.2195890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 03/19/2023] [Indexed: 04/03/2023]
Abstract
We propose a novel hybrid quantum computing strategy for parallel MCMC algorithms that generate multiple proposals at each step. This strategy makes the rate-limiting step within parallel MCMC amenable to quantum parallelization by using the Gumbel-max trick to turn the generalized accept-reject step into a discrete optimization problem. When combined with new insights from the parallel MCMC literature, such an approach allows us to embed target density evaluations within a well-known extension of Grover's quantum search algorithm. Letting P d e n o t e t h e n u m b e r o f p r o p o s a l s i n a s i n g l e M C M C i t e r a t i o n , t h e c o m b i n e d s t r a t e g y r e d u c e s t h e n u m b e r o f t a r g e t e v a l u a t i o n s r e q u i r e d f r o m 𝒪 ( P ) t o 𝒪 P 1 / 2 . In the following, we review the rudiments of quantum computing, quantum search and the Gumbel-max trick in order to elucidate their combination for as wide a readership as possible.
Collapse
|
6
|
Hassler GW, Magee A, Zhang Z, Baele G, Lemey P, Ji X, Fourment M, Suchard MA. Data integration in Bayesian phylogenetics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 10:353-377. [PMID: 38774036 PMCID: PMC11108065 DOI: 10.1146/annurev-statistics-033021-112532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.
Collapse
Affiliation(s)
- Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
| | - Andrew Magee
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Zhenyu Zhang
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, USA, 70118
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo NSW, Australia, 2007
| | - Marc A Suchard
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
- Department of Human Genetics, University of California, Los Angeles, USA, 90095
| |
Collapse
|
7
|
Holbrook AJ, Ji X, Suchard MA. From viral evolution to spatial contagion: a biologically modulated Hawkes model. Bioinformatics 2022; 38:1846-1856. [PMID: 35040956 PMCID: PMC8963291 DOI: 10.1093/bioinformatics/btac027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 12/11/2021] [Accepted: 01/12/2022] [Indexed: 02/04/2023] Open
Abstract
SUMMARY Mutations sometimes increase contagiousness for evolving pathogens. During an epidemic, scientists use viral genome data to infer a shared evolutionary history and connect this history to geographic spread. We propose a model that directly relates a pathogen's evolution to its spatial contagion dynamics-effectively combining the two epidemiological paradigms of phylogenetic inference and self-exciting process modeling-and apply this phylogenetic Hawkes process to a Bayesian analysis of 23 421 viral cases from the 2014 to 2016 Ebola outbreak in West Africa. The proposed model is able to detect individual viruses with significantly elevated rates of spatiotemporal propagation for a subset of 1610 samples that provide genome data. Finally, to facilitate model application in big data settings, we develop massively parallel implementations for the gradient and Hessian of the log-likelihood and apply our high-performance computing framework within an adaptively pre-conditioned Hamiltonian Monte Carlo routine. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrew J Holbrook
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, LA 70118, USA
| | - Marc A Suchard
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
- Department of Biomathematics
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
8
|
Nishimura A, Suchard MA. Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in ‘large n & large p’ Bayesian sparse regression. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2057859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | - Marc A. Suchard
- Department of Biomathematics, Biostatistics, and Human Genetics, University of California - Los Angeles
| |
Collapse
|
9
|
Holbrook AJ, Ji X, Suchard MA. BAYESIAN MITIGATION OF SPATIAL COARSENING FOR A HAWKES MODEL APPLIED TO GUNFIRE, WILDFIRE AND VIRAL CONTAGION. Ann Appl Stat 2022; 16:573-595. [PMID: 36211254 PMCID: PMC9536472 DOI: 10.1214/21-aoas1517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Self-exciting spatiotemporal Hawkes processes have found increasing use in the study of large-scale public health threats, ranging from gun violence and earthquakes to wildfires and viral contagion. Whereas many such applications feature locational uncertainty, that is, the exact spatial positions of individual events are unknown, most Hawkes model analyses to date have ignored spatial coarsening present in the data. Three particular 21st century public health crises-urban gun violence, rural wildfires and global viral spread-present qualitatively and quantitatively varying uncertainty regimes that exhibit: (a) different collective magnitudes of spatial coarsening, (b) uniform and mixed magnitude coarsening, (c) differently shaped uncertainty regions and-less orthodox-(d) locational data distributed within the "wrong" effective space. We explicitly model such uncertainties in a Bayesian manner and jointly infer unknown locations together with all parameters of a reasonably flexible Hawkes model, obtaining results that are practically and statistically distinct from those obtained while ignoring spatial coarsening. This work also features two different secondary contributions: first, to facilitate Bayesian inference of locations and background rate parameters, we make a subtle yet crucial change to an established kernel-based rate model, and second, to facilitate the same Bayesian inference at scale, we develop a massively parallel implementation of the model's log-likelihood gradient with respect to locations and thus avoid its quadratic computational cost in the context of Hamiltonian Monte Carlo. Our examples involve thousands of observations and allow us to demonstrate practicality at moderate scales.
Collapse
Affiliation(s)
| | - Xiang Ji
- Department of Mathematics, Tulane University
| | - Marc A. Suchard
- Departments of Biostatistics, Human Genetics and Computational Medicine, UCLA
| |
Collapse
|
10
|
Dellicour S, Gill MS, Faria NR, Rambaut A, Pybus OG, Suchard MA, Lemey P. Relax, Keep Walking - A Practical Guide to Continuous Phylogeographic Inference with BEAST. Mol Biol Evol 2021; 38:3486-3493. [PMID: 33528560 PMCID: PMC8321535 DOI: 10.1093/molbev/msab031] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Spatially explicit phylogeographic analyses can be performed with an inference framework that employs relaxed random walks to reconstruct phylogenetic dispersal histories in continuous space. This core model was first implemented 10 years ago and has opened up new opportunities in the field of phylodynamics, allowing researchers to map and analyze the spatial dissemination of rapidly evolving pathogens. We here provide a detailed and step-by-step guide on how to set up, run, and interpret continuous phylogeographic analyses using the programs BEAUti, BEAST, Tracer, and TreeAnnotator.
Collapse
Affiliation(s)
- Simon Dellicour
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Bruxelles, Belgium
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mandev S Gill
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Nuno R Faria
- MRC Centre for Global Infectious Disease Analysis, J-IDEA, Imperial College London, London, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Instituto de Medicina Tropical, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
11
|
Holbrook AJ, Loeffler CE, Flaxman SR, Suchard MA. Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data. STATISTICS AND COMPUTING 2021; 31:4. [PMID: 34354329 PMCID: PMC8330599 DOI: 10.1007/s11222-020-09980-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 12/02/2020] [Indexed: 06/13/2023]
Abstract
The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes' computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis-Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage widespread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting.
Collapse
Affiliation(s)
- Andrew J. Holbrook
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, USA
| | | | - Seth R. Flaxman
- Department of Mathematics, Imperial College London, London, UK
| | - Marc A. Suchard
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, USA
- Department of Biomathematics, University of California, Los Angeles, Los Angeles, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, USA
| |
Collapse
|