1
|
Magee AF, Holbrook AJ, Pekar JE, Caviedes-Solis IW, Matsen Iv FA, Baele G, Wertheim JO, Ji X, Lemey P, Suchard MA. Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations. Syst Biol 2024; 73:562-578. [PMID: 38712512 DOI: 10.1093/sysbio/syae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 02/26/2024] [Accepted: 05/02/2024] [Indexed: 05/08/2024] Open
Abstract
Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.
Collapse
Affiliation(s)
- Andrew F Magee
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California - Los Angeles, Los Angeles, CA, USA
| | - Andrew J Holbrook
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California - Los Angeles, Los Angeles, CA, USA
| | - Jonathan E Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California - San Diego, La Jolla, CA, USA
- Department of Biomedical Informatics, University of California - San Diega, La Jolla, CA, USA
| | | | - Fredrick A Matsen Iv
- Howard Hughes Medical Institute, Seattle, Washington, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Department of Statistics, University of Washington, Seattle, Washington, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Joel O Wertheim
- Department of Medicine, University of California - San Diego, La Jolla, CA, USA
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, LA, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California - Los Angeles, Los Angeles, CA, USA
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California - Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California - Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
2
|
Chisholm LO, Orlandi KN, Phillips SR, Shavlik MJ, Harms MJ. Ancestral Reconstruction and the Evolution of Protein Energy Landscapes. Annu Rev Biophys 2024; 53:127-146. [PMID: 38134334 PMCID: PMC11192866 DOI: 10.1146/annurev-biophys-030722-125440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
A protein's sequence determines its conformational energy landscape. This, in turn, determines the protein's function. Understanding the evolution of new protein functions therefore requires understanding how mutations alter the protein energy landscape. Ancestral sequence reconstruction (ASR) has proven a valuable tool for tackling this problem. In ASR, one phylogenetically infers the sequences of ancient proteins, allowing characterization of their properties. When coupled to biophysical, biochemical, and functional characterization, ASR can reveal how historical mutations altered the energy landscape of ancient proteins, allowing the evolution of enzyme activity, altered conformations, binding specificity, oligomerization, and many other protein features. In this article, we review how ASR studies have been used to dissect the evolution of energy landscapes. We also discuss ASR studies that reveal how energy landscapes have shaped protein evolution. Finally, we propose that thinking about evolution from the perspective of an energy landscape can improve how we approach and interpret ASR studies.
Collapse
Affiliation(s)
- Lauren O Chisholm
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA;
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| | - Kona N Orlandi
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
- Department of Biology, University of Oregon, Eugene, Oregon, USA
| | - Sophia R Phillips
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA;
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| | - Michael J Shavlik
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
- Department of Biology, University of Oregon, Eugene, Oregon, USA
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA;
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA
| |
Collapse
|
3
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
4
|
Narayanan KK, Procko E. Deep Mutational Scanning of Viral Glycoproteins and Their Host Receptors. Front Mol Biosci 2021; 8:636660. [PMID: 33898517 PMCID: PMC8062978 DOI: 10.3389/fmolb.2021.636660] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/18/2021] [Indexed: 11/17/2022] Open
Abstract
Deep mutational scanning or deep mutagenesis is a powerful tool for understanding the sequence diversity available to viruses for adaptation in a laboratory setting. It generally involves tracking an in vitro selection of protein sequence variants with deep sequencing to map mutational effects based on changes in sequence abundance. Coupled with any of a number of selection strategies, deep mutagenesis can explore the mutational diversity available to viral glycoproteins, which mediate critical roles in cell entry and are exposed to the humoral arm of the host immune response. Mutational landscapes of viral glycoproteins for host cell attachment and membrane fusion reveal extensive epistasis and potential escape mutations to neutralizing antibodies or other therapeutics, as well as aiding in the design of optimized immunogens for eliciting broadly protective immunity. While less explored, deep mutational scans of host receptors further assist in understanding virus-host protein interactions. Critical residues on the host receptors for engaging with viral spikes are readily identified and may help with structural modeling. Furthermore, mutations may be found for engineering soluble decoy receptors as neutralizing agents that specifically bind viral targets with tight affinity and limited potential for viral escape. By untangling the complexities of how sequence contributes to viral glycoprotein and host receptor interactions, deep mutational scanning is impacting ideas and strategies at multiple levels for combatting circulating and emergent virus strains.
Collapse
Affiliation(s)
| | - Erik Procko
- Department of Biochemistry and Cancer Center at Illinois, University of Illinois, Urbana, IL, United States
| |
Collapse
|
5
|
Puller V, Sagulenko P, Neher RA. Efficient inference, potential, and limitations of site-specific substitution models. Virus Evol 2020; 6:veaa066. [PMID: 33343922 PMCID: PMC7733610 DOI: 10.1093/ve/veaa066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Natural selection imposes a complex filter on which variants persist in a population resulting in evolutionary patterns that vary greatly along the genome. Some sites evolve close to neutrally, while others are highly conserved, allow only specific states, or only change in concert with other sites. On one hand, such constraints on sequence evolution can be to infer biological function, one the other hand they need to be accounted for in phylogenetic reconstruction. Phylogenetic models often account for this complexity by partitioning sites into a small number of discrete classes with different rates and/or state preferences. Appropriate model complexity is typically determined by model selection procedures. Here, we present an efficient algorithm to estimate more complex models that allow for different preferences at every site and explore the accuracy at which such models can be estimated from simulated data. Our iterative approximate maximum likelihood scheme uses information in the data efficiently and accurately estimates site-specific preferences from large data sets with moderately diverged sequences and known topology. However, the joint estimation of site-specific rates, and site-specific preferences, and phylogenetic branch length can suffer from identifiability problems, while ignoring variation in preferences across sites results in branch length underestimates. Site-specific preferences estimated from large HIV pol alignments show qualitative concordance with intra-host estimates of fitness costs. Analysis of these substitution models suggests near saturation of divergence after a few hundred years. Such saturation can explain the inability to infer deep divergence times of HIV and SIVs using molecular clock approaches and time-dependent rate estimates.
Collapse
Affiliation(s)
- Vadim Puller
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| | - Pavel Sagulenko
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| | - Richard A Neher
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| |
Collapse
|
6
|
Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, Navarro MJ, Bowen JE, Tortorici MA, Walls AC, King NP, Veesler D, Bloom JD. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 2020; 182:1295-1310.e20. [PMID: 32841599 PMCID: PMC7418704 DOI: 10.1016/j.cell.2020.08.012] [Citation(s) in RCA: 1372] [Impact Index Per Article: 343.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/31/2020] [Accepted: 08/06/2020] [Indexed: 02/07/2023]
Abstract
The receptor binding domain (RBD) of the SARS-CoV-2 spike glycoprotein mediates viral attachment to ACE2 receptor and is a major determinant of host range and a dominant target of neutralizing antibodies. Here, we experimentally measure how all amino acid mutations to the RBD affect expression of folded protein and its affinity for ACE2. Most mutations are deleterious for RBD expression and ACE2 binding, and we identify constrained regions on the RBD's surface that may be desirable targets for vaccines and antibody-based therapeutics. But a substantial number of mutations are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses. However, we find no evidence that these ACE2-affinity-enhancing mutations have been selected in current SARS-CoV-2 pandemic isolates. We present an interactive visualization and open analysis pipeline to facilitate use of our dataset for vaccine design and functional annotation of mutations observed during viral surveillance.
Collapse
Affiliation(s)
- Tyler N Starr
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Allison J Greaney
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA
| | - Sarah K Hilton
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Daniel Ellis
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Graduate Program in Molecular and Cellular Biology, University of Washington, Seattle, WA 98195, USA
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA
| | - Adam S Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Mary Jane Navarro
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - John E Bowen
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | - Alexandra C Walls
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Neil P King
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - David Veesler
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA.
| |
Collapse
|
7
|
Starr TN, Greaney AJ, Hilton SK, Crawford KH, Navarro MJ, Bowen JE, Tortorici MA, Walls AC, Veesler D, Bloom JD. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.06.17.157982. [PMID: 32587970 PMCID: PMC7310626 DOI: 10.1101/2020.06.17.157982] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The receptor binding domain (RBD) of the SARS-CoV-2 spike glycoprotein mediates viral attachment to ACE2 receptor, and is a major determinant of host range and a dominant target of neutralizing antibodies. Here we experimentally measure how all amino-acid mutations to the RBD affect expression of folded protein and its affinity for ACE2. Most mutations are deleterious for RBD expression and ACE2 binding, and we identify constrained regions on the RBD's surface that may be desirable targets for vaccines and antibody-based therapeutics. But a substantial number of mutations are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses. However, we find no evidence that these ACE2-affinity enhancing mutations have been selected in current SARS-CoV-2 pandemic isolates. We present an interactive visualization and open analysis pipeline to facilitate use of our dataset for vaccine design and functional annotation of mutations observed during viral surveillance.
Collapse
Affiliation(s)
- Tyler N. Starr
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Co-first authors
| | - Allison J. Greaney
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA
- Co-first authors
| | - Sarah K. Hilton
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Katharine H.D. Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA
| | - Mary Jane Navarro
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - John E. Bowen
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | - Alexandra C. Walls
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - David Veesler
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Jesse D. Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Lead Contact
| |
Collapse
|
8
|
Rasmussen DA, Stadler T. Coupling adaptive molecular evolution to phylodynamics using fitness-dependent birth-death models. eLife 2019; 8:45562. [PMID: 31411558 PMCID: PMC6715349 DOI: 10.7554/elife.45562] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 07/26/2019] [Indexed: 12/25/2022] Open
Abstract
Beneficial and deleterious mutations cause the fitness of lineages to vary across a phylogeny and thereby shape its branching structure. While standard phylogenetic models do not allow mutations to feedback and shape trees, birth-death models can account for this feedback by letting the fitness of lineages depend on their type. To date, these multi-type birth-death models have only been applied to cases where a lineage’s fitness is determined by a single character state. We extend these models to track sequence evolution at multiple sites. This approach remains computationally tractable by tracking the genotype and fitness of lineages probabilistically in an approximate manner. Although approximate, we show that we can accurately estimate the fitness of lineages and site-specific mutational fitness effects from phylogenies. We apply this approach to estimate the population-level fitness effects of mutations in Ebola and influenza virus, and compare our estimates with in vitro fitness measurements for these mutations.
Collapse
Affiliation(s)
- David A Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, United States.,Bioinformatics Research Center, North Carolina State University, Raleigh, United States
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
9
|
Bletsa M, Suchard MA, Ji X, Gryseels S, Vrancken B, Baele G, Worobey M, Lemey P. Divergence dating using mixed effects clock modelling: An application to HIV-1. Virus Evol 2019; 5:vez036. [PMID: 31720009 PMCID: PMC6830409 DOI: 10.1093/ve/vez036] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The need to estimate divergence times in evolutionary histories in the presence of various sources of substitution rate variation has stimulated a rich development of relaxed molecular clock models. Viral evolutionary studies frequently adopt an uncorrelated clock model as a generic relaxed molecular clock process, but this may impose considerable estimation bias if discrete rate variation exists among clades or lineages. For HIV-1 group M, rate variation among subtypes has been shown to result in inconsistencies in time to the most recent common ancestor estimation. Although this calls into question the adequacy of available molecular dating methods, no solution to this problem has been offered so far. Here, we investigate the use of mixed effects molecular clock models, which combine both fixed and random effects in the evolutionary rate, to estimate divergence times. Using simulation, we demonstrate that this model outperforms existing molecular clock models in a Bayesian framework for estimating time-measured phylogenies in the presence of mixed sources of rate variation, while also maintaining good performance in simpler scenarios. By analysing a comprehensive HIV-1 group M complete genome data set we confirm considerable rate variation among subtypes that is not adequately modelled by uncorrelated relaxed clock models. The mixed effects clock model can accommodate this rate variation and results in a time to the most recent common ancestor of HIV-1 group M of 1920 (1915-25), which is only slightly earlier than the uncorrelated relaxed clock estimate for the same data set. The use of complete genome data appears to have a more profound impact than the molecular clock model because it reduces the credible intervals by 50 per cent relative to similar estimates based on short envelope gene sequences.
Collapse
Affiliation(s)
- Magda Bletsa
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, CA, USA
| | - Xiang Ji
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | - Sophie Gryseels
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, Leuven, Belgium
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Bram Vrancken
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, Leuven, Belgium
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, Leuven, Belgium
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, Leuven, Belgium
| |
Collapse
|
10
|
Chen W, Kenney T, Bielawski J, Gu H. Testing adequacy for DNA substitution models. BMC Bioinformatics 2019; 20:349. [PMID: 31221105 PMCID: PMC6585133 DOI: 10.1186/s12859-019-2905-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 05/17/2019] [Indexed: 12/22/2022] Open
Abstract
Background Testing model adequacy is important before a DNA substitution model is chosen for phylogenetic inference. Using a mis-specified model can negatively impact phylogenetic inference, for example, the maximum likelihood method can be inconsistent when the DNA sequences are generated under a tree topology which is in the Felsentein Zone and analyzed with a mis-specified or inadequate model. However, model adequacy testing in phylogenetics is underdeveloped. Results Here we develop a simple, general, powerful and robust model test based on Pearson’s goodness-of-fit test and binning of site patterns. We demonstrate through simulation that this test is robust in its high power to reject the inadequate models for a large range of different ways of binning site patterns while the Type I error is controlled well. In the real data analysis we discovered many cases where models chosen by another method can be rejected by this new test, in particular, our proposed test rejects the most complex DNA model (GTR+I+ Γ) while the Goldman-Cox test fails to reject the commonly used simple models. Conclusions Model adequacy testing and bootstrap should be used together to assess reliability of conclusions after model selection and model fitting have already been applied to choose the model and fit it. The new goodness-of-fit test proposed in this paper is a simple and powerful model adequacy testing method serving such a regular model checking purpose. We caution against deriving strong conclusions from analyses based on inadequate models. At a minimum, those results derived from inadequate models can now be readly flagged using the new test, and reported as such.
Collapse
Affiliation(s)
- Wei Chen
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Joseph Bielawski
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada.,Department of Biology, Dalhousie University, Halifax, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada.
| |
Collapse
|
11
|
Lakdawala SS, Lee N, Brooke CB. Teaching an Old Virus New Tricks: A Review on New Approaches to Study Age-Old Questions in Influenza Biology. J Mol Biol 2019; 431:4247-4258. [PMID: 31051174 DOI: 10.1016/j.jmb.2019.04.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 04/12/2019] [Accepted: 04/23/2019] [Indexed: 01/31/2023]
Abstract
Influenza viruses have been studied for over 80 years, yet much about the basic viral lifecycle remain unknown. However, new imaging, biochemical, and sequencing techniques have revealed significant insight into many age-old questions of influenza virus biology. In this review, we will cover the role of imaging techniques to describe unique aspects of influenza virus assembly, biochemical techniques to study viral genomic organization, and next-generation sequencing to explore influenza genomic evolution. Our goal is to provide a brief overview of how emerging techniques are being used to answer basic questions about influenza viruses. This is not a comprehensive list of emerging techniques, rather ones that we feel will continue to make significant contributions to field of influenza biology.
Collapse
Affiliation(s)
- Seema S Lakdawala
- Department of Microbiology and Molecular Genetics, University of Pittsburgh, School of Medicine Pittsburgh, PA 15219, USA.
| | - Nara Lee
- Department of Microbiology and Molecular Genetics, University of Pittsburgh, School of Medicine Pittsburgh, PA 15219, USA.
| | - Christopher B Brooke
- Department of Microbiology, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|