1
|
Csűrös M. Gain-loss-duplication models for copy number evolution on a phylogeny: Exact algorithms for computing the likelihood and its gradient. Theor Popul Biol 2022; 145:80-94. [DOI: 10.1016/j.tpb.2022.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 03/07/2022] [Accepted: 03/10/2022] [Indexed: 10/18/2022]
|
2
|
Crawford FW, Ho LST, Suchard MA. Computational methods for birth-death processes. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2018; 10:e1423. [PMID: 29942419 PMCID: PMC6014701 DOI: 10.1002/wics.1423] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Many important stochastic counting models can be written as general birth-death processes (BDPs). BDPs are continuous-time Markov chains on the non-negative integers in which only jumps to adjacent states are allowed. BDPs can be used to easily parameterize a rich variety of probability distributions on the non-negative integers, and straightforward conditions guarantee that these distributions are proper. BDPs also provide a mechanistic interpretation - birth and death of actual particles or organisms - that has proven useful in evolution, ecology, physics, and chemistry. Although the theoretical properties of general BDPs are well understood, traditionally statistical work on BDPs has been limited to the simple linear (Kendall) process. Aside from a few simple cases, it remains impossible to find analytic expressions for the likelihood of a discretely-observed BDP, and computational difficulties have hindered development of tools for statistical inference. But the gap between BDP theory and practical methods for estimation has narrowed in recent years. There are now robust methods for evaluating likelihoods for realizations of BDPs: finite-time transition, first passage, equilibrium probabilities, and distributions of summary statistics that arise commonly in applications. Recent work has also exploited the connection between continuously- and discretely-observed BDPs to derive EM algorithms for maximum likelihood estimation. Likelihood-based inference for previously intractable BDPs is much easier than previously thought and regression approaches analogous to Poisson regression are straightforward to derive. In this review, we outline the basic mathematical theory for BDPs and demonstrate new tools for statistical inference using data from BDPs.
Collapse
Affiliation(s)
- Forrest W Crawford
- Departments of Biostatistics, Ecology & Evolutionary Biology, and School of Management, Yale University
| | - Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Marc A Suchard
- Departments of Biomathematics, Biostatistics and Human Genetics, University of California, Los Angeles
| |
Collapse
|
3
|
Ho LST, Xu J, Crawford FW, Minin VN, Suchard MA. Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol 2018; 76:911-944. [PMID: 28741177 PMCID: PMC5783825 DOI: 10.1007/s00285-017-1160-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 04/04/2017] [Indexed: 01/20/2023]
Abstract
Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth/birth-death process, a tractable bivariate extension of the birth-death process, where rates are allowed to be nonlinear. We develop an efficient algorithm to calculate its transition probabilities using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution.
Collapse
Affiliation(s)
- Lam Si Tung Ho
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Jason Xu
- Department of Biomathematics, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Vladimir N Minin
- Departments of Statistics and Biology, University of Washington, Seattle, WA, USA
| | - Marc A Suchard
- Departments of Biomathematics, Biostatistics and Human Genetics, University of California, Los Angeles, Los Angeles, WA, USA
| |
Collapse
|
4
|
The Evolution of Strain Typing in the Mycobacterium tuberculosis Complex. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 1019:43-78. [PMID: 29116629 DOI: 10.1007/978-3-319-64371-7_3] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Tuberculosis (TB) is a contagious disease with a complex epidemiology. Therefore, molecular typing (genotyping) of Mycobacterium tuberculosis complex (MTBC) strains is of primary importance to effectively guide outbreak investigations, define transmission dynamics and assist global epidemiological surveillance of the disease. Large-scale genotyping is also needed to get better insights into the biological diversity and the evolution of the pathogen. Thanks to its shorter turnaround and simple numerical nomenclature system, mycobacterial interspersed repetitive unit-variable-number tandem repeat (MIRU-VNTR) typing, based on 24 standardized plus 4 hypervariable loci, optionally combined with spoligotyping, has replaced IS6110 DNA fingerprinting over the last decade as a gold standard among classical strain typing methods for many applications. With the continuous progress and decreasing costs of next-generation sequencing (NGS) technologies, typing based on whole genome sequencing (WGS) is now increasingly performed for near complete exploitation of the available genetic information. However, some important challenges remain such as the lack of standardization of WGS analysis pipelines, the need of databases for sharing WGS data at a global level, and a better understanding of the relevant genomic distances for defining clusters of recent TB transmission in different epidemiological contexts. This chapter provides an overview of the evolution of genotyping methods over the last three decades, which culminated with the development of WGS-based methods. It addresses the relative advantages and limitations of these techniques, indicates current challenges and potential directions for facilitating standardization of WGS-based typing, and provides suggestions on what method to use depending on the specific research question.
Collapse
|
5
|
Xue C, Goldenfeld N. Stochastic Predator-Prey Dynamics of Transposons in the Human Genome. PHYSICAL REVIEW LETTERS 2016; 117:208101. [PMID: 27886494 DOI: 10.1103/physrevlett.117.208101] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Indexed: 06/06/2023]
Abstract
Transposable elements, or transposons, are DNA sequences that can jump from site to site in the genome during the life cycle of a cell, usually encoding the very enzymes which perform their excision. However, some transposons are parasitic, relying on the enzymes produced by the regular transposons. In this case, we show that a stochastic model, which takes into account the small copy numbers of the active transposons in a cell, predicts noise-induced predator-prey oscillations with a characteristic time scale that is much longer than the cell replication time, indicating that the state of the predator-prey oscillator is stored in the genome and transmitted to successive generations. Our work demonstrates the important role of the number fluctuations in the expression of mobile genetic elements, and shows explicitly how ecological concepts can be applied to the dynamics and fluctuations of living genomes.
Collapse
Affiliation(s)
- Chi Xue
- Department of Physics, and Center for the Physics of Living Cells, University of Illinois at Urbana-Champaign, Loomis Laboratory of Physics, 1110 West Green Street, Urbana, Illinois 61801-3080, USA
- Institute for Universal Biology, and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, Illinois 61801, USA
| | - Nigel Goldenfeld
- Department of Physics, and Center for the Physics of Living Cells, University of Illinois at Urbana-Champaign, Loomis Laboratory of Physics, 1110 West Green Street, Urbana, Illinois 61801-3080, USA
- Institute for Universal Biology, and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, Illinois 61801, USA
| |
Collapse
|
6
|
Xu J, Guttorp P, Kato-Maeda M, Minin VN. Likelihood-based inference for discretely observed birth-death-shift processes, with applications to evolution of mobile genetic elements. Biometrics 2015; 71:1009-21. [PMID: 26148963 DOI: 10.1111/biom.12352] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2015] [Revised: 05/01/2015] [Accepted: 05/01/2015] [Indexed: 11/28/2022]
Abstract
Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.
Collapse
Affiliation(s)
- Jason Xu
- Department of Statistics, University of Washington, Seattle, WA, U.S.A
| | - Peter Guttorp
- Department of Statistics, University of Washington, Seattle, WA, U.S.A
| | - Midori Kato-Maeda
- School of Medicine, University of California, San Francisco, CA, U.S.A
| | - Vladimir N Minin
- Department of Statistics, University of Washington, Seattle, WA, U.S.A.,Department of Biology, University of Washington, Seattle, WA, U.S.A
| |
Collapse
|
7
|
Abstract
Birth-death processes (BDPs) are continuous-time Markov chains that track the number of "particles" in a system over time. While widely used in population biology, genetics and ecology, statistical inference of the instantaneous particle birth and death rates remains largely limited to restrictive linear BDPs in which per-particle birth and death rates are constant. Researchers often observe the number of particles at discrete times, necessitating data augmentation procedures such as expectation-maximization (EM) to find maximum likelihood estimates. For BDPs on finite state-spaces, there are powerful matrix methods for computing the conditional expectations needed for the E-step of the EM algorithm. For BDPs on infinite state-spaces, closed-form solutions for the E-step are available for some linear models, but most previous work has resorted to time-consuming simulation. Remarkably, we show that the E-step conditional expectations can be expressed as convolutions of computable transition probabilities for any general BDP with arbitrary rates. This important observation, along with a convenient continued fraction representation of the Laplace transforms of the transition probabilities, allows for novel and efficient computation of the conditional expectations for all BDPs, eliminating the need for truncation of the state-space or costly simulation. We use this insight to derive EM algorithms that yield maximum likelihood estimation for general BDPs characterized by various rate models, including generalized linear models. We show that our Laplace convolution technique outperforms competing methods when they are available and demonstrate a technique to accelerate EM algorithm convergence. We validate our approach using synthetic data and then apply our methods to cancer cell growth and estimation of mutation parameters in microsatellite evolution.
Collapse
Affiliation(s)
- Forrest W Crawford
- Department of Biostatistics, Yale University, 60 College Street, Box 208034, New Haven, CT 06510 USA
| | - Vladimir N Minin
- Department of Statistics, University of Washington, Padelford Hall C-315, Box 354322, Seattle, WA 98195-4322 USA
| | - Marc A Suchard
- Department of Biomathematics, University of California Los Angeles, 6558 Gonda Building, Los Angeles, CA 90095-1766 USA ; Department of Biostatistics, University of California Los Angeles, 6558 Gonda Building, Los Angeles, CA 90095-1766 USA ; Department of Human Genetics, University of California Los Angeles, 6558 Gonda Building, Los Angeles, CA 90095-1766 USA
| |
Collapse
|
8
|
Doss CR, Suchard MA, Holmes I, Kato-Maeda M, Minin VN. Fitting Birth-Death Processes to Panel Data with Applications to Bacterial DNA Fingerprinting. Ann Appl Stat 2013; 7:2315-2335. [PMID: 26702330 DOI: 10.1214/13-aoas673] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Continuous-time linear birth-death-immigration (BDI) processes are frequently used in ecology and epidemiology to model stochastic dynamics of the population of interest. In clinical settings, multiple birth-death processes can describe disease trajectories of individual patients, allowing for estimation of the effects of individual covariates on the birth and death rates of the process. Such estimation is usually accomplished by analyzing patient data collected at unevenly spaced time points, referred to as panel data in the biostatistics literature. Fitting linear BDI processes to panel data is a nontrivial optimization problem because birth and death rates can be functions of many parameters related to the covariates of interest. We propose a novel expectation-maximization (EM) algorithm for fitting linear BDI models with covariates to panel data. We derive a closed-form expression for the joint generating function of some of the BDI process statistics and use this generating function to reduce the E-step of the EM algorithm, as well as calculation of the Fisher information, to one-dimensional integration. This analytical technique yields a computationally efficient and robust optimization algorithm that we implemented in an open-source R package. We apply our method to DNA fingerprinting of Mycobacterium tuberculosis, the causative agent of tuberculosis, to study intrapatient time evolution of IS6110 copy number, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections. Our analysis reveals previously undocumented differences in IS6110 birth-death rates among three major lineages of Mycobacterium tuberculosis, which has important implications for epidemiologists that use IS6110 for DNA fingerprinting of Mycobacterium tuberculosis.
Collapse
|
9
|
Abstract
In this article, I develop a methodology for inferring the transmission rate and reproductive value of an epidemic on the basis of genotype data from a sample of infected hosts. The epidemic is modeled by a birth-death process describing the transmission dynamics in combination with an infinite-allele model describing the evolution of alleles. I provide a recursive formulation for the probability of the allele frequencies in a sample of hosts and a Bayesian framework for estimating transmission rates and reproductive values on the basis of observed allele frequencies. Using the Bayesian method, I reanalyze tuberculosis data from the United States. I estimate a net transmission rate of 0.19/year [0.13, 0.24] and a reproductive value of 1.02 [1.01, 1.04]. I demonstrate that the allele frequency probability under the birth-death model does not follow the well-known Ewens' sampling formula that holds under Kingman's coalescent.
Collapse
|
10
|
Reyes JF, Tanaka MM. Mutation rates of spoligotypes and variable numbers of tandem repeat loci in Mycobacterium tuberculosis. INFECTION GENETICS AND EVOLUTION 2010; 10:1046-51. [DOI: 10.1016/j.meegid.2010.06.016] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2010] [Revised: 06/24/2010] [Accepted: 06/24/2010] [Indexed: 01/14/2023]
|
11
|
Mycobacterium bovis BCG-Russia clinical isolate with noncanonical spoligotyping profile. J Clin Microbiol 2010; 48:4686-7. [PMID: 20881181 DOI: 10.1128/jcm.01368-10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
12
|
Benedetti A, Menzies D, Behr MA, Schwartzman K, Jin Y. How close is close enough? Exploring matching criteria in the estimation of recent transmission of tuberculosis. Am J Epidemiol 2010; 172:318-26. [PMID: 20576754 DOI: 10.1093/aje/kwq124] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
If Mycobacterium tuberculosis isolates from 2 people have the same genotype, transmission may have occurred between them. Genotyping based on the insertion sequence IS6110 uses identical restriction fragment length polymorphisms ("fingerprints") to infer transmission. However, once transmission has occurred, the genotypes may mutate, resulting in divergent fingerprints. Estimation of the proportion of tuberculosis (TB) cases due to recent transmission includes 3 approaches to determine if genotypes match: exact matching (assumes no fingerprint change); band-addition, band-loss, band-shift matching (ad hoc attempt to account for fingerprint changes); and genetic distance (directly accounts for fingerprint changes). Via simulation study, the authors varied the fingerprint change rate, level of recent transmission, and background genetic heterogeneity and estimated sensitivity, specificity, and bias of the recent transmission index by matching method. For exact matching, specificity was always high, but sensitivity decreased as the change rate increased. For band-addition, band-loss, band-shift matching, specificity decreased as genetic diversity decreased, and sensitivity remained high as the change rate increased. Genetic distance offered a compromise between the 2. Results from this study suggest that interpretation of the recent transmission index and the resulting necessary public health interventions will vary according to how researchers account for spontaneous mutation when estimating transmission from genotyping data.
Collapse
Affiliation(s)
- Andrea Benedetti
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Quebec, Canada.
| | | | | | | | | |
Collapse
|
13
|
Variation among genome sequences of H37Rv strains of Mycobacterium tuberculosis from multiple laboratories. J Bacteriol 2010; 192:3645-53. [PMID: 20472797 DOI: 10.1128/jb.00166-10] [Citation(s) in RCA: 176] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The publication of the complete genome sequence for Mycobacterium tuberculosis H37Rv in 1998 has had a great impact on the research community. Nonetheless, it is suspected that genetic differences have arisen in stocks of H37Rv that are maintained in different laboratories. In order to assess the consistency of the genome sequences among H37Rv strains in use and the extent to which they have diverged from the original strain sequenced, we carried out whole-genome sequencing on six strains of H37Rv from different laboratories. Polymorphisms at 73 sites were observed, which were shared among the lab strains, though 72 of these were also shared with H37Ra and are likely to be due to sequencing errors in the original H37Rv reference sequence. An updated H37Rv genome sequence should be valuable to the tuberculosis research community as well as the broader microbial research community. In addition, several polymorphisms unique to individual strains and several shared polymorphisms were identified and shown to be consistent with the known provenance of these strains. Aside from nucleotide substitutions and insertion/deletions, multiple IS6110 transposition events were observed, supporting the theory that they play a significant role in plasticity of the M. tuberculosis genome. This genome-wide catalog of genetic differences can help explain any phenotypic differences that might be found, including a frameshift mutation in the mycocerosic acid synthase gene which causes two of the strains to be deficient in biosynthesis of the surface glycolipid phthiocerol dimycocerosate (PDIM). The resequencing of these six lab strains represents a fortuitous "in vitro evolution" experiment that demonstrates how the M. tuberculosis genome continues to evolve even in a controlled environment.
Collapse
|
14
|
Borrell S, Thorne N, Español M, Mortimer C, Orcau À, Coll P, Gharbia S, González-Martín J, Arnold C. Comparison of four-colour IS6110-fAFLP with the classic IS6110-RFLP on the ability to detect recent transmission in the city of Barcelona, Spain. Tuberculosis (Edinb) 2009; 89:233-7. [DOI: 10.1016/j.tube.2009.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2008] [Revised: 03/16/2009] [Accepted: 03/17/2009] [Indexed: 11/27/2022]
|
15
|
Guernier V, Sola C, Brudey K, Guégan JF, Rastogi N. Use of cluster-graphs from spoligotyping data to study genotype similarities and a comparison of three indices to quantify recent tuberculosis transmission among culture positive cases in French Guiana during a eight year period. BMC Infect Dis 2008; 8:46. [PMID: 18410681 PMCID: PMC2375894 DOI: 10.1186/1471-2334-8-46] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2007] [Accepted: 04/14/2008] [Indexed: 11/30/2022] Open
Abstract
Background French Guiana has the highest tuberculosis (TB) burden among all French departments, with a strong increase in the TB incidence over the last few years. It is now uncertain how best to explain this incidence. The objective of this study was to compare three different methods evaluating the extent of recent TB transmission in French Guiana. Methods We conducted a population-based molecular epidemiology study of tuberculosis in French Guiana based on culture-positive TB strains (1996 to 2003, n = 344) to define molecular relatedness between isolates, i.e. potential transmission events. Phylogenetic relationships were inferred by comparing two methods: a "cluster-graph" method based on spoligotyping results, and a minimum spanning tree method based on both spoligotyping and variable number of tandem DNA repeats (VNTR). Furthermore, three indices attempting to reflect the extent of recent TB transmission (RTIn, RTIn-1 and TMI) were compared. Results Molecular analyses showed a total amount of 120 different spoligotyping patterns and 273 clinical isolates (79.4%) that were grouped in 49 clusters. The comparison of spoligotypes from French Guiana with an international spoligotype database (SpolDB4) showed that the majority of isolates belonged to major clades of M. tuberculosis (Haarlem, 22.6%; Latin American-Mediterranean, 23.3%; and T, 32.6%). Indices designed to quantify transmission of tuberculosis gave the following values: RTIn = 0.794, RTIn-1 = 0.651, and TMI = 0.146. Conclusion Our data showed a high number of Mycobacterium tuberculosis clusters, suggesting a high level of recent TB transmission, nonetheless an estimation of transmission rate taking into account cluster size and mutation rate of genetic markers showed a low ongoing transmission rate (14.6%). Our results indicate an endemic mode of TB transmission in French Guiana, with both resurgence of old spatially restricted genotypes, and a significant importation of new TB genotypes by migration of TB infected persons from neighgouring high-incidence countries.
Collapse
Affiliation(s)
- Vanina Guernier
- UMR 2724 IRD-CNRS, Génétique et Evolution des Maladies Infectieuses, Equipe Dynamique des Systèmes & Maladies Infectieuses, 911 avenue Agropolis, BP 64501, 34394 Montpellier Cedex 05, France.
| | | | | | | | | |
Collapse
|
16
|
Mathema B, Kurepina NE, Bifani PJ, Kreiswirth BN. Molecular epidemiology of tuberculosis: current insights. Clin Microbiol Rev 2006; 19:658-85. [PMID: 17041139 PMCID: PMC1592690 DOI: 10.1128/cmr.00061-05] [Citation(s) in RCA: 236] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Molecular epidemiologic studies of tuberculosis (TB) have focused largely on utilizing molecular techniques to address short- and long-term epidemiologic questions, such as in outbreak investigations and in assessing the global dissemination of strains, respectively. This is done primarily by examining the extent of genetic diversity of clinical strains of Mycobacterium tuberculosis. When molecular methods are used in conjunction with classical epidemiology, their utility for TB control has been realized. For instance, molecular epidemiologic studies have added much-needed accuracy and precision in describing transmission dynamics, and they have facilitated investigation of previously unresolved issues, such as estimates of recent-versus-reactive disease and the extent of exogenous reinfection. In addition, there is mounting evidence to suggest that specific strains of M. tuberculosis belonging to discrete phylogenetic clusters (lineages) may differ in virulence, pathogenesis, and epidemiologic characteristics, all of which may significantly impact TB control and vaccine development strategies. Here, we review the current methods, concepts, and applications of molecular approaches used to better understand the epidemiology of TB.
Collapse
Affiliation(s)
- Barun Mathema
- Tuberculosis Center, Public Health Research Institute, Newark, NJ 07103, USA.
| | | | | | | |
Collapse
|
17
|
Tanaka MM, Phong R, Francis AR. An evaluation of indices for quantifying tuberculosis transmission using genotypes of pathogen isolates. BMC Infect Dis 2006; 6:92. [PMID: 16756684 PMCID: PMC1538606 DOI: 10.1186/1471-2334-6-92] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2005] [Accepted: 06/07/2006] [Indexed: 11/23/2022] Open
Abstract
Background Infectious diseases are often studied by characterising the population structure of the pathogen using genetic markers. An unresolved problem is the effective quantification of the extent of transmission using genetic variation data from such pathogen isolates. Methods It is important that transmission indices reflect the growth of the infectious population as well as account for the mutation rate of the marker and the effects of sampling. That is, while responding to this growth rate, indices should be unresponsive to the sample size and the mutation rate. We use simulation methods taking into account both the mutation and sampling processes to evaluate indices designed to quantify transmission of tuberculosis. Results Previously proposed indices generally perform inadequately according to the above criteria, with the partial exception of the recently proposed Transmission-Mutation Index. Conclusion Any transmission index needs to take into account mutation of the marker and the effects of sampling. Simple indices are unlikely to capture the full complexity of the underlying processes.
Collapse
Affiliation(s)
- Mark M Tanaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Australia
| | - Renault Phong
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Australia
| | - Andrew R Francis
- School of Computing and Mathematics, University of Western Sydney, Australia
| |
Collapse
|
18
|
Tanaka MM, Francis AR, Luciani F, Sisson SA. Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 2006; 173:1511-20. [PMID: 16624908 PMCID: PMC1526704 DOI: 10.1534/genetics.106.055574] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Tuberculosis can be studied at the population level by genotyping strains of Mycobacterium tuberculosis isolated from patients. We use an approximate Bayesian computational method in combination with a stochastic model of tuberculosis transmission and mutation of a molecular marker to estimate the net transmission rate, the doubling time, and the reproductive value of the pathogen. This method is applied to a published data set from San Francisco of tuberculosis genotypes based on the marker IS6110. The mutation rate of this marker has previously been studied, and we use those estimates to form a prior distribution of mutation rates in the inference procedure. The posterior point estimates of the key parameters of interest for these data are as follows: net transmission rate, 0.69/year [95% credibility interval (C.I.) 0.38, 1.08]; doubling time, 1.08 years (95% C.I. 0.64, 1.82); and reproductive value 3.4 (95% C.I. 1.4, 79.7). These figures suggest a rapidly spreading epidemic, consistent with observations of the resurgence of tuberculosis in the United States in the 1980s and 1990s.
Collapse
Affiliation(s)
- Mark M Tanaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia.
| | | | | | | |
Collapse
|
19
|
Cave MD, Yang ZH, Stefanova R, Fomukong N, Ijaz K, Bates J, Eisenach KD. Epidemiologic import of tuberculosis cases whose isolates have similar but not identical IS6110 restriction fragment length polymorphism patterns. J Clin Microbiol 2005; 43:1228-33. [PMID: 15750088 PMCID: PMC1081265 DOI: 10.1128/jcm.43.3.1228-1233.2005] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Isolates of Mycobacterium tuberculosis from patients with epidemiologic links frequently demonstrate identical IS6110 restriction fragment length polymorphism (RFLP) patterns (i.e., RFLP clustering) because they are infected with the same strain. Uncertainty arises with isolates that differ from one another by a few IS6110 hybridizing bands. During the period from 1 January 1996 to 31 December 1999, isolates from 585 tuberculosis (TB) cases were analyzed by RFLP, representing 98.2% of the 596 culture-positive TB cases reported in Arkansas during the study period. Of the 585 cases for which RFLP was available, 419 (71.6%) had an RFLP pattern with more than five copies of IS6110. Of the total 74 clusters, 48 comprised isolates with more than five copies of IS6110 and included 164 cases. Sixty-nine isolates with more than five copies of IS6110 comprising 16 clusters and 60 unique isolates were found to be similar to at least 1 other isolate (differing from it by one or two hybridizing bands). Among the 129 cases whose isolates were similar to other clustered or unique isolates, 16 cases were discovered with epidemiologic links: 14 (15.2%) were among the 92 cases with IS6110 RFLP patterns similar to those in clusters, and 2 (5.2%) were among the 37 unique cases that were similar to another unique case. The isolates from the epidemiologically linked patients shared common spoligotypes; all except one case shared common polymorphic GC-rich sequence (PGRS) patterns. Of the 129 patients whose isolates differed from another by one or two hybridizing IS6110 bands, 101 (78.3%) shared common spoligotypes and 87 (67.4%) shared common PGRS RFLP patterns.
Collapse
Affiliation(s)
- M D Cave
- Department of Neurobiology and Developmental Sciences, Slot 510, University of Arkansas for Medical Sciences, and Central Arkansas Veterans' Healthcare System, 4301 West Markham Street, Little Rock, AR 72205, USA.
| | | | | | | | | | | | | |
Collapse
|
20
|
Tanaka MM. Evidence for positive selection on Mycobacterium tuberculosis within patients. BMC Evol Biol 2004; 4:31. [PMID: 15355550 PMCID: PMC518962 DOI: 10.1186/1471-2148-4-31] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2004] [Accepted: 09/09/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While the pathogenesis and epidemiology of tuberculosis are well studied, relatively little is known about the evolution of the infectious agent Mycobacterium tuberculosis, especially at the within-host level. The insertion sequence IS6110 is a genetic marker that is widely used to track the transmission of tuberculosis between individuals. This and other markers may also facilitate our understanding of the disease within patients. RESULTS This article presents three lines of evidence supporting the action of positive selection on M. tuberculosis within patients. The arguments are based on a comparison between empirical findings from molecular epidemiology, and population genetic models of evolution. Under the hypothesis of neutrality of genotypes, 1) the mutation rate of the marker IS6110 is unusually high, 2) the time it takes for substitutions to occur within patients is too short, and 3) the amount of polymorphism within patients is too low. CONCLUSIONS Empirical observations are explained by the action of positive selection during infection, or alternatively by very low effective population sizes. I discuss the possible roles of antibiotic treatment, the host immune system and extrapulmonary dissemination in creating opportunities for positive selection.
Collapse
Affiliation(s)
- Mark M Tanaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, NSW 2052, Australia.
| |
Collapse
|