1
|
Batisti Biffignandi G, Chindelevitch L, Corbella M, Feil EJ, Sassera D, Lees JA. Optimising machine learning prediction of minimum inhibitory concentrations in Klebsiella pneumoniae. Microb Genom 2024; 10:001222. [PMID: 38529944 PMCID: PMC10995625 DOI: 10.1099/mgen.0.001222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 03/07/2024] [Indexed: 03/27/2024] Open
Abstract
Minimum Inhibitory Concentrations (MICs) are the gold standard for quantitatively measuring antibiotic resistance. However, lab-based MIC determination can be time-consuming and suffers from low reproducibility, and interpretation as sensitive or resistant relies on guidelines which change over time. Genome sequencing and machine learning promise to allow in silico MIC prediction as an alternative approach which overcomes some of these difficulties, albeit the interpretation of MIC is still needed. Nevertheless, precisely how we should handle MIC data when dealing with predictive models remains unclear, since they are measured semi-quantitatively, with varying resolution, and are typically also left- and right-censored within varying ranges. We therefore investigated genome-based prediction of MICs in the pathogen Klebsiella pneumoniae using 4367 genomes with both simulated semi-quantitative traits and real MICs. As we were focused on clinical interpretation, we used interpretable rather than black-box machine learning models, namely, Elastic Net, Random Forests, and linear mixed models. Simulated traits were generated accounting for oligogenic, polygenic, and homoplastic genetic effects with different levels of heritability. Then we assessed how model prediction accuracy was affected when MICs were framed as regression and classification. Our results showed that treating the MICs differently depending on the number of concentration levels of antibiotic available was the most promising learning strategy. Specifically, to optimise both prediction accuracy and inference of the correct causal variants, we recommend considering the MICs as continuous and framing the learning problem as a regression when the number of observed antibiotic concentration levels is large, whereas with a smaller number of concentration levels they should be treated as a categorical variable and the learning problem should be framed as a classification. Our findings also underline how predictive models can be improved when prior biological knowledge is taken into account, due to the varying genetic architecture of each antibiotic resistance trait. Finally, we emphasise that incrementing the population database is pivotal for the future clinical implementation of these models to support routine machine-learning based diagnostics.
Collapse
Affiliation(s)
- Gherard Batisti Biffignandi
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, UK
| | - Marta Corbella
- Microbiology and Virology Unit, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Edward J. Feil
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
| | - Davide Sassera
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
- Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - John A. Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
2
|
Derelle R, Lees J, Phelan J, Lalvani A, Arinaminpathy N, Chindelevitch L. fastlin: an ultra-fast program for Mycobacterium tuberculosis complex lineage typing. Bioinformatics 2023; 39:btad648. [PMID: 37871178 PMCID: PMC10627351 DOI: 10.1093/bioinformatics/btad648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/25/2023] Open
Abstract
SUMMARY Fastlin is a bioinformatics tool designed for rapid Mycobacterium tuberculosis complex (MTBC) lineage typing. It utilizes an ultra-fast alignment-free approach to detect previously identified barcode single nucleotide polymorphisms associated with specific MTBC lineages. In a comprehensive benchmarking against existing tools, fastlin demonstrated high accuracy and significantly faster running times. AVAILABILITY AND IMPLEMENTATION fastlin is freely available at https://github.com/rderelle/fastlin and can easily be installed using Conda.
Collapse
Affiliation(s)
- Romain Derelle
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, United Kingdom
| | - John Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London , London W12 0BZ, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus , Hinxton CB10 1SD, United Kingdom
| | - Jody Phelan
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Ajit Lalvani
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, United Kingdom
| | - Nimalan Arinaminpathy
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, United Kingdom
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London , London W12 0BZ, United Kingdom
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London , London W12 0BZ, United Kingdom
| |
Collapse
|
3
|
Chindelevitch L, van Dongen M, Graz H, Pedrotta A, Suresh A, Uplekar S, Jauneikaite E, Wheeler N. Ten simple rules for the sharing of bacterial genotype-Phenotype data on antimicrobial resistance. PLoS Comput Biol 2023; 19:e1011129. [PMID: 37347768 PMCID: PMC10286994 DOI: 10.1371/journal.pcbi.1011129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023] Open
Abstract
The increasing availability of high-throughput sequencing (frequently termed next-generation sequencing (NGS)) data has created opportunities to gain deeper insights into the mechanisms of a number of diseases and is already impacting many areas of medicine and public health. The area of infectious diseases stands somewhat apart from other human diseases insofar as the relevant genomic data comes from the microbes rather than their human hosts. A particular concern about the threat of antimicrobial resistance (AMR) has driven the collection and reporting of large-scale datasets containing information from microbial genomes together with antimicrobial susceptibility test (AST) results. Unfortunately, the lack of clear standards or guiding principles for the reporting of such data is hampering the field's advancement. We therefore present our recommendations for the publication and sharing of genotype and phenotype data on AMR, in the form of 10 simple rules. The adoption of these recommendations will enhance AMR data interoperability and help enable its large-scale analyses using computational biology tools, including mathematical modelling and machine learning. We hope that these rules can shed light on often overlooked but nonetheless very necessary aspects of AMR data sharing and enhance the field's ability to address the problems of understanding AMR mechanisms, tracking their emergence and spread in populations, and predicting microbial susceptibility to antimicrobials for diagnostic purposes.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, United Kingdom
| | | | | | | | - Anita Suresh
- FIND, the global alliance for diagnostics, Geneva, Switzerland
| | - Swapna Uplekar
- FIND, the global alliance for diagnostics, Geneva, Switzerland
| | - Elita Jauneikaite
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, United Kingdom
- NIHR HPRU in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College, London, England, United Kingdom
| | - Nicole Wheeler
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, England, United Kingdom
| |
Collapse
|
4
|
Sedaghat N, Stephen T, Chindelevitch L. Speeding Up the Structural Analysis of Metabolic Network Models Using the Fredman-Khachiyan Algorithm B. J Comput Biol 2023; 30:678-694. [PMID: 37327036 DOI: 10.1089/cmb.2022.0319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023] Open
Abstract
The problem of computing the Elementary Flux Modes (EFMs) and Minimal Cut Sets (MCSs) of metabolic network is a fundamental one in metabolic networks. A key insight is that they can be understood as a dual pair of monotone Boolean functions (MBFs). Using this insight, this computation reduces to the question of generating from an oracle a dual pair of MBFs. If one of the two sets (functions) is known, then the other can be computed through a process known as dualization. Fredman and Khachiyan provided two algorithms, which they called simply A and B that can serve as an engine for oracle-based generation or dualization of MBFs. We look at efficiencies available in implementing their algorithm B, which we will refer to as FK-B. Like their algorithm A, FK-B certifies whether two given MBFs in the form of Conjunctive Normal Form and Disjunctive Normal Form are dual or not, and in case of not being dual it returns a conflicting assignment (CA), that is, an assignment that makes one of the given Boolean functions True and the other one False. The FK-B algorithm is a recursive algorithm that searches through the tree of assignments to find a CA. If it does not find any CA, it means that the given Boolean functions are dual. In this article, we propose six techniques applicable to the FK-B and hence to the dualization process. Although these techniques do not reduce the time complexity, they considerably reduce the running time in practice. We evaluate the proposed improvements by applying them to compute the MCSs from the EFMs in the 19 small- and medium-sized models from the BioModels database along with 4 models of biomass synthesis in Escherichia coli that were used in an earlier computational survey Haus et al. (2008).
Collapse
Affiliation(s)
- Nafiseh Sedaghat
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Tamon Stephen
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | - Leonid Chindelevitch
- MRC Center for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, United Kingdom
| |
Collapse
|
5
|
Zanetti JPP, Oliveira LP, Meidanis J, Chindelevitch L. Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance. IEEE/ACM Trans Comput Biol Bioinform 2023; PP:1-15. [PMID: 37200133 DOI: 10.1109/tcbb.2023.3277733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
An important problem in genome comparison is the genome sorting problem, that is, the problem of finding a sequence of basic operations that transforms one genome into another whose length (possibly weighted) equals the distance between them. These sequences are called optimal sorting scenarios. However, there is usually a large number of such scenarios, and a naïve algorithm is very likely to be biased towards a specific type of scenario, impairing its usefulness in real-world applications. One way to go beyond the traditional sorting algorithms is to explore all possible solutions, looking at all the optimal sorting scenarios instead of just an arbitrary one. Another related approach is to analyze all the intermediate genomes, that is, all the genomes that can occur in an optimal sorting scenario. In this paper, we show how to enumerate the optimal sorting scenarios and the intermediate genomes between any two given genomes, under the rank distance.
Collapse
|
6
|
Pereira Zanetti JP, Peres Oliveira L, Chindelevitch L, Meidanis J. Generalizations of the genomic rank distance to indels. Bioinformatics 2023; 39:7039678. [PMID: 36790056 PMCID: PMC9985151 DOI: 10.1093/bioinformatics/btad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 12/25/2022] [Accepted: 02/13/2023] [Indexed: 02/16/2023] Open
Abstract
MOTIVATION The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, UK
| | - João Meidanis
- Institute of Computing, University of Campinas, Campinas, Brazil
| |
Collapse
|
7
|
Xue ZP, Chindelevitch L, Guichard F. Supply-driven evolution: Mutation bias and trait-fitness distributions can drive macro-evolutionary dynamics. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2022.1048752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Many well-documented macro-evolutionary phenomena still challenge current evolutionary theory. Examples include long-term evolutionary trends, major transitions in evolution, conservation of certain biological features such as hox genes, and the episodic creation of new taxa. Here, we present a framework that may explain these phenomena. We do so by introducing a probabilistic relationship between trait value and reproductive fitness. This integration allows mutation bias to become a robust driver of long-term evolutionary trends against environmental bias, in a way that is consistent with all current evolutionary theories. In cases where mutation bias is strong, such as when detrimental mutations are more common than beneficial mutations, a regime called “supply-driven” evolution can arise. This regime can explain the irreversible persistence of higher structural hierarchies, which happens in the major transitions in evolution. We further generalize this result in the long-term dynamics of phenotype spaces. We show how mutations that open new phenotype spaces can become frozen in time. At the same time, new possibilities may be observed as a burst in the creation of new taxa.
Collapse
|
8
|
Abstract
The field of genomic epidemiology is rapidly growing as many jurisdictions begin to deploy whole-genome sequencing (WGS) in their national or regional pathogen surveillance programmes. WGS data offer a rich view of the shared ancestry of a set of taxa, typically visualized with phylogenetic trees illustrating the clusters or subtypes present in a group of taxa, their relatedness and the extent of diversification within and between them. When methicillin-resistant Staphylococcus aureus (MRSA) arose and disseminated widely, phylogenetic trees of MRSA-containing types of S. aureus had a distinctive ‘comet’ shape, with a ‘comet head’ of recently adapted drug-resistant isolates in the context of a ‘comet tail’ that was predominantly drug-sensitive. Placing an S. aureus isolate in the context of such a ‘comet’ helped public health laboratories interpret local data within the broader setting of S. aureus evolution. In this work, we ask what other tree shapes, analogous to the MRSA comet, are present in bacterial WGS datasets. We extract trees from large bacterial genomic datasets, visualize them as images and cluster the images. We find nine major groups of tree images, including the ‘comets’, star-like phylogenies, ‘barbell’ phylogenies and other shapes, and comment on the evolutionary and epidemiological stories these shapes might illustrate. This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.
Collapse
Affiliation(s)
- Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6
| | - Leonid Chindelevitch
- Department of Infectious Disease Epidemiology, Imperial College, Praed Street, London W2 1NY, UK
| | - David Aanensen
- Big Data Institute, University of Oxford, Old Road Campus, Oxford OX3 7LF, UK
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6
| |
Collapse
|
9
|
Hemez C, Clarelli F, Palmer AC, Bleis C, Abel S, Chindelevitch L, Cohen T, Abel zur Wiesch P. Mechanisms of antibiotic action shape the fitness landscapes of resistance mutations. Comput Struct Biotechnol J 2022; 20:4688-4703. [PMID: 36147681 PMCID: PMC9463365 DOI: 10.1016/j.csbj.2022.08.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 08/12/2022] [Accepted: 08/12/2022] [Indexed: 11/15/2022] Open
Abstract
Antibiotic-resistant pathogens are a major public health threat. A deeper understanding of how an antibiotic's mechanism of action influences the emergence of resistance would aid in the design of new drugs and help to preserve the effectiveness of existing ones. To this end, we developed a model that links bacterial population dynamics with antibiotic-target binding kinetics. Our approach allows us to derive mechanistic insights on drug activity from population-scale experimental data and to quantify the interplay between drug mechanism and resistance selection. We find that both bacteriostatic and bactericidal agents can be equally effective at suppressing the selection of resistant mutants, but that key determinants of resistance selection are the relationships between the number of drug-inactivated targets within a cell and the rates of cellular growth and death. We also show that heterogeneous drug-target binding within a population enables resistant bacteria to evolve fitness-improving secondary mutations even when drug doses remain above the resistant strain's minimum inhibitory concentration. Our work suggests that antibiotic doses beyond this "secondary mutation selection window" could safeguard against the emergence of high-fitness resistant strains during treatment.
Collapse
Affiliation(s)
- Colin Hemez
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Graduate Program in Biophysics, Harvard University, Boston, MA 02115, USA
- Corresponding authors at: Broad Institute, 75 Ames St, Room 3035, Cambridge, MA 02412, USA (C. Hemez). Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway (P. Abel zur Wiesch).
| | - Fabrizio Clarelli
- Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Adam C. Palmer
- Department of Pharmacology, Computational Medicine Program, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christina Bleis
- Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Sören Abel
- Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Division of Infection Control, Norwegian Institute of Public Health, Oslo 0318, Norway
| | - Leonid Chindelevitch
- Department of Infectious Disease Epidemiology, Imperial College, London SW7 2AZ, UK
| | - Theodore Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT 06520, USA
| | - Pia Abel zur Wiesch
- Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
- Division of Infection Control, Norwegian Institute of Public Health, Oslo 0318, Norway
- Corresponding authors at: Broad Institute, 75 Ames St, Room 3035, Cambridge, MA 02412, USA (C. Hemez). Department of Pharmacy, UiT – The Arctic University of Norway, 9019 Tromsø, Norway (P. Abel zur Wiesch).
| |
Collapse
|
10
|
Walker TM, Miotto P, Köser CU, Fowler PW, Knaggs J, Iqbal Z, Hunt M, Chindelevitch L, Farhat MR, Cirillo DM, Comas I, Posey J, Omar SV, Peto TEA, Suresh A, Uplekar S, Laurent S, Colman RE, Nathanson CM, Zignol M, Walker AS, Crook DW, Ismail N, Rodwell TC. The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: A genotypic analysis. Lancet Microbe 2022; 3:e265-e273. [PMID: 35373160 PMCID: PMC7612554 DOI: 10.1016/s2666-5247(21)00301-3] [Citation(s) in RCA: 80] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Background Molecular diagnostics are considered the most promising route to achieving rapid, universal drug susceptibility testing for Mycobacterium tuberculosiscomplex (MTBC). We aimed to generate a WHO endorsed catalogue of mutations to serve as a global standard for interpreting molecular information for drug resistance prediction. Methods A candidate gene approach was used to identify mutations as associated with resistance, or consistent with susceptibility, for 13 WHO endorsed anti-tuberculosis drugs. 38,215 MTBC isolates with paired whole-genome sequencing and phenotypic drug susceptibility testing data were amassed from 45 countries. For each mutation, a contingency table of binary phenotypes and presence or absence of the mutation computed positive predictive value, and Fisher's exact tests generated odds ratios and Benjamini-Hochberg corrected p-values. Mutations were graded as Associated with Resistance if present in at least 5 isolates, if the odds ratio was >1 with a statistically significant corrected p-value, and if the lower bound of the 95% confidence interval on the positive predictive value for phenotypic resistance was >25%. A series of expert rules were applied for final confidence grading of each mutation. Findings 15,667 associations were computed for 13,211 unique mutations linked to one or more drugs. 1,149/15,667 (7·3%) mutations were classified as associated with phenotypic resistance and 107/15,667 (0·7%) were deemed consistent with susceptibility. For rifampicin, isoniazid, ethambutol, fluoroquinolones, and streptomycin, the mutations' pooled sensitivity was >80%. Specificity was over 95% for all drugs except ethionamide (91·4%), moxifloxacin (91·6%) and ethambutol (93·3%). Only two resistance mutations were classified for bedaquiline, delamanid, clofazimine, and linezolid as prevalence of phenotypic resistance was low for these drugs. Interpretation This first WHO endorsed catalogue of molecular targets for MTBC drug susceptibility testing provides a global standard for resistance interpretation. Its existence should encourage the implementation of molecular diagnostics by National Tuberculosis Programmes. Funding UNITAID, Wellcome, MRC, BMGF.
Collapse
Affiliation(s)
- Timothy M Walker
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Paolo Miotto
- IRCCS San Raffaele Scientific Institute, Milano, Italy
| | - Claudio U Köser
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Philip W Fowler
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Jeff Knaggs
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- European Bioinformatics Institute, Hinxton, UK
| | - Zamin Iqbal
- European Bioinformatics Institute, Hinxton, UK
| | - Martin Hunt
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- European Bioinformatics Institute, Hinxton, UK
| | | | | | | | - Iñaki Comas
- Biomedicine Institute of Valencia IBV-CSIC, Valencia, Spain
- CIBER Epidemiology and Public Health, Madrid, Spain
| | - James Posey
- Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Shaheed V Omar
- National Institute for Communicable Diseases, Johannesburg, South Africa
| | - Timothy EA Peto
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institutes for Health Research Oxford Biomedical Research Centre, Oxford, UK
| | | | | | | | | | | | - Matteo Zignol
- Global Tuberculosis Programme, WHO, Geneva, Switzerland
| | - Ann Sarah Walker
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institutes for Health Research Oxford Biomedical Research Centre, Oxford, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institutes for Health Research Oxford Biomedical Research Centre, Oxford, UK
| | - Nazir Ismail
- Global Tuberculosis Programme, WHO, Geneva, Switzerland
| | - Timothy C Rodwell
- FIND, Geneva, Switzerland
- Division of Pulmonary, Critical Care and Sleep Medicine, University of California, San Diego, CA, USA
| |
Collapse
|
11
|
Mansouri M, Khakabimamaghani S, Chindelevitch L, Ester M. Aristotle: stratified causal discovery for omics data. BMC Bioinformatics 2022; 23:42. [PMID: 35033007 PMCID: PMC8760642 DOI: 10.1186/s12859-021-04521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 12/08/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. METHODS To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. RESULTS Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle's predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.
Collapse
Affiliation(s)
- Mehrdad Mansouri
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Sahand Khakabimamaghani
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Leonid Chindelevitch
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Martin Ester
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| |
Collapse
|
12
|
Abstract
The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
- * E-mail:
| | - Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Art F. Y. Poon
- Department of Pathology & Laboratory Medicine, University of Western Ontario, London, ON, Canada
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
13
|
Sharma M, Mindermann S, Rogers-Smith C, Leech G, Snodin B, Ahuja J, Sandbrink JB, Monrad JT, Altman G, Dhaliwal G, Finnveden L, Norman AJ, Oehm SB, Sandkühler JF, Aitchison L, Gavenčiak T, Mellan T, Kulveit J, Chindelevitch L, Flaxman S, Gal Y, Mishra S, Bhatt S, Brauner JM. Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe. Nat Commun 2021; 12:5820. [PMID: 34611158 PMCID: PMC8492703 DOI: 10.1038/s41467-021-26013-4] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 08/23/2021] [Indexed: 12/24/2022] Open
Abstract
European governments use non-pharmaceutical interventions (NPIs) to control resurging waves of COVID-19. However, they only have outdated estimates for how effective individual NPIs were in the first wave. We estimate the effectiveness of 17 NPIs in Europe's second wave from subnational case and death data by introducing a flexible hierarchical Bayesian transmission model and collecting the largest dataset of NPI implementation dates across Europe. Business closures, educational institution closures, and gathering bans reduced transmission, but reduced it less than they did in the first wave. This difference is likely due to organisational safety measures and individual protective behaviours-such as distancing-which made various areas of public life safer and thereby reduced the effect of closing them. Specifically, we find smaller effects for closing educational institutions, suggesting that stringent safety measures made schools safer compared to the first wave. Second-wave estimates outperform previous estimates at predicting transmission in Europe's third wave.
Collapse
Affiliation(s)
- Mrinank Sharma
- Department of Statistics, University of Oxford, Oxford, UK.
- Department of Engineering Science, University of Oxford, Oxford, UK.
- Future of Humanity Institute, University of Oxford, Oxford, UK.
| | - Sören Mindermann
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
| | - Charlie Rogers-Smith
- OATML Group (work done while at OATML as an external collaborator), Department of Computer Science, University of Oxford, Oxford, UK
| | - Gavin Leech
- Department of Computer Science, University of Bristol, Bristol, UK
| | - Benedict Snodin
- Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Janvi Ahuja
- Future of Humanity Institute, University of Oxford, Oxford, UK
- Medical Sciences Division, University of Oxford, Oxford, UK
| | - Jonas B Sandbrink
- Future of Humanity Institute, University of Oxford, Oxford, UK
- Medical Sciences Division, University of Oxford, Oxford, UK
| | - Joshua Teperowski Monrad
- Future of Humanity Institute, University of Oxford, Oxford, UK
- Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, UK
- Department of Health Policy, London School of Economics and Political Science, London, UK
| | - George Altman
- Manchester University NHS Foundation Trust, Manchester, UK
| | - Gurpreet Dhaliwal
- The Francis Crick Institute, London, UK
- School of Life Sciences, University of Warwick, Coventry, UK
| | - Lukas Finnveden
- Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Alexander John Norman
- Mathematical, Physical and Life Sciences (MPLS) Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Sebastian B Oehm
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
- University of Cambridge, Cambridge, UK
| | | | | | | | - Thomas Mellan
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
| | - Jan Kulveit
- Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Leonid Chindelevitch
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
| | - Seth Flaxman
- Department of Mathematics, Imperial College London, London, UK
| | - Yarin Gal
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK
| | - Swapnil Mishra
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK.
- Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), School of Public Health, Imperial College London, London, UK.
| | - Samir Bhatt
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK.
- Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), School of Public Health, Imperial College London, London, UK.
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
| | - Jan Markus Brauner
- Future of Humanity Institute, University of Oxford, Oxford, UK.
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
| |
Collapse
|
14
|
Zabeti H, Dexter N, Safari AH, Sedaghat N, Libbrecht M, Chindelevitch L. INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis. Algorithms Mol Biol 2021; 16:17. [PMID: 34376217 PMCID: PMC8353837 DOI: 10.1186/s13015-021-00198-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 07/23/2021] [Indexed: 12/13/2022] Open
Abstract
Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.
Collapse
|
15
|
Gabbassov E, Moreno-Molina M, Comas I, Libbrecht M, Chindelevitch L. SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data. Microb Genom 2021; 7. [PMID: 34165419 PMCID: PMC8461467 DOI: 10.1099/mgen.0.000607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The occurrence of multiple strains of a bacterial pathogen such as M. tuberculosis or C. difficile within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges. Grounded in a rigorous statistical model, SplitStrains not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real M. tuberculosis data, but also successfully determines the identity of the underlying strains. We conclude that SplitStrains is a powerful addition to the existing toolkit of analytical methods for data coming from bacterial pathogens and holds the promise of enabling previously inaccessible conclusions to be drawn in the realm of public health microbiology.
Collapse
Affiliation(s)
- Einar Gabbassov
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
- *Correspondence: Einar Gabbassov,
| | | | - Iñaki Comas
- Instituto de Biomedicina de Valencia, Valencia, Spain
| | - Maxwell Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College, London, UK
- *Correspondence: Leonid Chindelevitch,
| |
Collapse
|
16
|
Brauner JM, Mindermann S, Sharma M, Johnston D, Salvatier J, Gavenčiak T, Stephenson AB, Leech G, Altman G, Mikulik V, Norman AJ, Monrad JT, Besiroglu T, Ge H, Hartwick MA, Teh YW, Chindelevitch L, Gal Y, Kulveit J. Inferring the effectiveness of government interventions against COVID-19. Science 2021; 371:eabd9338. [PMID: 33323424 PMCID: PMC7877495 DOI: 10.1126/science.abd9338] [Citation(s) in RCA: 505] [Impact Index Per Article: 168.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 09/25/2020] [Accepted: 12/08/2020] [Indexed: 12/14/2022]
Abstract
Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, the effectiveness of different NPIs at reducing transmission is poorly understood. We gathered chronological data on the implementation of NPIs for several European and non-European countries between January and the end of May 2020. We estimated the effectiveness of these NPIs, which range from limiting gathering sizes and closing businesses or educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.
Collapse
Affiliation(s)
- Jan M Brauner
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
- Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Sören Mindermann
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
| | - Mrinank Sharma
- Future of Humanity Institute, University of Oxford, Oxford, UK.
- Department of Statistics, University of Oxford, Oxford, UK
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - David Johnston
- College of Engineering and Computer Science, Australian National University, Canberra, Australia
- Quantified Uncertainty Research Institute, San Francisco, CA, USA
| | - John Salvatier
- Quantified Uncertainty Research Institute, San Francisco, CA, USA
| | | | - Anna B Stephenson
- Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Gavin Leech
- School of Computer Science, University of Bristol, Bristol, UK
| | - George Altman
- School of Medical Sciences, University of Manchester, Manchester, UK
| | | | - Alexander John Norman
- Mathematical, Physical and Life Sciences (MPLS) Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Joshua Teperowski Monrad
- Future of Humanity Institute, University of Oxford, Oxford, UK
- Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, UK
- Department of Health Policy, London School of Economics and Political Science, London, UK
| | - Tamay Besiroglu
- Faculty of Economics, University of Cambridge, Cambridge, UK
| | - Hong Ge
- Engineering Department, University of Cambridge, Cambridge, UK
| | - Meghan A Hartwick
- Tufts Initiative for the Forecasting and Modeling of Infectious Diseases, Tufts University, Boston, MA, USA
| | - Yee Whye Teh
- Department of Statistics, University of Oxford, Oxford, UK
| | - Leonid Chindelevitch
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
- Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), School of Public Health, Imperial College London, London, UK
| | - Yarin Gal
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK
| | - Jan Kulveit
- Future of Humanity Institute, University of Oxford, Oxford, UK
| |
Collapse
|
17
|
Brauner JM, Mindermann S, Sharma M, Johnston D, Salvatier J, Gavenčiak T, Stephenson AB, Leech G, Altman G, Mikulik V, Norman AJ, Monrad JT, Besiroglu T, Ge H, Hartwick MA, Teh YW, Chindelevitch L, Gal Y, Kulveit J. Inferring the effectiveness of government interventions against COVID-19. Science 2021; 371:science.abd9338. [PMID: 33323424 DOI: 10.1101/2020.05.28.20116129] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 09/25/2020] [Accepted: 12/08/2020] [Indexed: 05/21/2023]
Abstract
Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, the effectiveness of different NPIs at reducing transmission is poorly understood. We gathered chronological data on the implementation of NPIs for several European and non-European countries between January and the end of May 2020. We estimated the effectiveness of these NPIs, which range from limiting gathering sizes and closing businesses or educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.
Collapse
Affiliation(s)
- Jan M Brauner
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
- Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Sören Mindermann
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK.
| | - Mrinank Sharma
- Future of Humanity Institute, University of Oxford, Oxford, UK.
- Department of Statistics, University of Oxford, Oxford, UK
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - David Johnston
- College of Engineering and Computer Science, Australian National University, Canberra, Australia
- Quantified Uncertainty Research Institute, San Francisco, CA, USA
| | - John Salvatier
- Quantified Uncertainty Research Institute, San Francisco, CA, USA
| | | | - Anna B Stephenson
- Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Gavin Leech
- School of Computer Science, University of Bristol, Bristol, UK
| | - George Altman
- School of Medical Sciences, University of Manchester, Manchester, UK
| | | | - Alexander John Norman
- Mathematical, Physical and Life Sciences (MPLS) Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Joshua Teperowski Monrad
- Future of Humanity Institute, University of Oxford, Oxford, UK
- Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, UK
- Department of Health Policy, London School of Economics and Political Science, London, UK
| | - Tamay Besiroglu
- Faculty of Economics, University of Cambridge, Cambridge, UK
| | - Hong Ge
- Engineering Department, University of Cambridge, Cambridge, UK
| | - Meghan A Hartwick
- Tufts Initiative for the Forecasting and Modeling of Infectious Diseases, Tufts University, Boston, MA, USA
| | - Yee Whye Teh
- Department of Statistics, University of Oxford, Oxford, UK
| | - Leonid Chindelevitch
- Medical Research Council (MRC) Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
- Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), School of Public Health, Imperial College London, London, UK
| | - Yarin Gal
- Oxford Applied and Theoretical Machine Learning (OATML) Group, Department of Computer Science, University of Oxford, Oxford, UK
| | - Jan Kulveit
- Future of Humanity Institute, University of Oxford, Oxford, UK
| |
Collapse
|
18
|
Miraskarshahi R, Zabeti H, Stephen T, Chindelevitch L. MCS2: minimal coordinated supports for fast enumeration of minimal cut sets in metabolic networks. Bioinformatics 2020; 35:i615-i623. [PMID: 31510702 PMCID: PMC6612898 DOI: 10.1093/bioinformatics/btz393] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Motivation Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal cut sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network. Recently, Ballerstein et al. proposed a method for computing the MCSs of a network without first computing its EFMs, by creating a dual network whose EFMs are a superset of the MCSs of the original network. However, their dual network is always larger than the original network and depends on the target reaction. Here we propose the construction of a different dual network, which is typically smaller than the original network and is independent of the target reaction, for the same purpose. We prove the correctness of our approach, minimal coordinated support (MCS2), and describe how it can be modified to compute the few smallest MCSs for a given target reaction. Results We compare MCS2 to the method of Ballerstein et al. and two other existing methods. We show that MCS2 succeeds in calculating the full set of MCSs in many models where other approaches cannot finish within a reasonable amount of time. Thus, in addition to its theoretical novelty, our approach provides a practical advantage over existing methods. Availability and implementation MCS2 is freely available at https://github.com/RezaMash/MCS under the GNU 3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Reza Miraskarshahi
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Hooman Zabeti
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Tamon Stephen
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | | |
Collapse
|
19
|
Hayati M, Chindelevitch L. Computing the distribution of the Robinson-Foulds distance. Comput Biol Chem 2020; 87:107284. [PMID: 32599459 DOI: 10.1016/j.compbiolchem.2020.107284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 05/09/2020] [Indexed: 11/22/2022]
Abstract
With the exponential growth of genome databases, the importance of phylogenetics has increased dramatically over the past years. Studying phylogenetic trees enables us not only to understand how genes, genomes, and species evolve, but also helps us predict how they might change in future. One of the crucial aspects of phylogenetics is the comparison of two or more phylogenetic trees. There are different metrics for computing the dissimilarity between a pair of trees. The Robinson-Foulds (RF) distance is one of the widely used metrics on the space of labeled trees. The distribution of the RF distance from a given tree has been studied before, but the fastest known algorithm for computing this distribution is a slow, albeit polynomial-time, O(l5) algorithm. In this paper, we modify the dynamic programming algorithm for computing the distribution of this distance for a given tree by leveraging the number-theoretic transform (NTT), and improve the running time from O(l5) to O(l3logl), where l is the number of tips of the tree. In addition to its practical usefulness, our method represents a theoretical novelty, as it is, to our knowledge, one of the rare applications of the number-theoretic transform for solving a computational biology problem.
Collapse
Affiliation(s)
- Maryam Hayati
- Simon Fraser University, Department of Computing Science, add8888 University Avenue, Burnaby, BC V5A 1S6, Canada
| | - Leonid Chindelevitch
- Simon Fraser University, Department of Computing Science, add8888 University Avenue, Burnaby, BC V5A 1S6, Canada.
| |
Collapse
|
20
|
Abstract
BACKGROUND Bacterial pathogens exhibit an impressive amount of genomic diversity. This diversity can be informative of evolutionary adaptations, host-pathogen interactions, and disease transmission patterns. However, capturing this diversity directly from biological samples is challenging. RESULTS We introduce a framework for understanding the within-host diversity of a pathogen using multi-locus sequence types (MLST) from whole-genome sequencing (WGS) data. Our approach consists of two stages. First we process each sample individually by assigning it, for each locus in the MLST scheme, a set of alleles and a proportion for each allele. Next, we associate to each sample a set of strain types using the alleles and the strain proportions obtained in the first step. We achieve this by using the smallest possible number of previously unobserved strains across all samples, while using those unobserved strains which are as close to the observed ones as possible, at the same time respecting the allele proportions as closely as possible. We solve both problems using mixed integer linear programming (MILP). Our method performs accurately on simulated data and generates results on a real data set of Borrelia burgdorferi genomes suggesting a high level of diversity for this pathogen. CONCLUSIONS Our approach can apply to any bacterial pathogen with an MLST scheme, even though we developed it with Borrelia burgdorferi, the etiological agent of Lyme disease, in mind. Our work paves the way for robust strain typing in the presence of within-host heterogeneity, overcoming an essential challenge currently not addressed by any existing methodology for pathogen genomics.
Collapse
Affiliation(s)
- Guo Liang Gan
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby (BC), V5A 1S6, Canada
| | - Elijah Willie
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby (BC), V5A 1S6, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby (BC), V5A 1S6, Canada.,LaBRI, Université de Bordeaux, 351 Cours de la Libération, Talence, 33405, France
| | - Leonid Chindelevitch
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby (BC), V5A 1S6, Canada.
| |
Collapse
|
21
|
Chindelevitch L, La S, Meidanis J. A cubic algorithm for the generalized rank median of three genomes. Algorithms Mol Biol 2019; 14:16. [PMID: 31832081 PMCID: PMC6867026 DOI: 10.1186/s13015-019-0150-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 06/13/2019] [Indexed: 11/13/2022] Open
Abstract
Background The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(n^\omega )$$\end{document}O(nω), where \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\omega \le 3$$\end{document}ω≤3, with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix. Results We define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes. Conclusions We test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction.
Collapse
|
22
|
Abstract
Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.
Collapse
Affiliation(s)
- Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Bita Shadgar
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | | |
Collapse
|
23
|
Thain N, Le C, Crossa A, Ahuja SD, Meissner JS, Mathema B, Kreiswirth B, Kurepina N, Cohen T, Chindelevitch L. Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data. Infect Genet Evol 2019; 72:59-66. [PMID: 29960078 PMCID: PMC6708508 DOI: 10.1016/j.meegid.2018.06.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Revised: 06/20/2018] [Accepted: 06/22/2018] [Indexed: 11/30/2022]
Abstract
The determination of lineages from strain-based molecular genotyping information is an important problem in tuberculosis. Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is a commonly used molecular genotyping approach that uses counts of the number of times pre-specified loci repeat in a strain. There are three main approaches for determining lineage based on MIRU-VNTR data - one based on a direct comparison to the strains in a curated database, and two others, on machine learning algorithms trained on a large collection of labeled data. All existing methods have limitations. The direct approach imposes an arbitrary threshold on how much a database strain can differ from a given one to be informative. On the other hand, the machine learning-based approaches require a substantial amount of labeled data. Notably, all three methods exhibit suboptimal classification accuracy without additional data. We explore several computational approaches to address these limitations. First, we show that eliminating the arbitrary threshold improves the performance of the direct approach. Second, we introduce RuleTB, an alternative direct method that proposes a concise set of rules for determining lineages. Lastly, we propose StackTB, a machine learning approach that requires only a fraction of the training data to outperform the accuracy of both existing machine learning methods. Our approaches demonstrate superior performance on a training dataset collected in New York City over 10 years, and the improvement in performance translates to a held-out testing set. We conclude that our methods provide opportunities for improving the determination of pathogenic lineages based on MIRU-VNTR data.
Collapse
Affiliation(s)
- Nithum Thain
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Christopher Le
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Aldo Crossa
- New York City Department of Health and Mental Hygiene, Queens, NY, USA
| | - Shama Desai Ahuja
- New York City Department of Health and Mental Hygiene, Queens, NY, USA
| | | | - Barun Mathema
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Barry Kreiswirth
- Public Health Research Institute TB Center, Rutgers University, Newark, NJ, USA
| | - Natalia Kurepina
- Public Health Research Institute TB Center, Rutgers University, Newark, NJ, USA
| | - Ted Cohen
- Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | | |
Collapse
|
24
|
Abstract
MOTIVATION Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data. RESULTS We introduce a method called Hintra for intra-tumor heterogeneity detection. Hintra integrates sequencing data for a cohort of tumors and infers tumor phylogeny for each individual based on the evolutionary information shared between different tumors. Through an iterative process, Hintra learns the repeating evolutionary patterns and uses this information for resolving the phylogenetic ambiguities of individual tumors. The results of synthetic experiments show an improved performance compared to two state-of-the-art methods. The experimental results with a recent Breast Cancer dataset are consistent with the existing knowledge and provide potentially interesting findings. AVAILABILITY AND IMPLEMENTATION The source code for Hintra is available at https://github.com/sahandk/HINTRA.
Collapse
Affiliation(s)
| | - Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Jeffrey Tang
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Ryan Morin
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC
| | | | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC
- Vancouver Prostate Centre, Vancouver, BC, Canada
| |
Collapse
|
25
|
Ezewudo M, Borens A, Chiner-Oms Á, Miotto P, Chindelevitch L, Starks AM, Hanna D, Liwski R, Zignol M, Gilpin C, Niemann S, Kohl TA, Warren RM, Crook D, Gagneux S, Hoffner S, Rodrigues C, Comas I, Engelthaler DM, Alland D, Rigouts L, Lange C, Dheda K, Hasan R, McNerney R, Cirillo DM, Schito M, Rodwell TC, Posey J. Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase. Sci Rep 2018; 8:15382. [PMID: 30337678 PMCID: PMC6194142 DOI: 10.1038/s41598-018-33731-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 09/11/2018] [Indexed: 12/30/2022] Open
Abstract
Drug-resistant tuberculosis poses a persistent public health threat. The ReSeqTB platform is a collaborative, curated knowledgebase, designed to standardize and aggregate global Mycobacterium tuberculosis complex (MTBC) variant data from whole genome sequencing (WGS) with phenotypic drug susceptibility testing (DST) and clinical data. We developed a unified analysis variant pipeline (UVP) ( https://github.com/CPTR-ReSeqTB/UVP ) to identify variants and assign lineage from MTBC sequence data. Stringent thresholds and quality control measures were incorporated in this open source tool. The pipeline was validated using a well-characterized dataset of 90 diverse MTBC isolates with conventional DST and DNA Sanger sequencing data. The UVP exhibited 98.9% agreement with the variants identified using Sanger sequencing and was 100% concordant with conventional methods of assigning lineage. We analyzed 4636 publicly available MTBC isolates in the ReSeqTB platform representing all seven major MTBC lineages. The variants detected have an above 94% accuracy of predicting drug based on the accompanying DST results in the platform. The aggregation of variants over time in the platform will establish confidence-graded mutations statistically associated with phenotypic drug resistance. These tools serve as critical reference standards for future molecular diagnostic assay developers, researchers, public health agencies and clinicians working towards the control of drug-resistant tuberculosis.
Collapse
Affiliation(s)
- Matthew Ezewudo
- Critical Path Institute, 1730 E River Rd., Tucson, AZ, 85718, USA
| | - Amanda Borens
- Critical Path Institute, 1730 E River Rd., Tucson, AZ, 85718, USA
| | - Álvaro Chiner-Oms
- Joint unit Infection and Public Health FISABIO-CSISP/University of Valencia, Institute of integrative Systems Biology, Valencia, Spain
| | - Paolo Miotto
- Emerging Bacterial Pathogens Unit, IRCCS San Raffaele Scientific Institute, via Olgettina 58, 20132, Milano, Italy
| | - Leonid Chindelevitch
- School of Computing Science, Simon Fraser University, 8888 University Ave, Burnaby, BC, V5A 1S6, Canada
| | - Angela M Starks
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, 1600 Clifton Road MS F08, Atlanta, GA, 30329, USA
| | - Debra Hanna
- Critical Path Institute, 1730 E River Rd., Tucson, AZ, 85718, USA
| | - Richard Liwski
- Critical Path Institute, 1730 E River Rd., Tucson, AZ, 85718, USA
| | - Matteo Zignol
- Global Tuberculosis Program, World Health Organization, Geneva, Switzerland
| | - Christopher Gilpin
- Global Tuberculosis Program, World Health Organization, Geneva, Switzerland
| | - Stefan Niemann
- German Center for Infection Research, Partner Site Borstel, Borstel, Germany
| | - Thomas Andreas Kohl
- Molecular and Experimental Mycobacteriology, Priority area Infections, Research Center Borstel, Borstel, Germany
| | - Robin M Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Derrick Crook
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DU, United Kingdom
| | | | - Sven Hoffner
- Department of Public Health Sciences, Karolinska institute, Stockholm, Sweden
| | | | - Iñaki Comas
- Tuberculosis Genomics Unit, Biomedicine Institute of Valencia (IBV-CSIC), Street Jaime Roig 11. P.O., 4010, Valencia, Spain
| | - David M Engelthaler
- Translational Genomics Research Institute, 3051 W. Shamrell Blvd. Ste 106, Flagstaff, AZ, 86005, USA
| | - David Alland
- Center for Emerging Pathogens, Rutgers-New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Leen Rigouts
- Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Christoph Lange
- Division of Clinical Infectious Diseases and German Center for Infection Research Tuberculosis Unit, Research Center Borstel, Borstel, Germany
| | - Keertan Dheda
- Lung Infection and Immunity Unit, Department of Medicine, Division of Pulmonology and UCT Lung Institute, University of Cape Town, Old Main Building, Groote Schuur Hospital, Observatory, Cape Town, South Africa
| | - Rumina Hasan
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, Karachi, Pakistan
| | - Ruth McNerney
- Department of Medicine, Division of Pulmonology, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa
| | - Daniela M Cirillo
- Emerging Bacterial Pathogens Unit, IRCCS San Raffaele Scientific Institute, via Olgettina 58, 20132, Milano, Italy
| | - Marco Schito
- Critical Path Institute, 1730 E River Rd., Tucson, AZ, 85718, USA
| | - Timothy C Rodwell
- Department of Medicine, University of California, San Diego, CA, USA.,The Foundation for Innovative New Diagnostics, Geneva, Switzerland
| | - James Posey
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, 1600 Clifton Road MS F08, Atlanta, GA, 30329, USA.
| |
Collapse
|
26
|
Nathavitharana RR, Shi CX, Chindelevitch L, Calderon R, Zhang Z, Galea JT, Contreras C, Yataco R, Lecca L, Becerra MC, Murray MB, Cohen T. Polyclonal Pulmonary Tuberculosis Infections and Risk for Multidrug Resistance, Lima, Peru. Emerg Infect Dis 2018; 23:1887-1890. [PMID: 29048297 PMCID: PMC5652442 DOI: 10.3201/eid2311.170077] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Because within-host Mycobacterium tuberculosis diversity complicates diagnosis and treatment of tuberculosis (TB), we measured diversity prevalence and associated factors among 3,098 pulmonary TB patients in Lima, Peru. The 161 patients with polyclonal infection were more likely than the 115 with clonal or the 2,822 with simple infections to have multidrug-resistant TB.
Collapse
|
27
|
Abstract
Background Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper. In their paper, the authors provide an \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$O\left (n^{3}\right)$\end{document}On3 algorithm for determining three candidate medians, prove the tight approximation ratio \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\frac {4}{3}$\end{document}43, and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data. Results In this paper, we extend their results and provide the following:
Three invariants characterizing the problem of finding the median of 3 matrices A sufficient condition for uniqueness of medians that can be checked in O(n) A faster, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$O\left (n^{2}\right)$\end{document}On2 algorithm for determining the median under this condition A new heuristic algorithm for this problem based on compressed sensing A \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$O\left (n^{4}\right)$\end{document}On4 algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.
Conclusions Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons. Electronic supplementary material The online version of this article (10.1186/s12859-018-2131-4) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Santillana M, Tuite A, Nasserie T, Fine P, Champredon D, Chindelevitch L, Dushoff J, Fisman D. Relatedness of the incidence decay with exponential adjustment (IDEA) model, "Farr's law" and SIR compartmental difference equation models. Infect Dis Model 2018; 3:1-12. [PMID: 30839910 PMCID: PMC6326218 DOI: 10.1016/j.idm.2018.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 03/02/2018] [Indexed: 01/14/2023] Open
Abstract
Mathematical models are often regarded as recent innovations in the description and analysis of infectious disease outbreaks and epidemics, but simple mathematical expressions have been in use for projection of epidemic trajectories for more than a century. We recently introduced a single equation model (the incidence decay with exponential adjustment, or IDEA model) that can be used for short-term epidemiological forecasting. In the mid-19th century, Dr. William Farr made the observation that epidemic events rise and fall in a roughly symmetrical pattern that can be approximated by a bell-shaped curve. He noticed that this time-evolution behavior could be captured by a single mathematical formula ("Farr's law") that could be used for epidemic forecasting. We show here that the IDEA model follows Farr's law, and show that for intuitive assumptions, Farr's Law can be derived from the IDEA model. Moreover, we show that both mathematical approaches, Farr's Law and the IDEA model, resemble solutions of a susceptible-infectious-removed (SIR) compartmental differential-equation model in an asymptotic limit, where the changes of disease transmission respond to control measures, and not only to the depletion of susceptible individuals. This suggests that the concept of the reproduction number ( R 0 ) was implicitly captured in Farr's (pre-microbial era) work, and also suggests that control of epidemics, whether via behavior change or intervention, is as integral to the natural history of epidemics as is the dynamics of disease transmission.
Collapse
Affiliation(s)
- Mauricio Santillana
- Computation Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.,Department of Pediatrics, Harvard Medical School, Bonton, MA, USA
| | - Ashleigh Tuite
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,BlueDot, Toronto, Ontario, Canada
| | - Tahmina Nasserie
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,BlueDot, Toronto, Ontario, Canada
| | - Paul Fine
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - David Champredon
- Agent-Based Modelling Laboratory, York University, Toronto, Ontario, Canada.,Department of Theoretical Biology, McMaster University, Hamilton, Ontario, Canada
| | - Leonid Chindelevitch
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Jonathan Dushoff
- Department of Theoretical Biology, McMaster University, Hamilton, Ontario, Canada
| | - David Fisman
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
29
|
Abstract
MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.
Collapse
Affiliation(s)
- Pedro Feijao
- 1School of Computing Science, Simon Fraser University, Vancouver, Canada
| | - Hua-Ting Yao
- 2École Polytechnique, Université Paris-Saclay, Palaiseau, France
| | - Dan Fornika
- 3BC Centre for Disease Control, Vancouver, Canada
| | - Jennifer Gardy
- 4School of Population and Public Health, University of British Columbia, Vancouver, Canada
| | - William Hsiao
- 5Department of Pathology and Laboratory Medicine, University of British Columbia and BC Centre for Disease Control, Vancouver, Canada
| | - Cedric Chauve
- 6Department of Mathematics, Simon Fraser University, Vancouver, Canada
| | | |
Collapse
|
30
|
Miotto P, Tessema B, Tagliani E, Chindelevitch L, Starks AM, Emerson C, Hanna D, Kim PS, Liwski R, Zignol M, Gilpin C, Niemann S, Denkinger CM, Fleming J, Warren RM, Crook D, Posey J, Gagneux S, Hoffner S, Rodrigues C, Comas I, Engelthaler DM, Murray M, Alland D, Rigouts L, Lange C, Dheda K, Hasan R, Ranganathan UDK, McNerney R, Ezewudo M, Cirillo DM, Schito M, Köser CU, Rodwell TC. A standardised method for interpreting the association between mutations and phenotypic drug resistance in Mycobacterium tuberculosis. Eur Respir J 2017; 50:1701354. [PMID: 29284687 PMCID: PMC5898944 DOI: 10.1183/13993003.01354-2017] [Citation(s) in RCA: 216] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 10/13/2017] [Indexed: 11/24/2022]
Abstract
A clear understanding of the genetic basis of antibiotic resistance in Mycobacterium tuberculosis is required to accelerate the development of rapid drug susceptibility testing methods based on genetic sequence.Raw genotype-phenotype correlation data were extracted as part of a comprehensive systematic review to develop a standardised analytical approach for interpreting resistance associated mutations for rifampicin, isoniazid, ofloxacin/levofloxacin, moxifloxacin, amikacin, kanamycin, capreomycin, streptomycin, ethionamide/prothionamide and pyrazinamide. Mutation frequencies in resistant and susceptible isolates were calculated, together with novel statistical measures to classify mutations as high, moderate, minimal or indeterminate confidence for predicting resistance.We identified 286 confidence-graded mutations associated with resistance. Compared to phenotypic methods, sensitivity (95% CI) for rifampicin was 90.3% (89.6-90.9%), while for isoniazid it was 78.2% (77.4-79.0%) and their specificities were 96.3% (95.7-96.8%) and 94.4% (93.1-95.5%), respectively. For second-line drugs, sensitivity varied from 67.4% (64.1-70.6%) for capreomycin to 88.2% (85.1-90.9%) for moxifloxacin, with specificity ranging from 90.0% (87.1-92.5%) for moxifloxacin to 99.5% (99.0-99.8%) for amikacin.This study provides a standardised and comprehensive approach for the interpretation of mutations as predictors of M. tuberculosis drug-resistant phenotypes. These data have implications for the clinical interpretation of molecular diagnostics and next-generation sequencing as well as efficient individualised therapy for patients with drug-resistant tuberculosis.
Collapse
Affiliation(s)
- Paolo Miotto
- Emerging Bacterial Pathogens Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Belay Tessema
- Department of Medical Microbiology, University of Gondar, Gondar, Ethiopia
| | - Elisa Tagliani
- Emerging Bacterial Pathogens Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | - Angela M Starks
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Claudia Emerson
- Institute on Ethics & Policy for Innovation, Department of Philosophy, McMaster University, Hamilton, ON, Canada
| | | | - Peter S Kim
- Office of AIDS Research, National Institutes of Health, Rockville, MD, USA
| | | | - Matteo Zignol
- Global Tuberculosis Programme, World Health Organization, Geneva, Switzerland
| | - Christopher Gilpin
- Global Tuberculosis Programme, World Health Organization, Geneva, Switzerland
| | - Stefan Niemann
- Molecular and Experimental Mycobacteriology, Priority Area Infections, Research Center Borstel, Borstel, Germany
- German Center for Infection Research, Borstel, Germany
| | - Claudia M Denkinger
- Foundation for Innovative New Diagnostics, Campus Biotech, Geneva, Switzerland
| | - Joy Fleming
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Robin M Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | - Derrick Crook
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
- National Infection Service, Public Health England, London, UK
| | - James Posey
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Sven Hoffner
- Microbiology, Tumour and Cell Biology, Karolinska Institute, Stockholm, Sweden
- Public Health Agency of Sweden, Solna, Sweden
| | | | - Iñaki Comas
- Tuberculosis Genomics Unit, Biomedicine Institute of Valencia (IBV-CSIC), Valencia, Spain
- Foundation for the Promotion of Health and Biomedical Research in the Valencian Community (FISABIO), Valencia, Spain
- CIBER (Centros de Investigación Biomédica en Red) in Epidemiology and Public Health, Madrid, Spain
| | | | - Megan Murray
- Harvard School of Public Health, Department of Epidemiology, Boston, MA, USA
| | - David Alland
- Center for Emerging Pathogens, Rutgers-New Jersey Medical School, Newark, NJ, USA
| | - Leen Rigouts
- Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Christoph Lange
- Division of Clinical Infectious Diseases and German Center for Infection Research Tuberculosis Unit, Research Center Borstel, Borstel, Germany
- International Health/Infectious Diseases, University of Lübeck, Lübeck, Germany
- Department of Medicine, Karolinska Institute, Stockholm, Sweden
- Department of Internal Medicine, University of Namibia School of Medicine, Windhoek, Namibia
| | - Keertan Dheda
- Lung Infection and Immunity Unit, Department of Medicine, Division of Pulmonology and UCT Lung Institute, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa
| | - Rumina Hasan
- Department of Pathology and Laboratory Medicine, Aga Khan University, Karachi, Pakistan
| | | | - Ruth McNerney
- Department of Medicine, Division of Pulmonology, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa
| | | | - Daniela M Cirillo
- Emerging Bacterial Pathogens Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | | - Claudio U Köser
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Timothy C Rodwell
- Foundation for Innovative New Diagnostics, Campus Biotech, Geneva, Switzerland
- Department of Medicine, University of California, San Diego, CA, USA
| |
Collapse
|
31
|
Satgunam PN, Chindelevitch L. Vision Screening Results in a Cohort of Bhopal Gas Disaster Survivors. CURR SCI INDIA 2017. [DOI: 10.18520/cs/v112/i10/2085-2088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Cohen T, Chindelevitch L, Misra R, Kempner ME, Galea J, Moodley P, Wilson D. Reply to Chen et al. J Infect Dis 2016; 214:1287-8. [PMID: 27493235 DOI: 10.1093/infdis/jiw351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 07/28/2016] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut
| | - Leonid Chindelevitch
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut
| | | | | | | | | | - Douglas Wilson
- Department of Internal Medicine, Edendale Hospital, Pietermaritzburg, University of KwaZulu-Natal, South Africa
| |
Collapse
|
33
|
Chindelevitch L, Menzies NA, Pretorius C, Stover J, Salomon JA, Cohen T. Evaluating the potential impact of enhancing HIV treatment and tuberculosis control programmes on the burden of tuberculosis. J R Soc Interface 2016; 12:rsif.2015.0146. [PMID: 25878131 PMCID: PMC4424692 DOI: 10.1098/rsif.2015.0146] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
HIV has fuelled increasing tuberculosis (TB) incidence in sub-Saharan Africa. Better control of TB in this region may be achieved directly through TB programme improvements and indirectly through expanded use of antiretroviral therapy (ART) among those with HIV. We used a mathematical model of TB and HIV in South Africa to examine the potential epidemiological impact in scenarios involving improvements in three dimensions of TB programmes: coverage, diagnosis and treatment effectiveness, as well as expanded ART use through broadened eligibility. We projected the effect of alternative scenarios on TB prevalence, incidence and TB-related mortality over 20 years. Of the three dimensions of TB programme improvement, expanding coverage would produce the greatest reduction in TB burden. Compared with current performance, combined TB programme improvements were projected to decrease TB incidence by 30% over 5 years and 46% over 20 years, and decrease TB-related mortality by 45% over 5 years and 69% over 20 years. Expanded ART eligibility was projected to decrease TB incidence by 22% over 5 years and 45% over 20 years, and TB-related mortality by 22% over 5 years and 50% over 20 years. We found that over a 20-year horizon, TB-specific and HIV-specific programme changes contribute equally to incidence reductions, whereas the TB-specific changes produce a majority of the mortality benefits. An aggressive expansion of ART alongside traditional TB-specific control measures has the potential to greatly reduce TB burden, with the different elements of a combined approach having a synergistic effect in reducing long-term TB incidence and mortality.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| | - Nicolas A Menzies
- Center for Health Decision Science, Harvard T. H. Chan School of Public Health, Boston, MA, USA Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | | | | | - Joshua A Salomon
- Center for Health Decision Science, Harvard T. H. Chan School of Public Health, Boston, MA, USA Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
34
|
Chindelevitch L, Colijn C, Moodley P, Wilson D, Cohen T. ClassTR: Classifying Within-Host Heterogeneity Based on Tandem Repeats with Application to Mycobacterium tuberculosis Infections. PLoS Comput Biol 2016; 12:e1004475. [PMID: 26829497 PMCID: PMC4734664 DOI: 10.1371/journal.pcbi.1004475] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 07/22/2015] [Indexed: 11/18/2022] Open
Abstract
Genomic tools have revealed genetically diverse pathogens within some hosts. Within-host pathogen diversity, which we refer to as "complex infection", is increasingly recognized as a determinant of treatment outcome for infections like tuberculosis. Complex infection arises through two mechanisms: within-host mutation (which results in clonal heterogeneity) and reinfection (which results in mixed infections). Estimates of the frequency of within-host mutation and reinfection in populations are critical for understanding the natural history of disease. These estimates influence projections of disease trends and effects of interventions. The genotyping technique MLVA (multiple loci variable-number tandem repeats analysis) can identify complex infections, but the current method to distinguish clonal heterogeneity from mixed infections is based on a rather simple rule. Here we describe ClassTR, a method which leverages MLVA information from isolates collected in a population to distinguish mixed infections from clonal heterogeneity. We formulate the resolution of complex infections into their constituent strains as an optimization problem, and show its NP-completeness. We solve it efficiently by using mixed integer linear programming and graph decomposition. Once the complex infections are resolved into their constituent strains, ClassTR probabilistically classifies isolates as clonally heterogeneous or mixed by using a model of tandem repeat evolution. We first compare ClassTR with the standard rule-based classification on 100 simulated datasets. ClassTR outperforms the standard method, improving classification accuracy from 48% to 80%. We then apply ClassTR to a sample of 436 strains collected from tuberculosis patients in a South African community, of which 92 had complex infections. We find that ClassTR assigns an alternate classification to 18 of the 92 complex infections, suggesting important differences in practice. By explicitly modeling tandem repeat evolution, ClassTR helps to improve our understanding of the mechanisms driving within-host diversity of pathogens like Mycobacterium tuberculosis.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut, United States of America
- * E-mail:
| | - Caroline Colijn
- Department of Mathematics, Imperial College, London, United Kingdom
| | - Prashini Moodley
- School of Laboratory Medicine and Medical Sciences, Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Douglas Wilson
- Department of Medicine, Edendale Hospital, Pietermaritzberg, South Africa
- Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
35
|
Cohen T, Chindelevitch L, Misra R, Kempner ME, Galea J, Moodley P, Wilson D. Within-Host Heterogeneity of Mycobacterium tuberculosis Infection Is Associated With Poor Early Treatment Response: A Prospective Cohort Study. J Infect Dis 2016; 213:1796-9. [PMID: 26768249 DOI: 10.1093/infdis/jiw014] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 01/06/2016] [Indexed: 11/12/2022] Open
Abstract
The clinical management of tuberculosis is a major challenge in southern Africa. The prevalence of within-host genetically heterogeneous Mycobacterium tuberculosis infection and its effect on treatment response are not well understood. We enrolled 500 patients with tuberculosis in KwaZulu-Natal and followed them through 2 months of treatment. Using mycobacterial interspersed repetitive units-variable number of tandem repeats genotyping to identify mycobacterial heterogeneity, we report the prevalence and evaluate the association of heterogeneity with treatment response. Upon initiation of treatment, 21.1% of participants harbored a heterogeneous M. tuberculosis infection; such heterogeneity was independently associated with a nearly 2-fold higher odds of persistent culture positivity after 2 months of treatment (adjusted odds ratio, 1.90; 95% confidence interval, 1.03-3.50).
Collapse
Affiliation(s)
- Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut
| | - Leonid Chindelevitch
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, Connecticut
| | - Reshma Misra
- Infection Prevention and Control, University of KwaZulu-Natal, Durban
| | | | | | - Prashini Moodley
- Infection Prevention and Control, University of KwaZulu-Natal, Durban
| | - Douglas Wilson
- Department of Internal Medicine, Edendale Hospital, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
36
|
Affiliation(s)
| | - Jason Trigg
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aviv Regev
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Bonnie Berger
- Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
37
|
Chindelevitch L, Trigg J, Regev A, Berger B. An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models. Nat Commun 2014; 5:4893. [PMID: 25291352 PMCID: PMC4205847 DOI: 10.1038/ncomms5893] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 08/04/2014] [Indexed: 12/03/2022] Open
Abstract
Constraint-based models are currently the only methodology that allows the study of metabolism at the whole-genome scale. Flux balance analysis is commonly used to analyse constraint-based models. Curiously, the results of this analysis vary with the software being run, a situation that we show can be remedied by using exact rather than floating-point arithmetic. Here we introduce MONGOOSE, a toolbox for analysing the structure of constraint-based metabolic models in exact arithmetic. We apply MONGOOSE to the analysis of 98 existing metabolic network models and find that the biomass reaction is surprisingly blocked (unable to sustain non-zero flux) in nearly half of them. We propose a principled approach for unblocking these reactions and extend it to the problems of identifying essential and synthetic lethal reactions and minimal media. Our structural insights enable a systematic study of constraint-based metabolic models, yielding a deeper understanding of their possibilities and limitations. Current tools to analyse constraint-based models of metabolic networks have limited accuracy due to their use of floating-point arithmetic. Here the authors present MONGOOSE, a new computational tool that analyses such models in exact arithmetic, providing improved accuracy and reproducibility.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- 1] Mathematics Department, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, USA [2] Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
| | - Jason Trigg
- Mathematics Department, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, USA
| | - Aviv Regev
- 1] Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA [2] Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, MD 20815, USA [3] Department of Biology, MIT, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- 1] Mathematics Department, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, USA [2] Broad Institute, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
38
|
Eaton JW, Menzies NA, Stover J, Cambiano V, Chindelevitch L, Cori A, Hontelez JAC, Humair S, Kerr CC, Klein DJ, Mishra S, Mitchell KM, Nichols BE, Vickerman P, Bakker R, Bärnighausen T, Bershteyn A, Bloom DE, Boily MC, Chang ST, Cohen T, Dodd PJ, Fraser C, Gopalappa C, Lundgren J, Martin NK, Mikkelsen E, Mountain E, Pham QD, Pickles M, Phillips A, Platt L, Pretorius C, Prudden HJ, Salomon JA, van de Vijver DAMC, de Vlas SJ, Wagner BG, White RG, Wilson DP, Zhang L, Blandford J, Meyer-Rath G, Remme M, Revill P, Sangrujee N, Terris-Prestholt F, Doherty M, Shaffer N, Easterbrook PJ, Hirnschall G, Hallett TB. Health benefits, costs, and cost-effectiveness of earlier eligibility for adult antiretroviral therapy and expanded treatment coverage: a combined analysis of 12 mathematical models. Lancet Glob Health 2013; 2:23-34. [PMID: 25083415 PMCID: PMC4114402 DOI: 10.1016/s2214-109x(13)70172-4] [Citation(s) in RCA: 177] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND New WHO guidelines recommend ART initiation for HIV-positive persons with CD4 cell counts ≤500 cells/µL, a higher threshold than was previously recommended. Country decision makers must consider whether to further expand ART eligibility accordingly. METHODS We used multiple independent mathematical models in four settings-South Africa, Zambia, India, and Vietnam-to evaluate the potential health impact, costs, and cost-effectiveness of different adult ART eligibility criteria under scenarios of current and expanded treatment coverage, with results projected over 20 years. Analyses considered extending eligibility to include individuals with CD4 ≤500 cells/µL or all HIV-positive adults, compared to the previous recommendation of initiation with CD4 ≤350 cells/µL. We assessed costs from a health system perspective, and calculated the incremental cost per DALY averted ($/DALY) to compare competing strategies. Strategies were considered 'very cost-effective' if the $/DALY was less than the country's per capita gross domestic product (GDP; South Africa: $8040, Zambia: $1425, India: $1489, Vietnam: $1407) and 'cost-effective' if $/DALY was less than three times per capita GDP. FINDINGS In South Africa, the cost per DALY averted of extending ART eligibility to CD4 ≤500 cells/µL ranged from $237 to $1691/DALY compared to 2010 guidelines; in Zambia, expanded eligibility ranged from improving health outcomes while reducing costs (i.e. dominating current guidelines) to $749/DALY. Results were similar in scenarios with substantially expanded treatment access and for expanding eligibility to all HIV-positive adults. Expanding treatment coverage in the general population was therefore found to be cost-effective. In India, eligibility for all HIV-positive persons ranged from $131 to $241/DALY and in Vietnam eligibility for CD4 ≤500 cells/µL cost $290/DALY. In concentrated epidemics, expanded access among key populations was also cost-effective. INTERPRETATION Earlier ART eligibility is estimated to be very cost-effective in low- and middle-income settings, although these questions should be revisited as further information becomes available. Scaling-up ART should be considered among other high-priority health interventions competing for health budgets. FUNDING The Bill and Melinda Gates Foundation and World Health Organization.
Collapse
Affiliation(s)
- Jeffrey W Eaton
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Nicolas A Menzies
- Center for Health Decision Science, Harvard School of Public Health, Boston, MA, USA
| | | | - Valentina Cambiano
- Research Department of Infection and Population Health, University College London, London, UK
| | - Leonid Chindelevitch
- Department of Global Health and Population, Harvard School of Public Health, Boston, MA, USA
| | - Anne Cori
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Jan A C Hontelez
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, Rotterdam, Netherlands
- Africa Centre for Health and Population Studies, University of KwaZulu-Natal, Mtubatuba, South Africa
- Nijmegen International Center for Health System Analysis and Education (NICHE), Department of Primary and Community Care, Radboud University Nijmegen Medical Centre, Nijmegen, Netherlands
| | - Salal Humair
- Harvard School of Public Health, Boston, MA, USA
| | - Cliff C Kerr
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Daniel J Klein
- Epidemiological Modeling Group, Intellectual Ventures Laboratory, Bellevue, WA, USA
| | - Sharmistha Mishra
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
- Division of Infectious Diseases, St. Michael’s Hospital, University of Toronto, Canada
| | - Kate M Mitchell
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | - Brooke E Nichols
- Department of Virology, Erasmus Medical Center, Rotterdam, Netherlands
| | - Peter Vickerman
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | - Roel Bakker
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, Rotterdam, Netherlands
| | - Till Bärnighausen
- Africa Centre for Health and Population Studies, University of KwaZulu-Natal, Mtubatuba, South Africa
- Harvard School of Public Health, Boston, MA, USA
| | - Anna Bershteyn
- Epidemiological Modeling Group, Intellectual Ventures Laboratory, Bellevue, WA, USA
| | | | - Marie-Claude Boily
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Stewart T Chang
- Epidemiological Modeling Group, Intellectual Ventures Laboratory, Bellevue, WA, USA
| | - Ted Cohen
- Division of Global Health Equity, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
| | - Peter J Dodd
- Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK
| | - Christophe Fraser
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | | | - Jens Lundgren
- Copenhagen University Hospital/Rigshospitalet, Copenhagen, Denmark
- University of Copenhagen, Copenhagen, Denmark
| | - Natasha K Martin
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Evelinn Mikkelsen
- Nijmegen International Center for Health System Analysis and Education (NICHE), Department of Primary and Community Care, Radboud University Nijmegen Medical Centre, Nijmegen, Netherlands
| | - Elisa Mountain
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Quang D Pham
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Michael Pickles
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Andrew Phillips
- Research Department of Infection and Population Health, University College London, London, UK
| | - Lucy Platt
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | | | - Holly J Prudden
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | - Joshua A Salomon
- Center for Health Decision Science, Harvard School of Public Health, Boston, MA, USA
- Department of Global Health and Population, Harvard School of Public Health, Boston, MA, USA
| | | | - Sake J de Vlas
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, Rotterdam, Netherlands
| | - Bradley G Wagner
- Epidemiological Modeling Group, Intellectual Ventures Laboratory, Bellevue, WA, USA
| | - Richard G White
- Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK
| | - David P Wilson
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - Lei Zhang
- Kirby Institute, University of New South Wales, Sydney, Australia
| | - John Blandford
- U.S. Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Gesine Meyer-Rath
- Center for Global Health and Development, Boston University, Boston, MA, USA
- Health Economics and Epidemiology Research Office, Department of Medicine, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| | - Michelle Remme
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | - Paul Revill
- Centre for Health Economics, University of York, York, UK
| | | | - Fern Terris-Prestholt
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| | - Meg Doherty
- Department of HIV/AIDS, World Health Organization, Geneva, Switzerland
| | - Nathan Shaffer
- Department of HIV/AIDS, World Health Organization, Geneva, Switzerland
| | | | | | - Timothy B Hallett
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| |
Collapse
|
39
|
Chindelevitch L, Ma CY, Liao CS, Berger B. Optimizing a global alignment of protein interaction networks. Bioinformatics 2013; 29:2765-73. [PMID: 24048352 PMCID: PMC3799479 DOI: 10.1093/bioinformatics/btt486] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Revised: 07/31/2013] [Accepted: 08/15/2013] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION The global alignment of protein interaction networks is a widely studied problem. It is an important first step in understanding the relationship between the proteins in different species and identifying functional orthologs. Furthermore, it can provide useful insights into the species' evolution. RESULTS We propose a novel algorithm, PISwap, for optimizing global pairwise alignments of protein interaction networks, based on a local optimization heuristic that has previously demonstrated its effectiveness for a variety of other intractable problems. PISwap can begin with different types of network alignment approaches and then iteratively adjust the initial alignments by incorporating network topology information, trading it off for sequence information. In practice, our algorithm efficiently refines other well-studied alignment techniques with almost no additional time cost. We also show the robustness of the algorithm to noise in protein interaction data. In addition, the flexible nature of this algorithm makes it suitable for different applications of network alignment. This algorithm can yield interesting insights into the evolutionary dynamics of related species. AVAILABILITY Our software is freely available for non-commercial purposes from our Web site, http://piswap.csail.mit.edu/. CONTACT bab@csail.mit.edu or csliao@ie.nthu.edu.tw. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA, Department of Computer Science and Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | | | | | |
Collapse
|
40
|
Huang CL, Lamb J, Chindelevitch L, Kostrowicki J, Guinney J, DeLisi C, Ziemek D. Correlation set analysis: detecting active regulators in disease populations using prior causal knowledge. BMC Bioinformatics 2012; 13:46. [PMID: 22443377 PMCID: PMC3382432 DOI: 10.1186/1471-2105-13-46] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 03/23/2012] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators. RESULTS We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships. CONCLUSIONS CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression.
Collapse
Affiliation(s)
- Chia-Ling Huang
- Bioinformatics Graduate Program, and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | - John Lamb
- Oncology Research Unit, Worldwide Research & Development, Pfizer, 10646 Science center Drive, San Diego, CA 92121, USA
| | - Leonid Chindelevitch
- Computational Sciences Center of Emphasis, Worldwide Research & Development, Pfizer, 35 Cambridgepark Drive, Cambridge, MA 02140, USA
| | - Jarek Kostrowicki
- Oncology Research Unit, Worldwide Research & Development, Pfizer, 10646 Science center Drive, San Diego, CA 92121, USA
| | - Justin Guinney
- Sage Bionetworks, 1100 Fairview Ave North, Seattle, WA 98109, USA
| | - Charles DeLisi
- Bioinformatics Graduate Program, and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | - Daniel Ziemek
- Computational Sciences Center of Emphasis, Worldwide Research & Development, Pfizer, 35 Cambridgepark Drive, Cambridge, MA 02140, USA
| |
Collapse
|
41
|
Chindelevitch L, Ziemek D, Enayetallah A, Randhawa R, Sidders B, Brockel C, Huang ES. Causal reasoning on biological networks: interpreting transcriptional changes. Bioinformatics 2012; 28:1114-21. [PMID: 22355083 DOI: 10.1093/bioinformatics/bts090] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The interpretation of high-throughput datasets has remained one of the central challenges of computational biology over the past decade. Furthermore, as the amount of biological knowledge increases, it becomes more and more difficult to integrate this large body of knowledge in a meaningful manner. In this article, we propose a particular solution to both of these challenges. METHODS We integrate available biological knowledge by constructing a network of molecular interactions of a specific kind: causal interactions. The resulting causal graph can be queried to suggest molecular hypotheses that explain the variations observed in a high-throughput gene expression experiment. We show that a simple scoring function can discriminate between a large number of competing molecular hypotheses about the upstream cause of the changes observed in a gene expression profile. We then develop an analytical method for computing the statistical significance of each score. This analytical method also helps assess the effects of random or adversarial noise on the predictive power of our model. RESULTS Our results show that the causal graph we constructed from known biological literature is extremely robust to random noise and to missing or spurious information. We demonstrate the power of our causal reasoning model on two specific examples, one from a cancer dataset and the other from a cardiac hypertrophy experiment. We conclude that causal reasoning models provide a valuable addition to the biologist's toolkit for the interpretation of gene expression data. AVAILABILITY AND IMPLEMENTATION R source code for the method is available upon request.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Computational Sciences Center of Emphasis, Pfizer Worldwide Research & Development, Cambridge, MA 02140, USA
| | | | | | | | | | | | | |
Collapse
|
42
|
Chindelevitch L, Loh PR, Enayetallah A, Berger B, Ziemek D. Assessing statistical significance in causal graphs. BMC Bioinformatics 2012; 13:35. [PMID: 22348444 PMCID: PMC3307026 DOI: 10.1186/1471-2105-13-35] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 02/20/2012] [Indexed: 12/20/2022] Open
Abstract
Background Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing. Results First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds. Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs. Conclusion We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Computational Sciences Center of Emphasis, Pfizer Worldwide Research & Development, Cambridge, MA, USA
| | | | | | | | | |
Collapse
|
43
|
Chindelevitch L, Stanley S, Hung D, Regev A, Berger B. MetaMerge: scaling up genome-scale metabolic reconstructions with application to Mycobacterium tuberculosis. Genome Biol 2012; 13:r6. [PMID: 22292986 PMCID: PMC3488975 DOI: 10.1186/gb-2012-13-1-r6] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Revised: 01/13/2012] [Accepted: 01/31/2012] [Indexed: 12/18/2022] Open
Abstract
Reconstructed models of metabolic networks are widely used for studying metabolism in various organisms. Many different reconstructions of the same organism often exist concurrently, forcing researchers to choose one of them at the exclusion of the others. We describe MetaMerge, an algorithm for semi-automatically reconciling a pair of existing metabolic network reconstructions into a single metabolic network model. We use MetaMerge to combine two published metabolic networks for Mycobacterium tuberculosis into a single network, which allows many reactions that could not be active in the individual models to become active, and predicts essential genes with a higher positive predictive value.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | | | | | | | | |
Collapse
|
44
|
Chindelevitch L, Li Z, Blais E, Blanchette M. On the inference of parsimonious indel evolutionary scenarios. J Bioinform Comput Biol 2006; 4:721-44. [PMID: 16960972 DOI: 10.1142/s0219720006002168] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2005] [Revised: 12/02/2005] [Accepted: 12/31/2005] [Indexed: 11/18/2022]
Abstract
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing a most parsimonious scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, called the Indel Parsimony Problem, is a crucial component of the problem of ancestral genome reconstruction, and its solution provides valuable information to many genome functional annotation approaches. We first show that the problem is NP-complete. Second, we provide an algorithm, based on the fractional relaxation of an integer linear programming formulation. The algorithm is fast in practice, and the solutions it produces are, in most cases, provably optimal. We describe a divide-and-conquer approach that makes it possible to solve very large instances on a simple desktop machine, while retaining guaranteed optimality. Our algorithms are tested and shown efficient and accurate on a set of 1.8 Mb mammalian orthologous sequences in the CFTR region.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- School of Computer Science, McGill University, 3480 University Street, Montreal, Quebec, H3A 2A7, Canada.
| | | | | | | |
Collapse
|