1
|
Mallawaarachchi S, Tonkin-Hill G, Pöntinen A, Calland J, Gladstone R, Arredondo-Alonso S, MacAlasdair N, Thorpe H, Top J, Sheppard S, Balding D, Croucher N, Corander J. Detecting co-selection through excess linkage disequilibrium in bacterial genomes. NAR Genom Bioinform 2024; 6:lqae061. [PMID: 38846349 PMCID: PMC11155488 DOI: 10.1093/nargab/lqae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/15/2024] [Accepted: 05/14/2024] [Indexed: 06/09/2024] Open
Abstract
Population genomics has revolutionized our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here, we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.
Collapse
Affiliation(s)
| | | | - Anna K Pöntinen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - Jessica K Calland
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | | | | | | | - Harry A Thorpe
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Janetta Top
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Samuel K Sheppard
- Ineos Oxford Institute of Antimicrobial Research, Department of Biology, University of Oxford, Oxford, United Kingdom
| | - David Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, Victoria, Australia
| | - Nicholas J Croucher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, United Kingdom
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
2
|
Green AG, Vargas R, Marin MG, Freschi L, Xie J, Farhat MR. Analysis of Genome-Wide Mutational Dependence in Naturally Evolving Mycobacterium tuberculosis Populations. Mol Biol Evol 2023; 40:msad131. [PMID: 37352142 PMCID: PMC10292908 DOI: 10.1093/molbev/msad131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 05/12/2023] [Accepted: 05/23/2023] [Indexed: 06/25/2023] Open
Abstract
Pathogenic microorganisms are in a perpetual struggle for survival in changing host environments, where host pressures necessitate changes in pathogen virulence, antibiotic resistance, or transmissibility. The genetic basis of phenotypic adaptation by pathogens is difficult to study in vivo. In this work, we develop a phylogenetic method to detect genetic dependencies that promote pathogen adaptation using 31,428 in vivo sampled Mycobacterium tuberculosis genomes, a globally prevalent bacterial pathogen with increasing levels of antibiotic resistance. We find that dependencies between mutations are enriched in antigenic and antibiotic resistance functions and discover 23 mutations that potentiate the development of antibiotic resistance. Between 11% and 92% of resistant strains harbor a dependent mutation acquired after a resistance-conferring variant. We demonstrate the pervasiveness of genetic dependency in adaptation of naturally evolving populations and the utility of the proposed computational approach.
Collapse
Affiliation(s)
- Anna G Green
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Roger Vargas
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Maximillian G Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Luca Freschi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Xie
- Department of Genetics, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Maha R Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
3
|
Abstract
Horizontal gene transfer (HGT) is arguably the most conspicuous feature of bacterial evolution. Evidence for HGT is found in most bacterial genomes. Although HGT can considerably alter bacterial genomes, not all transfer events may be biologically significant and may instead represent the outcome of an incessant evolutionary process that only occasionally has a beneficial purpose. When adaptive transfers occur, HGT and positive selection may result in specific, detectable signatures in genomes, such as gene-specific sweeps or increased transfer rates for genes that are ecologically relevant. In this Review, we first discuss the various mechanisms whereby HGT occurs, how the genetic signatures shape patterns of genomic variation and the distinct bioinformatic algorithms developed to detect these patterns. We then discuss the evolutionary theory behind HGT and positive selection in bacteria, and discuss the approaches developed over the past decade to detect transferred DNA that may be involved in adaptation to new environments.
Collapse
|
4
|
Si Y, Zhang Y, Yan C. A reproducibility analysis-based statistical framework for residue-residue evolutionary coupling detection. Brief Bioinform 2022; 23:6509046. [PMID: 35037015 DOI: 10.1093/bib/bbab576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/26/2021] [Accepted: 12/15/2021] [Indexed: 11/14/2022] Open
Abstract
Direct coupling analysis (DCA) has been widely used to infer evolutionary coupled residue pairs from the multiple sequence alignment (MSA) of homologous sequences. However, effectively selecting residue pairs with significant evolutionary couplings according to the result of DCA is a non-trivial task. In this study, we developed a general statistical framework for significant evolutionary coupling detection, referred to as irreproducible discovery rate (IDR)-DCA, which is based on reproducibility analysis of the coupling scores obtained from DCA on manually created MSA replicates. IDR-DCA was applied to select residue pairs for contact prediction for monomeric proteins, protein-protein interactions and monomeric RNAs, in which three different versions of DCA were applied. We demonstrated that with the application of IDR-DCA, the residue pairs selected using a universal threshold always yielded stable performance for contact prediction. Comparing with the application of carefully tuned coupling score cutoffs, IDR-DCA always showed better performance. The robustness of IDR-DCA was also supported through the MSA downsampling analysis. We further demonstrated the effectiveness of applying constraints obtained from residue pairs selected by IDR-DCA to assist RNA secondary structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yi Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
5
|
Top J, Arredondo-Alonso S, Schürch AC, Puranen S, Pesonen M, Pensar J, Willems RJL, Corander J. Genomic rearrangements uncovered by genome-wide co-evolution analysis of a major nosocomial pathogen, Enterococcus faecium. Microb Genom 2020; 6:mgen000488. [PMID: 33253085 PMCID: PMC8116687 DOI: 10.1099/mgen.0.000488] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 11/16/2020] [Indexed: 11/25/2022] Open
Abstract
Enterococcus faecium is a gut commensal of the gastro-digestive tract, but also known as nosocomial pathogen among hospitalized patients. Population genetics based on whole-genome sequencing has revealed that E. faecium strains from hospitalized patients form a distinct clade, designated clade A1, and that plasmids are major contributors to the emergence of nosocomial E. faecium. Here we further explored the adaptive evolution of E. faecium using a genome-wide co-evolution study (GWES) to identify co-evolving single-nucleotide polymorphisms (SNPs). We identified three genomic regions harbouring large numbers of SNPs in tight linkage that are not proximal to each other based on the completely assembled chromosome of the clade A1 reference hospital isolate AUS0004. Close examination of these regions revealed that they are located at the borders of four different types of large-scale genomic rearrangements, insertion sites of two different genomic islands and an IS30-like transposon. In non-clade A1 isolates, these regions are adjacent to each other and they lack the insertions of the genomic islands and IS30-like transposon. Additionally, among the clade A1 isolates there is one group of pet isolates lacking the genomic rearrangement and insertion of the genomic islands, suggesting a distinct evolutionary trajectory. In silico analysis of the biological functions of the genes encoded in three regions revealed a common link to a stress response. This suggests that these rearrangements may reflect adaptation to the stringent conditions in the hospital environment, such as antibiotics and detergents, to which bacteria are exposed. In conclusion, to our knowledge, this is the first study using GWES to identify genomic rearrangements, suggesting that there is considerable untapped potential to unravel hidden evolutionary signals from population genomic data.
Collapse
Affiliation(s)
- Janetta Top
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Sergio Arredondo-Alonso
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Anita C. Schürch
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Santeri Puranen
- Department of Computer Science, Aalto University, FI-00076 Espoo, Finland
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
| | - Maiju Pesonen
- Department of Computer Science, Aalto University, FI-00076 Espoo, Finland
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
- Present address: Oslo Centre for Biostatistics and Epidemiology (OCBE), Oslo University Hospital Research Support Services, Oslo, Norway
| | - Johan Pensar
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
- Present address: Department of Mathematics, University of Oslo, 0316 Oslo, Norway
| | - Rob J. L. Willems
- Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Jukka Corander
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland
- Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
- Department of Biostatistics, University of Oslo, 0317 Oslo, Norway
| |
Collapse
|
6
|
Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST, Parkhill J, Corander J. Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions. mBio 2020; 11:e01344-20. [PMID: 32636251 PMCID: PMC7343994 DOI: 10.1128/mbio.01344-20] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 06/05/2020] [Indexed: 12/19/2022] Open
Abstract
Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - T Tien Mai
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Marco Galardini
- Biological Design Center, Boston University, Boston, Massachusetts, USA
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
7
|
Cui Y, Yang C, Qiu H, Wang H, Yang R, Falush D. The landscape of coadaptation in Vibrio parahaemolyticus. eLife 2020; 9:54136. [PMID: 32195663 PMCID: PMC7101233 DOI: 10.7554/elife.54136] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/19/2020] [Indexed: 11/13/2022] Open
Abstract
Investigating fitness interactions in natural populations remains a considerable challenge. We take advantage of the unique population structure of Vibrio parahaemolyticus, a bacterial pathogen of humans and shrimp, to perform a genome-wide screen for coadapted genetic elements. We identified 90 interaction groups (IGs) involving 1,560 coding genes. 82 IGs are between accessory genes, many of which have functions related to carbohydrate transport and metabolism. Only 8 involve both core and accessory genomes. The largest includes 1,540 SNPs in 82 genes and 338 accessory genome elements, many involved in lateral flagella and cell wall biogenesis. The interactions have a complex hierarchical structure encoding at least four distinct ecological strategies. One strategy involves a divergent profile in multiple genome regions, while the others involve fewer genes and are more plastic. Our results imply that most genetic alliances are ephemeral but that increasingly complex strategies can evolve and eventually cause speciation.
Collapse
Affiliation(s)
- Yujun Cui
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Chao Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.,Shenzhen Centre for Disease Control and Prevention, Shenzhen, China
| | - Hongling Qiu
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Institute for Nutritional Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hui Wang
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Institute for Nutritional Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ruifu Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Daniel Falush
- The Center for Microbes, Development and Health, Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
8
|
Pensar J, Puranen S, Arnold B, MacAlasdair N, Kuronen J, Tonkin-Hill G, Pesonen M, Xu Y, Sipola A, Sánchez-Busó L, Lees JA, Chewapreecha C, Bentley SD, Harris SR, Parkhill J, Croucher NJ, Corander J. Genome-wide epistasis and co-selection study using mutual information. Nucleic Acids Res 2019; 47:e112. [PMID: 31361894 PMCID: PMC6765119 DOI: 10.1093/nar/gkz656] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 07/09/2019] [Accepted: 07/19/2019] [Indexed: 01/19/2023] Open
Abstract
Covariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.
Collapse
Affiliation(s)
- Johan Pensar
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland
| | - Santeri Puranen
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland.,Department of Computer Science, Aalto University, Espoo, FI-00014, Finland
| | - Brian Arnold
- Division of Informatics, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Neil MacAlasdair
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Juri Kuronen
- Department of Biostatistics, University of Oslo, Oslo, 0317, Norway
| | - Gerry Tonkin-Hill
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Maiju Pesonen
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland.,Department of Computer Science, Aalto University, Espoo, FI-00014, Finland
| | - Yingying Xu
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland.,Department of Computer Science, Aalto University, Espoo, FI-00014, Finland
| | - Aleksi Sipola
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland
| | | | - John A Lees
- Department of Microbiology, New York University School of Medicine, New York, NY 10016, USA
| | - Claire Chewapreecha
- Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK.,Bioinformatics & Systems Biology program, King Mongkut's University of Technology Thonburi, Bangkok 10150, Thailand
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Simon R Harris
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB3 0ES, UK
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, W2 1PG, UK
| | - Jukka Corander
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology (HIIT), Faculty of Science, University of Helsinki, FI-00014 Helsinki, Finland.,Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.,Department of Biostatistics, University of Oslo, Oslo, 0317, Norway
| |
Collapse
|
9
|
Dewé TCM, D'Aeth JC, Croucher NJ. Genomic epidemiology of penicillin-non-susceptible Streptococcus pneumoniae. Microb Genom 2019; 5. [PMID: 31609685 PMCID: PMC6861860 DOI: 10.1099/mgen.0.000305] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Penicillin-non-susceptible Streptococcus pneumoniae (PNSP) were first detected in the 1960s, and are now common worldwide, predominantly through the international spread of a limited number of strains. Extant PNSP are characterized by mosaic pbp2x, pbp2b and pbp1a genes generated by interspecies recombinations, with the extent of these alterations determining the range and concentrations of β-lactams to which the genotype is non-susceptible. The complexity of the genetics underlying these phenotypes has been the subject of both molecular microbiology and genome-wide association and epistasis analyses. Such studies can aid our understanding of PNSP evolution and help improve the already highly-performing bioinformatic methods capable of identifying PNSP from genomic surveillance data.
Collapse
Affiliation(s)
- Tamsin C M Dewé
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, W2 1PG, UK
| | - Joshua C D'Aeth
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, W2 1PG, UK
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, St. Mary's Campus, Imperial College London, London, W2 1PG, UK
| |
Collapse
|
10
|
Gao CY, Cecconi F, Vulpiani A, Zhou HJ, Aurell E. DCA for genome-wide epistasis analysis: the statistical genetics perspective. Phys Biol 2019; 16:026002. [PMID: 30605896 DOI: 10.1088/1478-3975/aafbe0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Direct coupling analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the quasi-linkage equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of clonal competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3000 genomes of the human pathogen Streptococcus pneumoniae.
Collapse
Affiliation(s)
- Chen-Yi Gao
- Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China. School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | | | | | | | | |
Collapse
|
11
|
Schubert B, Maddamsetti R, Nyman J, Farhat MR, Marks DS. Genome-wide discovery of epistatic loci affecting antibiotic resistance in Neisseria gonorrhoeae using evolutionary couplings. Nat Microbiol 2018; 4:328-338. [PMID: 30510172 DOI: 10.1038/s41564-018-0309-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 10/26/2018] [Indexed: 11/09/2022]
Abstract
Genome analysis should allow the discovery of interdependent loci that together cause antibiotic resistance. In practice, however, the vast number of possible epistatic interactions erodes statistical power. Here, we extend an approach that has been successfully used to identify epistatic residues in proteins to infer genomic loci that are strongly coupled. This approach reduces the number of tests required for an epistatic genome-wide association study of antibiotic resistance and increases the likelihood of identifying causal epistasis. We discovered 38 loci and 240 epistatic pairs that influence the minimum inhibitory concentrations of 5 different antibiotics in 1,102 isolates of Neisseria gonorrhoeae that were confirmed in a second dataset of 495 isolates. Many known resistance-affecting loci were recovered; however, the majority of associations occurred in unreported genes, such as murE. About half of the discovered epistasis involved at least one locus previously associated with antibiotic resistance, including interactions between gyrA and parC. Still, many combinations involved unreported loci and genes. While most variation in minimum inhibitory concentrations could be explained by identified loci, epistasis substantially increased explained phenotypic variance. Our work provides a systematic identification of epistasis affecting antibiotic resistance in N. gonorrhoeae and a generalizable approach for epistatic genome-wide association studies.
Collapse
Affiliation(s)
- Benjamin Schubert
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Cell Biology, Harvard Medical School, Boston, MA, USA.,cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Rohan Maddamsetti
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Department of Biological Sciences, Old Dominion University, Norfolk, VA, USA
| | - Jackson Nyman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Maha R Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA. .,Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
12
|
Xu Y, Puranen S, Corander J, Kabashima Y. Inverse finite-size scaling for high-dimensional significance analysis. Phys Rev E 2018; 97:062112. [PMID: 30011500 DOI: 10.1103/physreve.97.062112] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Indexed: 11/07/2022]
Abstract
We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 10^{10} parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.
Collapse
Affiliation(s)
- Yingying Xu
- Department of Computer Science, School of Science, Aalto University, 00076 Espoo, Finland.,Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Santeri Puranen
- Department of Computer Science, School of Science, Aalto University, 00076 Espoo, Finland.,Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland.,Department of Biostatistics, University of Oslo, 0317 Oslo, Norway
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland.,Department of Biostatistics, University of Oslo, 0317 Oslo, Norway
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8552, Japan
| |
Collapse
|