1
|
Zhang Y, Liu M, Zhang J, Wu J, Hong L, Zhu L, Long J. Large-scale comparative analysis reveals phylogenomic preference of bla NDM-1 and bla KPC-2 transmission among Klebsiella pneumoniae. Int J Antimicrob Agents 2024; 64:107225. [PMID: 38810941 DOI: 10.1016/j.ijantimicag.2024.107225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 05/20/2024] [Indexed: 05/31/2024]
Abstract
blaNDM-1 and blaKPC-2 are responsible for the global increase in carbapenem-resistant Klebsiella pneumoniae, posing a great challenge to public health. However, the impact of phylogenetic factors on the dissemination of blaNDM-1 and blaKPC-2 is not yet fully understood. This study established a global dataset of 4051 blaNDM-1+ and 10,223 blaKPC-2+ K. pneumoniae genomes, and compared their transmission modes on a global scale. The results showed that blaNDM-1+ K. pneumoniae genomes exhibited a broader geographical distribution and higher sequence type (ST) richness than blaKPC-2+ genomes, indicating higher transmissibility of the blaNDM-1 gene. Furthermore, blaNDM-1+ genomes displayed significant differences in ST lineage, antibiotic resistance gene composition, virulence gene composition and genetic environments compared with blaKPC-2+ genomes, suggesting distinct dissemination mechanisms. blaNDM-1+ genomes were predominantly associated with ST147 and ST16, whereas blaKPC-2+ genomes were mainly found in ST11 and ST258. Significantly different accessory genes were identified between blaNDM-1+ and blaKPC-2+ genomes. The preference for blaKPC-2 distribution across certain countries, ST lineages and genetic environments underscores vertical spread as the primary mechanism driving the expansion of blaKPC-2. In contrast, blaNDM-1+ genomes did not display such a strong preference, confirming that the dissemination of blaNDM-1 mainly depends on horizontal gene transfer. Overall, this study demonstrates different phylogenetic drivers for the dissemination of blaNDM-1 and blaKPC-2, providing new insights into their global transmission dynamics.
Collapse
Affiliation(s)
- Yali Zhang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Mengyue Liu
- College of Public Health, Zhengzhou University, Zhengzhou, Henan, China
| | - Jiangfeng Zhang
- Department of Clinical Laboratory, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University and People's Hospital of Henan University, Zhengzhou, Henan, China
| | - Jie Wu
- College of Public Health, Zhengzhou University, Zhengzhou, Henan, China
| | - Lijuan Hong
- Department Hospital-Acquired Infection Control, The First Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China.
| | - LiQiang Zhu
- Department of Clinical Laboratory, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China.
| | - Jinzhao Long
- College of Public Health, Zhengzhou University, Zhengzhou, Henan, China.
| |
Collapse
|
2
|
Sajid S, Mashkoor M, Jørgensen MG, Christensen LP, Hansen PR, Franzyk H, Mirza O, Prabhala BK. The Y-ome Conundrum: Insights into Uncharacterized Genes and Approaches for Functional Annotation. Mol Cell Biochem 2024; 479:1957-1968. [PMID: 37610616 DOI: 10.1007/s11010-023-04827-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 08/09/2023] [Indexed: 08/24/2023]
Abstract
The ever-increasing availability of genome sequencing data has revealed a substantial number of uncharacterized genes without known functions across various organisms. The first comprehensive genome sequencing of E. coli K12 revealed that more than 50% of its open reading frames corresponded to transcripts with no known functions. The group of protein-coding genes without a functional description and/or a recognized pathway, beginning with the letter "Y", is classified as the "y-ome". Several efforts have been made to elucidate the functions of these genes and to recognize their role in biological processes. This review provides a brief update on various strategies employed when studying the y-ome, such as high-throughput experimental approaches, comparative omics, metabolic engineering, gene expression analysis, and data integration techniques. Additionally, we highlight recent advancements in functional annotation methods, including the use of machine learning, network analysis, and functional genomics approaches. Novel approaches are required to produce more precise functional annotations across the genome to reduce the number of genes with unknown functions.
Collapse
Affiliation(s)
- Salvia Sajid
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Maliha Mashkoor
- Department of Surgery, Center for Surgical Sciences, Zealand University Hospital, Lykkebækvej 1, 4600, Køge, Denmark
| | - Mikkel Girke Jørgensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Lars Porskjær Christensen
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Paul Robert Hansen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Henrik Franzyk
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Osman Mirza
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Bala Krishna Prabhala
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark.
| |
Collapse
|
3
|
Gong Y, Li Y, Liu X, Ma Y, Jiang L. A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals? J Anim Sci Biotechnol 2023; 14:73. [PMID: 37143156 PMCID: PMC10161434 DOI: 10.1186/s40104-023-00860-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/01/2023] [Indexed: 05/06/2023] Open
Abstract
As large-scale genomic studies have progressed, it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level. While domestic animals tend to have complex routes of origin and migration, suggesting a possible omission of some population-specific sequences in the current reference genome. Conversely, the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals (core genome) and is also able to display sequence information unique to each individual (variable genome). The progress of pangenome research in humans, plants and domestic animals has proved that the missing genetic components and the identification of large structural variants (SVs) can be explored through pangenomic studies. Many individual specific sequences have been shown to be related to biological adaptability, phenotype and important economic traits. The maturity of technologies and methods such as third-generation sequencing, Telomere-to-telomere genomes, graphic genomes, and reference-free assembly will further promote the development of pangenome. In the future, pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals, providing better insights into animal domestication, evolution and breeding. In this review, we mainly discuss how pangenome analysis reveals genetic variations in domestic animals (sheep, cattle, pigs, chickens) and their impacts on phenotypes and how this can contribute to the understanding of species diversity. Additionally, we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.
Collapse
Affiliation(s)
- Ying Gong
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
| | - Yefang Li
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
| | - Xuexue Liu
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- Centre d'Anthropobiologie et de Génomique de Toulouse, Université Paul Sabatier, 37 allées Jules Guesde, Toulouse, 31000, France
| | - Yuehui Ma
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
| | - Lin Jiang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
- National Germplasm Center of Domestic Animal Resources, Ministry of Technology, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China.
| |
Collapse
|
4
|
Roh H. A genome-wide association study of the occurrence of genetic variations in Edwardsiella piscicida, Vibrio harveyi, and Streptococcus parauberis under stressed environments. JOURNAL OF FISH DISEASES 2022; 45:1373-1388. [PMID: 35735095 PMCID: PMC9541752 DOI: 10.1111/jfd.13668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
Bacterial mutation and genetic diversity in aquaculture have led to increasing phenotypic variances, which can weaken or invalidate strategies for controlling diseases. However, few studies have monitored the degree of mutation in fish bacterial pathogens caused by environmental pressure within a short period. In this study, transcriptomic sequences from Edwardsiella piscicida, Vibrio harveyi and Streptococcus parauberis under stressed environments were used for investigating the emergence of variants. In detail, a sub-inhibitory concentration of formalin and phenol for E. piscicida, sea water at 30°C for V. harveyi and flounder serum for S. parauberis were used as stressed environments, and significant single-nucleotide polymorphisms (SNPs) and/or mutation sites were investigated after culture in the ordinary liquid media (control) and the stressed environment through a genome-wide association study. As results, several SNPs or mutations during incubation were observed under different environments in E. piscicida and/or V. harveyi in the genes relevant to flagella, fimbria type 3 secretion systems, and outer and inner membranes that have been directly exposed to external environments. In particular, given that flagella and fimbriae are considered important factors in differentiating the serotypes in some bacterial pathogens, it can be speculated that different environmental pressures are the source of phenotypic or serotypic differentiation from the same origin. On the other hands, S. parauberis did not exhibit notable changes for 4 h when inoculated in the serum from olive flounder. The results presented in this study provide examples of possible molecular evolution in pathogens relevant to the aquaculture industry as a response to different environmental pressure.
Collapse
Affiliation(s)
- HyeongJin Roh
- Pathogens and Disease TransferInstitute of Marine ResearchBergenNorway
| |
Collapse
|
5
|
Tantoso E, Eisenhaber B, Eisenhaber F. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes. Methods Mol Biol 2022; 2449:299-324. [PMID: 35507269 DOI: 10.1007/978-1-0716-2095-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The paradigm shift associated with the introduction of the pan-genome concept has drawn the attention from singular reference genomes toward the actual sequence diversity within organism populations, strain collections, clades, etc. A single genome is no longer sufficient to describe bacteria of interest, but instead, the genomic repertoire of all existing strains is the key to the metabolic, evolutionary, or pathogenic potential of a species. The classification of orthologous genes derived from a collection of taxonomically related genome sequences is central to bacterial pan-genome computational analysis. In this work, we present a review of methods for computing pan-genome gene clusters including their comparative analysis for the case of Streptococcus pyogenes strain genomes. We exhaustively scanned the parametrization space of the homologue searching procedures and find optimal parameters (sequence identity (60%) and coverage (50-60%) in the pairwise alignment) for the orthologous clustering of gene sequences. We find that the sequence identity threshold influences the number of gene families ~3 times stronger than the sequence coverage threshold.
Collapse
Affiliation(s)
- Erwin Tantoso
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Genome Institute Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Frank Eisenhaber
- Genome Institute and Bioinformatics Institute, Singapore, Singapore.
| |
Collapse
|
6
|
Guo J, Pang E, Song H, Lin K. A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes. BMC Bioinformatics 2021; 22:282. [PMID: 34044757 PMCID: PMC8161984 DOI: 10.1186/s12859-021-04149-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 04/25/2021] [Indexed: 11/25/2022] Open
Abstract
Background With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. Results We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. Conclusions Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB. Supplementary information The online version contains supplementary material available at 10.1186/s12859-021-04149-w.
Collapse
Affiliation(s)
- Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.
| |
Collapse
|
7
|
Zhong C, Chen C, Wang L, Ning K. Integrating pan-genome with metagenome for microbial community profiling. Comput Struct Biotechnol J 2021; 19:1458-1466. [PMID: 33841754 PMCID: PMC8010324 DOI: 10.1016/j.csbj.2021.02.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 02/24/2021] [Accepted: 02/27/2021] [Indexed: 02/07/2023] Open
Abstract
Advances in sequencing technology have led to the increased availability of genomes and metagenomes, which has greatly facilitated microbial pan-genome and metagenome analysis in the community. In line with this trend, studies on microbial genomes and phenotypes have gradually shifted from individuals to environmental communities. Pan-genomics and metagenomics are powerful strategies for in-depth profiling study of microbial communities. Pan-genomics focuses on genetic diversity, dynamics, and phylogeny at the multi-genome level, while metagenomics profiles the distribution and function of culture-free microbial communities in special environments. Combining pan-genome and metagenome analysis can reveal the microbial complicated connections from an individual complete genome to a mixture of genomes, thereby extending the catalog of traditional individual genomic profile to community microbial profile. Therefore, the combination of pan-genome and metagenome approaches has become a promising method to track the sources of various microbes and decipher the population-level evolution and ecosystem functions. This review summarized the pan-genome and metagenome approaches, the combined strategies of pan-genome and metagenome, and applications of these combined strategies in studies of microbial dynamics, evolution, and function in communities. We discussed emerging strategies for the study of microbial communities that integrate information in both pan-genome and metagenome. We emphasized studies in which the integrating pan-genome with metagenome approach improved the understanding of models of microbial community profiles, both structural and functional. Finally, we illustrated future perspectives of microbial community profile: more advanced analytical techniques, including big-data based artificial intelligence, will lead to an even better understanding of the patterns of microbial communities.
Collapse
Affiliation(s)
- Chaofang Zhong
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China.,Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Chaoyun Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China.,City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| |
Collapse
|
8
|
Holley G, Melsted P. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs. Genome Biol 2020; 21:249. [PMID: 32943081 PMCID: PMC7499882 DOI: 10.1186/s13059-020-02135-8] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 08/06/2020] [Indexed: 02/07/2023] Open
Abstract
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based assemblers reduce the complexity by compacting paths into single vertices, but this is challenging as it requires the uncompacted de Bruijn graph to be available in memory. We present a parallel and memory-efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted graph. Bifrost features a broad range of functions, such as indexing, editing, and querying the graph, and includes a graph coloring method that maps each k-mer of the graph to the genomes it occurs in.Availability https://github.com/pmelsted/bifrost.
Collapse
Affiliation(s)
- Guillaume Holley
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland.
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
9
|
Bonnici V, Maresi E, Giugno R. Challenges in gene-oriented approaches for pangenome content discovery. Brief Bioinform 2020; 22:5901976. [PMID: 32893299 DOI: 10.1093/bib/bbaa198] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/14/2020] [Accepted: 08/04/2020] [Indexed: 01/17/2023] Open
Abstract
Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
| | - Emiliano Maresi
- The Microsoft Research, University of Trento Centre for Computational and Systems Biology
| | - Rosalba Giugno
- Computer Science and Bioinformatics, referent of the Master Degree in Medical Bioinformatics
| |
Collapse
|
10
|
Anani H, Zgheib R, Hasni I, Raoult D, Fournier PE. Interest of bacterial pangenome analyses in clinical microbiology. Microb Pathog 2020; 149:104275. [PMID: 32562810 DOI: 10.1016/j.micpath.2020.104275] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 05/22/2020] [Accepted: 05/25/2020] [Indexed: 12/12/2022]
Abstract
Thanks to the progress and decreasing costs in genome sequencing technologies, more than 250,000 bacterial genomes are currently available in public databases, covering most, if not all, of the major human-associated phylogenetic groups of these microorganisms, pathogenic or not. In addition, for many of them, sequences from several strains of a given species are available, thus enabling to evaluate their genetic diversity and study their evolution. In addition, the significant cost reduction of bacterial whole genome sequencing as well as the rapid increase in the number of available bacterial genomes have prompted the development of pangenomic software tools. The study of bacterial pangenome has many applications in clinical microbiology. It can unveil the pathogenic potential and ability of bacteria to resist antimicrobials as well identify specific sequences and predict antigenic epitopes that allow molecular or serologic assays and vaccines to be designed. Bacterial pangenome constitutes a powerful method for understanding the history of human bacteria and relating these findings to diagnosis in clinical microbiology laboratories in order to optimize patient management.
Collapse
Affiliation(s)
- Hussein Anani
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France
| | - Rita Zgheib
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France
| | - Issam Hasni
- Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), UMR Microbes Evolution Phylogeny and Infections (MEPHI), Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France
| | - Didier Raoult
- Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Aix-Marseille Université, Institut de Recherche pour le Développement (IRD), UMR Microbes Evolution Phylogeny and Infections (MEPHI), Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France; Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Pierre-Edouard Fournier
- Aix Marseille Univ, Institut de Recherche pour le Développement (IRD), Service de Santé des Armées, AP-HM, UMR Vecteurs Infections Tropicales et Méditerranéennes (VITROME), Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France; Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France.
| |
Collapse
|
11
|
Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, Perrin A, Médigue C, Calteau A, Cruveiller S, Matias C, Ambroise C, Rocha EPC, Vallenet D. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol 2020; 16:e1007732. [PMID: 32191703 PMCID: PMC7108747 DOI: 10.1371/journal.pcbi.1007732] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 03/31/2020] [Accepted: 02/12/2020] [Indexed: 12/21/2022] Open
Abstract
The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don't account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.
Collapse
Affiliation(s)
- Guillaume Gautreau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Adelme Bazin
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Mathieu Gachet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Rémi Planel
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Laura Burlot
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Mathieu Dubois
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Amandine Perrin
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
- Sorbonne Université, Collège doctoral, Paris, France
| | - Claudine Médigue
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Alexandra Calteau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Stéphane Cruveiller
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Catherine Matias
- Laboratoire de Probabilités, Statistique et Modélisation, Sorbonne Université, Université de Paris, Centre National de la Recherche Scientifique, Paris, France
| | - Christophe Ambroise
- Laboratoire de Mathématiques et Modélisation d’Evry, UMR CNRS 8071, Université d’Evry Val d’Essonne, Evry, France
| | - Eduardo P. C. Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| |
Collapse
|
12
|
Chen X, Zhang Y, Zhang Z, Zhao Y, Sun C, Yang M, Wang J, Liu Q, Zhang B, Chen M, Yu J, Wu J, Jin Z, Xiao J. PGAweb: A Web Server for Bacterial Pan-Genome Analysis. Front Microbiol 2018; 9:1910. [PMID: 30186253 PMCID: PMC6110895 DOI: 10.3389/fmicb.2018.01910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 07/30/2018] [Indexed: 01/22/2023] Open
Abstract
An astronomical increase in microbial genome data in recent years has led to strong demand for bioinformatic tools for pan-genome analysis within and across species. Here, we present PGAweb, a user-friendly, web-based tool for bacterial pan-genome analysis, which is composed of two main pan-genome analysis modules, PGAP and PGAP-X. PGAweb provides key interactive and customizable functions that include orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. PGAweb presents features of genomic structural dynamics and sequence diversity with different visualization methods that are helpful for intuitively understanding the dynamics and evolution of bacterial genomes. PGAweb has an intuitive interface with one-click setting of parameters and is freely available at http://PGAweb.vlcc.cn/.
Collapse
Affiliation(s)
- Xinyu Chen
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Yadong Zhang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Zhewen Zhang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Yongbing Zhao
- Lymphocyte Nuclear Biology, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, United States
| | - Chen Sun
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, United States
| | - Ming Yang
- Office of General Affairs, Chinese Academy of Sciences, Beijing, China
| | - Jinyue Wang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qian Liu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Center of Scientific Computing Applications and Research, Chinese Academy of Sciences, Beijing, China
| | - Baohua Zhang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Center of Scientific Computing Applications and Research, Chinese Academy of Sciences, Beijing, China
| | - Meili Chen
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jiayan Wu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhong Jin
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China.,Center of Scientific Computing Applications and Research, Chinese Academy of Sciences, Beijing, China
| | - Jingfa Xiao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|