1
|
Nuhamunada M, Mohite OS, Phaneuf P, Palsson B, Weber T. BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets. Nucleic Acids Res 2024; 52:5478-5495. [PMID: 38686794 PMCID: PMC11162802 DOI: 10.1093/nar/gkae314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/22/2024] [Accepted: 04/11/2024] [Indexed: 05/02/2024] Open
Abstract
Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.
Collapse
Affiliation(s)
- Matin Nuhamunada
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Omkar S Mohite
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Patrick V Phaneuf
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Bernhard O Palsson
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Tilmann Weber
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| |
Collapse
|
2
|
Wang T, Shi Y, Zheng M, Zheng J. Comparative Genomics Unveils Functional Diversity, Pangenome Openness, and Underlying Biological Drivers among Bacillus subtilis Group. Microorganisms 2024; 12:986. [PMID: 38792815 PMCID: PMC11124052 DOI: 10.3390/microorganisms12050986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 05/04/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
The Bacillus subtilis group (Bs group), with Bacillus subtilis as its core species, holds significant research and economic value in various fields, including science, industrial production, food, and pharmaceuticals. However, most studies have been confined to comparative genomics analyses and exploration within individual genomes at the level of species, with few conducted within groups across different species. This study focused on Bacillus subtilis, the model of Gram-positive bacteria, and 14 other species with significant research value, employing comparative pangenomics as well as population enrichment analysis to ascertain the functional enrichment and diversity. Through the quantification of pangenome openness, this work revealed the underlying biological drivers and significant correlation between pangenome openness and various factors, including the distribution of toxin-antitoxin- and integrase-related genes, as well as the number of endonucleases, recombinases, repair system-related genes, prophages, integrases, and transfer mobile elements. Furthermore, the functional enrichment results indicated the potential for secondary metabolite, probiotic, and antibiotic exploration in Bacillus licheniformis, Bacillus paralicheniformis, and Bacillus spizizenii, respectively. In general, this work systematically exposed the quantification of pangenome openness, biological drivers, the pivotal role of genomic instability factors, and mobile elements, providing targeted exploration guidance for the Bs group.
Collapse
Affiliation(s)
- Taiquan Wang
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China; (T.W.); (Y.S.); (M.Z.)
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yiling Shi
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China; (T.W.); (Y.S.); (M.Z.)
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Mengzhuo Zheng
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China; (T.W.); (Y.S.); (M.Z.)
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinshui Zheng
- National Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China; (T.W.); (Y.S.); (M.Z.)
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
3
|
Rahman MS, Shimul MEK, Parvez MAK. Comprehensive analysis of genomic variation, pan-genome and biosynthetic potential of Corynebacterium glutamicum strains. PLoS One 2024; 19:e0299588. [PMID: 38718091 PMCID: PMC11078359 DOI: 10.1371/journal.pone.0299588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 02/13/2024] [Indexed: 05/12/2024] Open
Abstract
Corynebacterium glutamicum is a non-pathogenic species of the Corynebacteriaceae family. It has been broadly used in industrial biotechnology for the production of valuable products. Though it is widely accepted at the industrial level, knowledge about the genomic diversity of the strains is limited. Here, we investigated the comparative genomic features of the strains and pan-genomic characteristics. We also observed phylogenetic relationships among the strains based on average nucleotide identity (ANI). We found diversity between strains at the genomic and pan-genomic levels. Less than one-third of the C. glutamicum pan-genome consists of core genes and soft-core genes. Whereas, a large number of strain-specific genes covered about half of the total pan-genome. Besides, C. glutamicum pan-genome is open and expanding, which indicates the possible addition of new gene families to the pan-genome. We also investigated the distribution of biosynthetic gene clusters (BGCs) among the strains. We discovered slight variations of BGCs at the strain level. Several BGCs with the potential to express novel bioactive secondary metabolites have been identified. Therefore, by utilizing the characteristic advantages of C. glutamicum, different strains can be potential applicants for natural drug discovery.
Collapse
Affiliation(s)
- Md. Shahedur Rahman
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Bioinformatics and Microbial Biotechnology Laboratory, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md. Ebrahim Khalil Shimul
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Bioinformatics and Microbial Biotechnology Laboratory, Jashore University of Science and Technology, Jashore, Bangladesh
| | | |
Collapse
|
4
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
5
|
Garcia JF, Morales-Cruz A, Cochetel N, Minio A, Figueroa-Balderas R, Rolshausen PE, Baumgartner K, Cantu D. Comparative Pangenomic Insights into the Distinct Evolution of Virulence Factors Among Grapevine Trunk Pathogens. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2024; 37:127-142. [PMID: 37934016 DOI: 10.1094/mpmi-09-23-0129-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The permanent organs of grapevines (Vitis vinifera L.), like those of other woody perennials, are colonized by various unrelated pathogenic ascomycete fungi secreting cell wall-degrading enzymes and phytotoxic secondary metabolites that contribute to host damage and disease symptoms. Trunk pathogens differ in the symptoms they induce and the extent and speed of damage. Isolates of the same species often display a wide virulence range, even within the same vineyard. This study focuses on Eutypa lata, Neofusicoccum parvum, and Phaeoacremonium minimum, causal agents of Eutypa dieback, Botryosphaeria dieback, and Esca, respectively. We sequenced 50 isolates from viticulture regions worldwide and built nucleotide-level, reference-free pangenomes for each species. Through examination of genomic diversity and pangenome structure, we analyzed intraspecific conservation and variability of putative virulence factors, focusing on functions under positive selection and recent gene family dynamics of contraction and expansion. Our findings reveal contrasting distributions of putative virulence factors in the core, dispensable, and private genomes of each pangenome. For example, carbohydrate active enzymes (CAZymes) were prevalent in the core genomes of each pangenome, whereas biosynthetic gene clusters were prevalent in the dispensable genomes of E. lata and P. minimum. The dispensable fractions were also enriched in Gypsy transposable elements and virulence factors under positive selection (polyketide synthase genes in E. lata and P. minimum, glycosyltransferases in N. parvum). Our findings underscore the complexity of the genomic architecture in each species and provide insights into their adaptive strategies, enhancing our understanding of the underlying mechanisms of virulence. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Jadran F Garcia
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
| | - Abraham Morales-Cruz
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
- U.S. Department of Energy, Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, U.S.A
| | - Noé Cochetel
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
| | - Andrea Minio
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
| | - Rosa Figueroa-Balderas
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
| | - Philippe E Rolshausen
- Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA, U.S.A
| | - Kendra Baumgartner
- Crops Pathology and Genetics Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Davis, CA, U.S.A
| | - Dario Cantu
- Department of Viticulture and Enology, University of California, Davis, Davis, CA, U.S.A
- Genome Center, University of California, Davis, Davis, CA, U.S.A
| |
Collapse
|
6
|
Schulz T, Parmigiani L, Rempel A, Stoye J. Methods for Pangenomic Core Detection. Methods Mol Biol 2024; 2802:73-106. [PMID: 38819557 DOI: 10.1007/978-1-0716-3838-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Computational pangenomics deals with the joint analysis of all genomic sequences of a species. It has already been successfully applied to various tasks in many research areas. Further advances in DNA sequencing technologies constantly let more and more genomic sequences become available for many species, leading to an increasing attractiveness of pangenomic studies. At the same time, larger datasets also pose new challenges for data structures and algorithms that are needed to handle the data. Efficient methods oftentimes make use of the concept of k-mers.Core detection is a common way of analyzing a pangenome. The pangenome's core is defined as the subset of genomic information shared among all individual members. Classically, it is not only determined on the abstract level of genes but can also be described on the sequence level.In this chapter, we provide an overview of k-mer-based methods in the context of pangenomics studies. We first revisit existing software solutions for k-mer counting and k-mer set representation. Afterward, we describe the usage of two k-mer-based approaches, Pangrowth and Corer, for pangenomic core detection.
Collapse
Affiliation(s)
- Tizian Schulz
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Luca Parmigiani
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Andreas Rempel
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
7
|
Hyun JC, Monk JM, Szubin R, Hefner Y, Palsson BO. Global pathogenomic analysis identifies known and candidate genetic antimicrobial resistance determinants in twelve species. Nat Commun 2023; 14:7690. [PMID: 38001096 PMCID: PMC10673929 DOI: 10.1038/s41467-023-43549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
Surveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates. Validation of two candidates in E. coli BW25113 reveals cases of conditional resistance: ΔcycA confers ciprofloxacin resistance in minimal media with D-serine, and frdD V111D confers ampicillin resistance in the presence of ampC by modifying the overlapping promoter. We expect this approach to be adaptable to other species and phenotypes.
Collapse
Affiliation(s)
- Jason C Hyun
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Jonathan M Monk
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Richard Szubin
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Ying Hefner
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Bernhard O Palsson
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800, Kongens, Lyngby, Denmark.
| |
Collapse
|
8
|
Rajput A, Chauhan SM, Mohite OS, Hyun JC, Ardalani O, Jahn LJ, Sommer MO, Palsson BO. Pangenome analysis reveals the genetic basis for taxonomic classification of the Lactobacillaceae family. Food Microbiol 2023; 115:104334. [PMID: 37567624 DOI: 10.1016/j.fm.2023.104334] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/29/2023] [Accepted: 07/05/2023] [Indexed: 08/13/2023]
Abstract
Lactobacillaceae represent a large family of important microbes that are foundational to the food industry. Many genome sequences of Lactobacillaceae strains are now available, enabling us to conduct a comprehensive pangenome analysis of this family. We collected 3591 high-quality genomes from public sources and found that: 1) they contained enough genomes for 26 species to perform a pangenomic analysis, 2) the normalized Heap's coefficient λ (a measure of pangenome openness) was found to have an average value of 0.27 (ranging from 0.07 to 0.37), 3) the pangenome openness was correlated with the abundance and genomic location of transposons and mobilomes, 4) the pangenome for each species was divided into core, accessory, and rare genomes, that highlight the species-specific properties (such as motility and restriction-modification systems), 5) the pangenome of Lactiplantibacillus plantarum (which contained the highest number of genomes found amongst the 26 species studied) contained nine distinct phylogroups, and 6) genome mining revealed a richness of detected biosynthetic gene clusters, with functions ranging from antimicrobial and probiotic to food preservation, but ∼93% were of unknown function. This study provides the first in-depth comparative pangenomics analysis of the Lactobacillaceae family.
Collapse
Affiliation(s)
- Akanksha Rajput
- Department of Bioengineering, University of California, San Diego, La Jolla, USA
| | - Siddharth M Chauhan
- Department of Bioengineering, University of California, San Diego, La Jolla, USA
| | - Omkar S Mohite
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kongens, Lyngby, Denmark
| | - Jason C Hyun
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, USA
| | - Omid Ardalani
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kongens, Lyngby, Denmark
| | - Leonie J Jahn
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kongens, Lyngby, Denmark
| | - Morten Oa Sommer
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kongens, Lyngby, Denmark
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Center for Microbiome Innovation, University of California San Diego, La Jolla, CA 92093, USA; Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kongens, Lyngby, Denmark.
| |
Collapse
|
9
|
Nageeb WM, Hetta HF. Pangenome analysis of Corynebacterium striatum: insights into a neglected multidrug-resistant pathogen. BMC Microbiol 2023; 23:252. [PMID: 37684624 PMCID: PMC10486106 DOI: 10.1186/s12866-023-02996-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
BACKGROUND Over the past two decades, Corynebacterium striatum has been increasingly isolated from clinical cultures with most isolates showing increased antimicrobial resistance (AMR) to last resort agents. Advances in the field of pan genomics would facilitate the understanding of the clinical significance of such bacterial species previously thought to be among commensals paving the way for identifying new drug targets and control strategies. METHODS We constructed a pan-genome using 310 genome sequences of C. striatum. Pan-genome analysis was performed using three tools including Roary, PIRATE, and PEPPAN. AMR genes and virulence factors have been studied in relation to core genome phylogeny. Genomic Islands (GIs), Integrons, and Prophage regions have been explored in detail. RESULTS The pan-genome ranges between a total of 5253-5857 genes with 2070 - 1899 core gene clusters. Some antimicrobial resistance genes have been identified in the core genome portion, but most of them were located in the dispensable genome. In addition, some well-known virulence factors described in pathogenic Corynebacterium species were located in the dispensable genome. A total of 115 phage species have been identified with only 44 intact prophage regions. CONCLUSION This study presents a detailed comparative pangenome report of C. striatum. The species show a very slowly growing pangenome with relatively high number of genes in the core genome contributing to lower genomic variation. Prophage elements carrying AMR and virulence elements appear to be infrequent in the species. GIs appear to offer a prominent role in mobilizing antibiotic resistance genes in the species and integrons occur at a frequency of 50% in the species. Control strategies should be directed against virulence and resistance determinants carried on the core genome and those frequently occurring in the accessory genome.
Collapse
Affiliation(s)
- Wedad M Nageeb
- Department of Medical Microbiology and Immunology, Faculty of Medicine, Suez Canal University, Ismailia, 41111, Egypt.
| | - Helal F Hetta
- Department of Medical Microbiology and Immunology, Faculty of Medicine, Assiut University, Assiut, 71515, Egypt.
| |
Collapse
|
10
|
Hyun JC, Palsson BO. Reconstruction of the last bacterial common ancestor from 183 pangenomes reveals a versatile ancient core genome. Genome Biol 2023; 24:183. [PMID: 37553643 PMCID: PMC10411014 DOI: 10.1186/s13059-023-03028-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/28/2023] [Indexed: 08/10/2023] Open
Abstract
BACKGROUND Cumulative sequencing efforts have yielded enough genomes to construct pangenomes for dozens of bacterial species and elucidate intraspecies gene conservation. Given the diversity of organisms for which this is achievable, similar analyses for ancestral species are feasible through the integration of pangenomics and phylogenetics, promising deeper insights into the nature of ancient life. RESULTS We construct pangenomes for 183 bacterial species from 54,085 genomes and identify their core genomes using a novel statistical model to estimate genome-specific error rates and underlying gene frequencies. The core genomes are then integrated into a phylogenetic tree to reconstruct the core genome of the last bacterial common ancestor (LBCA), yielding three main results: First, the gene content of modern and ancestral core genomes are diverse at the level of individual genes but are similarly distributed by functional category and share several poorly characterized genes. Second, the LBCA core genome is distinct from any individual modern core genome but has many fundamental biological systems intact, especially those involving translation machinery and biosynthetic pathways to all major nucleotides and amino acids. Third, despite this metabolic versatility, the LBCA core genome likely requires additional non-core genes for viability, based on comparisons with the minimal organism, JCVI-Syn3A. CONCLUSIONS These results suggest that many cellular systems commonly conserved in modern bacteria were not just present in ancient bacteria but were nearly immutable with respect to short-term intraspecies variation. Extending this analysis to other domains of life will likely provide similar insights into more distant ancestral species.
Collapse
Affiliation(s)
- Jason C Hyun
- Bioinformatics and Systems Biology Program, University of California, La Jolla, San Diego, CA, USA
| | - Bernhard O Palsson
- Bioinformatics and Systems Biology Program, University of California, La Jolla, San Diego, CA, USA.
- Department of Bioengineering, University of California, La Jolla, San Diego, CA, USA.
| |
Collapse
|
11
|
Morales-Olavarría M, Nuñez-Belmar J, González D, Vicencio E, Rivas-Pardo JA, Cortez C, Cárdenas JP. Phylogenomic analysis of the Porphyromonas gingivalis - Porphyromonas gulae duo: approaches to the origin of periodontitis. Front Microbiol 2023; 14:1226166. [PMID: 37538845 PMCID: PMC10394638 DOI: 10.3389/fmicb.2023.1226166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 07/04/2023] [Indexed: 08/05/2023] Open
Abstract
Porphyromonas gingivalis is an oral human pathogen associated with the onset and progression of periodontitis, a chronic immune-inflammatory disease characterized by the destruction of the teeth-supporting tissue. P. gingivalis belongs to the genus Porphyromonas, which is characterized by being composed of Gram-negative, asaccharolytic, non-spore-forming, non-motile, obligatory anaerobic species, inhabiting niches such as the oral cavity, urogenital tract, gastrointestinal tract and infected wound from different mammals including humans. Among the Porphyromonas genus, P. gingivalis stands out for its specificity in colonizing the human oral cavity and its keystone pathogen role in periodontitis pathogenesis. To understand the evolutionary process behind P. gingivalis in the context of the Pophyoromonas genus, in this study, we performed a comparative genomics study with publicly available Porphyromonas genomes, focused on four main objectives: (A) to confirm the phylogenetic position of P. gingivalis in the Porphyromonas genus by phylogenomic analysis; (B) the definition and comparison of the pangenomes of P. gingivalis and its relative P. gulae; and (C) the evaluation of the gene family gain/loss events during the divergence of P. gingivalis and P. gulae; (D) the evaluation of the evolutionary pressure (represented by the calculation of Tajima-D values and dN/dS ratios) comparing gene families of P. gingivalis and P. gulae. Our analysis found 84 high-quality assemblies representing P. gingivalis and 14 P. gulae strains (from a total of 233 Porphyromonas genomes). Phylogenomic analysis confirmed that P. gingivalis and P. gulae are highly related lineages, close to P. loveana. Both organisms harbored open pangenomes, with a strong core-to-accessory ratio for housekeeping genes and a negative ratio for unknown function genes. Our analyses also characterized the gene set differentiating P. gulae from P. gingivalis, mainly associated with unknown functions. Relevant virulence factors, such as the FimA, Mfa1, and the hemagglutinins, are conserved in P. gulae, P. gingivalis, and P. loveana, suggesting that the origin of those factors occurred previous to the P. gulae - P. gingivalis divergence. These results suggest an unexpected evolutionary relationship between the P. gulae - P. gingivalis duo and P. loveana, showing more clues about the origin of the role of those organisms in periodontitis.
Collapse
Affiliation(s)
- Mauricio Morales-Olavarría
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Josefa Nuñez-Belmar
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Dámariz González
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Emiliano Vicencio
- Escuela de Tecnología Médica, Facultad de Ciencias, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Jaime Andres Rivas-Pardo
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
- Escuela de Biotecnología, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| | - Cristian Cortez
- Escuela de Tecnología Médica, Facultad de Ciencias, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Juan P. Cárdenas
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
- Escuela de Biotecnología, Facultad de Ciencias, Ingeniería y Tecnología, Universidad Mayor, Santiago, Chile
| |
Collapse
|
12
|
Wang Z, Kim W, Wang YW, Yakubovich E, Dong C, Trail F, Townsend JP, Yarden O. The Sordariomycetes: an expanding resource with Big Data for mining in evolutionary genomics and transcriptomics. FRONTIERS IN FUNGAL BIOLOGY 2023; 4:1214537. [PMID: 37746130 PMCID: PMC10512317 DOI: 10.3389/ffunb.2023.1214537] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 06/06/2023] [Indexed: 09/26/2023]
Abstract
Advances in genomics and transcriptomics accompanying the rapid accumulation of omics data have provided new tools that have transformed and expanded the traditional concepts of model fungi. Evolutionary genomics and transcriptomics have flourished with the use of classical and newer fungal models that facilitate the study of diverse topics encompassing fungal biology and development. Technological advances have also created the opportunity to obtain and mine large datasets. One such continuously growing dataset is that of the Sordariomycetes, which exhibit a richness of species, ecological diversity, economic importance, and a profound research history on amenable models. Currently, 3,574 species of this class have been sequenced, comprising nearly one-third of the available ascomycete genomes. Among these genomes, multiple representatives of the model genera Fusarium, Neurospora, and Trichoderma are present. In this review, we examine recently published studies and data on the Sordariomycetes that have contributed novel insights to the field of fungal evolution via integrative analyses of the genetic, pathogenic, and other biological characteristics of the fungi. Some of these studies applied ancestral state analysis of gene expression among divergent lineages to infer regulatory network models, identify key genetic elements in fungal sexual development, and investigate the regulation of conidial germination and secondary metabolism. Such multispecies investigations address challenges in the study of fungal evolutionary genomics derived from studies that are often based on limited model genomes and that primarily focus on the aspects of biology driven by knowledge drawn from a few model species. Rapidly accumulating information and expanding capabilities for systems biological analysis of Big Data are setting the stage for the expansion of the concept of model systems from unitary taxonomic species/genera to inclusive clusters of well-studied models that can facilitate both the in-depth study of specific lineages and also investigation of trait diversity across lineages. The Sordariomycetes class, in particular, offers abundant omics data and a large and active global research community. As such, the Sordariomycetes can form a core omics clade, providing a blueprint for the expansion of our knowledge of evolution at the genomic scale in the exciting era of Big Data and artificial intelligence, and serving as a reference for the future analysis of different taxonomic levels within the fungal kingdom.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Wonyong Kim
- Korean Lichen Research Institute, Sunchon National University, Suncheon, Republic of Korea
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Elizabeta Yakubovich
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Frances Trail
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
13
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
14
|
Blázquez B, San León D, Rojas A, Tortajada M, Nogales J. New Insights on Metabolic Features of Bacillus subtilis Based on Multistrain Genome-Scale Metabolic Modeling. Int J Mol Sci 2023; 24:ijms24087091. [PMID: 37108252 PMCID: PMC10138676 DOI: 10.3390/ijms24087091] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/01/2023] [Accepted: 04/10/2023] [Indexed: 04/29/2023] Open
Abstract
Bacillus subtilis is an effective workhorse for the production of many industrial products. The high interest aroused by B. subtilis has guided a large metabolic modeling effort of this species. Genome-scale metabolic models (GEMs) are powerful tools for predicting the metabolic capabilities of a given organism. However, high-quality GEMs are required in order to provide accurate predictions. In this work, we construct a high-quality, mostly manually curated genome-scale model for B. subtilis (iBB1018). The model was validated by means of growth performance and carbon flux distribution and provided significantly more accurate predictions than previous models. iBB1018 was able to predict carbon source utilization with great accuracy while identifying up to 28 metabolites as potential novel carbon sources. The constructed model was further used as a tool for the construction of the panphenome of B. subtilis as a species, by means of multistrain genome-scale reconstruction. The panphenome space was defined in the context of 183 GEMs representative of 183 B. subtilis strains and the array of carbon sources sustaining growth. Our analysis highlights the large metabolic versatility of the species and the important role of the accessory metabolism as a driver of the panphenome, at a species level.
Collapse
Affiliation(s)
- Blas Blázquez
- Department of Systems Biology, Centro Nacional de Biotecnología, CSIC, 28049 Madrid, Spain
- Interdisciplinary Platform for Sustainable Plastics towards a Circular Economy-Spanish National Research Council (SusPlast-CSIC), 28040 Madrid, Spain
| | - David San León
- Department of Systems Biology, Centro Nacional de Biotecnología, CSIC, 28049 Madrid, Spain
- Interdisciplinary Platform for Sustainable Plastics towards a Circular Economy-Spanish National Research Council (SusPlast-CSIC), 28040 Madrid, Spain
| | - Antonia Rojas
- Archer Daniels Midland, Nutrition, Biopolis S.L. Parc Científic Universitat de València, Carrer del Catedrático Agustín Escardino Benlloch, 9, 46980 Paterna, Spain
| | - Marta Tortajada
- Archer Daniels Midland, Nutrition, Biopolis S.L. Parc Científic Universitat de València, Carrer del Catedrático Agustín Escardino Benlloch, 9, 46980 Paterna, Spain
| | - Juan Nogales
- Department of Systems Biology, Centro Nacional de Biotecnología, CSIC, 28049 Madrid, Spain
- Interdisciplinary Platform for Sustainable Plastics towards a Circular Economy-Spanish National Research Council (SusPlast-CSIC), 28040 Madrid, Spain
| |
Collapse
|
15
|
Comparative genomic analysis of Stenotrophomonas maltophilia unravels their genetic variations and versatility trait. J Appl Genet 2023; 64:351-360. [PMID: 36892794 DOI: 10.1007/s13353-023-00752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 12/26/2022] [Accepted: 02/10/2023] [Indexed: 03/10/2023]
Abstract
Stenotrophomonas maltophilia is a species with immensely broad phenotypic and genotypic diversity that could widely distribute in natural and clinical environments. However, little attention has been paid to reveal their genome plasticity to diverse environments. In the present study, a comparative genomic analysis of S. maltophilia isolated from clinical and natural sources was systematically explored its genetic diversity of 42 sequenced genomes. The results showed that S. maltophilia owned an open pan-genome and had strong adaptability to different environments. A total of 1612 core genes were existed with an average of 39.43% of each genome, and the shared core genes might be necessary to maintain the basic characteristics of those S. maltophilia strains. Based on the results of the phylogenetic tree, the ANI value, and the distribution of accessory genes, genes associated with the fundamental process of those strains from the same habitat were found to be mostly conserved in evolution. Isolates from the same habitat had a high degree of similarity in COG category, and the most significant KEGG pathways were mainly involved in carbohydrate and amino acid metabolism, indicating that genes related to essential processes were mostly conserved in evolution for the clinical and environmental settings. Meanwhile, the number of resistance and efflux pump gene was significantly higher in the clinical setting than that of in the environmental setting. Collectively, this study highlights the evolutionary relationships of S. maltophilia isolated from clinical and environmental sources, shedding new light on its genomic diversity.
Collapse
|
16
|
Nielsen FD, Møller-Jensen J, Jørgensen MG. Adding context to the pneumococcal core genes using bioinformatic analysis of the intergenic pangenome of Streptococcus pneumoniae. FRONTIERS IN BIOINFORMATICS 2023; 3:1074212. [PMID: 36844929 PMCID: PMC9944727 DOI: 10.3389/fbinf.2023.1074212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 01/24/2023] [Indexed: 02/10/2023] Open
Abstract
Introduction: Whole genome sequencing offers great opportunities for linking genotypes to phenotypes aiding in our understanding of human disease and bacterial pathogenicity. However, these analyses often overlook non-coding intergenic regions (IGRs). By disregarding the IGRs, crucial information is lost, as genes have little biological function without expression. Methods/Results: In this study, we present the first complete pangenome of the important human pathogen Streptococcus pneumoniae (pneumococcus), spanning both the genes and IGRs. We show that the pneumococcus species retains a small core genome of IGRs that are present across all isolates. Gene expression is highly dependent on these core IGRs, and often several copies of these core IGRs are found across each genome. Core genes and core IGRs show a clear linkage as 81% of core genes are associated with core IGRs. Additionally, we identify a single IGR within the core genome that is always occupied by one of two highly distinct sequences, scattered across the phylogenetic tree. Discussion: Their distribution indicates that this IGR is transferred between isolates through horizontal regulatory transfer independent of the flanking genes and that each type likely serves different regulatory roles depending on their genetic context.
Collapse
Affiliation(s)
- Flemming Damgaard Nielsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark,Department of Clinical Microbiology, Odense University Hospital, Odense, Denmark
| | - Jakob Møller-Jensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Mikkel Girke Jørgensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark,*Correspondence: Mikkel Girke Jørgensen,
| |
Collapse
|
17
|
Chromosome-scale haplotype-resolved pangenomics. Trends Genet 2022; 38:1103-1107. [PMID: 35817620 DOI: 10.1016/j.tig.2022.06.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/14/2022] [Accepted: 06/16/2022] [Indexed: 01/24/2023]
Abstract
Complete pangenomics is crucial for understanding genetic diversity and evolution across the tree of life. Chromosome-scale, haplotype-resolved pangenomics allows complex structural variations, long-range interactions, and associated functions to be discerned in species populations. We explore the need for high-resolution pangenomes, discuss computational strategies for their development, and describe applications in biodiversity and human health.
Collapse
|
18
|
Dereeper A, Summo M, Meyer DF. PanExplorer: a web-based tool for exploratory analysis and visualization of bacterial pan-genomes. Bioinformatics 2022; 38:4412-4414. [PMID: 35916725 PMCID: PMC9477528 DOI: 10.1093/bioinformatics/btac504] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 07/09/2022] [Accepted: 07/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION As pan-genome approaches are largely employed for bacterial comparative genomics and evolution analyses, but still difficult to be carried out by non-bioinformatician biologists, there is a need for an innovative tool facilitating the exploration of bacterial pan-genomes. RESULTS PanExplorer is a web application providing various genomic analyses and reports, giving intuitive views that enable a better understanding of bacterial pan-genomes. As an example, we produced the pan-genome for 121 Anaplasmataceae strains (including 30 Ehrlichia, 15 Anaplasma, 68 Wolbachia). AVAILABILITY AND IMPLEMENTATION PanExplorer is written in Perl CGI and relies on several JavaScript libraries for visualization (hotmap.js, MauveViewer, CircosJS). It is freely available at http://panexplorer.southgreen.fr. The source code has been released in a GitHub repository https://github.com/SouthGreenPlatform/PanExplorer. A documentation section is available on PanExplorer website.
Collapse
Affiliation(s)
| | - Marilyne Summo
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France,CIRAD, UMR AGAP, F-34398 Montpellier, France
| | | |
Collapse
|