1
|
Lu Y, Li Q, Li T. PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment. Front Genet 2022; 13:839453. [PMID: 35444686 PMCID: PMC9013948 DOI: 10.3389/fgene.2022.839453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/17/2022] [Indexed: 11/17/2022] Open
Abstract
With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes.
Collapse
Affiliation(s)
- Yuntao Lu
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.,College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qi Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Tao Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|
2
|
Unravelling the role of hub genes associated with cardio renal syndrome through an integrated bioinformatics approach. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
3
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
4
|
Araújo CL, Blanco I, Souza L, Tiwari S, Pereira LC, Ghosh P, Azevedo V, Silva A, Folador A. In silico functional prediction of hypothetical proteins from the core genome of Corynebacterium pseudotuberculosis biovar ovis. PeerJ 2020; 8:e9643. [PMID: 32913672 PMCID: PMC7456259 DOI: 10.7717/peerj.9643] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 07/10/2020] [Indexed: 12/30/2022] Open
Abstract
Corynebacterium pseudotuberculosis is a pathogen of veterinary relevance diseases, being divided into two biovars: equi and ovis; causing ulcerative lymphangitis and caseous lymphadenitis, respectively. The isolation and sequencing of C. pseudotuberculosis biovar ovis strains in the Northern and Northeastern regions of Brazil exhibited the emergence of this pathogen, which causes economic losses to small ruminant producers, and condemnation of carcasses and skins of animals. Through the pan-genomic approach, it is possible to determine and analyze genes that are shared by all strains of a species—the core genome. However, many of these genes do not have any predicted function, being characterized as hypothetical proteins (HP). In this study, we considered 32 C. pseudotuberculosis biovar ovis genomes for the pan-genomic analysis, where were identified 172 HP present in a core genome composed by 1255 genes. We are able to functionally annotate 80 sequences previously characterized as HP through the identification of structural features as conserved domains and families. Furthermore, we analyzed the physicochemical properties, subcellular localization and molecular function. Additionally, through RNA-seq data, we investigated the differential gene expression of the annotated HP. Genes inserted in pathogenicity islands had their virulence potential evaluated. Also, we have analyzed the existence of functional associations for their products based on protein–protein interaction networks, and perform the structural prediction of three targets. Due to the integration of different strategies, this study can underlie deeper in vitro researches in the characterization of these HP and the search for new solutions for combat this pathogen.
Collapse
Affiliation(s)
- Carlos Leonardo Araújo
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Iago Blanco
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Luciana Souza
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Sandeep Tiwari
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Lino César Pereira
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Vasco Azevedo
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Artur Silva
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Adriana Folador
- Laboratory of Genomics and Bioinformatics, Center of Genomics and Systems Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
5
|
Amoako DG, Somboro AM, Abia ALK, Allam M, Ismail A, Bester LA, Essack SY. Genome Mining and Comparative Pathogenomic Analysis of An Endemic Methicillin-Resistant Staphylococcus Aureus (MRSA) Clone, ST612-CC8-t1257-SCCmec_IVd(2B), Isolated in South Africa. Pathogens 2019; 8:E166. [PMID: 31569754 PMCID: PMC6963616 DOI: 10.3390/pathogens8040166] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/19/2022] Open
Abstract
This study undertook genome mining and comparative genomics to gain genetic insights into the dominance of the methicillin-resistant Staphylococcus aureus (MRSA) endemic clone ST612-CC8-t1257-SCCmec_IVd(2B), obtained from the poultry food chain in South Africa. Functional annotation of the genome revealed a vast array of similar central metabolic, cellular and biochemical networks within the endemic clone crucial for its survival in the microbial community. In-silico analysis of the clone revealed the possession of uniform defense systems, restriction-modification system (type I and IV), accessory gene regulator (type I), arginine catabolic mobile element (type II), and type 1 clustered, regularly interspaced, short palindromic repeat (CRISPR)Cas array (N = 7 ± 1), which offer protection against exogenous attacks. The estimated pathogenic potential predicted a higher probability (average Pscore ≈ 0.927) of the clone being pathogenic to its host. The clone carried a battery of putative virulence determinants whose expression are critical for establishing infection. However, there was a slight difference in their possession of adherence factors (biofilm operon system) and toxins (hemolysins and enterotoxins). Further analysis revealed a conserved environmental tolerance and persistence mechanisms related to stress (oxidative and osmotic), heat shock, sporulation, bacteriocins, and detoxification, which enable it to withstand lethal threats and contribute to its success in diverse ecological niches. Phylogenomic analysis with close sister lineages revealed that the clone was closely related to the MRSA isolate SHV713 from Australia. The results of this bioinformatic analysis provide valuable insights into the biology of this endemic clone.
Collapse
Affiliation(s)
- Daniel Gyamfi Amoako
- Infection Genomics and Applied Bioinformatics Division, Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
| | - Anou M Somboro
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| | - Akebe Luther King Abia
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| | - Mushal Allam
- Sequencing Core Facility, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg 2131, South Africa.
| | - Arshad Ismail
- Sequencing Core Facility, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg 2131, South Africa.
| | - Linda A Bester
- Biomedical Resource Unit, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal; Durban 4000, South Africa.
| | - Sabiha Y Essack
- Antimicrobial Research Unit, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa.
| |
Collapse
|