1
|
Price MN, Arkin AP. Interactive tools for functional annotation of bacterial genomes. Database (Oxford) 2024; 2024:baae089. [PMID: 39241109 DOI: 10.1093/database/baae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/29/2024] [Accepted: 08/09/2024] [Indexed: 09/08/2024]
Abstract
Automated annotations of protein functions are error-prone because of our lack of knowledge of protein functions. For example, it is often impossible to predict the correct substrate for an enzyme or a transporter. Furthermore, much of the knowledge that we do have about the functions of proteins is missing from the underlying databases. We discuss how to use interactive tools to quickly find different kinds of information relevant to a protein's function. Many of these tools are available via PaperBLAST (http://papers.genomics.lbl.gov). Combining these tools often allows us to infer a protein's function. Ideally, accurate annotations would allow us to predict a bacterium's capabilities from its genome sequence, but in practice, this remains challenging. We describe interactive tools that infer potential capabilities from a genome sequence or that search a genome to find proteins that might perform a specific function of interest. Database URL: http://papers.genomics.lbl.gov.
Collapse
Affiliation(s)
- Morgan N Price
- Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, CA 94720, United States
| | - Adam P Arkin
- Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, CA 94720, United States
| |
Collapse
|
2
|
Volzhenin K, Bittner L, Carbone A. SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. iScience 2024; 27:110371. [PMID: 39055916 PMCID: PMC11269938 DOI: 10.1016/j.isci.2024.110371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/04/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open
Abstract
Ab initio computational reconstructions of protein-protein interaction (PPI) networks will provide invaluable insights into cellular systems, enabling the discovery of novel molecular interactions and elucidating biological mechanisms within and between organisms. Leveraging the latest generation protein language models and recurrent neural networks, we present SENSE-PPI, a sequence-based deep learning model that efficiently reconstructs ab initio PPIs, distinguishing partners among tens of thousands of proteins and identifying specific interactions within functionally similar proteins. SENSE-PPI demonstrates high accuracy, limited training requirements, and versatility in cross-species predictions, even with non-model organisms and human-virus interactions. Its performance decreases for phylogenetically more distant model and non-model organisms, but signal alteration is very slow. In this regard, it demonstrates the important role of parameters in protein language models. SENSE-PPI is very fast and can test 10,000 proteins against themselves in a matter of hours, enabling the reconstruction of genome-wide proteomes.
Collapse
Affiliation(s)
- Konstantin Volzhenin
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Lucie Bittner
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
- Institut Universitaire de France, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
3
|
Padalko A, Nair G, Sousa FL. Fusion/fission protein family identification in Archaea. mSystems 2024; 9:e0094823. [PMID: 38700364 PMCID: PMC11237513 DOI: 10.1128/msystems.00948-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/02/2024] [Indexed: 05/05/2024] Open
Abstract
The majority of newly discovered archaeal lineages remain without a cultivated representative, but scarce experimental data from the cultivated organisms show that they harbor distinct functional repertoires. To unveil the ecological as well as evolutionary impact of Archaea from metagenomics, new computational methods need to be developed, followed by in-depth analysis. Among them is the genome-wide protein fusion screening performed here. Natural fusions and fissions of genes not only contribute to microbial evolution but also complicate the correct identification and functional annotation of sequences. The products of these processes can be defined as fusion (or composite) proteins, the ones consisting of two or more domains originally encoded by different genes and split proteins, and the ones originating from the separation of a gene in two (fission). Fusion identifications are required for proper phylogenetic reconstructions and metabolic pathway completeness assessments, while mappings between fused and unfused proteins can fill some of the existing gaps in metabolic models. In the archaeal genome-wide screening, more than 1,900 fusion/fission protein clusters were identified, belonging to both newly sequenced and well-studied lineages. These protein families are mainly associated with different types of metabolism, genetic, and cellular processes. Moreover, 162 of the identified fusion/fission protein families are archaeal specific, having no identified fused homolog within the bacterial domain. Our approach was validated by the identification of experimentally characterized fusion/fission cases. However, around 25% of the identified fusion/fission families lack functional annotations for both composite and split states, showing the need for experimental characterization in Archaea.IMPORTANCEGenome-wide fusion screening has never been performed in Archaea on a broad taxonomic scale. The overlay of multiple computational techniques allows the detection of a fine-grained set of predicted fusion/fission families, instead of rough estimations based on conserved domain annotations only. The exhaustive mapping of fused proteins to bacterial organisms allows us to capture fusion/fission families that are specific to archaeal biology, as well as to identify links between bacterial and archaeal lineages based on cooccurrence of taxonomically restricted proteins and their sequence features. Furthermore, the identification of poorly characterized lineage-specific fusion proteins opens up possibilities for future experimental and computational investigations. This approach enhances our understanding of Archaea in general and provides potential candidates for in-depth studies in the future.
Collapse
Affiliation(s)
- Anastasiia Padalko
- Genome Evolution and Ecology Group, Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria
- Vienna Doctoral School of Ecology and Evolution, University of Vienna, Vienna, Austria
| | - Govind Nair
- Genome Evolution and Ecology Group, Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria
| | - Filipa L. Sousa
- Genome Evolution and Ecology Group, Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria
| |
Collapse
|
4
|
Gao Y, Ma B, Xu Q, Peng Y, Gong H, Guan A, Hua K, Langford PR, Jin H, Luo R. Spatial proximity and gene function: a new dimension in prokaryotic gene association network analysis with 3D-GeneNet. Brief Bioinform 2024; 25:bbae320. [PMID: 38975892 PMCID: PMC11229033 DOI: 10.1093/bib/bbae320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/22/2024] [Accepted: 06/18/2024] [Indexed: 07/09/2024] Open
Abstract
Understanding the biological functions and processes of genes, particularly those not yet characterized, is crucial for advancing molecular biology and identifying therapeutic targets. The hypothesis guiding this study is that the 3D proximity of genes correlates with their functional interactions and relevance in prokaryotes. We introduced 3D-GeneNet, an innovative software tool that utilizes high-throughput sequencing data from chromosome conformation capture techniques and integrates topological metrics to construct gene association networks. Through a series of comparative analyses focused on spatial versus linear distances, we explored various dimensions such as topological structure, functional enrichment levels, distribution patterns of linear distances among gene pairs, and the area under the receiver operating characteristic curve by utilizing model organism Escherichia coli K-12. Furthermore, 3D-GeneNet was shown to maintain good accuracy compared to multiple algorithms (neighbourhood, co-occurrence, coexpression, and fusion) across multiple bacteria, including E. coli, Brucella abortus, and Vibrio cholerae. In addition, the accuracy of 3D-GeneNet's prediction of long-distance gene interactions was identified by bacterial two-hybrid assays on E. coli K-12 MG1655, where 3D-GeneNet not only increased the accuracy of linear genomic distance tripled but also achieved 60% accuracy by running alone. Finally, it can be concluded that the applicability of 3D-GeneNet will extend to various bacterial forms, including Gram-negative, Gram-positive, single-, and multi-chromosomal bacteria through Hi-C sequencing and analysis. Such findings highlight the broad applicability and significant promise of this method in the realm of gene association network. 3D-GeneNet is freely accessible at https://github.com/gaoyuanccc/3D-GeneNet.
Collapse
Affiliation(s)
- Yuan Gao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Qianshuai Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Yuna Peng
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Aohan Guan
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Kexin Hua
- Swine Genome and Breeding Team, Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, Hainan Province 572024, China
| | - Paul R Langford
- Section of Paediatric Infectious Disease, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- College of Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei, China
| |
Collapse
|
5
|
Price MN, Arkin AP. A fast comparative genome browser for diverse bacteria and archaea. PLoS One 2024; 19:e0301871. [PMID: 38593165 PMCID: PMC11003636 DOI: 10.1371/journal.pone.0301871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/22/2024] [Indexed: 04/11/2024] Open
Abstract
Genome sequencing has revealed an incredible diversity of bacteria and archaea, but there are no fast and convenient tools for browsing across these genomes. It is cumbersome to view the prevalence of homologs for a protein of interest, or the gene neighborhoods of those homologs, across the diversity of the prokaryotes. We developed a web-based tool, fast.genomics, that uses two strategies to support fast browsing across the diversity of prokaryotes. First, the database of genomes is split up. The main database contains one representative from each of the 6,377 genera that have a high-quality genome, and additional databases for each taxonomic order contain up to 10 representatives of each species. Second, homologs of proteins of interest are identified quickly by using accelerated searches, usually in a few seconds. Once homologs are identified, fast.genomics can quickly show their prevalence across taxa, view their neighboring genes, or compare the prevalence of two different proteins. Fast.genomics is available at https://fast.genomics.lbl.gov.
Collapse
Affiliation(s)
- Morgan N. Price
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| |
Collapse
|
6
|
Wei X, Tan H, Lobb B, Zhen W, Wu Z, Parks DH, Neufeld JD, Moreno-Hagelsieb G, Doxey AC. AnnoView enables large-scale analysis, comparison, and visualization of microbial gene neighborhoods. Brief Bioinform 2024; 25:bbae229. [PMID: 38747283 PMCID: PMC11094555 DOI: 10.1093/bib/bbae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 05/19/2024] Open
Abstract
The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.
Collapse
Affiliation(s)
- Xin Wei
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Huagang Tan
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Briallen Lobb
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - William Zhen
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Zijing Wu
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Donovan H Parks
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Brisbane, Australia
| | - Josh D Neufeld
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo, ON, Canada
| | - Andrew C Doxey
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
7
|
Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024; 25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open
Abstract
In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.
Collapse
Affiliation(s)
- Steven Tavis
- Genome Science and Technology Graduate Program, University of Tennessee Knoxville, Knoxville, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
8
|
Rodríguez Del Río Á, Giner-Lamia J, Cantalapiedra CP, Botas J, Deng Z, Hernández-Plaza A, Munar-Palmer M, Santamaría-Hernando S, Rodríguez-Herva JJ, Ruscheweyh HJ, Paoli L, Schmidt TSB, Sunagawa S, Bork P, López-Solanilla E, Coelho LP, Huerta-Cepas J. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature 2024; 626:377-384. [PMID: 38109938 PMCID: PMC10849945 DOI: 10.1038/s41586-023-06955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 12/08/2023] [Indexed: 12/20/2023]
Abstract
Many of the Earth's microbes remain uncultured and understudied, limiting our understanding of the functional and evolutionary aspects of their genetic material, which remain largely overlooked in most metagenomic studies1. Here we analysed 149,842 environmental genomes from multiple habitats2-6 and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel (FESNov) gene families exclusive to uncultivated prokaryotic taxa. All FESNov families span multiple species, exhibit strong signals of purifying selection and qualify as new orthologous groups, thus nearly tripling the number of bacterial and archaeal gene families described to date. The FESNov catalogue is enriched in clade-specific traits, including 1,034 novel families that can distinguish entire uncultivated phyla, classes and orders, probably representing synapomorphies that facilitated their evolutionary divergence. Using genomic context analysis and structural alignments we predicted functional associations for 32.4% of FESNov families, including 4,349 high-confidence associations with important biological processes. These predictions provide a valuable hypothesis-driven framework that we used for experimental validatation of a new gene family involved in cell motility and a novel set of antimicrobial peptides. We also demonstrate that the relative abundance profiles of novel families can discriminate between environments and clinical conditions, leading to the discovery of potentially new biomarkers associated with colorectal cancer. We expect this work to enhance future metagenomics studies and expand our knowledge of the genetic repertory of uncultivated organisms.
Collapse
Affiliation(s)
- Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
- Departamento de Bioquímica Vegetal y Biología Molecular, Facultad de Biología, Instituto de Bioquímica Vegetal y Fotosíntesis (IBVF), Universidad de Sevilla-CSIC, Seville, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ziqi Deng
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Martí Munar-Palmer
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Saray Santamaría-Hernando
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - José J Rodríguez-Herva
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Thomas S B Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Emilia López-Solanilla
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| |
Collapse
|
9
|
Kolan D, Cattan-Tsaushu E, Enav H, Freiman Z, Malinsky-Rushansky N, Ninio S, Avrani S. Tradeoffs between phage resistance and nitrogen fixation drive the evolution of genes essential for cyanobacterial heterocyst functionality. THE ISME JOURNAL 2024; 18:wrad008. [PMID: 38365231 PMCID: PMC10811720 DOI: 10.1093/ismejo/wrad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 10/26/2023] [Accepted: 11/13/2023] [Indexed: 02/18/2024]
Abstract
Harmful blooms caused by diazotrophic (nitrogen-fixing) Cyanobacteria are becoming increasingly frequent and negatively impact aquatic environments worldwide. Cyanophages (viruses infecting Cyanobacteria) can potentially regulate cyanobacterial blooms, yet Cyanobacteria can rapidly acquire mutations that provide protection against phage infection. Here, we provide novel insights into cyanophage:Cyanobacteria interactions by characterizing the resistance to phages in two species of diazotrophic Cyanobacteria: Nostoc sp. and Cylindrospermopsis raciborskii. Our results demonstrate that phage resistance is associated with a fitness tradeoff by which resistant Cyanobacteria have reduced ability to fix nitrogen and/or to survive nitrogen starvation. Furthermore, we use whole-genome sequence analysis of 58 Nostoc-resistant strains to identify several mutations associated with phage resistance, including in cell surface-related genes and regulatory genes involved in the development and function of heterocysts (cells specialized in nitrogen fixation). Finally, we employ phylogenetic analyses to show that most of these resistance genes are accessory genes whose evolution is impacted by lateral gene transfer events. Together, these results further our understanding of the interplay between diazotrophic Cyanobacteria and their phages and suggest that a tradeoff between phage resistance and nitrogen fixation affects the evolution of cell surface-related genes and of genes involved in heterocyst differentiation and nitrogen fixation.
Collapse
Affiliation(s)
- Dikla Kolan
- Department of Evolutionary and Environmental Biology, The Institute of Evolution, University of Haifa, Mount Carmel, Haifa 3103301, Israel
| | - Esther Cattan-Tsaushu
- Department of Evolutionary and Environmental Biology, The Institute of Evolution, University of Haifa, Mount Carmel, Haifa 3103301, Israel
| | - Hagay Enav
- Department of Evolutionary and Environmental Biology, The Institute of Evolution, University of Haifa, Mount Carmel, Haifa 3103301, Israel
| | - Zohar Freiman
- Kinneret Limnological Laboratory (KLL) Israel Oceanographic and Limnological Research (IOLR), Migdal 1495000, Israel
| | - Nechama Malinsky-Rushansky
- Kinneret Limnological Laboratory (KLL) Israel Oceanographic and Limnological Research (IOLR), Migdal 1495000, Israel
| | - Shira Ninio
- Kinneret Limnological Laboratory (KLL) Israel Oceanographic and Limnological Research (IOLR), Migdal 1495000, Israel
| | - Sarit Avrani
- Department of Evolutionary and Environmental Biology, The Institute of Evolution, University of Haifa, Mount Carmel, Haifa 3103301, Israel
| |
Collapse
|
10
|
Sutton JAF, Cooke M, Tinajero-Trejo M, Wacnik K, Salamaga B, Portman-Ross C, Lund VA, Hobbs JK, Foster SJ. The roles of GpsB and DivIVA in Staphylococcus aureus growth and division. Front Microbiol 2023; 14:1241249. [PMID: 37711690 PMCID: PMC10498921 DOI: 10.3389/fmicb.2023.1241249] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023] Open
Abstract
The spheroid bacterium Staphylococcus aureus is often used as a model of morphogenesis due to its apparently simple cell cycle. S. aureus has many cell division proteins that are conserved across bacteria alluding to common functions. However, despite intensive study, we still do not know the roles of many of these components. Here, we have examined the functions of the paralogues DivIVA and GpsB in the S. aureus cell cycle. Cells lacking gpsB display a more spherical phenotype than the wild-type cells, which is associated with a decrease in peripheral cell wall peptidoglycan synthesis. This correlates with increased localization of penicillin-binding proteins at the developing septum, notably PBPs 2 and 3. Our results highlight the role of GpsB as an apparent regulator of cell morphogenesis in S. aureus.
Collapse
Affiliation(s)
- Joshua A. F. Sutton
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Mark Cooke
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
| | - Mariana Tinajero-Trejo
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Katarzyna Wacnik
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Bartłomiej Salamaga
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Callum Portman-Ross
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Victoria A. Lund
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| | - Jamie K. Hobbs
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
- Department of Physics and Astronomy, University of Sheffield, Sheffield, United Kingdom
| | - Simon J. Foster
- School of Biosciences, University of Sheffield, Sheffield, United Kingdom
- The Florey Institute for Host-Pathogen Interactions, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
11
|
Genetic and Structural Diversity of Prokaryotic Ice-Binding Proteins from the Central Arctic Ocean. Genes (Basel) 2023; 14:genes14020363. [PMID: 36833289 PMCID: PMC9957290 DOI: 10.3390/genes14020363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/01/2023] Open
Abstract
Ice-binding proteins (IBPs) are a group of ecologically and biotechnologically relevant enzymes produced by psychrophilic organisms. Although putative IBPs containing the domain of unknown function (DUF) 3494 have been identified in many taxa of polar microbes, our knowledge of their genetic and structural diversity in natural microbial communities is limited. Here, we used samples from sea ice and sea water collected in the central Arctic Ocean as part of the MOSAiC expedition for metagenome sequencing and the subsequent analyses of metagenome-assembled genomes (MAGs). By linking structurally diverse IBPs to particular environments and potential functions, we reveal that IBP sequences are enriched in interior ice, have diverse genomic contexts and cluster taxonomically. Their diverse protein structures may be a consequence of domain shuffling, leading to variable combinations of protein domains in IBPs and probably reflecting the functional versatility required to thrive in the extreme and variable environment of the central Arctic Ocean.
Collapse
|
12
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
13
|
Santorelli L, Caterino M, Costanzo M. Dynamic Interactomics by Cross-Linking Mass Spectrometry: Mapping the Daily Cell Life in Postgenomic Era. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:633-649. [PMID: 36445175 DOI: 10.1089/omi.2022.0137] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
The majority of processes that occur in daily cell life are modulated by hundreds to thousands of dynamic protein-protein interactions (PPI). The resulting protein complexes constitute a tangled network that, with its continuous remodeling, builds up highly organized functional units. Thus, defining the dynamic interactome of one or more proteins allows determining the full range of biological activities these proteins are capable of. This conceptual approach is poised to gain further traction and significance in the current postgenomic era wherein the treatment of severe diseases needs to be tackled at both genomic and PPI levels. This also holds true for COVID-19, a multisystemic disease affecting biological networks across the biological hierarchy from genome to proteome to metabolome. In this overarching context and the current historical moment of the COVID-19 pandemic where systems biology increasingly comes to the fore, cross-linking mass spectrometry (XL-MS) has become highly relevant, emerging as a powerful tool for PPI discovery and characterization. This expert review highlights the advanced XL-MS approaches that provide in vivo insights into the three-dimensional protein complexes, overcoming the static nature of common interactomics data and embracing the dynamics of the cell proteome landscape. Many XL-MS applications based on the use of diverse cross-linkers, MS detection methods, and predictive bioinformatic tools for single proteins or proteome-wide interactions were shown. We conclude with a future outlook on XL-MS applications in the field of structural proteomics and ways to sustain the remarkable flexibility of XL-MS for dynamic interactomics and structural studies in systems biology and planetary health.
Collapse
Affiliation(s)
- Lucia Santorelli
- Department of Oncology and Hematology-Oncology, University of Milano, Milan, Italy.,IFOM ETS, The AIRC Institute of Molecular Oncology, Milan, Italy
| | - Marianna Caterino
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.,CEINGE-Biotecnologie Avanzate s.c.ar.l., Naples, Italy
| | - Michele Costanzo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.,CEINGE-Biotecnologie Avanzate s.c.ar.l., Naples, Italy
| |
Collapse
|
14
|
Hernández-Plaza A, Szklarczyk D, Botas J, Cantalapiedra C, Giner-Lamia J, Mende DR, Kirsch R, Rattei T, Letunic I, Jensen L, Bork P, von Mering C, Huerta-Cepas J. eggNOG 6.0: enabling comparative genomics across 12 535 organisms. Nucleic Acids Res 2022; 51:D389-D394. [PMID: 36399505 PMCID: PMC9825578 DOI: 10.1093/nar/gkac1022] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/17/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open
Abstract
The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 535 reference species, expands functional annotations, and implements new functionality. In total, eggNOG 6.0 provides a hierarchy of over 17M orthologous groups (OGs) computed at 1601 taxonomic levels, spanning 10 756 bacterial, 457 archaeal and 1322 eukaryotic organisms. OGs have been thoroughly annotated using recent knowledge from functional databases, including KEGG, Gene Ontology, UniProtKB, BiGG, CAZy, CARD, PFAM and SMART. eggNOG also offers phylogenetic trees for all OGs, maximising utility and versatility for end users while allowing researchers to investigate the evolutionary history of speciation and duplication events as well as the phylogenetic distribution of functional terms within each OG. Furthermore, the eggNOG 6.0 website contains new functionality to mine orthology and functional data with ease, including the possibility of generating phylogenetic profiles for multiple OGs across species or identifying single-copy OGs at custom taxonomic levels. eggNOG 6.0 is available at http://eggnog6.embl.de.
Collapse
Affiliation(s)
- Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid 28040, Spain
| | - Daniel R Mende
- Department of Medical Microbiology, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Thomas Rattei
- University of Vienna, Centre for Microbiology and Environmental Systems Science, Djerassiplatz 11030, Vienna, Austria
| | - Ivica Letunic
- Biobyte solutions GmbH, Bothestr. 142, 69117 Heidelberg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Peer Bork
- Correspondence may also be addressed to Peer Bork. Tel: +49 62213878526;
| | - Christian von Mering
- Correspondence may also be addressed to Christian von Mering. Tel: +41 446353147;
| | | |
Collapse
|
15
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2022; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 1296] [Impact Index Per Article: 648.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Correspondence may also be addressed to Peer Bork. Tel: +49 6221 387 8526; Fax: +49 6221 387 517;
| | - Lars J Jensen
- Correspondence may also be addressed to Lars J. Jensen. Tel: +45 3 532 5025;
| | - Christian von Mering
- To whom correspondence should be addressed. Tel: +41 44 6353147; Fax: +41 44 6356864;
| |
Collapse
|
16
|
Mihelčić M. Redescription mining on data with background network information. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
17
|
Miller D, Stern A, Burstein D. Deciphering microbial gene function using natural language processing. Nat Commun 2022; 13:5731. [PMID: 36175448 PMCID: PMC9523054 DOI: 10.1038/s41467-022-33397-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 09/16/2022] [Indexed: 11/08/2022] Open
Abstract
Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model "gene semantics" based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the "discovery potential" of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method's ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes.
Collapse
Affiliation(s)
- Danielle Miller
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, 6997801, Israel
| | - Adi Stern
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, 6997801, Israel
| | - David Burstein
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, 6997801, Israel.
| |
Collapse
|
18
|
Liu C, Kenney T, Beiko RG, Gu H. The Community Coevolution Model with Application to the Study of Evolutionary Relationships between Genes based on Phylogenetic Profiles. Syst Biol 2022:6651862. [PMID: 35904761 DOI: 10.1093/sysbio/syac052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked, and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration. Here we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes. A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a non-phylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex.
Collapse
Affiliation(s)
- Chaoyue Liu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada.,Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| |
Collapse
|
19
|
Pazos Obregón F, Silvera D, Soto P, Yankilevich P, Guerberoff G, Cantera R. Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning. Sci Rep 2022; 12:11655. [PMID: 35803984 PMCID: PMC9270439 DOI: 10.1038/s41598-022-15329-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 06/22/2022] [Indexed: 12/13/2022] Open
Abstract
The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.
Collapse
Affiliation(s)
- Flavio Pazos Obregón
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay. .,Unidad de Bioquímica y Proteómica Analíticas, Instituto Pasteur de Montevideo, Montevideo, Uruguay.
| | - Diego Silvera
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| | - Pablo Soto
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| | - Patricio Yankilevich
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA), CONICET-Partner Institute of the Max Planck Society, Buenos Aires, Argentina
| | - Gustavo Guerberoff
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, UDELAR, Montevideo, Uruguay
| | - Rafael Cantera
- Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay
| |
Collapse
|
20
|
Botas J, Rodríguez Del Río Á, Giner-Lamia J, Huerta-Cepas J. GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective. Nucleic Acids Res 2022; 50:W352-W357. [PMID: 35639770 PMCID: PMC9252766 DOI: 10.1093/nar/gkac367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/14/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
Synteny conservation analysis is a well-established methodology to investigate the potential functional role of unknown prokaryotic genes. However, bioinformatic tools to reconstruct and visualise genomic contexts usually depend on slow computations, are restricted to narrow taxonomic ranges, and/or do not allow for the functional and interactive exploration of neighbouring genes across different species. Here, we present GeCoViz, an online resource built upon 12 221 reference prokaryotic genomes that provides fast and interactive visualisation of custom genomic regions anchored by any target gene, which can be sought by either name, orthologous group (KEGGs, eggNOGs), protein domain (PFAM) or sequence. To facilitate functional and evolutionary interpretation, GeCoViz allows to customise the taxonomic scope of each analysis and provides comprehensive annotations of the neighbouring genes. Interactive visualisation options include, among others, the scaled representations of gene lengths and genomic distances, and on the fly calculation of synteny conservation of neighbouring genes, which can be highlighted based on custom thresholds. The resulting plots can be downloaded as high-quality images for publishing purposes. Overall, GeCoViz offers an easy-to-use, comprehensive, fast and interactive web-based tool for investigating the genomic context of prokaryotic genes, and is freely available at https://gecoviz.cgmlab.org.
Collapse
Affiliation(s)
- Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain.,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, 28040, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, 28223, Spain
| |
Collapse
|
21
|
Ji F, Bonilla G, Krykbaev R, Ruvkun G, Tabach Y, Sadreyev RI. DEPCOD: a tool to detect and visualize co-evolution of protein domains. Nucleic Acids Res 2022; 50:W246-W253. [PMID: 35536332 PMCID: PMC9252791 DOI: 10.1093/nar/gkac349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/26/2022] [Indexed: 11/14/2022] Open
Abstract
Proteins with similar phylogenetic patterns of conservation or loss across evolutionary taxa are strong candidates to work in the same cellular pathways or engage in physical or functional interactions. Our previously published tools implemented our method of normalized phylogenetic sequence profiling to detect functional associations between non-homologous proteins. However, many proteins consist of multiple protein domains subjected to different selective pressures, so using protein domain as the unit of analysis improves the detection of similar phylogenetic patterns. Here we analyze sequence conservation patterns across the whole tree of life for every protein domain from a set of widely studied organisms. The resulting new interactive webserver, DEPCOD (DEtection of Phylogenetically COrrelated Domains), performs searches with either a selected pre-defined protein domain or a user-supplied sequence as a query to detect other domains from the same organism that have similar conservation patterns. Top similarities on two evolutionary scales (the whole tree of life or eukaryotic genomes) are displayed along with known protein interactions and shared complexes, pathway enrichment among the hits, and detailed visualization of sources of detected similarities. DEPCOD reveals functional relationships between often non-homologous domains that could not be detected using whole-protein sequences. The web server is accessible at http://genetics.mgh.harvard.edu/DEPCOD.
Collapse
Affiliation(s)
- Fei Ji
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Gracia Bonilla
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Rustem Krykbaev
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Gary Ruvkun
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem 9112102, Israel
| | - Ruslan I Sadreyev
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA.,Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
22
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
23
|
Chavez JD, Park SG, Mohr JP, Bruce JE. Applications and advancements of FT-ICR-MS for interactome studies. MASS SPECTROMETRY REVIEWS 2022; 41:248-261. [PMID: 33289940 PMCID: PMC8184889 DOI: 10.1002/mas.21675] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/16/2020] [Accepted: 10/16/2020] [Indexed: 05/05/2023]
Abstract
The set of all intra- and intermolecular interactions, collectively known as the interactome, is currently an unmet challenge for any analytical method, but if measured, could provide unparalleled insight on molecular function in living systems. Developments and applications of chemical cross-linking and high-performance mass spectrometry technologies are beginning to reveal details on how proteins interact in cells and how protein conformations and interactions inside cells change with phenotype or during drug treatment or other perturbations. A major contributor to these advances is Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) technology and its implementation with accurate mass measurements on cross-linked peptide-pair precursor and fragment ions to enable improved identification methods. However, these applications place increased demands on mass spectrometer performance in terms of high-resolution spectral acquisition rates for on-line MSn experiments. Moreover, FT-ICR-MS also offers unique opportunities to develop and implement parallel ICR cells for multiplexed signal acquisition and the potential to greatly advance accurate mass acquisition rates for interactome studies. This review highlights our efforts to exploit accurate mass FT-ICR-MS technologies with chemical cross-linking and developments being pursued to realize parallel MS array capabilities that will further advance visualization of the interactome.
Collapse
Affiliation(s)
- Juan D. Chavez
- Department of Genome Sciences, University of Washington, Seattle, WA 98109
| | - Sung-Gun Park
- Department of Genome Sciences, University of Washington, Seattle, WA 98109
| | - Jared P. Mohr
- Department of Genome Sciences, University of Washington, Seattle, WA 98109
| | - James E. Bruce
- Department of Genome Sciences, University of Washington, Seattle, WA 98109
| |
Collapse
|
24
|
Network Pharmacology- and Molecular Docking-Based Identification of Potential Phytocompounds from Argyreia capitiformis in the Treatment of Inflammation. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2022; 2022:8037488. [PMID: 35140801 PMCID: PMC8820870 DOI: 10.1155/2022/8037488] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 01/03/2022] [Accepted: 01/15/2022] [Indexed: 12/16/2022]
Abstract
The methanolic extract of Argyreia capitiformis stem was examined for anti-inflammatory activities following network pharmacology analysis and molecular docking study. Based on gas chromatography-mass spectrometry (GC-MS) analysis, 49 compounds were identified from the methanolic extract of A. capitiformis stem. A network pharmacology analysis was conducted against the identified compounds, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and Gene Ontology analysis of biological processes and molecular functions were performed. Six proteins (IL1R1, IRAK4, MYD88, TIRAP, TLR4, and TRAF6) were identified from the KEGG pathway analysis and subjected to molecular docking study. Additionally, six best ligand efficiency compounds and positive control (aspirin) from each protein were evaluated for their stability using the molecular dynamics simulation study. Our study suggested that IL1R1, IRAK4, MYD88, TIRAP, TLR4, and TRAF6 proteins may be targeted by compounds in the methanolic extract of A. capitiformis stem to provide anti-inflammatory effects.
Collapse
|
25
|
Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure 2022; 30:462-475. [DOI: 10.1016/j.str.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/26/2021] [Accepted: 02/02/2022] [Indexed: 02/08/2023]
|
26
|
Tsoy O, Mushegian A. Florigen and its homologs of FT/CETS/PEBP/RKIP/YbhB family may be the enzymes of small molecule metabolism: review of the evidence. BMC PLANT BIOLOGY 2022; 22:56. [PMID: 35086479 PMCID: PMC8793217 DOI: 10.1186/s12870-022-03432-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Flowering signals are sensed in plant leaves and transmitted to the shoot apical meristems, where the formation of flowers is initiated. Searches for a diffusible hormone-like signaling entity ("florigen") went on for many decades, until a product of plant gene FT was identified as the key component of florigen in the 1990s, based on the analysis of mutants, genetic complementation evidence, and protein and RNA localization studies. Sequence homologs of FT protein are found throughout prokaryotes and eukaryotes; some eukaryotic family members appear to bind phospholipids or interact with the components of the signal transduction cascades. Most FT homologs are known to share a constellation of five charged residues, three of which, i.e., two histidines and an aspartic acid, are located at the rim of a well-defined cavity on the protein surface. RESULTS We studied molecular features of the FT homologs in prokaryotes and analyzed their genome context, to find tentative evidence connecting the bacterial FT homologs with small molecule metabolism, often involving substrates that contain sugar or ribonucleoside moieties. We argue that the unifying feature of this protein family, i.e., a set of charged residues conserved at the sequence and structural levels, is more likely to be an enzymatic active center than a catalytically inert ligand-binding site. CONCLUSIONS We propose that most of FT-related proteins are enzymes operating on small diffusible molecules. Those metabolites may constitute an overlooked essential ingredient of the florigen signal.
Collapse
Affiliation(s)
- Olga Tsoy
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich (TUM), 3, Maximus-von-Imhof-Forum, 85354, Freising, Germany
- Current address: Chair of Computational Systems Biology, University of Hamburg, Notkestrasse, 9, 22607, Hamburg, Germany
| | - Arcady Mushegian
- Molecular and Cellular Biology Division, National Science Foundation, 2415 Eisenhower Avenue, Alexandria, Virginia, 22314, USA.
- Clare Hall College, University of Cambridge, Cambridge, CB3 9AL, UK.
| |
Collapse
|
27
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
28
|
Abstract
Rhodopsins are light-activated proteins displaying an enormous versatility of function as cation/anion pumps or sensing environmental stimuli and are widely distributed across all domains of life. Even with wide sequence divergence and uncertain evolutionary linkages between microbial (type 1) and animal (type 2) rhodopsins, the membrane orientation of the core structural scaffold of both was presumed universal. This was recently amended through the discovery of heliorhodopsins (HeRs; type 3), that, in contrast to known rhodopsins, display an inverted membrane topology and yet retain similarities in sequence, structure, and the light-activated response. While no ion-pumping activity has been demonstrated for HeRs and multiple crystal structures are available, fundamental questions regarding their cellular and ecological function or even their taxonomic distribution remain unresolved. Here, we investigated HeR function and distribution using genomic/metagenomic data with protein domain fusions, contextual genomic information, and gene coexpression analysis with strand-specific metatranscriptomics. We bring to resolution the debated monoderm/diderm occurrence patterns and show that HeRs are restricted to monoderms. Moreover, we provide compelling evidence that HeRs are a novel type of sensory rhodopsins linked to histidine kinases and other two-component system genes across phyla. In addition, we also describe two novel putative signal-transducing domains fused to some HeRs. We posit that HeRs likely function as generalized light-dependent switches involved in the mitigation of light-induced oxidative stress and metabolic circuitry regulation. Their role as sensory rhodopsins is corroborated by their photocycle dynamics and their presence/function in monoderms is likely connected to the higher sensitivity of these organisms to light-induced damage. IMPORTANCE Heliorhodopsins are enigmatic, novel rhodopsins with a membrane orientation that is opposite to all known rhodopsins. However, their cellular and ecological functions are unknown, and even their taxonomic distribution remains a subject of debate. We provide evidence that HeRs are a novel type of sensory rhodopsins linked to histidine kinases and other two-component system genes across phyla boundaries. In support of this, we also identify two novel putative signal transducing domains in HeRs that are fused with them. We also observe linkages of HeRs to genes involved in mitigation of light-induced oxidative stress and increased carbon and nitrogen metabolism. Finally, we synthesize these findings into a framework that connects HeRs with the cellular response to light in monoderms, activating light-induced oxidative stress defenses along with carbon/nitrogen metabolic circuitries. These findings are consistent with the evolutionary, taxonomic, structural, and genomic data available so far.
Collapse
|
29
|
Filho JAF, Rosolen RR, Almeida DA, de Azevedo PHC, Motta MLL, Aono AH, dos Santos CA, Horta MAC, de Souza AP. Trends in biological data integration for the selection of enzymes and transcription factors related to cellulose and hemicellulose degradation in fungi. 3 Biotech 2021; 11:475. [PMID: 34777932 PMCID: PMC8548487 DOI: 10.1007/s13205-021-03032-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
Fungi are key players in biotechnological applications. Although several studies focusing on fungal diversity and genetics have been performed, many details of fungal biology remain unknown, including how cellulolytic enzymes are modulated within these organisms to allow changes in main plant cell wall compounds, cellulose and hemicellulose, and subsequent biomass conversion. With the advent and consolidation of DNA/RNA sequencing technology, different types of information can be generated at the genomic, structural and functional levels, including the gene expression profiles and regulatory mechanisms of these organisms, during degradation-induced conditions. This increase in data generation made rapid computational development necessary to deal with the large amounts of data generated. In this context, the origination of bioinformatics, a hybrid science integrating biological data with various techniques for information storage, distribution and analysis, was a fundamental step toward the current state-of-the-art in the postgenomic era. The possibility of integrating biological big data has facilitated exciting discoveries, including identifying novel mechanisms and more efficient enzymes, increasing yields, reducing costs and expanding opportunities in the bioprocess field. In this review, we summarize the current status and trends of the integration of different types of biological data through bioinformatics approaches for biological data analysis and enzyme selection.
Collapse
Affiliation(s)
- Jaire A. Ferreira Filho
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Rafaela R. Rosolen
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Deborah A. Almeida
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Paulo Henrique C. de Azevedo
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Maria Lorenza L. Motta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Alexandre H. Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
| | - Clelton A. dos Santos
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Brazilian Biorenewables National Laboratory (LNBR), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, SP Brazil
| | - Maria Augusta C. Horta
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Faculty of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP Brazil
| | - Anete P. de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, SP Brazil
- Department of Plant Biology, Institute of Biology, UNICAMP, Universidade Estadual de Campinas, Campinas, SP 13083-875 Brazil
| |
Collapse
|
30
|
Fang Y, Li M, Li X, Yang Y. GFICLEE: ultrafast tree-based phylogenetic profile method inferring gene function at the genomic-wide level. BMC Genomics 2021; 22:774. [PMID: 34715785 PMCID: PMC8557005 DOI: 10.1186/s12864-021-08070-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 10/10/2021] [Indexed: 11/25/2022] Open
Abstract
Background Phylogenetic profiling is widely used to predict novel members of large protein complexes and biological pathways. Although methods combined with phylogenetic trees have significantly improved prediction accuracy, computational efficiency is still an issue that limits its genome-wise application. Results Here we introduce a new tree-based phylogenetic profiling algorithm named GFICLEE, which infers common single and continuous loss (SCL) events in the evolutionary patterns. We validated our algorithm with human pathways from three databases and compared the computational efficiency with current tree-based with 10 different scales genome dataset. Our algorithm has a better predictive performance with high computational efficiency. Conclusions The GFICLEE is a new method to infers genome-wide gene function. The accuracy and computational efficiency of GFICLEE make it possible to explore gene functions at the genome-wide level on a personal computer. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08070-7.
Collapse
Affiliation(s)
- Yang Fang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, People's Republic of China
| | - Xufeng Li
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Yi Yang
- Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China.
| |
Collapse
|
31
|
Stable-Isotope-Informed, Genome-Resolved Metagenomics Uncovers Potential Cross-Kingdom Interactions in Rhizosphere Soil. mSphere 2021; 6:e0008521. [PMID: 34468166 PMCID: PMC8550312 DOI: 10.1128/msphere.00085-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The functioning, health, and productivity of soil are intimately tied to a complex network of interactions, particularly in plant root-associated rhizosphere soil. We conducted a stable-isotope-informed, genome-resolved metagenomic study to trace carbon from Avena fatua grown in a 13CO2 atmosphere into soil. We collected paired rhizosphere and nonrhizosphere soil at 6 and 9 weeks of plant growth and extracted DNA that was then separated by density using ultracentrifugation. Thirty-two fractions from each of five samples were grouped by density, sequenced, assembled, and binned to generate 55 unique bacterial genomes that were ≥70% complete. We also identified complete 18S rRNA sequences of several 13C-enriched microeukaryotic bacterivores and fungi. We generated 10 circularized bacteriophage (phage) genomes, some of which were the most labeled entities in the rhizosphere, suggesting that phage may be important agents of turnover of plant-derived C in soil. CRISPR locus targeting connected one of these phage to a Burkholderiales host predicted to be a plant pathogen. Another highly labeled phage is predicted to replicate in a Catenulispora sp., a possible plant growth-promoting bacterium. We searched the genome bins for traits known to be used in interactions involving bacteria, microeukaryotes, and plant roots and found DNA from heavily 13C-labeled bacterial genes thought to be involved in modulating plant signaling hormones, plant pathogenicity, and defense against microeukaryote grazing. Stable-isotope-informed, genome-resolved metagenomics indicated that phage can be important agents of turnover of plant-derived carbon in soil. IMPORTANCE Plants grow in intimate association with soil microbial communities; these microbes can facilitate the availability of essential resources to plants. Thus, plant productivity commonly depends on interactions with rhizosphere bacteria, viruses, and eukaryotes. Our work is significant because we identified the organisms that took up plant-derived organic C in rhizosphere soil and determined that many of the active bacteria are plant pathogens or can impact plant growth via hormone modulation. Further, by showing that bacteriophage accumulate CO2-derived carbon, we demonstrated their vital roles in redistribution of plant-derived C into the soil environment through bacterial cell lysis. The use of stable-isotope probing (SIP) to identify consumption (or lack thereof) of root-derived C by key microbial community members within highly complex microbial communities opens the way for assessing manipulations of bacteria and phage with potentially beneficial and detrimental traits, ultimately providing a path to improved plant health and soil carbon storage.
Collapse
|
32
|
Finding functional associations between prokaryotic virus orthologous groups: a proof of concept. BMC Bioinformatics 2021; 22:438. [PMID: 34525942 PMCID: PMC8442406 DOI: 10.1186/s12859-021-04343-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/27/2021] [Indexed: 02/02/2023] Open
Abstract
Background The field of viromics has greatly benefited from recent developments in metagenomics, with significant efforts focusing on viral discovery. However, functional annotation of the increasing number of viral genomes is lagging behind. This is highlighted by the degree of annotation of the protein clusters in the prokaryotic Virus Orthologous Groups (pVOGs) database, with 83% of its current 9518 pVOGs having an unknown function. Results In this study we describe a machine learning approach to explore potential functional associations between pVOGs. We measure seven genomic features and use them as input to a Random Forest classifier to predict protein–protein interactions between pairs of pVOGs. After systematic evaluation of the model’s performance on 10 different datasets, we obtained a predictor with a mean accuracy of 0.77 and Area Under Receiving Operation Characteristic (AUROC) score of 0.83. Its application to a set of 2,133,027 pVOG-pVOG interactions allowed us to predict 267,265 putative interactions with a reported probability greater than 0.65. At an expected false discovery rate of 0.27, we placed 95.6% of the previously unannotated pVOGs in a functional context, by predicting their interaction with a pVOG that is functionally annotated. Conclusions We believe that this proof-of-concept methodology, wrapped in a reproducible and automated workflow, can represent a significant step towards obtaining a more complete picture of bacteriophage biology. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04343-w.
Collapse
|
33
|
Zhang D, Kabuka MR. Protein Family Classification from Scratch: A CNN Based Deep Learning Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1996-2007. [PMID: 31944984 DOI: 10.1109/tcbb.2020.2966633] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.
Collapse
|
34
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
35
|
Pathogenic Determinants of the Mycobacterium kansasii Complex: An Unsuspected Role for Distributive Conjugal Transfer. Microorganisms 2021; 9:microorganisms9020348. [PMID: 33578772 PMCID: PMC7916490 DOI: 10.3390/microorganisms9020348] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 02/02/2021] [Accepted: 02/05/2021] [Indexed: 01/15/2023] Open
Abstract
The Mycobacterium kansasii species comprises six subtypes that were recently classified into six closely related species; Mycobacterium kansasii (formerly M. kansasii subtype 1), Mycobacterium persicum (subtype 2), Mycobacterium pseudokansasii (subtype 3), Mycobacterium ostraviense (subtype 4), Mycobacterium innocens (subtype 5) and Mycobacterium attenuatum (subtype 6). Together with Mycobacterium gastri, they form the M. kansasii complex. M. kansasii is the most frequent and most pathogenic species of the complex. M. persicum is classically associated with diseases in immunosuppressed patients, and the other species are mostly colonizers, and are only very rarely reported in ill patients. Comparative genomics was used to assess the genetic determinants leading to the pathogenicity of members of the M. kansasii complex. The genomes of 51 isolates collected from patients with and without disease were sequenced and compared with 24 publicly available genomes. The pathogenicity of each isolate was determined based on the clinical records or public metadata. A comparative genomic analysis showed that all M. persicum, M. ostraviense, M innocens and M. gastri isolates lacked the ESX-1-associated EspACD locus that is thought to play a crucial role in the pathogenicity of M. tuberculosis and other non-tuberculous mycobacteria. Furthermore, M. kansasii was the only species exhibiting a 25-Kb-large genomic island encoding for 17 type-VII secretion system-associated proteins. Finally, a genome-wide association analysis revealed that two consecutive genes encoding a hemerythrin-like protein and a nitroreductase-like protein were significantly associated with pathogenicity. These two genes may be involved in the resistance to reactive oxygen and nitrogen species, a required mechanism for the intracellular survival of bacteria. Three non-pathogenic M. kansasii lacked these genes likely due to two distinct distributive conjugal transfers (DCTs) between M. attenuatum and M. kansasii, and one DCT between M. persicum and M. kansasii. To our knowledge, this is the first study linking DCT to reduced pathogenicity.
Collapse
|
36
|
Ding D, Wu M, Liu Y. Genome-scale mutant fitness reveals versatile c-type cytochromes in Shewanella oneidensis MR-1. Mol Omics 2021; 17:288-295. [PMID: 33554980 DOI: 10.1039/d0mo00107d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Shewanella has been widely investigated for its metabolic versatility and use of a large number of extracellular electron acceptors. Many c-type cytochromes are responsible for this diversity, mainly in condition-specific fashions. By using genome-scale mutant fitness data, we studied which genes (particularly c-type cytochromes) were used to coordinate various electron transfer processes in the present work. First, by integrating fitness profiles with protein-protein interaction (PPI) networks, we showed that the genes with a high total fitness value were generally more important in PPI networks than those with low fitness values. Then, we identified genes that are important across many experiments, and further fitness analysis confirmed five versatile c-type cytochromes: ScyA (SO0264), PetC (SO0610), CcoP (SO2361), CcoO (SO2363) and CytcB (SO4666), which are considered to be crucial in most experimental conditions. Finally, we demonstrated a mediating role in the periplasm for the less-reported CytcB by combining protein structure, subcellular localization and disordered region analysis. Comparative genome analysis further revealed that it is distinctive in Shewanella species. Collectively, these results suggest that periplasmic electron transfer processes are more diverse and flexible than previously reported, giving insight for further experimental studies of Shewanella oneidensis MR-1.
Collapse
Affiliation(s)
- Dewu Ding
- School of Mathematics and Computer Science, Yichun University, Yichun, 336000, P. R. China.
| | | | | |
Collapse
|
37
|
Sibbald SJ, Lawton M, Archibald JM. Mitochondrial Genome Evolution in Pelagophyte Algae. Genome Biol Evol 2021; 13:6126422. [PMID: 33675661 PMCID: PMC7936722 DOI: 10.1093/gbe/evab018] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/27/2021] [Indexed: 11/19/2022] Open
Abstract
The Pelagophyceae are marine stramenopile algae that include Aureoumbra lagunensis and Aureococcus anophagefferens, two microbial species notorious for causing harmful algal blooms. Despite their ecological significance, relatively few genomic studies of pelagophytes have been carried out. To improve understanding of the biology and evolution of pelagophyte algae, we sequenced complete mitochondrial genomes for A. lagunensis (CCMP1510), Pelagomonas calceolata (CCMP1756), and five strains of Aureoc. anophagefferens (CCMP1707, CCMP1708, CCMP1850, CCMP1984, and CCMP3368) using Nanopore long-read sequencing. All pelagophyte mitochondrial genomes assembled into single, circular mapping contigs between 39,376 bp (P. calceolata) and 55,968 bp (A. lagunensis) in size. Mitochondrial genomes for the five Aureoc. anophagefferens strains varied slightly in length (42,401–42,621 bp) and were 99.4–100.0% identical. Gene content and order were highly conserved between the Aureoc. anophagefferens and P. calceolata genomes, with the only major difference being a unique region in Aureoc. anophagefferens containingDNA adenine and cytosine methyltransferase (dam/dcm) genes that appear to be the product of lateral gene transfer from a prokaryotic or viral donor. Although the A. lagunensis mitochondrial genome shares seven distinct syntenic blocks with the other pelagophyte genomes, it has a tandem repeat expansion comprising ∼40% of its length, and lacks identifiable rps19 and glycine tRNA genes. Laterally acquired self-splicing introns were also found in the 23S rRNA (rnl) gene of P. calceolata and the coxI gene of the five Aureoc. anophagefferens genomes. Overall, these data provide baseline knowledge about the genetic diversity of bloom-forming pelagophytes relative to nonbloom-forming species.
Collapse
Affiliation(s)
- Shannon J Sibbald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Maggie Lawton
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - John M Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
38
|
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021; 49:D605-D612. [PMID: 33237311 PMCID: PMC7779004 DOI: 10.1093/nar/gkaa1074] [Citation(s) in RCA: 3846] [Impact Index Per Article: 1282.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/20/2020] [Accepted: 11/23/2020] [Indexed: 12/19/2022] Open
Abstract
Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Katerina C Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David Lyon
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Future Technologies, University of Turku, 20014 Turun Yliopisto, Finland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Marc Legeay
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Tao Fang
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
39
|
Jiang K, Ma Z, Wang Z, Li H, Wang Y, Tian Y, Li D, Liu X. Evolution, Expression Profile, Regulatory Mechanism, and Functional Verification of EBP-Like Gene in Cholesterol Biosynthetic Process in Chickens (Gallus Gallus). Front Genet 2021; 11:587546. [PMID: 33519893 PMCID: PMC7841431 DOI: 10.3389/fgene.2020.587546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 12/14/2020] [Indexed: 12/30/2022] Open
Abstract
The emopamil binding protein (EBP) is an important enzyme participating in the final steps of cholesterol biosynthesis in mammals. A predictive gene EBP-like, which encodes the protein with a high identity to human EBP, was found in chicken genome. No regulatory mechanisms and biological functions of EBP-like have been characterized in chickens. In the present study, the coding sequence of EBP-like was cloned, the phylogenetic trees of EBP/EBP-like were constructed and the genomic synteny of EBP-like was analyzed. The regulatory mechanism of EBP-like were explored with in vivo and in vitro experiments. The biological functions of EBP-like in liver cholesterol biosynthetic were examined by using gain- or loss-of-function strategies. The results showed that chicken EBP-like gene was originated from a common ancestral with Japanese quail EBP gene, and was relatively conservative with EBP gene among different species. The EBP-like gene was highly expressed in liver, its expression level was significantly increased in peak-laying stage, and was upregulated by estrogen. Inhibition of the EBP-like mRNA expression could restrain the expressions of EBP-like downstream genes (SC5D, DHCR24, and DHCR7) in the cholesterol synthetic pathway, therefore downregulate the liver intracellular T-CHO level. In conclusion, as substitute of EBP gene in chickens, EBP-like plays a vital role in the process of chicken liver cholesterol synthesis. This research provides a basis for revealing the molecular regulatory mechanism of cholesterol synthesis in birds, contributes insights into the improvement of the growth and development, laying performance and egg quality in poultry.
Collapse
Affiliation(s)
- Keren Jiang
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
| | - Zheng Ma
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- School of Life Sciences and Engineering, Foshan University, Foshan, China
| | - Zhang Wang
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
| | - Hong Li
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Zhengzhou, China
- International Joint Research Laboratory for Poultry Breeding of Henan, Zhengzhou, China
| | - Yanbin Wang
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Zhengzhou, China
- International Joint Research Laboratory for Poultry Breeding of Henan, Zhengzhou, China
| | - Yadong Tian
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Zhengzhou, China
- International Joint Research Laboratory for Poultry Breeding of Henan, Zhengzhou, China
| | - Donghua Li
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Zhengzhou, China
- International Joint Research Laboratory for Poultry Breeding of Henan, Zhengzhou, China
| | - Xiaojun Liu
- College of Animal Science, Henan Agricultural University, Zhengzhou, China
- Henan Innovative Engineering Research Center of Poultry Germplasm Resource, Zhengzhou, China
- International Joint Research Laboratory for Poultry Breeding of Henan, Zhengzhou, China
| |
Collapse
|
40
|
Tremblay BJM, Lobb B, Doxey AC. PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling. Bioinformatics 2021; 37:17-22. [PMID: 33416870 DOI: 10.1093/bioinformatics/btaa1105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 12/26/2020] [Accepted: 12/29/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Statistical detection of co-occurring genes across genomes, known as "phylogenetic profiling", is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation, and substantial computational requirements. RESULTS We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154,217,052 comparisons for 28,315 genes across 27,372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM, or KEGG query. In total, PhyloCorrelate detected 29,762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function. AVAILABILITY PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Briallen Lobb
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| | - Andrew C Doxey
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
41
|
Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective optimization. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01139-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
42
|
Makrodimitris S, van Ham RCHJ, Reinders MJT. Automatic Gene Function Prediction in the 2020's. Genes (Basel) 2020; 11:E1264. [PMID: 33120976 PMCID: PMC7692357 DOI: 10.3390/genes11111264] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 02/06/2023] Open
Abstract
The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.
Collapse
Affiliation(s)
- Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands; (R.C.H.J.v.H.); (M.J.T.R.)
- Keygene N.V., 6708PW Wageningen, The Netherlands
| | - Roeland C. H. J. van Ham
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands; (R.C.H.J.v.H.); (M.J.T.R.)
- Keygene N.V., 6708PW Wageningen, The Netherlands
| | - Marcel J. T. Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands; (R.C.H.J.v.H.); (M.J.T.R.)
- Leiden Computational Biology Center, Leiden University Medical Center, 2333ZC Leiden, The Netherlands
| |
Collapse
|
43
|
Sinha S, Lynn AM, Desai DK. Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study. BMC Bioinformatics 2020; 21:466. [PMID: 33076816 PMCID: PMC7574302 DOI: 10.1186/s12859-020-03794-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/01/2020] [Indexed: 02/06/2023] Open
Abstract
Background Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes (‘Hole finding protocol’) coupled with the identification of candidate proteins for the predicted orphan enzyme (‘Hole filling protocol’). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function. Results The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using ‘Hole finding protocol’. The ‘Hole-filling protocol’ was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes. Conclusions We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.
Collapse
Affiliation(s)
- Swati Sinha
- Bioinformatics Institute, Agency for Science, Technology, and Research (A*Star), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore
| | - Andrew M Lynn
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Dhwani K Desai
- Department of Biology and Department of Pharmacology, Dalhousie University, Halifax, NS, B3H4R2, Canada. .,School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|
44
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
45
|
Schober AF, Mathis AD, Ingle C, Park JO, Chen L, Rabinowitz JD, Junier I, Rivoire O, Reynolds KA. A Two-Enzyme Adaptive Unit within Bacterial Folate Metabolism. Cell Rep 2020; 27:3359-3370.e7. [PMID: 31189117 DOI: 10.1016/j.celrep.2019.05.030] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 04/05/2019] [Accepted: 05/09/2019] [Indexed: 11/29/2022] Open
Abstract
Enzyme function and evolution are influenced by the larger context of a metabolic pathway. Deleterious mutations or perturbations in one enzyme can often be compensated by mutations to others. We used comparative genomics and experiments to examine evolutionary interactions with the essential metabolic enzyme dihydrofolate reductase (DHFR). Analyses of synteny and co-occurrence across bacterial species indicate that DHFR is coupled to thymidylate synthase (TYMS) but relatively independent from the rest of folate metabolism. Using quantitative growth rate measurements and forward evolution in Escherichia coli, we demonstrate that the two enzymes adapt as a relatively independent unit in response to antibiotic stress. Metabolomic profiling revealed that TYMS activity must not exceed DHFR activity to prevent the depletion of reduced folates and the accumulation of the intermediate dihydrofolate. Comparative genomics analyses identified >200 gene pairs with similar statistical signatures of modular co-evolution, suggesting that cellular pathways may be decomposable into small adaptive units.
Collapse
Affiliation(s)
- Andrew F Schober
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Andrew D Mathis
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Junyoung O Park
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Li Chen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Ivan Junier
- Centre National de la Recherche Scientifique, Université Grenoble Alpes, TIMC-IMAG, F-38000 Grenoble, France
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, F-75005 Paris, France
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
46
|
Bundalovic-Torma C, Whitfield GB, Marmont LS, Howell PL, Parkinson J. A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries. PLoS Comput Biol 2020; 16:e1007721. [PMID: 32236097 PMCID: PMC7112194 DOI: 10.1371/journal.pcbi.1007721] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 02/11/2020] [Indexed: 12/20/2022] Open
Abstract
In bacteria functionally related genes comprising metabolic pathways and protein complexes are frequently encoded in operons and are widely conserved across phylogenetically diverse species. The evolution of these operon-encoded processes is affected by diverse mechanisms such as gene duplication, loss, rearrangement, and horizontal transfer. These mechanisms can result in functional diversification, increasing the potential evolution of novel biological pathways, and enabling pre-existing pathways to adapt to the requirements of particular environments. Despite the fundamental importance that these mechanisms play in bacterial environmental adaptation, a systematic approach for studying the evolution of operon organization is lacking. Herein, we present a novel method to study the evolution of operons based on phylogenetic clustering of operon-encoded protein families and genomic-proximity network visualizations of operon architectures. We applied this approach to study the evolution of the synthase dependent exopolysaccharide (EPS) biosynthetic systems: cellulose, acetylated cellulose, poly-β-1,6-N-acetyl-D-glucosamine (PNAG), Pel, and alginate. These polymers have important roles in biofilm formation, antibiotic tolerance, and as virulence factors in opportunistic pathogens. Our approach revealed the complex evolutionary landscape of EPS machineries, and enabled operons to be classified into evolutionarily distinct lineages. Cellulose operons show phyla-specific operon lineages resulting from gene loss, rearrangement, and the acquisition of accessory loci, and the occurrence of whole-operon duplications arising through horizonal gene transfer. Our evolution-based classification also distinguishes between PNAG production from Gram-negative and Gram-positive bacteria on the basis of structural and functional evolution of the acetylation modification domains shared by PgaB and IcaB loci, respectively. We also predict several pel-like operon lineages in Gram-positive bacteria and demonstrate in our companion paper (Whitfield et al PLoS Pathogens, in press) that Bacillus cereus produces a Pel-dependent biofilm that is regulated by cyclic-3',5'-dimeric guanosine monophosphate (c-di-GMP).
Collapse
Affiliation(s)
- Cedoljub Bundalovic-Torma
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - Gregory B. Whitfield
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - Lindsey S. Marmont
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - P. Lynne Howell
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
| | - John Parkinson
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
47
|
Rosana ARR, Whitford DS, Migur A, Steglich C, Kujat-Choy SL, Hess WR, Owttrim GW. RNA helicase-regulated processing of the Synechocystis rimO-crhR operon results in differential cistron expression and accumulation of two sRNAs. J Biol Chem 2020; 295:6372-6386. [PMID: 32209657 DOI: 10.1074/jbc.ra120.013148] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 03/19/2020] [Indexed: 12/21/2022] Open
Abstract
The arrangement of functionally-related genes in operons is a fundamental element of how genetic information is organized in prokaryotes. This organization ensures coordinated gene expression by co-transcription. Often, however, alternative genetic responses to specific stress conditions demand the discoordination of operon expression. During cold temperature stress, accumulation of the gene encoding the sole Asp-Glu-Ala-Asp (DEAD)-box RNA helicase in Synechocystis sp. PCC 6803, crhR (slr0083), increases 15-fold. Here, we show that crhR is expressed from a dicistronic operon with the methylthiotransferase rimO/miaB (slr0082) gene, followed by rapid processing of the operon transcript into two monocistronic mRNAs. This cleavage event is required for and results in destabilization of the rimO transcript. Results from secondary structure modeling and analysis of RNase E cleavage of the rimO-crhR transcript in vitro suggested that CrhR plays a role in enhancing the rate of the processing in an auto-regulatory manner. Moreover, two putative small RNAs are generated from additional processing, degradation, or both of the rimO transcript. These results suggest a role for the bacterial RNA helicase CrhR in RNase E-dependent mRNA processing in Synechocystis and expand the known range of organisms possessing small RNAs derived from processing of mRNA transcripts.
Collapse
Affiliation(s)
- Albert Remus R Rosana
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Denise S Whitford
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Anzhela Migur
- Faculty of Biology, University of Freiburg, Schänzlestrasse 1, D-79104 Freiburg, Germany
| | - Claudia Steglich
- Faculty of Biology, University of Freiburg, Schänzlestrasse 1, D-79104 Freiburg, Germany
| | - Sonya L Kujat-Choy
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| | - Wolfgang R Hess
- Faculty of Biology, University of Freiburg, Schänzlestrasse 1, D-79104 Freiburg, Germany.,Freiburg Institute for Advanced Studies, University of Freiburg, Albertstrasse 19, D-79104 Freiburg, Germany
| | - George W Owttrim
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| |
Collapse
|
48
|
Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep 2019; 9:19537. [PMID: 31863070 PMCID: PMC6925100 DOI: 10.1038/s41598-019-55984-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 12/02/2019] [Indexed: 01/01/2023] Open
Abstract
Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.
Collapse
|
49
|
Evaluation of specificity determinants in Mycobacterium tuberculosis σ/anti-σ factor interactions. Biochem Biophys Res Commun 2019; 521:900-906. [PMID: 31711645 DOI: 10.1016/j.bbrc.2019.10.198] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 10/31/2019] [Indexed: 01/11/2023]
Abstract
Extra Cytoplasmic Function (ECF) σ factor/regulatory protein (anti-σ factor) pairs govern environment mediated changes in gene expression in bacteria. The release of the ECF σ factor from an inactive σ/anti-σ factor complex is triggered by specific environmental stimuli. The free σ factor then associates with the RNA polymerase and drives the expression of genes in its target regulon. Multiple ECF σ/anti-σ pairs ensure calibrated changes in the expression profile by correlating diverse environmental stimuli with changes in the intracellular levels of different ECF σ factors. Specificity in σ/anti-σ factor interaction is thus essential for accurate signal transduction. Here we describe experiments to evaluate interactions between different M. tuberculosis σ and anti-σ proteins in vitro. The interaction parameters suggest that cross-talk between non-cognate σ/anti-σ pairs is likely. The sequence and conformational determinants that govern interaction specificity in a σ/anti-σ complex are not immediately evident due to substantial structural conservation. Sequence-structure analysis of all σ/anti-σ pairs suggest that conserved residues are not the primary determinants of σ/anti-σ interactions-a finding that suggests a potential route to set tolerance limits in interaction specificity. Non-specific σ/anti-σ interactions are likely to be biologically significant as it can contribute to heterogeneity in cellular responses in a bacterial population under less stringent requirements. This finding is relevant for synthetic biology approaches to engineer bacteria using σ/anti-σ transcription initiation modules for diverse applications in biotechnology.
Collapse
|
50
|
Lee T, Lee S, Yang S, Lee I. MaizeNet: a co-functional network for network-assisted systems genetics in Zea mays. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 99:571-582. [PMID: 31006149 DOI: 10.1111/tpj.14341] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/21/2019] [Accepted: 03/28/2019] [Indexed: 05/27/2023]
Abstract
Maize (Zea mays) has multiple uses in human food, animal fodder, starch and sweetener production and as a biofuel, and is accordingly the most extensively cultivated cereal worldwide. To enhance maize production, genetic factors underlying important agricultural traits, including stress tolerance and flowering, have been explored through forward and reverse genetics approaches. Co-functional gene networks are systems biology resources useful in identifying trait-associated genes in plants by prioritizing candidate genes. Here, we present MaizeNet (http://www.inetbio.org/maizenet/), a genome-scale co-functional network of Z. mays genes, and a companion web server for network-assisted systems genetics. We describe the validation of MaizeNet network quality and its ability to functionally predict molecular pathways and complex traits in maize. Furthermore, we demonstrate that MaizeNet-based prioritization of candidate genes can facilitate the identification of cell wall biosynthesis genes and detect network communities associated with flowering-time candidate genes derived from genome-wide association studies. The demonstrated gene prioritization and subnetwork analysis can be conducted by simply submitting maize gene models based on the commonly used B73 RefGen_v3 and the latest B73 RefGen_v4 reference genomes on the MaizeNet web server. MaizeNet-based network-assisted systems genetics will substantially accelerate the discovery of trait-associated genes for crop improvement.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Sungho Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| |
Collapse
|