1
|
Allert MJ, Kumar S, Wang Y, Beese LS, Hellinga HW. Accurate Identification of Periplasmic Urea-binding Proteins by Structure- and Genome Context-assisted Functional Analysis. J Mol Biol 2024; 436:168780. [PMID: 39241982 DOI: 10.1016/j.jmb.2024.168780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/29/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
ABC transporters are ancient and ubiquitous nutrient transport systems in bacteria and play a central role in defining lifestyles. Periplasmic solute-binding proteins (SBPs) are components that deliver ligands to their translocation machinery. SBPs have diversified to bind a wide range of ligands with high specificity and affinity. However, accurate assignment of cognate ligands remains a challenging problem in SBPs. Urea metabolism plays an important role in the nitrogen cycle; anthropogenic sources account for more than half of global nitrogen fertilizer. We report identification of urea-binding proteins within a large SBP sequence family that encodes diverse functions. By combining genetic linkage between SBPs, ABC transporter components, enzymes or transcription factors, we accurately identified cognate ligands, as we verified experimentally by biophysical characterization of ligand binding and crystallographic determination of the urea complex of a thermostable urea-binding homolog. Using three-dimensional structure information, these functional assignments were extrapolated to other members in the sequence family lacking genetic linkage information, which revealed that only a fraction bind urea. Using the same combined approaches, we also inferred that other family members bind various short-chain amides, aliphatic amino acids (leucine, isoleucine, valine), γ-aminobutyrate, and as yet unknown ligands. Comparative structural analysis revealed structural adaptations that encode diversification in these SBPs. Systematic assignment of ligands to SBP sequence families is key to understanding bacterial lifestyles, and also provides a rich source of biosensors for clinical and environmental analysis, such as the thermostable urea-binding protein identified here.
Collapse
Affiliation(s)
- Malin J Allert
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA.
| | - Shivesh Kumar
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA; Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, MO 63110, USA.
| | - You Wang
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA.
| | - Lorena S Beese
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA.
| | - Homme W Hellinga
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA.
| |
Collapse
|
2
|
Wei X, Tan H, Lobb B, Zhen W, Wu Z, Parks DH, Neufeld JD, Moreno-Hagelsieb G, Doxey AC. AnnoView enables large-scale analysis, comparison, and visualization of microbial gene neighborhoods. Brief Bioinform 2024; 25:bbae229. [PMID: 38747283 PMCID: PMC11094555 DOI: 10.1093/bib/bbae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 05/19/2024] Open
Abstract
The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.
Collapse
Affiliation(s)
- Xin Wei
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Huagang Tan
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Briallen Lobb
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - William Zhen
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Zijing Wu
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Donovan H Parks
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Brisbane, Australia
| | - Josh D Neufeld
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo, ON, Canada
| | - Andrew C Doxey
- Department of Biology and Waterloo Centre for Microbial Research, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
3
|
Zhang Y, Lang M, Jiang J, Gao Z, Xu F, Litfin T, Chen K, Singh J, Huang X, Song G, Tian Y, Zhan J, Chen J, Zhou Y. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res 2024; 52:e3. [PMID: 37941140 PMCID: PMC10783488 DOI: 10.1093/nar/gkad1031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/21/2023] [Indexed: 11/10/2023] Open
Abstract
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised multiple sequence alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap, as it can provide significantly more homologous sequences than manually annotated Rfam. We demonstrate that the resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM contain structural information. In fact, they can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks compared with existing state-of-the-art techniques including SPOT-RNA2 and RNAsnap2. By comparison, RNA-FM, a BERT-based RNA language model, performs worse than one-hot encoding with its embedding in base pair and solvent-accessible surface area prediction. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.
Collapse
Affiliation(s)
- Yikun Zhang
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzen 518055, China
| | - Mei Lang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jiuhong Jiang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Zhiqiang Gao
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Fan Xu
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| | - Ke Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jaswinder Singh
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | | | - Guoli Song
- Peng Cheng Laboratory, Shenzhen 518066, China
| | | | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jie Chen
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| |
Collapse
|
4
|
Yu Y, Rué Casamajo A, Finnigan W, Schnepel C, Barker R, Morrill C, Heath RS, De Maria L, Turner NJ, Scrutton NS. Structure-Based Design of Small Imine Reductase Panels for Target Substrates. ACS Catal 2023; 13:12310-12321. [PMID: 37736118 PMCID: PMC10510103 DOI: 10.1021/acscatal.3c02278] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/20/2023] [Indexed: 09/23/2023]
Abstract
Biocatalysis is important in the discovery, development, and manufacture of pharmaceuticals. However, the identification of enzymes for target transformations of interest requires major screening efforts. Here, we report a structure-based computational workflow to prioritize protein sequences by a score based on predicted activities on substrates, thereby reducing a resource-intensive laboratory-based biocatalyst screening. We selected imine reductases (IREDs) as a class of biocatalysts to illustrate the application of the computational workflow termed IREDFisher. Validation by using published data showed that IREDFisher can retrieve the best enzymes and increase the hit rate by identifying the top 20 ranked sequences. The power of IREDFisher is confirmed by computationally screening 1400 sequences for chosen reductive amination reactions with different levels of complexity. Highly active IREDs were identified by only testing 20 samples in vitro. Our speed test shows that it only takes 90 min to rank 85 sequences from user input and 30 min for the established IREDFisher database containing 591 IRED sequences. IREDFisher is available as a user-friendly web interface (https://enzymeevolver.com/IREDFisher). IREDFisher enables the rapid discovery of IREDs for applications in synthesis and directed evolution studies, with minimal time and resource expenditure. Future use of the workflow with other enzyme families could be implemented following the modification of the workflow scoring function.
Collapse
Affiliation(s)
- Yuqi Yu
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
- Augmented
Biologics Discovery & Design, Department of Biologics Engineering, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB21 6GH, U.K.
| | - Arnau Rué Casamajo
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - William Finnigan
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Christian Schnepel
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rhys Barker
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Charlotte Morrill
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Leonardo De Maria
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(RI), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 43150, Sweden
| | - Nicholas J. Turner
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nigel S. Scrutton
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
5
|
Zheng Y, Young ND, Song J, Gasser RB. Genome-Wide Analysis of Haemonchus contortus Proteases and Protease Inhibitors Using Advanced Informatics Provides Insights into Parasite Biology and Host-Parasite Interactions. Int J Mol Sci 2023; 24:12320. [PMID: 37569696 PMCID: PMC10418638 DOI: 10.3390/ijms241512320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Biodiversity within the animal kingdom is associated with extensive molecular diversity. The expansion of genomic, transcriptomic and proteomic data sets for invertebrate groups and species with unique biological traits necessitates reliable in silico tools for the accurate identification and annotation of molecules and molecular groups. However, conventional tools are inadequate for lesser-known organismal groups, such as eukaryotic pathogens (parasites), so that improved approaches are urgently needed. Here, we established a combined sequence- and structure-based workflow system to harness well-curated publicly available data sets and resources to identify, classify and annotate proteases and protease inhibitors of a highly pathogenic parasitic roundworm (nematode) of global relevance, called Haemonchus contortus (barber's pole worm). This workflow performed markedly better than conventional, sequence-based classification and annotation alone and allowed the first genome-wide characterisation of protease and protease inhibitor genes and gene products in this worm. In total, we identified 790 genes encoding 860 proteases and protease inhibitors representing 83 gene families. The proteins inferred included 280 metallo-, 145 cysteine, 142 serine, 121 aspartic and 81 "mixed" proteases as well as 91 protease inhibitors, all of which had marked physicochemical diversity and inferred involvements in >400 biological processes or pathways. A detailed investigation revealed a remarkable expansion of some protease or inhibitor gene families, which are likely linked to parasitism (e.g., host-parasite interactions, immunomodulation and blood-feeding) and exhibit stage- or sex-specific transcription profiles. This investigation provides a solid foundation for detailed explorations of the structures and functions of proteases and protease inhibitors of H. contortus and related nematodes, and it could assist in the discovery of new drug or vaccine targets against infections or diseases.
Collapse
Affiliation(s)
- Yuanting Zheng
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Neil D. Young
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC 3800, Australia;
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Robin B. Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| |
Collapse
|
6
|
Saikat ASM. Computational approaches for molecular characterization and structure-based functional elucidation of a hypothetical protein from Mycobacterium tuberculosis. Genomics Inform 2023; 21:e25. [PMID: 37415455 PMCID: PMC10326535 DOI: 10.5808/gi.23001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 07/08/2023] Open
Abstract
Adaptation of infections and hosts has resulted in several metabolic mechanisms adopted by intracellular pathogens to combat the defense responses and the lack of fuel during infection. Human tuberculosis caused by Mycobacterium tuberculosis (MTB) is the world's first cause of mortality tied to a single disease. This study aims to characterize and anticipate potential antigen characteristics for promising vaccine candidates for the hypothetical protein of MTB through computational strategies. The protein is associated with the catalyzation of dithiol oxidation and/or disulfide reduction because of the protein's anticipated disulfide oxidoreductase properties. This investigation analyzed the protein's physicochemical characteristics, protein-protein interactions, subcellular locations, anticipated active sites, secondary and tertiary structures, allergenicity, antigenicity, and toxicity properties. The protein has significant active amino acid residues with no allergenicity, elevated antigenicity, and no toxicity.
Collapse
Affiliation(s)
- Abu Saim Mohammad Saikat
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
| |
Collapse
|
7
|
Discovery and Biotechnological Exploitation of Glycoside-Phosphorylases. Int J Mol Sci 2022; 23:ijms23063043. [PMID: 35328479 PMCID: PMC8950772 DOI: 10.3390/ijms23063043] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 02/04/2023] Open
Abstract
Among carbohydrate active enzymes, glycoside phosphorylases (GPs) are valuable catalysts for white biotechnologies, due to their exquisite capacity to efficiently re-modulate oligo- and poly-saccharides, without the need for costly activated sugars as substrates. The reversibility of the phosphorolysis reaction, indeed, makes them attractive tools for glycodiversification. However, discovery of new GP functions is hindered by the difficulty in identifying them in sequence databases, and, rather, relies on extensive and tedious biochemical characterization studies. Nevertheless, recent advances in automated tools have led to major improvements in GP mining, activity predictions, and functional screening. Implementation of GPs into innovative in vitro and in cellulo bioproduction strategies has also made substantial advances. Herein, we propose to discuss the latest developments in the strategies employed to efficiently discover GPs and make the best use of their exceptional catalytic properties for glycoside bioproduction.
Collapse
|
8
|
Kondo R, Kasahara K, Takahashi T. Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank. Biophys Physicobiol 2022; 19:1-12. [PMID: 35532457 PMCID: PMC8926306 DOI: 10.2142/biophysico.bppb-v19.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/02/2022] [Indexed: 12/05/2022] Open
Abstract
Elucidating the principles of sequence-structure relationships of proteins is a long-standing issue in biology. The nature of a short segment of a protein is determined by both the subsequence of the segment itself and its environment. For example, a type of subsequence, the so-called chameleon sequences, can form different secondary structures depending on its environments. Chameleon sequences are considered to have a weak tendency to form a specific structure. Although many chameleon sequences have been identified, they are only a small part of all possible subsequences in the proteome. The strength of the tendency to take a specific structure for each subsequence has not been fully quantified. In this study, we comprehensively analyzed subsequences consisting of four to nine amino acid residues, or N-gram (4≤N≤9), observed in non-redundant sequences in the Protein Data Bank (PDB). Tendencies to form a specific structure in terms of the secondary structure and accessible surface area are quantified as information quantities for each N-gram. Although the majority of observed subsequences have low information quantity due to lack of samples in the current PDB, thousands of N-grams with strong tendencies, including known structural motifs, were found. In addition, machine learning partially predicted the tendency of unknown N-grams, and thus, this technique helps to extract knowledge from the limited number of samples in the PDB.
Collapse
Affiliation(s)
- Ryohei Kondo
- Graduate School of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan
| | - Kota Kasahara
- College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan
| | - Takuya Takahashi
- College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan
| |
Collapse
|
9
|
Cordas CM, Nguyen GS, Valério GN, Jønsson M, Söllner K, Aune IH, Wentzel A, Moura JJG. Discovery and characterization of a novel Dyp-type peroxidase from a marine actinobacterium isolated from Trondheim fjord, Norway. J Inorg Biochem 2021; 226:111651. [PMID: 34740038 DOI: 10.1016/j.jinorgbio.2021.111651] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/12/2021] [Accepted: 10/20/2021] [Indexed: 12/21/2022]
Abstract
A new dye-decolorizing peroxidase (DyP) was discovered through a data mining workflow based on HMMER software and profile Hidden Markov Model (HMM) using a dataset of 1200 genomes originated from a Actinobacteria strain collection isolated from Trondheim fjord. Instead of the conserved GXXDG motif known for Dyp-type peroxidases, the enzyme contains a new conserved motif EXXDG which has been not reported before. The enzyme can oxidize an anthraquinone dye Remazol Brilliant Blue R (Reactive Blue 19) and other phenolic compounds such as ferulic acid, sinapic acid, caffeic acid, 3-methylcatechol, dopamine hydrochloride, and tannic acid. The acidic pH optimum (3 to 4) and the low temperature optimum (25 °C) were confirmed using both biochemical and electrochemical assays. Kinetic and thermodynamic parameters associated with the catalytic redox center were attained by electrochemistry.
Collapse
Affiliation(s)
- Cristina M Cordas
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal.
| | - Giang-Son Nguyen
- Sustainable Biotechnology and Bioprospecting, Department of Biotechnology and Nanomedicine, SINTEF Industry, Norway.
| | - Gabriel N Valério
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
| | - Malene Jønsson
- Sustainable Biotechnology and Bioprospecting, Department of Biotechnology and Nanomedicine, SINTEF Industry, Norway
| | - Katharina Söllner
- Sustainable Biotechnology and Bioprospecting, Department of Biotechnology and Nanomedicine, SINTEF Industry, Norway
| | - Ingvild H Aune
- Sustainable Biotechnology and Bioprospecting, Department of Biotechnology and Nanomedicine, SINTEF Industry, Norway
| | - Alexander Wentzel
- Sustainable Biotechnology and Bioprospecting, Department of Biotechnology and Nanomedicine, SINTEF Industry, Norway
| | - José J G Moura
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, FCT NOVA, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal.
| |
Collapse
|
10
|
Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. PathFams: statistical detection of pathogen-associated protein domains. BMC Genomics 2021; 22:663. [PMID: 34521345 PMCID: PMC8442362 DOI: 10.1186/s12864-021-07982-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 09/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background A substantial fraction of genes identified within bacterial genomes encode proteins of unknown function. Identifying which of these proteins represent potential virulence factors, and mapping their key virulence determinants, is a challenging but important goal. Results To facilitate virulence factor discovery, we performed a comprehensive analysis of 17,929 protein domain families within the Pfam database, and scored them based on their overrepresentation in pathogenic versus non-pathogenic species, taxonomic distribution, relative abundance in metagenomic datasets, and other factors. Conclusions We identify pathogen-associated domain families, candidate virulence factors in the human gut, and eukaryotic-like mimicry domains with likely roles in virulence. Furthermore, we provide an interactive database called PathFams to allow users to explore pathogen-associated domains as well as identify pathogen-associated domains and domain architectures in user-uploaded sequences of interest. PathFams is freely available at https://pathfams.uwaterloo.ca. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07982-8.
Collapse
Affiliation(s)
- Briallen Lobb
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
| | | | | | - Andrew C Doxey
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.
| |
Collapse
|
11
|
Discovery and mining of enzymes from the human gut microbiome. Trends Biotechnol 2021; 40:240-254. [PMID: 34304905 DOI: 10.1016/j.tibtech.2021.06.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 12/19/2022]
Abstract
Advances in technological and bioinformatics approaches have led to the generation of a plethora of human gut metagenomic datasets. Metabolomics has also provided substantial data regarding the small metabolites produced and modified by the microbiota. Comparatively, the microbial enzymes mediating the transformation of metabolites have not been intensively investigated. Here, we discuss the recent efforts and technologies used for discovering and mining enzymes from the human gut microbiota. The wealth of knowledge on metabolites, reactions, genome sequences, and structures of proteins, may drive the development of strategies for enzyme mining. Ongoing efforts to annotate gut microbiota enzymes will explain catalytic mechanisms that may guide the clinical applications of the gut microbiome for diagnostic and therapeutic purposes.
Collapse
|
12
|
Li S, Cai C, Gong J, Liu X, Li H. A fast protein binding site comparison algorithm for proteome-wide protein function prediction and drug repurposing. Proteins 2021; 89:1541-1556. [PMID: 34245187 DOI: 10.1002/prot.26176] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 06/26/2021] [Accepted: 06/30/2021] [Indexed: 01/18/2023]
Abstract
The expansion of three-dimensional protein structures and enhanced computing power have significantly facilitated our understanding of protein sequence/structure/function relationships. A challenge in structural genomics is to predict the function of uncharacterized proteins. Protein function deconvolution based on global sequence or structural homology is impracticable when a protein relates to no other proteins with known function, and in such cases, functional relationships can be established by detecting their local ligand binding site similarity. Here, we introduce a sequence order-independent comparison algorithm, PocketShape, for structural proteome-wide exploration of protein functional site by fully considering the geometry of the backbones, orientation of the sidechains, and physiochemical properties of the pocket-lining residues. PocketShape is efficient in distinguishing similar from dissimilar ligand binding site pairs by retrieving 99.3% of the similar pairs while rejecting 100% of the dissimilar pairs on a dataset containing 1538 binding site pairs. This method successfully classifies 83 enzyme structures with diverse functions into 12 clusters, which is highly in accordance with the actual structural classification of proteins classification. PocketShape also achieves superior performances than other methods in protein profiling based on experimental data. Potential new applications for representative SARS-CoV-2 drugs Remdesivir and 11a are predicted. The high accuracy and time-efficient characteristics of PocketShape will undoubtedly make it a promising complementary tool for proteome-wide protein function inference and drug repurposing study.
Collapse
Affiliation(s)
- Shiliang Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Chaoqian Cai
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Jiayu Gong
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Xiaofeng Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China.,Research and Development Department, Jiangzhong Pharmaceutical Co., Ltd., Nanchang, China
| |
Collapse
|
13
|
Tremblay BJM, Lobb B, Doxey AC. PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling. Bioinformatics 2021; 37:17-22. [PMID: 33416870 DOI: 10.1093/bioinformatics/btaa1105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 12/26/2020] [Accepted: 12/29/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Statistical detection of co-occurring genes across genomes, known as "phylogenetic profiling", is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation, and substantial computational requirements. RESULTS We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154,217,052 comparisons for 28,315 genes across 27,372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM, or KEGG query. In total, PhyloCorrelate detected 29,762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function. AVAILABILITY PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Briallen Lobb
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| | - Andrew C Doxey
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
14
|
Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom 2020; 6. [PMID: 32124724 PMCID: PMC7200070 DOI: 10.1099/mgen.0.000341] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Although gene-finding in bacterial genomes is relatively straightforward, the automated assignment of gene function is still challenging, resulting in a vast quantity of hypothetical sequences of unknown function. But how prevalent are hypothetical sequences across bacteria, what proportion of genes in different bacterial genomes remain unannotated, and what factors affect annotation completeness? To address these questions, we surveyed over 27 000 bacterial genomes from the Genome Taxonomy Database, and measured genome annotation completeness as a function of annotation method, taxonomy, genome size, 'research bias' and publication date. Our analysis revealed that 52 and 79 % of the average bacterial proteome could be functionally annotated based on protein and domain-based homology searches, respectively. Annotation coverage using protein homology search varied significantly from as low as 14 % in some species to as high as 98 % in others. We found that taxonomy is a major factor influencing annotation completeness, with distinct trends observed across the microbial tree (e.g. the lowest level of completeness was found in the Patescibacteria lineage). Most lineages showed a significant association between genome size and annotation incompleteness, likely reflecting a greater degree of uncharacterized sequences in 'accessory' proteomes than in 'core' proteomes. Finally, research bias, as measured by publication volume, was also an important factor influencing genome annotation completeness, with early model organisms showing high completeness levels relative to other genomes in their own taxonomic lineages. Our work highlights the disparity in annotation coverage across the bacterial tree of life and emphasizes a need for more experimental characterization of accessory proteomes as well as understudied lineages.
Collapse
Affiliation(s)
- Briallen Lobb
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| | | | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo, ON, Canada
| | - Andrew C Doxey
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
15
|
Levine TP. Structural bioinformatics predicts that the Retinitis Pigmentosa-28 protein of unknown function FAM161A is a homologue of the microtubule nucleation factor Tpx2. F1000Res 2020; 9:1052. [PMID: 33093951 PMCID: PMC7551519 DOI: 10.12688/f1000research.25870.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/13/2020] [Indexed: 11/20/2022] Open
Abstract
Background: FAM161A is a microtubule-associated protein conserved widely across eukaryotes, which is mutated in the inherited blinding disease Retinitis Pigmentosa-28. FAM161A is also a centrosomal protein, being a core component of a complex that forms an internal skeleton of centrioles. Despite these observations about the importance of FAM161A, current techniques used to examine its sequence reveal no homologies to other proteins. Methods: Sequence profiles derived from multiple sequence alignments of FAM161A homologues were constructed by PSI-BLAST and HHblits, and then used by the profile-profile search tool HHsearch, implemented online as HHpred, to identify homologues. These in turn were used to create profiles for reverse searches and pair-wise searches. Multiple sequence alignments were also used to identify amino acid usage in functional elements. Results: FAM161A has a single homologue: the targeting protein for
Xenopus kinesin-like protein-2 (Tpx2), which is a strong hit across more than 200 residues. Tpx2 is also a microtubule-associated protein, and it has been shown previously by a cryo-EM molecular structure to nucleate microtubules through two small elements: an extended loop and a short helix. The homology between FAM161A and Tpx2 includes these elements, as FAM161A has three copies of the loop, and one helix that has many, but not all, properties of the one in Tpx2. Conclusions: FAM161A and its homologues are predicted to be a previously unknown variant of Tpx2, and hence bind microtubules in the same way. This prediction allows precise, testable molecular models to be made of FAM161A-microtubule complexes.
Collapse
Affiliation(s)
- Timothy P Levine
- UCL Institute of Ophthalmology, University College London, London, EC1V 9EL, UK
| |
Collapse
|
16
|
Li T, Cui X, Cui Y, Sun J, Chen Y, Zhu T, Li C, Li R, Wu B. Exploration of Transaminase Diversity for the Oxidative Conversion of Natural Amino Acids into 2-Ketoacids and High-Value Chemicals. ACS Catal 2020. [DOI: 10.1021/acscatal.0c01895] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Tao Li
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100101, PR China
| | - Xuexian Cui
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100101, PR China
| | - Yinglu Cui
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
| | - Jinyuan Sun
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
| | - Yanchun Chen
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100101, PR China
| | - Tong Zhu
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100101, PR China
| | - Chuijian Li
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
| | - Ruifeng Li
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100101, PR China
| | - Bian Wu
- CAS Key Laboratory of Microbial Physiological and Metabolic Engineering, State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, PR China
| |
Collapse
|
17
|
Rajkovic A, Jovanovic J, Monteiro S, Decleer M, Andjelkovic M, Foubert A, Beloglazova N, Tsilla V, Sas B, Madder A, De Saeger S, Uyttendaele M. Detection of toxins involved in foodborne diseases caused by Gram‐positive bacteria. Compr Rev Food Sci Food Saf 2020; 19:1605-1657. [DOI: 10.1111/1541-4337.12571] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 04/10/2020] [Accepted: 04/14/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Andreja Rajkovic
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
| | - Jelena Jovanovic
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
| | - Silvia Monteiro
- Laboratorio Analises, Instituto Superior TecnicoUniversidade de Lisboa Lisbon Portugal
| | - Marlies Decleer
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
- Laboratory of Food Analysis, Department of Bioanalysis, Faculty of Pharmaceutical SciencesGhent University Ghent Belgium
| | - Mirjana Andjelkovic
- Operational Directorate Food, Medicines and Consumer SafetyService for Chemical Residues and Contaminants Brussels Belgium
| | - Astrid Foubert
- Laboratory of Food Analysis, Department of Bioanalysis, Faculty of Pharmaceutical SciencesGhent University Ghent Belgium
| | - Natalia Beloglazova
- Laboratory of Food Analysis, Department of Bioanalysis, Faculty of Pharmaceutical SciencesGhent University Ghent Belgium
- Nanotechnology Education and Research CenterSouth Ural State University Chelyabinsk Russia
| | - Varvara Tsilla
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
| | - Benedikt Sas
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
| | - Annemieke Madder
- Laboratorium for Organic and Biomimetic Chemistry, Department of Organic and Macromolecular ChemistryGhent University Ghent Belgium
| | - Sarah De Saeger
- Laboratory of Food Analysis, Department of Bioanalysis, Faculty of Pharmaceutical SciencesGhent University Ghent Belgium
| | - Mieke Uyttendaele
- Laboratory of Food Microbiology and Food Preservation, Department of Food Technology, Safety and Health, Faculty of Bioscience EngineeringGhent University Ghent Belgium
| |
Collapse
|
18
|
Kasahara K, Terazawa H, Takahashi T, Higo J. Studies on Molecular Dynamics of Intrinsically Disordered Proteins and Their Fuzzy Complexes: A Mini-Review. Comput Struct Biotechnol J 2019; 17:712-720. [PMID: 31303975 PMCID: PMC6603302 DOI: 10.1016/j.csbj.2019.06.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 05/29/2019] [Accepted: 06/11/2019] [Indexed: 11/19/2022] Open
Abstract
The molecular dynamics (MD) method is a promising approach toward elucidating the molecular mechanisms of intrinsically disordered regions (IDRs) of proteins and their fuzzy complexes. This mini-review introduces recent studies that apply MD simulations to investigate the molecular recognition of IDRs. Firstly, methodological issues by which MD simulations treat IDRs, such as developing force fields, treating periodic boundary conditions, and enhanced sampling approaches, are discussed. Then, several examples of the applications of MD to investigate molecular interactions of IDRs in terms of the two kinds of complex formations; coupled-folding and binding and fuzzy complex. MD simulations provide insight into the molecular mechanisms of these binding processes by sampling conformational ensembles of flexible IDRs. In particular, we focused on all-atom explicit-solvent MD simulations except for studies of higher-order assembly of IDRs. Recent advances in MD methods, and computational power make it possible to dissect the molecular details of realistic molecular systems involving the dynamic behavior of IDRs.
Collapse
Affiliation(s)
- Kota Kasahara
- College of Life Sciences, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu, Shiga 525-8577, Japan
- Corresponding author.
| | - Hiroki Terazawa
- Graduate School of Life Sciences, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu, Shiga 525-8577, Japan
| | - Takuya Takahashi
- College of Life Sciences, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu, Shiga 525-8577, Japan
| | - Junichi Higo
- Graduate School of Simulation Studies, University of Hyogo, 7-1-28 Minatojima-minamimachi, Chuo-ku, Kobe 650-0047, Japan
| |
Collapse
|
19
|
A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction. Genes (Basel) 2019; 10:genes10010057. [PMID: 30658497 PMCID: PMC6356783 DOI: 10.3390/genes10010057] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 01/01/2019] [Accepted: 01/10/2019] [Indexed: 11/25/2022] Open
Abstract
With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function.
Collapse
|
20
|
Kasahara K, Minami S, Aizawa Y. Characteristics of interactions at protein segments without non-local intramolecular contacts in the Protein Data Bank. PLoS One 2018; 13:e0205052. [PMID: 30537764 PMCID: PMC6289587 DOI: 10.1371/journal.pone.0205052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 11/23/2018] [Indexed: 12/20/2022] Open
Abstract
The principle of three-dimensional protein structure formation is a long-standing conundrum in structural biology. A globular domain of a soluble protein is formed by a network of atomic contacts among amino acid residues, but regions without intramolecular non-local contacts are often observed in the protein structure, especially in loop, linker, and peripheral segments with secondary structures. Although these regions can play key roles for protein function as interfaces for intermolecular interactions, their nature remains unclear. Here, we termed protein segments without non-local contacts as floating segments and sought them in tens of thousands of entries in the Protein Data Bank. As a result, we found that 0.72% of residues are in floating segments. Regarding secondary structural elements, coil structures are enriched in floating segments, especially for long segments. Interactions with polypeptides and polynucleotides, but not chemical compounds, are enriched in floating segments. The amino acid preferences of floating segments are similar to those of surface residues, with exceptions; the small side chain amino acids, Gly and Ala, are preferred, and some charged side chains, Arg and His, are disfavored for floating segments compared to surface residues. Our comprehensive characterization of floating segments may provide insights into understanding protein sequence-structure-function relationships.
Collapse
Affiliation(s)
- Kota Kasahara
- College of Life Sciences, Ritsumeikan University, Noji-higashi, Kusatsu, Shiga, Japan
| | - Shintaro Minami
- Exploratory Research Center on Life and Living Systems, National Institutes for Natural Sciences, Myodaiji, Okazaki, Aichi, Japan
| | - Yasunori Aizawa
- School of Life Science and Technology, Tokyo Institute of Technology, Nagatsuda-cho, Midori-ku, Yokohama, Kanagawa, Japan
| |
Collapse
|
21
|
Wyman SK, Avila-Herrera A, Nayfach S, Pollard KS. A most wanted list of conserved microbial protein families with no known domains. PLoS One 2018; 13:e0205749. [PMID: 30332487 PMCID: PMC6192648 DOI: 10.1371/journal.pone.0205749] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 10/01/2018] [Indexed: 02/07/2023] Open
Abstract
The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a "most wanted" list of genes to prioritize for further characterization.
Collapse
Affiliation(s)
- Stacia K. Wyman
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, Berkeley, CA, United States of America
| | - Aram Avila-Herrera
- Gladstone Institutes, San Francisco, CA, United States of America
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Stephen Nayfach
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, San Francisco, CA, United States of America
- DOE Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA, United States of America
- University of California, San Francisco, CA, United States of America
- Chan-Zuckerberg Biohub, San Francisco, CA, United States of America
- * E-mail:
| |
Collapse
|
22
|
Pienaar R, Neitz AWH, Mans BJ. Tick Paralysis: Solving an Enigma. Vet Sci 2018; 5:E53. [PMID: 29757990 PMCID: PMC6024606 DOI: 10.3390/vetsci5020053] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 05/04/2018] [Accepted: 05/09/2018] [Indexed: 11/17/2022] Open
Abstract
In comparison to other arachnids, ticks are major vectors of disease, but less than 8% of the known species are capable of inducing paralysis, as compared to the ~99⁻100% arachnids that belong to venomous classes. When considering the potential monophyly of venomous Arachnida, this review reflects on the implications regarding the classification of ticks as venomous animals and the possible origin of toxins. The origin of tick toxins is compared with scorpion and spider toxins and venoms based on their significance, functionality, and structure in the search to find homologous venomous characters. Phenotypic evaluation of paralysis, as caused by different ticks, demonstrated the need for expansion on existing molecular data of pure isolated tick toxins because of differences and discrepancies in available data. The use of in-vivo, in-vitro, and in-silico assays for the purification and characterization of paralysis toxins were critically considered, in view of what may be considered to be a paralysis toxin. Purified toxins should exhibit physiologically relevant activity to distinguish them from other tick-derived proteins. A reductionist approach to identify defined tick proteins will remain as paramount in the search for defined anti-paralysis vaccines.
Collapse
Affiliation(s)
- Ronel Pienaar
- Epidemiology, Parasites and Vectors, Agricultural Research Council⁻Onderstepoort Veterinary Research, Onderstepoort, Pretoria 0110, South Africa.
- Department of Veterinary Tropical Diseases, Faculty of Veterinary Science, University of Pretoria, Onderstepoort, Pretoria 0110, South Africa.
| | - Albert W H Neitz
- Division of Biochemistry, University of Pretoria, Hatfield, Pretoria 0028, South Africa.
| | - Ben J Mans
- Epidemiology, Parasites and Vectors, Agricultural Research Council⁻Onderstepoort Veterinary Research, Onderstepoort, Pretoria 0110, South Africa.
- Department of Veterinary Tropical Diseases, Faculty of Veterinary Science, University of Pretoria, Onderstepoort, Pretoria 0110, South Africa.
- Department of Life and Consumer Sciences, University of South Africa, Florida, Johannesburg 1710, South Africa.
| |
Collapse
|
23
|
Vanacek P, Sebestova E, Babkova P, Bidmanova S, Daniel L, Dvorak P, Stepankova V, Chaloupkova R, Brezovsky J, Prokop Z, Damborsky J. Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catal 2018. [DOI: 10.1021/acscatal.7b03523] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Pavel Vanacek
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Eva Sebestova
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Petra Babkova
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Sarka Bidmanova
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Lukas Daniel
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Pavel Dvorak
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Veronika Stepankova
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- Enantis
Ltd., Biotechnology Incubator INBIT, Kamenice 34, 625 00 Brno, Czech Republic
| | - Radka Chaloupkova
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- Enantis
Ltd., Biotechnology Incubator INBIT, Kamenice 34, 625 00 Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- Enantis
Ltd., Biotechnology Incubator INBIT, Kamenice 34, 625 00 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and Research Centre
for Toxic Compounds in the Environment RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International
Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
24
|
Discovery of novel bacterial toxins by genomics and computational biology. Toxicon 2018; 147:2-12. [PMID: 29438679 DOI: 10.1016/j.toxicon.2018.02.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 12/23/2017] [Accepted: 02/07/2018] [Indexed: 12/13/2022]
Abstract
Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes.
Collapse
|
25
|
Discovery of a proteolytic flagellin family in diverse bacterial phyla that assembles enzymatically active flagella. Nat Commun 2017; 8:521. [PMID: 28900095 PMCID: PMC5595980 DOI: 10.1038/s41467-017-00599-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Accepted: 07/12/2017] [Indexed: 01/01/2023] Open
Abstract
Bacterial flagella are cell locomotion and occasional adhesion organelles composed primarily of the polymeric protein flagellin, but to date have not been associated with any enzymatic function. Here, we report the bioinformatics-driven discovery of a class of enzymatic flagellins that assemble to form proteolytically active flagella. Originating by a metallopeptidase insertion into the central flagellin hypervariable region, this flagellin family has expanded to at least 74 bacterial species. In the pathogen, Clostridium haemolyticum, metallopeptidase-containing flagellin (which we termed flagellinolysin) is the second most abundant protein in the flagella and is localized to the extracellular flagellar surface. Purified flagellar filaments and recombinant flagellin exhibit proteolytic activity, cleaving nearly 1000 different peptides. With ~ 20,000 flagellin copies per ~ 10-μm flagella this assembles the largest proteolytic complex known. Flagellum-mediated extracellular proteolysis expands our understanding of the functional plasticity of bacterial flagella, revealing this family as enzymatic biopolymers that mediate interactions with diverse peptide substrates. So far no enzymatic activity has been attributed to flagellin, the major component of bacterial flagella. Here the authors use bioinformatic analysis and identify a metallopeptidase insertion in flagellins from 74 bacterial species and show that recombinant flagellin and flagellar filaments have proteolytic activity.
Collapse
|
26
|
Marcu O, Dodson EJ, Alam N, Sperber M, Kozakov D, Lensink MF, Schueler-Furman O. FlexPepDock lessons from CAPRI peptide-protein rounds and suggested new criteria for assessment of model quality and utility. Proteins 2017; 85:445-462. [PMID: 28002624 PMCID: PMC6618814 DOI: 10.1002/prot.25230] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 11/15/2016] [Accepted: 11/23/2016] [Indexed: 12/21/2022]
Abstract
CAPRI rounds 28 and 29 included, for the first time, peptide-receptor targets of three different systems, reflecting increased appreciation of the importance of peptide-protein interactions. The CAPRI rounds allowed us to objectively assess the performance of Rosetta FlexPepDock, one of the first protocols to explicitly include peptide flexibility in docking, accounting for peptide conformational changes upon binding. We discuss here successes and challenges in modeling these targets: we obtain top-performing, high-resolution models of the peptide motif for cases with known binding sites but there is a need for better modeling of flanking regions, as well as better selection criteria, in particular for unknown binding sites. These rounds have also provided us the opportunity to reassess the success criteria, to better reflect the quality of a peptide-protein complex model. Using all models submitted to CAPRI, we analyze the correlation between current classification criteria and the ability to retrieve critical interface features, such as hydrogen bonds and hotspots. We find that loosening the backbone (and ligand) RMSD threshold, together with a restriction on the side chain RMSD measure, allows us to improve the selection of high-accuracy models. We also suggest a new measure to assess interface hydrogen bond recovery, which is not assessed by the current CAPRI criteria. Finally, we find that surprisingly much can be learned from rather inaccurate models about binding hotspots, suggesting that the current status of peptide-protein docking methods, as reflected by the submitted CAPRI models, can already have a significant impact on our understanding of protein interactions. Proteins 2017; 85:445-462. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Orly Marcu
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, the Hebrew University of Jerusalem, Israel
| | - Emma-Joy Dodson
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, the Hebrew University of Jerusalem, Israel
| | - Nawsad Alam
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, the Hebrew University of Jerusalem, Israel
| | - Michal Sperber
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, the Hebrew University of Jerusalem, Israel
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brooks University, Stony Brook, New York, 11794
| | - Marc F Lensink
- University of Lille, CNRS UMR8576 UGSF, Lille, 59000, France
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, the Hebrew University of Jerusalem, Israel
| |
Collapse
|
27
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
28
|
Using natural sequences and modularity to design common and novel protein topologies. Curr Opin Struct Biol 2016; 38:26-36. [PMID: 27270240 DOI: 10.1016/j.sbi.2016.05.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 05/13/2016] [Accepted: 05/18/2016] [Indexed: 02/07/2023]
Abstract
Protein design is still a challenging undertaking, often requiring multiple attempts or iterations for success. Typically, the source of failure is unclear, and scoring metrics appear similar between successful and failed cases. Nevertheless, the use of sequence statistics, modularity and symmetry from natural proteins, combined with computational design both at the coarse-grained and atomistic levels is propelling a new wave of design efforts to success. Here we highlight recent examples of design, showing how the wealth of natural protein sequence and topology data may be leveraged to reduce the search space and increase the likelihood of achieving desired outcomes.
Collapse
|