1
|
Li J, Xia R, Huang WC, Gu J, Li M. DUF99 family proteins are novel endonucleases that cleave deoxyuridine on DNA substrates. J Biol Chem 2024; 300:107901. [PMID: 39426726 PMCID: PMC11585767 DOI: 10.1016/j.jbc.2024.107901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 10/09/2024] [Accepted: 10/12/2024] [Indexed: 10/21/2024] Open
Abstract
DNA deamination occurs constantly in a cell and causes DNA damage. As this damage can be deleterious, organisms have evolved many systems to eliminate it, such as Endonuclease V (Endo V). DUF99 family protein contains a domain of unknown function similar to Endo V but has not been experimentally characterized to date. Here, we show that DUF99 family proteins cleave the 3'-side of deoxyuridine (dU) on DNA substrates. Based on phylogenetic analysis, we designated this new protein family as Endonuclease dU (Endo_dU). We also observed that Endo_dU coding gene frequently colocalizes with that of uracil-DNA glycosylase (UDG) in halophilic archaea, and we further performed gene knockout of Endo_dU gene on Haloferax volcanii. The transcription level of UDG gene on Endo_dU knockout strain was increased when induced by sodium bisulfite. Thus, we hypothesize that Endo_dU establishes a new endonuclease family with broad phylogenetic distribution and may participate in DNA repair.
Collapse
Affiliation(s)
- Jinquan Li
- Archaeal Biology Centre, Synthetic Biology Research Center, Shenzhen Key Laboratory of Marine Microbiome Engineering, Key Laboratory of Marine Microbiome Engineering of Guangdong Higher Education Institutes, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Runyue Xia
- Archaeal Biology Centre, Synthetic Biology Research Center, Shenzhen Key Laboratory of Marine Microbiome Engineering, Key Laboratory of Marine Microbiome Engineering of Guangdong Higher Education Institutes, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Wen-Cong Huang
- Archaeal Biology Centre, Synthetic Biology Research Center, Shenzhen Key Laboratory of Marine Microbiome Engineering, Key Laboratory of Marine Microbiome Engineering of Guangdong Higher Education Institutes, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Jiazheng Gu
- Archaeal Biology Centre, Synthetic Biology Research Center, Shenzhen Key Laboratory of Marine Microbiome Engineering, Key Laboratory of Marine Microbiome Engineering of Guangdong Higher Education Institutes, Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Meng Li
- Archaeal Biology Centre, Synthetic Biology Research Center, Shenzhen Key Laboratory of Marine Microbiome Engineering, Key Laboratory of Marine Microbiome Engineering of Guangdong Higher Education Institutes, Institute for Advanced Study, Shenzhen University, Shenzhen, China.
| |
Collapse
|
2
|
Vishwakarma A, Padmashali N, Thiyagarajan S. AnnoDUF: A Web-Based Tool for Annotating Functions of Proteins Having Domains of Unknown Function. J Proteome Res 2024; 23:4296-4302. [PMID: 39215721 DOI: 10.1021/acs.jproteome.4c00251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
The rapid expansion of biological sequence databases due to high-throughput genomic and proteomic sequencing methods has left a considerable number of identified protein sequences with unclear or incomplete functional annotations. Domains of unknown function (DUFs) are protein domains that lack functional annotations but are present in numerous proteins. To address the challenge of finding functional annotations for DUFs, we have developed a computational method that efficiently identifies and annotates these enigmatic protein domains by utilizing the position-specific iterative basic local alignment search tool (PSI-BLAST) and data mining techniques. Our pipeline identifies putative potential functionalities of DUFs, thereby decreasing the gap between known sequences and functions. The tool can also take user input sequences to annotate. We executed our pipeline on 5111 unique DUF sequences obtained from Pfam, resulting in putative annotations for 2007 of these. These annotations were subsequently incorporated into a comprehensive database and interfaced with a web-based server named "AnnoDUF". AnnoDUF is freely accessible to both academic and industrial users, via the World Wide Web at the link http://bts.ibab.ac.in/annoduf.php. All scripts used in this study are uploaded to the GitHub repository, and these can be accessed from https://github.com/BioToolSuite/AnnoDUF.
Collapse
Affiliation(s)
- Aman Vishwakarma
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Electronic City Phase 1, Bengaluru 560100, KA, India
| | - Namrata Padmashali
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Electronic City Phase 1, Bengaluru 560100, KA, India
| | - Saravanamuthu Thiyagarajan
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Electronic City Phase 1, Bengaluru 560100, KA, India
| |
Collapse
|
3
|
Mamun TI, Bourhia M, Neoaj T, Akash S, Azad MAK, Hossain MS, Rahman MM, Bin Jardan YA, Ibenmoussa S, Sitotaw B. Structure based functional identification of an uncharacterized protein from Coxiella burnetii involved in adipogenesis. Sci Rep 2024; 14:16789. [PMID: 39039093 PMCID: PMC11263603 DOI: 10.1038/s41598-024-66072-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 06/26/2024] [Indexed: 07/24/2024] Open
Abstract
Coxiella burnetii, the causative agent of Q fever, is an intracellular pathogen posing a significant global public health threat. There is a pressing need for dependable and effective treatments, alongside an urgency for further research into the molecular characterization of its genome. Within the genomic landscape of Coxiella burnetii, numerous hypothetical proteins remain unidentified, underscoring the necessity for in-depth study. In this study, we conducted comprehensive in silico analyses to identify and prioritize potential hypothetical protein of Coxiella burnetii, aiming to elucidate the structure and function of uncharacterized protein. Furthermore, we delved into the physicochemical properties, localization, and molecular dynamics and simulations, and assessed the primary, secondary, and tertiary structures employing a variety of bioinformatics tools. The in-silico analysis revealed that the uncharacterized protein contains a conserved Mth938-like domain, suggesting a role in preadipocyte differentiation and adipogenesis. Subcellular localization predictions indicated its presence in the cytoplasm, implicating a significant role in cellular processes. Virtual screening identified ligands with high binding affinities, suggesting the protein's potential as a drug target against Q fever. Molecular dynamics simulations confirmed the stability of these complexes, indicating their therapeutic relevance. The findings provide a structural and functional overview of an uncharacterized protein from C. burnetii, implicating it in adipogenesis. This study underscores the power of in-silico approaches in uncovering the biological roles of uncharacterized proteins and facilitating the discovery of new therapeutic strategies. The findings provide valuable preliminary data for further investigation into the protein's role in adipogenesis.
Collapse
Affiliation(s)
- Tajul Islam Mamun
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Mohammed Bourhia
- Laboratory of Biotechnology and Natural Resources Valorization, Faculty of Sciences, Ibn Zohr University, 80060, Agadir, Morocco.
| | - Taufiq Neoaj
- Department of Pharmacology and Toxicology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Shopnil Akash
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Birulia, Ashulia, Dhaka, 1216, Bangladesh
| | - Md A K Azad
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Birulia, Ashulia, Dhaka, 1216, Bangladesh
| | - Md Sarowar Hossain
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Birulia, Ashulia, Dhaka, 1216, Bangladesh
- Faculty of Pharmaceutical Science, Assam Down Town University, Guwahati, Assam, India
| | - Md Masudur Rahman
- Department of Pathology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Yousef A Bin Jardan
- Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 11451, Riyadh, Saudi Arabia
| | - Samir Ibenmoussa
- Laboratory of Therapeutic and Organic Chemistry, Faculty of Pharmacy, University of Montpellier, 34000, Montpellier, France
| | - Baye Sitotaw
- Department of Biology, Bahir Dar University, P.O. Box 79, Bahir Dar, Ethiopia.
| |
Collapse
|
4
|
Luo C, Akhtar M, Min W, Bai X, Ma T, Liu C. Domain of unknown function (DUF) proteins in plants: function and perspective. PROTOPLASMA 2024; 261:397-410. [PMID: 38158398 DOI: 10.1007/s00709-023-01917-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 12/08/2023] [Indexed: 01/03/2024]
Abstract
Domains of unknown function (DUFs), which are deposited in the protein family database (Pfam), are protein domains with conserved amino acid sequences and uncharacterized functions. Proteins with the same DUF were classified as DUF families. Although DUF families are generally not essential for the survival of plants, they play roles in plant development and adaptation. Characterizing the functions of DUFs is important for deciphering biological puzzles. DUFs were generally studied through forward and reverse genetics. Some novelty approaches, especially the determination of crystal structures and interaction partners of the DUFs, should attract more attention. This review described the identification of DUF genes by genome-wide and transcriptome-wide analyses, summarized the function of DUF-containing proteins, and addressed the prospects for future studies in DUFs in plants.
Collapse
Affiliation(s)
- Chengke Luo
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
| | - Maryam Akhtar
- College of Life Sciences, Northwest Normal University, Lanzhou, 730070, China
| | - Weifang Min
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
| | - Xiaorong Bai
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
| | - Tianli Ma
- School of Agriculture, Ningxia University, Yinchuan, 750021, China
| | - Caixia Liu
- School of Agriculture, Ningxia University, Yinchuan, 750021, China.
| |
Collapse
|
5
|
McKay CE, Cheng J, Tanner JJ. Crystal structure of domain of unknown function 507 (DUF507) reveals a new protein fold. Sci Rep 2023; 13:13496. [PMID: 37596303 PMCID: PMC10439177 DOI: 10.1038/s41598-023-40558-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 08/12/2023] [Indexed: 08/20/2023] Open
Abstract
The crystal structure of the domain of unknown function family 507 protein from Aquifex aeolicus is reported (AaDUF507, UniProt O67633, 183 residues). The structure was determined in two space groups (C2221 and P3221) at 1.9 Å resolution. The phase problem was solved by molecular replacement using an AlphaFold model as the search model. AaDUF507 is a Y-shaped α-helical protein consisting of an anti-parallel 4-helix bundle base and two helical arms that extend 30-Å from the base. The two crystal structures differ by a 25° rigid body rotation of the C-terminal arm. The tertiary structure exhibits pseudo-twofold symmetry. The structural symmetry mirrors internal sequence similarity: residues 11-57 and 102-148 are 30% identical and 53% similar with an E-value of 0.002. In one of the structures, electron density for an unknown ligand, consistent with nicotinamide or similar molecule, may indicate a functional site. Docking calculations suggest potential ligand binding hot spots in the region between the helical arms. Structure-based query of the Protein Data Bank revealed no other protein with a similar tertiary structure, leading us to propose that AaDUF507 represents a new protein fold.
Collapse
Affiliation(s)
- Cole E McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA.
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
6
|
Rizos I, Debeljak P, Finet T, Klein D, Ayata SD, Not F, Bittner L. Beyond the limits of the unassigned protist microbiome: inferring large-scale spatio-temporal patterns of Syndiniales marine parasites. ISME COMMUNICATIONS 2023; 3:16. [PMID: 36854980 PMCID: PMC9975217 DOI: 10.1038/s43705-022-00203-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 03/02/2023]
Abstract
Marine protists are major components of the oceanic microbiome that remain largely unrepresented in culture collections and genomic reference databases. The exploration of this uncharted protist diversity in oceanic communities relies essentially on studying genetic markers from the environment as taxonomic barcodes. Here we report that across 6 large scale spatio-temporal planktonic surveys, half of the genetic barcodes remain taxonomically unassigned at the genus level, preventing a fine ecological understanding for numerous protist lineages. Among them, parasitic Syndiniales (Dinoflagellata) appear as the least described protist group. We have developed a computational workflow, integrating diverse 18S rDNA gene metabarcoding datasets, in order to infer large-scale ecological patterns at 100% similarity of the genetic marker, overcoming the limitation of taxonomic assignment. From a spatial perspective, we identified 2171 unassigned clusters, i.e., Syndiniales sequences with 100% similarity, exclusively shared between the Tropical/Subtropical Ocean and the Mediterranean Sea among all Syndiniales orders and 25 ubiquitous clusters shared within all the studied marine regions. From a temporal perspective, over 3 time-series, we highlighted 39 unassigned clusters that follow rhythmic patterns of recurrence and are the best indicators of parasite community's variation. These clusters withhold potential as ecosystem change indicators, mirroring their associated host community responses. Our results underline the importance of Syndiniales in structuring planktonic communities through space and time, raising questions regarding host-parasite association specificity and the trophic mode of persistent Syndiniales, while providing an innovative framework for prioritizing unassigned protist taxa for further description.
Collapse
Affiliation(s)
- Iris Rizos
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France.
- Sorbonne Université, CNRS, AD2M-UMR7144 Station Biologique de Roscoff, 29680, Roscoff, France.
| | - Pavla Debeljak
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Thomas Finet
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Dylan Klein
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Sakina-Dorothée Ayata
- Sorbonne Université, Laboratoire d'Océanographie et du Climat: Expérimentation et Analyses Numériques (LOCEAN, SU/CNRS/IRD/MNHN), 75252, Paris Cedex 05, France
| | - Fabrice Not
- Sorbonne Université, CNRS, AD2M-UMR7144 Station Biologique de Roscoff, 29680, Roscoff, France
| | - Lucie Bittner
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
7
|
Linsky TW, Noble K, Tobin AR, Crow R, Carter L, Urbauer JL, Baker D, Strauch EM. Sampling of structure and sequence space of small protein folds. Nat Commun 2022; 13:7151. [PMID: 36418330 PMCID: PMC9684540 DOI: 10.1038/s41467-022-34937-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/10/2022] [Indexed: 11/24/2022] Open
Abstract
Nature only samples a small fraction of the sequence space that can fold into stable proteins. Furthermore, small structural variations in a single fold, sometimes only a few amino acids, can define a protein's molecular function. Hence, to design proteins with novel functionalities, such as molecular recognition, methods to control and sample shape diversity are necessary. To explore this space, we developed and experimentally validated a computational platform that can design a wide variety of small protein folds while sampling shape diversity. We designed and evaluated stability of about 30,000 de novo protein designs of eight different folds. Among these designs, about 6,200 stable proteins were identified, including some predicted to have a first-of-its-kind minimalized thioredoxin fold. Obtained data revealed protein folding rules for structural features such as helix-connecting loops. Beyond serving as a resource for protein engineering, this massive and diverse dataset also provides training data for machine learning. We developed an accurate classifier to predict the stability of our designed proteins. The methods and the wide range of protein shapes provide a basis for designing new protein functions without compromising stability.
Collapse
Affiliation(s)
- Thomas W Linsky
- Department of Biochemistry, University of Washington, Seattle, WA, 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
| | - Kyle Noble
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA, 30602, USA
| | - Autumn R Tobin
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA, 30602, USA
| | - Rachel Crow
- Department of Microbiology, University of Washington, Seattle, WA, 98195, USA
| | - Lauren Carter
- Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
| | - Jeffrey L Urbauer
- Department of Chemistry, University of Georgia, Athens, GA, 30602, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA, 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA
| | - Eva-Maria Strauch
- Department of Pharmaceutical and Biomedical Sciences, University of Georgia, Athens, GA, 30602, USA.
- Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA.
| |
Collapse
|
8
|
Aptekmann AA, Buongiorno J, Giovannelli D, Glamoclija M, Ferreiro DU, Bromberg Y. mebipred: identifying metal binding potential in protein sequence. Bioinformatics 2022; 38:3532-3540. [PMID: 35639953 PMCID: PMC9272798 DOI: 10.1093/bioinformatics/btac358] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/27/2022] [Accepted: 05/22/2022] [Indexed: 11/23/2022] Open
Abstract
Motivation metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability. Results we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements. Availability and implementation mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A A Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ, 08873, USA.,Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA
| | | | - D Giovannelli
- Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.,Department of Biology, University of Naples Federico II, Naples, Italy.,Institute for Marine Biological Resources and Biotechnology-IRBIM, National Research Council of Italy, CNR, Ancona, Italy
| | - M Glamoclija
- Department of Earth and Environmental Sciences, Rutgers University, New Brunswick, NJ, 07102, USA
| | - D U Ferreiro
- Protein Physiology Lab, Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, 1428, Argentina
| | - Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ, 08873, USA
| |
Collapse
|
9
|
Javier RA, Matías R, Alonso F, Renato C, Gloria L. A novel gene from the acidophilic bacterium Leptospirillum sp. CF-1 and its role in oxidative stress and chromate tolerance. Biol Res 2022; 55:19. [PMID: 35525996 PMCID: PMC9080137 DOI: 10.1186/s40659-022-00388-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 04/15/2022] [Indexed: 11/16/2022] Open
Abstract
Background Acidophilic microorganisms like Leptospirillum sp. CF-1 thrive in environments with extremely low pH and high concentrations of dissolved heavy metals that can induce the generation of reactive oxygen species (ROS). Several hypothetical genes and proteins from Leptospirillum sp. CF-1 are known to be up-regulated under oxidative stress conditions. Results In the present work, the function of hypothetical gene ABH19_09590 from Leptospirillum sp. CF-1 was studied. Heterologous expression of this gene in Escherichia coli led to an increase in the ability to grow under oxidant conditions with 5 mM K2CrO4 or 5 mM H2O2. Similarly, a significant reduction in ROS production in E. coli transformed with a plasmid carrying ABH19_09590 was observed after exposure to these oxidative stress elicitors for 30 min, compared to a strain complemented with the empty vector. A co-transcriptional study using RT-PCR showed that ABH19_09590 is contained in an operon, here named the “och” operon, that also contains ABH19_09585, ABH19_09595 and ABH19_09600 genes. The expression of the och operon was significantly up-regulated in Leptospirillum sp. CF-1 exposed to 5 mM K2CrO4 for 15 and 30 min. Genes of this operon potentially encode a NADH:ubiquinone oxidoreductase, a CXXC motif-containing protein likely involved in thiol/disulfide exchange, a hypothetical protein, and a di-hydroxy-acid dehydratase. A comparative genomic analysis revealed that the och operon is a characteristic genetic determinant of the Leptospirillum genus that is not present in other acidophiles. Conclusions Altogether, these results suggest that the och operon plays a protective role against chromate and hydrogen peroxide and is an important mechanism required to face polyextremophilic conditions in acid environments.
Collapse
Affiliation(s)
- Rivera-Araya Javier
- Biology Department, Faculty of Chemistry and Biology, University of Santiago of Chile (USACH), Santiago, Chile
| | - Riveros Matías
- Biology Department, Faculty of Chemistry and Biology, University of Santiago of Chile (USACH), Santiago, Chile
| | - Ferrer Alonso
- Núcleo de Química y Bioquímica, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Chávez Renato
- Biology Department, Faculty of Chemistry and Biology, University of Santiago of Chile (USACH), Santiago, Chile
| | - Levicán Gloria
- Biology Department, Faculty of Chemistry and Biology, University of Santiago of Chile (USACH), Santiago, Chile.
| |
Collapse
|
10
|
Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. eLife 2022; 11:e67667. [PMID: 35356891 PMCID: PMC9132574 DOI: 10.7554/elife.67667] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open
Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
Collapse
Affiliation(s)
- Chiara Vanni
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Jacobs University BremenBremenGermany
| | - Matthew S Schechter
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Department of Medicine, University of ChicagoChicagoUnited States
| | - Silvia G Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
| | - Albert Barberán
- Department of Environmental Science, University of ArizonaTucsonUnited States
| | - Pier Luigi Buttigieg
- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener InstituteBremerhavenGermany
| | - Emilio O Casamayor
- Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for ResearchBlanesSpain
| | - Tom O Delmont
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-SaclayEvryFrance
| | - Carlos M Duarte
- Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and TechnologyThuwalSaudi Arabia
| | - A Murat Eren
- Department of Medicine, University of ChicagoChicagoUnited States
- Josephine Bay Paul Center, Marine Biological LaboratoryWoods HoleUnited States
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
| | - Renzo Kottmann
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
| | - Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
| | - Pablo Sánchez
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
| | - Kimmo Siren
- Section for Evolutionary Genomics, The GLOBE Institute, University of CopenhagenCopenhagenDenmark
| | - Martin Steinegger
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
| | - Frank Oliver Gloeckner
- Jacobs University BremenBremenGermany
- University of Bremen and Life Sciences and ChemistryBremenGermany
- Computing Center, Helmholtz Center for Polar and Marine ResearchBremerhavenGermany
| | - Antonio Fernàndez-Guerra
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
11
|
Kader MA, Ahammed A, Khan MS, Ashik SAA, Islam MS, Hossain MU. Hypothetical protein predicted to be tumor suppressor: a protein functional analysis. Genomics Inform 2022; 20:e6. [PMID: 35399005 PMCID: PMC9002001 DOI: 10.5808/gi.21073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 01/08/2022] [Indexed: 12/22/2022] Open
Abstract
Litorilituus sediminis is a Gram-negative, aerobic, novel bacterium under the family of Colwelliaceae, has a stunning hypothetical protein containing domain called von Hippel-Lindau that has significant tumor suppressor activity. Therefore, this study was designed to elucidate the structure and function of the biologically important hypothetical protein EMK97_00595 (QBG34344.1) using several bioinformatics tools. The functional annotation exposed that the hypothetical protein is an extracellular secretory soluble signal peptide and contains the von Hippel-Lindau (VHL; VHL beta) domain that has a significant role in tumor suppression. This domain is conserved throughout evolution, as its homologs are available in various types of the organism like mammals, insects, and nematode. The gene product of VHL has a critical regulatory activity in the ubiquitous oxygen-sensing pathway. This domain has a significant role in inhibiting cell proliferation, angiogenesis progression, kidney cancer, breast cancer, and colon cancer. At last, the current study depicts that the annotated hypothetical protein is linked with tumor suppressor activity which might be of great interest to future research in the higher organism.
Collapse
Affiliation(s)
- Md Abdul Kader
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh
| | - Akash Ahammed
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh
| | - Md Sharif Khan
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh
| | - Sheikh Abdullah Al Ashik
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh
| | | | | |
Collapse
|
12
|
Thuy-Boun PS, Wang AY, Crissien-Martinez A, Xu JH, Chatterjee S, Stupp GS, Su AI, Coyle WJ, Wolan DW. Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiome identifies host and microbial serine-type endopeptidase activity associated with ulcerative colitis. Mol Cell Proteomics 2022; 21:100197. [PMID: 35033677 PMCID: PMC8941213 DOI: 10.1016/j.mcpro.2022.100197] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open
Abstract
The gut microbiota plays an important yet incompletely understood role in the induction and propagation of ulcerative colitis (UC). Organism-level efforts to identify UC-associated microbes have revealed the importance of community structure, but less is known about the molecular effectors of disease. We performed 16S rRNA gene sequencing in parallel with label-free data-dependent LC-MS/MS proteomics to characterize the stool microbiomes of healthy (n = 8) and UC (n = 10) patients. Comparisons of taxonomic composition between techniques revealed major differences in community structure partially attributable to the additional detection of host, fungal, viral, and food peptides by metaproteomics. Differential expression analysis of metaproteomic data identified 176 significantly enriched protein groups between healthy and UC patients. Gene ontology analysis revealed several enriched functions with serine-type endopeptidase activity overrepresented in UC patients. Using a biotinylated fluorophosphonate probe and streptavidin-based enrichment, we show that serine endopeptidases are active in patient fecal samples and that additional putative serine hydrolases are detectable by this approach compared with unenriched profiling. Finally, as metaproteomic databases expand, they are expected to asymptotically approach completeness. Using ComPIL and de novo peptide sequencing, we estimate the size of the probable peptide space unidentified (“dark peptidome”) by our large database approach to establish a rough benchmark for database sufficiency. Despite high variability inherent in patient samples, our analysis yielded a catalog of differentially enriched proteins between healthy and UC fecal proteomes. This catalog provides a clinically relevant jumping-off point for further molecular-level studies aimed at identifying the microbial underpinnings of UC. Identified 176 significantly altered protein groups between healthy and UC patients. Serine-type endopeptidase activity is overrepresented in UC patients. Fluorophosphonate ABPP shows that endopeptidases are active in fecal samples. ABPP enrichment helps identify additional putative serine hydrolases in samples. De novo sequencing used to estimate number of MS2 spectra unidentified by ComPIL.
Collapse
Affiliation(s)
- Peter S Thuy-Boun
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Ana Y Wang
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | | | - Janice H Xu
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Sandip Chatterjee
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Gregory S Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037
| | - Walter J Coyle
- Scripps Clinic Gastroenterology Division, La Jolla, CA 92037
| | - Dennis W Wolan
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037.
| |
Collapse
|
13
|
Mazumder L, Hasan M, Rus'd AA, Islam MA. In-silico characterization and structure-based functional annotation of a hypothetical protein from Campylobacter jejuni involved in propionate catabolism. Genomics Inform 2022; 19:e43. [PMID: 35012287 PMCID: PMC8752978 DOI: 10.5808/gi.21043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 12/09/2021] [Indexed: 11/20/2022] Open
Abstract
Campylobacter jejuni is one of the most prevalent organisms associated with foodborne illness across the globe causing campylobacteriosis and gastritis. Many proteins of C. jejuni are still unidentified. The purpose of this study was to determine the structure and function of a non-annotated hypothetical protein (HP) from C. jejuni. A number of properties like physiochemical characteristics, 3D structure, and functional annotation of the HP (accession No. CAG2129885.1) were predicted using various bioinformatics tools followed by further validation and quality assessment. Moreover, the protein-protein interactions and active site were obtained from the STRING and CASTp server, respectively. The hypothesized protein possesses various characteristics including an acidic pH, thermal stability, water solubility, and cytoplasmic distribution. While alpha-helix and random coil structures are the most prominent structural components of this protein, most of it is formed of helices and coils. Along with expected quality, the 3D model has been found to be novel. This study has identified the potential role of the HP in 2-methylcitric acid cycle and propionate catabolism. Furthermore, protein-protein interactions revealed several significant functional partners. The in-silico characterization of this protein will assist to understand its molecular mechanism of action better. The methodology of this study would also serve as the basis for additional research into proteomic and genomic data for functional potential identification.
Collapse
Affiliation(s)
- Lincon Mazumder
- Department of Microbiology, Jagannath University, Dhaka 1100, Bangladesh
| | | | - Ahmed Abu Rus'd
- Department of Microbiology, Jagannath University, Dhaka 1100, Bangladesh
| | | |
Collapse
|
14
|
Robinson SL, Piel J, Sunagawa S. A roadmap for metagenomic enzyme discovery. Nat Prod Rep 2021; 38:1994-2023. [PMID: 34821235 PMCID: PMC8597712 DOI: 10.1039/d1np00006c] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.
Collapse
Affiliation(s)
| | - Jörn Piel
- Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland.
| | | |
Collapse
|
15
|
Leng ZX, Liu Y, Chen ZY, Guo J, Chen J, Zhou YB, Chen M, Ma YZ, Xu ZS, Cui XY. Genome-Wide Analysis of the DUF4228 Family in Soybean and Functional Identification of GmDUF4228 -70 in Response to Drought and Salt Stresses. FRONTIERS IN PLANT SCIENCE 2021; 12:628299. [PMID: 34079564 PMCID: PMC8166234 DOI: 10.3389/fpls.2021.628299] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 03/26/2021] [Indexed: 05/24/2023]
Abstract
Domain of unknown function 4228 (DUF4228) proteins are a class of proteins widely found in plants, playing an important role in response to abiotic stresses. However, studies on the DUF4228 family in soybean (Glycine max L.) are sparse. In this study, we identified a total of 81 DUF4228 genes in soybean genome, named systematically based on their chromosome distributions. Results showed that these genes were unevenly distributed on the 20 chromosomes of soybean. The predicted soybean DUF4228 proteins were identified in three groups (Groups I-III) based on a maximum likelihood phylogenetic tree. Genetic structure analysis showed that most of the GmDUF4228 genes contained no introns. Expression profiling showed that GmDUF4228 genes were widely expressed in different organs and tissues in soybean. RNA-seq data were used to characterize the expression profiles of GmDUF4228 genes under the treatments of drought and salt stresses, with nine genes showing significant up-regulation under both drought and salt stress further functionally verified by promoter (cis-acting elements) analysis and quantitative real-time PCR (qRT-PCR). Due to its upregulation under drought and salt stresses based on both RNA-seq and qRT-PCR analyses, GmDUF4228-70 was selected for further functional analysis in transgenic plants. Under drought stress, the degree of leaf curling and wilting of the GmDUF4228-70-overexpressing (GmDUF4228-70-OE) line was lower than that of the empty vector (EV) line. GmDUF4228-70-OE lines also showed increased proline content, relative water content (RWC), and chlorophyll content, and decreased contents of malondialdehyde (MDA), H2O2, and O2-. Under salt stress, the changes in phenotypic and physiological indicators of transgenic plants were the same as those under drought stress. In addition, overexpression of the GmDUF4228-70 gene promoted the expression of marker genes under both drought and salt stresses. Taken together, the results indicated that GmDUF4228 genes play important roles in response to abiotic stresses in soybean.
Collapse
Affiliation(s)
- Zhi-Xin Leng
- College of Life Sciences/College of Agronomy, Jilin Agricultural University, Changchun, China
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Ying Liu
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Zhan-Yu Chen
- College of Life Sciences/College of Agronomy, Jilin Agricultural University, Changchun, China
| | - Jun Guo
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Jun Chen
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Yong-Bin Zhou
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Ming Chen
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - You-Zhi Ma
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Zhao-Shi Xu
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
| | - Xi-Yan Cui
- College of Life Sciences/College of Agronomy, Jilin Agricultural University, Changchun, China
| |
Collapse
|
16
|
Life-history strategies of soil microbial communities in an arid ecosystem. THE ISME JOURNAL 2021; 15:649-657. [PMID: 33051582 PMCID: PMC8027408 DOI: 10.1038/s41396-020-00803-y] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/28/2020] [Accepted: 10/01/2020] [Indexed: 01/30/2023]
Abstract
The overwhelming taxonomic diversity and metabolic complexity of microorganisms can be simplified by a life-history classification; copiotrophs grow faster and rely on resource availability, whereas oligotrophs efficiently exploit resource at the expense of growth rate. Here, we hypothesize that community-level traits inferred from metagenomic data can distinguish copiotrophic and oligotrophic microbial communities. Moreover, we hypothesize that oligotrophic microbial communities harbor more unannotated genes. To test these hypotheses, we conducted metagenomic analyses of soil samples collected from copiotrophic vegetated areas and from oligotrophic bare ground devoid of vegetation in an arid-hyperarid region of the Sonoran Desert, Arizona, USA. Results supported our hypotheses, as we found that multiple ecologically informed life-history traits including average 16S ribosomal RNA gene copy number, codon usage bias in ribosomal genes and predicted maximum growth rate were higher for microbial communities in vegetated than bare soils, and that oligotrophic microbial communities in bare soils harbored a higher proportion of genes that are unavailable in public reference databases. Collectively, our work demonstrates that life-history traits can distill complex microbial communities into ecologically coherent units and highlights that oligotrophic microbial communities serve as a rich source of novel functions.
Collapse
|
17
|
TIM29 is required for enhanced stem cell activity during regeneration in the flatworm Macrostomum lignano. Sci Rep 2021; 11:1166. [PMID: 33441924 PMCID: PMC7806878 DOI: 10.1038/s41598-020-80682-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 12/10/2020] [Indexed: 01/29/2023] Open
Abstract
TIM29 is a mitochondrial inner membrane protein that interacts with the protein import complex TIM22. TIM29 was shown to stabilize the TIM22 complex but its biological function remains largely unknown. Until recently, it was classified as one of the Domain of Unknown Function (DUF) genes, with a conserved protein domain DUF2366 of unclear function. Since characterizing DUF genes can provide novel biological insight, we used previously established transcriptional profiles of the germline and stem cells of the flatworm Macrostomum lignano to probe conserved DUFs for their potential role in germline biology, stem cell function, regeneration, and development. Here, we demonstrate that DUF2366/TIM29 knockdown in M. lignano has very limited effect during the normal homeostatic condition but prevents worms from adapting to a highly proliferative state required for regeneration.
Collapse
|
18
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
19
|
Rahman A, Susmi TF, Yasmin F, Karim ME, Hossain MU. Functional annotation of an ecologically important protein from Chloroflexus aurantiacus involved in polyhydroxyalkanoates (PHA) biosynthetic pathway. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03598-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
20
|
Didelon M, Khafif M, Godiard L, Barbacci A, Raffaele S. Patterns of Sequence and Expression Diversification Associate Members of the PADRE Gene Family With Response to Fungal Pathogens. Front Genet 2020; 11:491. [PMID: 32547597 PMCID: PMC7272662 DOI: 10.3389/fgene.2020.00491] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 04/20/2020] [Indexed: 12/31/2022] Open
Abstract
Pathogen infection triggers extensive reprogramming of the plant transcriptome, including numerous genes the function of which is unknown. Due to their wide taxonomic distribution, genes encoding proteins with Domains of Unknown Function (DUFs) activated upon pathogen challenge likely play important roles in disease. In Arabidopsis thaliana, we identified thirteen genes harboring a DUF4228 domain in the top 10% most induced genes after infection by the fungal pathogen Sclerotinia sclerotiorum. Based on functional information collected through homology and contextual searches, we propose to refer to this domain as the pathogen and abiotic stress response, cadmium tolerance, disordered region-containing (PADRE) domain. Genome-wide and phylogenetic analyses indicated that PADRE is specific to plants and diversified into 10 subfamilies early in the evolution of Angiosperms. PADRE typically occurs in small single-domain proteins with a bipartite architecture. PADRE N-terminus harbors conserved sequence motifs, while its C-terminus includes an intrinsically disordered region with multiple phosphorylation sites. A pangenomic survey of PADRE genes expression upon S. sclerotiorum inoculation in Arabidopsis, castor bean, and tomato indicated consistent expression across species within phylogenetic groups. Multi-stress expression profiling and co-expression network analyses associated AtPADRE genes with the induction of anthocyanin biosynthesis and responses to chitin and to hypoxia. Our analyses reveal patterns of sequence and expression diversification consistent with the evolution of a role in disease resistance for an uncharacterized family of plant genes. These findings highlight PADRE genes as prime candidates for the functional dissection of mechanisms underlying plant disease resistance to fungi.
Collapse
Affiliation(s)
| | | | | | | | - Sylvain Raffaele
- Université de Toulouse, Laboratoire des Interactions Plantes Micro-organismes (LIPM), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE) – Centre National de la Recherche Scientifique (CNRS), Castanet-Tolosan, France
| |
Collapse
|
21
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
22
|
Adam S, Klein A, Surup F, Koehnke J. The structure of CgnJ, a domain of unknown function protein from the crocagin gene cluster. Acta Crystallogr F Struct Biol Commun 2019; 75:205-211. [PMID: 30839296 PMCID: PMC6404859 DOI: 10.1107/s2053230x19000712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 01/16/2019] [Indexed: 12/28/2022] Open
Abstract
Natural products often contain interesting new chemical entities that are introduced into the structure of a compound by the enzymatic machinery of the producing organism. The recently described crocagins are novel polycyclic peptides which belong to the class of ribosomally synthesized and post-translationally modified peptide natural products. They have been shown to bind to the conserved prokaryotic carbon-storage regulator A in vitro. In efforts to understand crocagin biosynthesis, the putative biosynthetic genes were expressed and purified. Here, the first crystal structure of a protein from the crocagin-biosynthetic gene cluster, CgnJ, a domain of unknown function protein, is reported. Possible functions of this protein were explored by structural and sequence homology analyses. Even though the sequence homology to proteins in the Protein Data Bank is low, the protein shows significant structural homology to a protein with known function within the competency system of Bacillus subtilis, ComJ, leading to the hypothesis of a similar role of the protein within the producing organism.
Collapse
Affiliation(s)
- Sebastian Adam
- Structural Biology of Biosynthetic Enzymes, Helmholtz Institute for Pharmaceutical Research Saarland, Universität des Saarlandes Gebäude E8.1, 66123 Saarbrücken, Germany
| | - Andreas Klein
- Structural Biology of Biosynthetic Enzymes, Helmholtz Institute for Pharmaceutical Research Saarland, Universität des Saarlandes Gebäude E8.1, 66123 Saarbrücken, Germany
| | - Frank Surup
- Microbial Drugs, Helmholtz Centre for Infection Research, Inhoffenstrasse 7, 38124 Braunschweig, Germany
| | - Jesko Koehnke
- Structural Biology of Biosynthetic Enzymes, Helmholtz Institute for Pharmaceutical Research Saarland, Universität des Saarlandes Gebäude E8.1, 66123 Saarbrücken, Germany
| |
Collapse
|
23
|
Tahir RA, Wu H, Javed N, Khalique A, Khan SAF, Mir A, Ahmed MS, Barreto GE, Qing H, Ashraf GM, Sehgal SA. Pharmacoinformatics and molecular docking reveal potential drug candidates against Schizophrenia to target TAAR6. J Cell Physiol 2018; 234:13263-13276. [PMID: 30569503 DOI: 10.1002/jcp.27999] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 11/30/2018] [Indexed: 11/10/2022]
Abstract
Schizophrenia (SZ) is a complex disabling disorder that leads to the mental disability and afflicts 1% of the world's total population and placed in top ten medical disorders. In current work, bioinformatics analyses were carried out on Trace amine (TA)-associated receptor 6 (TAAR6) to recognize the potential drugs and compounds against SZ. Comparative modeling and threading-based approaches were utilized for the structure prediction of TAAR6. Fifty-nine predicted structures were evaluated by various model assessment techniques and final model having only eight amino acids in the outlier region and 98.5% overall quality factor was chosen for further pharmacoinformatics and molecular docking analyses. From an extensive literature review, 11 Food and Drug Administration (FDA) approved drugs were analyzed by computational techniques and Aripiprazole was found as the most effective drug against SZ by targeting TAAR6. Here, we report five novel molecules which exhibited the highest binding affinity, effective drug properties, and interestingly, observed better results than the approved selected drugs against SZ by targeting TAAR6. The docking analyses revealed that Arg-92, Trp-98, Gln-191, Thr-192, Ala-290, Cys-291, Tyr-293, and Glu-294 residues were observed as critical interacting residues in receptor-ligand interactions. Absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties, Lipinski rule of five, highest binding affinity coupled with virtual screening (VS), and pharmacophore modeling approach illustrated that aripiprazole (-8.6 kcal/mol) and TAAR6_0094 (-9.3 kcal/mol) are potential inhibitors for targeting TAAR6. It is suggested that schizophrenic patients have to use Aripiprazole for the medication of SZ by targeting TAAR6 and develop effective therapies by utilizing scrutinized novel compound.
Collapse
Affiliation(s)
- Rana Adnan Tahir
- Key Laboratory of Molecular Medicine and Biotherapy in the Ministry of Industry and Information Technology, Department of Biology, School of Life Sciences, Beijing Institute of Technology, Beijing, China.,Department of Biosciences, COMSATS University Islamabad, Sahiwal Campus, Islamabad, Pakistan
| | - Hao Wu
- State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Naima Javed
- Department of Biosciences, COMSATS University Islamabad, Sahiwal Campus, Islamabad, Pakistan
| | - Anila Khalique
- State Key Laboratory of Medicinal Chemical Biology, Key Laboratory of Bioactive Materials of Ministry of Education, College of Life Sciences, Nankai University, Tianjin, China
| | | | - Asif Mir
- Department of Bioinformatics and Biotechnology, International Islamic University Islamabad, Islamabad, Pakistan
| | - Muhammad Saad Ahmed
- Department of Biological Engineering/Institute of Biotransformation and Synthetic Biosystem, School of Life Sciences, Beijing Institute of Technology, Beijing, China
| | - George E Barreto
- Departamento de Nutrición y Bioquímica, Facultad de Ciencias, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Hong Qing
- Key Laboratory of Molecular Medicine and Biotherapy in the Ministry of Industry and Information Technology, Department of Biology, School of Life Sciences, Beijing Institute of Technology, Beijing, China
| | - Ghulam Md Ashraf
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sheikh Arslan Sehgal
- State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Department of Biosciences, COMSATS University Islamabad, Sahiwal Campus, Islamabad, Pakistan.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
24
|
Engqvist MKM. Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC Microbiol 2018; 18:177. [PMID: 30400856 PMCID: PMC6219164 DOI: 10.1186/s12866-018-1320-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
Background The ambient temperature of all habitats is a key physical property that shapes the biology of microbes inhabiting them. The optimal growth temperature (OGT) of a microbe, is therefore a key piece of data needed to understand evolutionary adaptations manifested in their genome sequence. Unfortunately there is no growth temperature database or easily downloadable dataset encompassing the majority of cultured microorganisms. We are thus limited in interpreting genomic data to identify temperature adaptations in microbes. Results In this work I significantly contribute to closing this gap by mining data from major culture collection centres to obtain growth temperature data for a nonredundant set of 21,498 microbes. The dataset (10.5281/zenodo.1175608) contains mainly bacteria and archaea and spans psychrophiles, mesophiles, thermophiles and hyperthermophiles. Using this data a full 43% of all protein entries in the UniProt database can be annotated with the growth temperature of the species from which they originate. I validate the dataset by showing a Pearson correlation of up to 0.89 between growth temperature and mean enzyme optima, a physiological property directly influenced by the growth temperature. Using the temperature dataset I correlate the genomic occurance of enzyme functional annotations with growth temperature. I identify 319 enzyme functions that either increase or decrease in occurrence with temperature. Eight metabolic pathways were statistically enriched for these enzyme functions. Furthermore, I establish a correlation between 33 domains of unknown function (DUFs) with growth temperature in microbes, four of which (DUF438, DUF1524, DUF1957 and DUF3458_C) were significant in both archaea and bacteria. Conclusions The growth temperature dataset enables large-scale correlation analysis with enzyme function- and domain-level annotations. Growth-temperature dependent changes in their occurrence highlight potential evolutionary adaptations. A few of the identified changes are previously known, such as the preference for menaquinone biosynthesis through the futalosine pathway in bacteria growing at high temperatures. Others represent important starting points for future studies, such as DUFs where their occurrence change with temperature. The growth temperature dataset should become a valuable community resource and will find additional, important, uses in correlating genomic, transcriptomic, proteomic, metabolomic, phenotypic or taxonomic properties with temperature in future studies. Electronic supplementary material The online version of this article (10.1186/s12866-018-1320-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin K M Engqvist
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden.
| |
Collapse
|
25
|
Chu ZJ, Sun HH, Zhu XG, Ying SH, Feng MG. Discovery of a new intravacuolar protein required for the autophagy, development and virulence of Beauveria bassiana. Environ Microbiol 2017; 19:2806-2818. [DOI: 10.1111/1462-2920.13803] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 05/20/2017] [Accepted: 05/20/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Zhen-Jian Chu
- Institute of Microbiology, College of Life Sciences, Zhejiang University; Hangzhou Zhejiang People's Republic of China
| | - Huan-Huan Sun
- Institute of Microbiology, College of Life Sciences, Zhejiang University; Hangzhou Zhejiang People's Republic of China
| | - Xiao-Guan Zhu
- Institute of Microbiology, College of Life Sciences, Zhejiang University; Hangzhou Zhejiang People's Republic of China
| | - Sheng-Hua Ying
- Institute of Microbiology, College of Life Sciences, Zhejiang University; Hangzhou Zhejiang People's Republic of China
| | - Ming-Guang Feng
- Institute of Microbiology, College of Life Sciences, Zhejiang University; Hangzhou Zhejiang People's Republic of China
| |
Collapse
|
26
|
ProBiS tools (algorithm, database, and web servers) for predicting and modeling of biologically interesting proteins. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017; 128:24-32. [PMID: 28212856 DOI: 10.1016/j.pbiomolbio.2017.02.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 12/14/2016] [Accepted: 02/07/2017] [Indexed: 01/30/2023]
Abstract
ProBiS (Protein Binding Sites) Tools consist of algorithm, database, and web servers for prediction of binding sites and protein ligands based on the detection of structurally similar binding sites in the Protein Data Bank. In this article, we review the operations that ProBiS Tools perform, provide comments on the evolution of the tools, and give some implementation details. We review some of its applications to biologically interesting proteins. ProBiS Tools are freely available at http://probis.cmm.ki.si and http://probis.nih.gov.
Collapse
|
27
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
28
|
Fuhrer T, Zampieri M, Sévin DC, Sauer U, Zamboni N. Genomewide landscape of gene-metabolome associations in Escherichia coli. Mol Syst Biol 2017; 13:907. [PMID: 28093455 PMCID: PMC5293155 DOI: 10.15252/msb.20167150] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Metabolism is one of the best-understood cellular processes whose network topology of enzymatic reactions is determined by an organism's genome. The influence of genes on metabolite levels, however, remains largely unknown, particularly for the many genes encoding non-enzymatic proteins. Serendipitously, genomewide association studies explore the relationship between genetic variants and metabolite levels, but a comprehensive interaction network has remained elusive even for the simplest single-celled organisms. Here, we systematically mapped the association between > 3,800 single-gene deletions in the bacterium Escherichia coli and relative concentrations of > 7,000 intracellular metabolite ions. Beyond expected metabolic changes in the proximity to abolished enzyme activities, the association map reveals a largely unknown landscape of gene-metabolite interactions that are not represented in metabolic models. Therefore, the map provides a unique resource for assessing the genetic basis of metabolic changes and conversely hypothesizing metabolic consequences of genetic alterations. We illustrate this by predicting metabolism-related functions of 72 so far not annotated genes and by identifying key genes mediating the cellular response to environmental perturbations.
Collapse
Affiliation(s)
- Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Mattia Zampieri
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Daniel C Sévin
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Nicola Zamboni
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| |
Collapse
|
29
|
Sévin DC, Fuhrer T, Zamboni N, Sauer U. Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli. Nat Methods 2016; 14:187-194. [DOI: 10.1038/nmeth.4103] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 10/19/2016] [Indexed: 12/14/2022]
|
30
|
Fidler DR, Murphy SE, Courtis K, Antonoudiou P, El-Tohamy R, Ient J, Levine TP. Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains. Traffic 2016; 17:1214-1226. [PMID: 27601190 PMCID: PMC5091641 DOI: 10.1111/tra.12432] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 08/30/2016] [Accepted: 08/30/2016] [Indexed: 01/08/2023]
Abstract
Advances in membrane cell biology are hampered by the relatively high proportion of proteins with no known function. Such proteins are largely or entirely devoid of structurally significant domain annotations. Structural bioinformaticians have developed profile‐profile tools such as HHsearch (online version called HHpred), which can detect remote homologies that are missed by tools used to annotate databases. Here we have applied HHsearch to study a single structural fold in a single model organism as proof of principle. In the entire clan of protein domains sharing the pleckstrin homology domain fold in yeast, systematic application of HHsearch accurately identified known PH‐like domains. It also predicted 16 new domains in 13 yeast proteins many of which are implicated in intracellular traffic. One of these was Vps13p, where we confirmed the functional importance of the predicted PH‐like domain. Even though such predictions require considerable work to be corroborated, they are useful first steps. HHsearch should be applied more widely, particularly across entire proteomes of model organisms, to significantly improve database annotations.
Collapse
Affiliation(s)
- David R Fidler
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Sarah E Murphy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Katherine Courtis
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | | | - Rana El-Tohamy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Jonathan Ient
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Timothy P Levine
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK.
| |
Collapse
|
31
|
Serrano P, Dutta SK, Proudfoot A, Mohanty B, Susac L, Martin B, Geralt M, Jaroszewski L, Godzik A, Elsliger M, Wilson IA, Wüthrich K. NMR in structural genomics to increase structural coverage of the protein universe: Delivered by Prof. Kurt Wüthrich on 7 July 2013 at the 38th FEBS Congress in St. Petersburg, Russia. FEBS J 2016; 283:3870-3881. [PMID: 27154589 DOI: 10.1111/febs.13751] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 04/12/2016] [Accepted: 05/04/2016] [Indexed: 12/12/2022]
Abstract
For more than a decade, the Joint Center for Structural Genomics (JCSG; www.jcsg.org) worked toward increased three-dimensional structure coverage of the protein universe. This coordinated quest was one of the main goals of the four high-throughput (HT) structure determination centers of the Protein Structure Initiative (PSI; www.nigms.nih.gov/Research/specificareas/PSI). To achieve the goals of the PSI, the JCSG made use of the complementarity of structure determination by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to increase and diversify the range of targets entering the HT structure determination pipeline. The overall strategy, for both techniques, was to determine atomic resolution structures for representatives of large protein families, as defined by the Pfam database, which had no structural coverage and could make significant contributions to biological and biomedical research. Furthermore, the experimental structures could be leveraged by homology modeling to further expand the structural coverage of the protein universe and increase biological insights. Here, we describe what could be achieved by this structural genomics approach, using as an illustration the contributions from 20 NMR structure determinations out of a total of 98 JCSG NMR structures, which were selected because they are the first three-dimensional structure representations of the respective Pfam protein families. The information from this small sample is representative for the overall results from crystal and NMR structure determination in the JCSG. There are five new folds, which were classified as domains of unknown functions (DUF), three of the proteins could be functionally annotated based on three-dimensional structure similarity with previously characterized proteins, and 12 proteins showed only limited similarity with previous deposits in the Protein Data Bank (PDB) and were classified as DUFs.
Collapse
Affiliation(s)
- Pedro Serrano
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Samit K Dutta
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew Proudfoot
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Biswaranjan Mohanty
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukas Susac
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Bryan Martin
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Michael Geralt
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Marc Elsliger
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian A Wilson
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kurt Wüthrich
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
32
|
Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016; 38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]
|
33
|
Tong SM, Chen Y, Ying SH, Feng MG. Three DUF1996 Proteins Localize in Vacuoles and Function in Fungal Responses to Multiple Stresses and Metal Ions. Sci Rep 2016; 6:20566. [PMID: 26839279 PMCID: PMC4738358 DOI: 10.1038/srep20566] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 01/07/2016] [Indexed: 12/22/2022] Open
Abstract
Many annotated fungal genomes harbour high proportions of hypothetical proteins with or without domains of unknown function (DUF). Here, three novel proteins (342−497 amino acids), each containing only a single large DUF1996 (231−250 residues) region with highly conserved head (DPIXXP) and tail (HXDXXXGW) signatures, were expressed as eGFP-tagged fusion proteins and shown to specifically localize in the vacuoles of Beauveria bassiana, a filamentous fungal entomopathogen; therefore, these proteins were named vacuole-localized proteins (VLPs). The VLPs have one to three homologues in other entomopathogenic or non-entomopathogenic filamentous fungi but no homologues in yeasts. The large DUF1996 regions can be formulated as D-X4-P-X5–6-H-X-H-X3-G-X25–26-D-X-S-X-YW-X-P-X123–203-CP-X39–48-H-X-D-X3-GW; the identical residues likely involve in a proton antiport system for intracellular homeostasis. Single deletions of three VLP-coding genes (vlp1–3) increased fungal sensitivities to cell wall perturbation, high osmolarity, oxidation, and several metal ions. Conidial thermotolerance decreased by ~11% in two Δvlp mutants, and UV-B resistance decreased by 41−57% in three Δvlp mutants. All the changes were restored by targeted gene complementation. However, the deletions did not influence fungal growth, conidiation, virulence or Cu2+ sensitivity. Our findings unveiled a role for the DUF1996 regions of three B. bassiana VLPs in the regulation of multiple stress responses and environmental adaptation.
Collapse
Affiliation(s)
- Sen-Miao Tong
- Institute of Microbiology, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Ying Chen
- Institute of Microbiology, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Sheng-Hua Ying
- Institute of Microbiology, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| | - Ming-Guang Feng
- Institute of Microbiology, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China
| |
Collapse
|
34
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
35
|
Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015; 11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling. Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.
Collapse
Affiliation(s)
- Brinda Vallat
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Carlos Madrid-Aliste
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| |
Collapse
|
36
|
Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods. Biol Direct 2015; 10:38. [PMID: 26228684 PMCID: PMC4520260 DOI: 10.1186/s13062-015-0069-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/20/2015] [Indexed: 12/23/2022] Open
Abstract
Background In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function. Results We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659. Conclusions This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners. Reviewers This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, 560 012, India.
| | - Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560 012, India.
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560 012, India.
| | | |
Collapse
|
37
|
Lobb B, Kurtz DA, Moreno-Hagelsieb G, Doxey AC. Remote homology and the functions of metagenomic dark matter. Front Genet 2015; 6:234. [PMID: 26257768 PMCID: PMC4508852 DOI: 10.3389/fgene.2015.00234] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 06/22/2015] [Indexed: 01/26/2023] Open
Abstract
Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.
Collapse
Affiliation(s)
- Briallen Lobb
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| | - Daniel A Kurtz
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| | | | - Andrew C Doxey
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| |
Collapse
|
38
|
Andersson DI, Jerlström-Hultqvist J, Näsvall J. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol 2015; 7:7/6/a017996. [PMID: 26032716 DOI: 10.1101/cshperspect.a017996] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
How the enormous structural and functional diversity of new genes and proteins was generated (estimated to be 10(10)-10(12) different proteins in all organisms on earth [Choi I-G, Kim S-H. 2006. Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci 103: 14056-14061] is a central biological question that has a long and rich history. Extensive work during the last 80 years have shown that new genes that play important roles in lineage-specific phenotypes and adaptation can originate through a multitude of different mechanisms, including duplication, lateral gene transfer, gene fusion/fission, and de novo origination. In this review, we focus on two main processes as generators of new functions: evolution of new genes by duplication and divergence of pre-existing genes and de novo gene origination in which a whole protein-coding gene evolves from a noncoding sequence.
Collapse
Affiliation(s)
- Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Jon Jerlström-Hultqvist
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Joakim Näsvall
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| |
Collapse
|
39
|
Lu Y, Lu Y, Deng J, Peng H, Lu H, Lu LJ. A novel essential domain perspective for exploring gene essentiality. Bioinformatics 2015; 31:2921-9. [PMID: 26002906 DOI: 10.1093/bioinformatics/btv312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Accepted: 05/13/2015] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Genes with indispensable functions are identified as essential; however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain. RESULTS To identify such essential domains, we have developed an Expectation-Maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10-23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred from ribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains. CONTACT huilu.bioinfo@gmail.com or Long.Lu@cchmc.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yao Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, People's Republic of China
| | - Yulan Lu
- State Key Laboratory of Genetic Engineering Institute of Biostatistics, School of Life Science, Fudan University, Shanghai 200433, People's Republic of China
| | - Jingyuan Deng
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Hai Peng
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei, People's Republic of China
| | - Hui Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, People's Republic of China, Department of Bioengineering (MC 063), University of Illinois at Chicago, Chicago, IL 60607-7052, USA and Collaborative Innovation Center for Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Long Jason Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA, Institute for Systems Biology, Jianghan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
40
|
Chiang Z, Vastermark A, Punta M, Coggill PC, Mistry J, Finn RD, Saier MH. The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam. Brief Bioinform 2015; 16:865-72. [PMID: 25614388 PMCID: PMC4570203 DOI: 10.1093/bib/bbu053] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Indexed: 01/04/2023] Open
Abstract
Transport systems comprise roughly 10% of all proteins in a cell, playing critical roles in many processes. Improving and expanding their classification is an important goal that can affect studies ranging from comparative genomics to potential drug target searches. It is not surprising that different classification systems for transport proteins have arisen, be it within a specialized database, focused on this functional class of proteins, or as part of a broader classification system for all proteins. Two such databases are the Transporter Classification Database (TCDB) and the Protein family (Pfam) database. As part of a long-term endeavor to improve consistency between the two classification systems, we have compared transporter annotations in the two databases to understand the rationale for differences and to improve both systems. Differences sometimes reflect the fact that one database has a particular transporter family while the other does not. Differing family definitions and hierarchical organizations were reconciled, resulting in recognition of 69 Pfam ‘Domains of Unknown Function’, which proved to be transport protein families to be renamed using TCDB annotations. Of over 400 potential new Pfam families identified from TCDB, 10% have already been added to Pfam, and TCDB has created 60 new entries based on Pfam data. This work, for the first time, reveals the benefits of comprehensive database comparisons and explains the differences between Pfam and TCDB.
Collapse
|
41
|
Martín-Galiano AJ, Yuste J, Cercenado MI, de la Campa AG. Inspecting the potential physiological and biomedical value of 44 conserved uncharacterised proteins of Streptococcus pneumoniae. BMC Genomics 2014; 15:652. [PMID: 25096389 PMCID: PMC4143570 DOI: 10.1186/1471-2164-15-652] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 07/21/2014] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The major Gram-positive coccoid pathogens cause similar invasive diseases and show high rates of antimicrobial resistance. Uncharacterised proteins shared by these organisms may be involved in virulence or be targets for antimicrobial therapy. RESULTS Forty four uncharacterised proteins from Streptococcus pneumoniae with homologues in Enterococcus faecalis and/or Staphylococcus aureus were selected for analysis. These proteins showed differences in terms of sequence conservation and number of interacting partners. Twenty eight of these proteins were monodomain proteins and 16 were modular, involving domain combinations and, in many cases, predicted unstructured regions. The genes coding for four of these 44 proteins were essential. Genomic and structural studies showed one of the four essential genes to code for a promising antibacterial target. The strongest impact of gene removal was on monodomain proteins showing high sequence conservation and/or interactions with many other proteins. Eleven out of 40 knockouts (one for each gene) showed growth delay and 10 knockouts presented a chaining phenotype. Five of these chaining mutants showed a lack of putative DNA-binding proteins. This suggest this phenotype results from a loss of overall transcription regulation. Five knockouts showed defective autolysis in response to penicillin and vancomycin, and attenuated virulence in an animal model of sepsis. CONCLUSIONS Uncharacterised proteins make up a reservoir of polypeptides of different physiological importance and biomedical potential. A promising antibacterial target was identified. Five of the 44 examined proteins seemed to be virulence factors.
Collapse
Affiliation(s)
- Antonio J Martín-Galiano
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - José Yuste
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - María I Cercenado
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
| | - Adela G de la Campa
- />Centro Nacional de Microbiología and CIBERES (CIBER de Enfermedades Respiratorias), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain
- />Presidencia, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| |
Collapse
|
42
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1508] [Impact Index Per Article: 137.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
43
|
K MJ, Laxmi A. DUF581 is plant specific FCS-like zinc finger involved in protein-protein interaction. PLoS One 2014; 9:e99074. [PMID: 24901469 PMCID: PMC4047054 DOI: 10.1371/journal.pone.0099074] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 05/11/2014] [Indexed: 11/18/2022] Open
Abstract
Zinc fingers are a ubiquitous class of protein domain with considerable variation in structure and function. Zf-FCS is a highly diverged group of C2-C2 zinc finger which is present in animals, prokaryotes and viruses, but not in plants. In this study we identified that a plant specific domain of unknown function, DUF581 is a zf-FCS type zinc finger. Based on HMM-HMM comparison and signature motif similarity we named this domain as FCS-Like Zinc finger (FLZ) domain. A genome wide survey identified that FLZ domain containing genes are bryophytic in origin and this gene family is expanded in spermatophytes. Expression analysis of selected FLZ gene family members of A. thaliana identified an overlapping expression pattern suggesting a possible redundancy in their function. Unlike the zf-FCS domain, the FLZ domain found to be highly conserved in sequence and structure. Using a combination of bioinformatic and protein-protein interaction tools, we identified that FLZ domain is involved in protein-protein interaction.
Collapse
Affiliation(s)
| | - Ashverya Laxmi
- National Institute of Plant Genome Research, New Delhi, India
- * E-mail:
| |
Collapse
|
44
|
Grzymski JJ, Marsh AG. Protein languages differ depending on microorganism lifestyle. PLoS One 2014; 9:e96910. [PMID: 24828817 PMCID: PMC4020791 DOI: 10.1371/journal.pone.0096910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 04/14/2014] [Indexed: 11/19/2022] Open
Abstract
Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups-a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage-utilizing linguistic analyses of word frequency in language and text-identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function.
Collapse
Affiliation(s)
- Joseph J. Grzymski
- Division of Earth and Ecosystem Sciences, Desert Research Institute, Reno, Nevada, United States of America
- * E-mail: (JJG); (AGM)
| | - Adam G. Marsh
- Center for Bioinformatics and Computational Biology, Marine Biological Sciences, University of Delaware, Lewes, Delaware, United States of America
- * E-mail: (JJG); (AGM)
| |
Collapse
|
45
|
Sheydina A, Eberhardt RY, Rigden DJ, Chang Y, Li Z, Zmasek CC, Axelrod HL, Godzik A. Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase. BMC Bioinformatics 2014; 15:112. [PMID: 24742328 PMCID: PMC4032388 DOI: 10.1186/1471-2105-15-112] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 03/31/2014] [Indexed: 12/03/2022] Open
Abstract
Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Adam Godzik
- Joint Center for Structural Genomics, 10550 North Torrey Pines Road, BCC-206, La Jolla, California 92037, USA.
| |
Collapse
|
46
|
Molecular architecture and the structural basis for anion interaction in prestin and SLC26 transporters. Nat Commun 2014; 5:3622. [PMID: 24710176 PMCID: PMC3988826 DOI: 10.1038/ncomms4622] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 03/11/2014] [Indexed: 12/15/2022] Open
Abstract
Prestin (SLC26A5) is a member of the SLC26/SulP anion transporter family. Its unique quasi-piezoelectric mechanical activity generates fast cellular motility of cochlear outer hair cells, a key process underlying active amplification in the mammalian ear. Despite its established physiological role, it is essentially unknown how prestin can generate mechanical force, since structural information on SLC26/SulP proteins is lacking. Here we derive a structural model of prestin and related transporters by combining homology modelling, MD simulations and cysteine accessibility scanning. Prestin’s transmembrane core region is organized in a 7+7 inverted repeat architecture. The model suggests a central cavity as the substrate-binding site located midway of the anion permeation pathway, which is supported by experimental solute accessibility and mutational analysis. Anion binding to this site also controls the electromotile activity of prestin. The combined structural and functional data provide a framework for understanding electromotility and anion transport by SLC26 transporters. Prestin is an anion transporter-like protein in the mammalian inner ear that amplifies sound-induced vibration by voltage-driven structural rearrangements. Here, Gorbunov et al. show that this electromechanical activity is controlled by the binding of anions to a central cavity within the protein core.
Collapse
|
47
|
Mining the bacterial unknown proteome: identification and characterization of a novel family of highly conserved protective antigens in Staphylococcus aureus. Biochem J 2014; 455:273-84. [PMID: 23895222 DOI: 10.1042/bj20130540] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In the human pathogen Staphylococcus aureus, there exists an enormous diversity of proteins containing DUFs (domains of unknown function). In the present study, we characterized the family of conserved staphylococcal antigens (Csa) classified as DUF576 and taxonomically restricted to Staphylococci. The 18 Csa paralogues in S. aureus Newman are highly similar at the sequence level, yet were found to be expressed in multiple cellular locations. Extracellular Csa1A was shown to be post-translationally processed and released. Molecular interaction studies revealed that Csa1A interacts with other Csa paralogues, suggesting that these proteins are involved in the same cellular process. The structures of Csa1A and Csa1B were determined by X-ray crystallography, unveiling a peculiar structure with limited structural similarity to other known proteins. Our results provide the first detailed biological characterization of this family and confirm the uniqueness of this family also at the structural level. We also provide evidence that Csa family members elicit protective immunity in in vivo animal models of staphylococcal infections, indicating a possible important role for these proteins in S. aureus biology and pathogenesis. These findings identify the Csa family as new potential vaccine candidates, and underline the importance of mining the bacterial unknown proteome to identify new targets for preventive vaccines.
Collapse
|
48
|
In silico prediction of structure and functions for some proteins of male-specific region of the human Y chromosome. Interdiscip Sci 2014; 5:258-69. [PMID: 24402818 DOI: 10.1007/s12539-013-0178-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Revised: 09/03/2012] [Accepted: 11/08/2012] [Indexed: 10/25/2022]
Abstract
Male-specific region of the human Y chromosome (MSY) comprises 95% of its length that is functionally active. This portion inherits in block from father to male offspring. Most of the genes in the MSY region are involved in male-specific function, such as sex determination and spermatogenesis; also contains genes probably involved in other cellular functions. However, a detailed characterization of numerous MSY-encoded proteins still remains to be done. In this study, 12 uncharacterized proteins of MSY were analyzed through bioinformatics tools for structural and functional characterization. Within these 12 proteins, a total of 55 domains were found, with DnaJ domain signature corresponding to be the highest (11%) followed by both FAD-dependent pyridine nucleotide reductase signature and fumarate lyase superfamily signature (9%). The 3D structures of our selected proteins were built up using homology modeling and the protein threading approaches. These predicted structures confirmed in detail the stereochemistry; indicating reasonably good quality model. Furthermore the predicted functions and the proteins with whom they interact established their biological role and their mechanism of action at molecular level. The results of these structure-functional annotations provide a comprehensive view of the proteins encoded by MSY, which sheds light on their biological functions and molecular mechanisms. The data presented in this study may assist in future prognosis of several human diseases such as Turner syndrome, gonadal sex reversal, spermatogenic failure, and gonadoblastoma.
Collapse
|
49
|
Konc J, Janežič D. Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol 2013; 25:34-9. [PMID: 24878342 DOI: 10.1016/j.sbi.2013.11.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 11/26/2013] [Accepted: 11/27/2013] [Indexed: 11/30/2022]
Abstract
While structural genomics resulted in thousands of new protein crystal structures, we still do not know the functions of most of these proteins. One reason for this shortcoming is their unique sequences or folds, which leaves them assigned as proteins of 'unknown function'. Recent advances in and applications of cutting edge binding site comparison algorithms for binding site detection and function prediction have begun to shed light on this problem. Here, we review these algorithms and their use in function prediction and pharmaceutical discovery. Finding common binding sites in weakly related proteins may lead to the discovery of new protein functions and to novel ways of drug discovery.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Ljubljana, Slovenia
| | - Dušanka Janežič
- University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia.
| |
Collapse
|
50
|
Hwang WC, Bakolitsa C, Punta M, Coggill PC, Bateman A, Axelrod HL, Rawlings ND, Sedova M, Peterson SN, Eberhardt RY, Aravind L, Pascual J, Godzik A. LUD, a new protein domain associated with lactate utilization. BMC Bioinformatics 2013; 14:341. [PMID: 24274019 PMCID: PMC3924224 DOI: 10.1186/1471-2105-14-341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 11/19/2013] [Indexed: 11/24/2022] Open
Abstract
Background A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family. Results JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome. Conclusions We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
Collapse
Affiliation(s)
- William C Hwang
- Joint Center for Structural Genomics, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|