1
|
Kumar K, Pazare M, Ratnaparkhi GS, Kamat SS. CG17192 is a Phospholipase That Regulates Signaling Lipids in the Drosophila Gut upon Infection. Biochemistry 2024. [PMID: 39442931 DOI: 10.1021/acs.biochem.4c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
The chemoproteomics technique, activity-based protein profiling (ABPP), has proven to be an invaluable tool in assigning functions to enzymes. The serine hydrolase (SH) enzyme superfamily, in particular, has served as an excellent example in displaying the versatility of various ABPP platforms and has resulted in a comprehensive cataloging of the biochemical activities associated within this superfamily. Besides SHs, in mammals, several other enzyme classes have been thoroughly investigated using ABPP platforms. However, the utility of ABPP platforms in fly models remains underexplored. Realizing this knowledge gap, leveraging complementary ABPP platforms, we reported the full array of SH activities during various developmental stages and adult tissues in the fruit fly (Drosophila melanogaster). Following up on this study, using ABPP, we mapped SH activities in adult fruit flies in an infection model and found that a gut-resident lipase CG17192 showed increased activity during infection. To assign a biological function to this uncharacterized lipase, we performed an untargeted lipidomics analysis and found that phosphatidylinositols were significantly elevated when CG17192 was depleted in the adult fruit fly gut. Next, we overexpressed this lipase in insect cells, and using biochemical assays, we show that CG17192 is a secreted enzyme that has phospholipase C (PLC) type activity, with phosphatidylinositol being a preferred substrate. Finally, we show during infection that heightened CG17192 regulates phosphatidylinositol levels and, by doing so, likely modulates signaling pathways in the adult fruit fly gut that might be involved in the resolution of this pathophysiological condition.
Collapse
Affiliation(s)
- Kundan Kumar
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Mrunal Pazare
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Girish S Ratnaparkhi
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| | - Siddhesh S Kamat
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pashan, Pune 411008, Maharashtra, India
| |
Collapse
|
2
|
Ahmed MH, Samia NSN, Singh G, Gupta V, Mishal MFM, Hossain A, Suman KH, Raza A, Dutta AK, Labony MA, Sultana J, Faysal EH, Alnasser SM, Alam P, Azam F. An immuno-informatics approach for annotation of hypothetical proteins and multi-epitope vaccine designed against the Mpox virus. J Biomol Struct Dyn 2024; 42:5288-5307. [PMID: 37519185 DOI: 10.1080/07391102.2023.2239921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 06/09/2023] [Indexed: 08/01/2023]
Abstract
A worrying new outbreak of Monkeypox (Mpox) in humans is caused by the Mpox virus (MpoxV). The pathogen has roughly 28 hypothetical proteins of unknown structure, function, and pathogenicity. Using reliable bioinformatics tools, we attempted to analyze the MpoxV genome, identify the role of hypothetical proteins (HPs), and design a potential candidate vaccine. Out of 28, we identified seven hypothetical proteins using multi-server validation with high confidence for the occurrence of conserved domains. Their physical, chemical, and functional characterizations, including molecular weight, theoretical isoelectric point, 3D structures, GRAVY value, subcellular localization, functional motifs, antigenicity, and virulence factors, were performed. We predicted possible cytotoxic T cell (CTL), helper T cell (HTL) and linear and conformational B cell epitopes, which were combined in a 219 amino acid multiepitope vaccine with human β defensin as a linker. This multi-epitopic vaccine was structurally modelled and docked with toll-like receptor-3 (TLR-3). The dynamical stability of the vaccine-TLR-3 docked complexes exhibited stable interactions based on RMSD and RMSF tests. Additionally, the modelled vaccine was cloned in-silico in an E. coli host to check the appropriate expression of the final vaccine built. Our results might conform to an immunogenic and safe vaccine, which would require further experimental validation.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Md Hridoy Ahmed
- Department of Genetic Engineering and Biotechnology, University of Chittagong, Chittagong, Bangladesh
| | - Nure Sharaf Nower Samia
- Department of Life Sciences (DLS), School of Environment and Life Sciences (SELS), Independent University, Dhaka, Bangladesh
| | - Gagandeep Singh
- Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi, India
- Section of Microbiology, Central Ayurveda Research Institute, Jhansi CCRAS, Ministry of Ayush, India
| | - Vandana Gupta
- Department of Microbiology, Ram Lal Anand College, University of Delhi, New Delhi, India
| | | | - Alomgir Hossain
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi, Bangladesh
| | | | - Adnan Raza
- Bioscience department, COMSATS University of Islamabad, Islamabad, Pakistan
| | - Amit Kumar Dutta
- Department of Microbiology, University of Rajshahi, Rajshahi, Bangladesh
| | - Moriom Akhter Labony
- Department of Genetic Engineering and Biotechnology, University of Chittagong, Chittagong, Bangladesh
| | - Jakia Sultana
- Department of Botany, University of Rajshahi, Rajshahi, Bangladesh
| | | | - Sulaiman Mohammed Alnasser
- Department of Pharmacology and Toxicology, Unaizah College of Pharmacy, Qassim University, Buraydah, Saudi Arabia
| | - Prawez Alam
- Department of Pharmacognosy, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, Saudi Arabia
| | - Faizul Azam
- Department of Pharmaceutical Chemistry and Pharmacognosy, Unaizah College of Pharmacy, Qassim University, Buraydah, Saudi Arabia
| |
Collapse
|
3
|
Vaidya K, Rodrigues G, Gupta S, Devarajan A, Yeolekar M, Madhusudhan MS, Kamat SS. Identification of sequence determinants for the ABHD14 enzymes. Proteins 2023. [PMID: 37974539 DOI: 10.1002/prot.26632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/14/2023] [Accepted: 10/24/2023] [Indexed: 11/19/2023]
Abstract
Over the course of evolution, enzymes have developed remarkable functional diversity in catalyzing important chemical reactions across various organisms, and understanding how new enzyme functions might have evolved remains an important question in modern enzymology. To systematically annotate functions, based on their protein sequences and available biochemical studies, enzymes with similar catalytic mechanisms have been clustered together into an enzyme superfamily. Typically, enzymes within a superfamily have similar overall three-dimensional structures, conserved catalytic residues, but large variations in substrate recognition sites and residues to accommodate the diverse biochemical reactions that are catalyzed within the superfamily. The serine hydrolases are an excellent example of such an enzyme superfamily. Based on known enzymatic activities and protein sequences, they are split almost equally into the serine proteases and metabolic serine hydrolases. Within the metabolic serine hydrolases, there are two outlying members, ABHD14A and ABHD14B, that have high sequence similarity, but their biological functions remained cryptic till recently. While ABHD14A still lacks any functional annotation to date, we recently showed that ABHD14B functions as a lysine deacetylase in mammals. Given their high sequence similarity, automated databases often wrongly assign ABHD14A and ABHD14B as the same enzyme, and therefore, annotating functions to them in various organisms has been problematic. In this article, we present a bioinformatics study coupled with biochemical experiments, which identifies key sequence determinants for both ABHD14A and ABHD14B, and enable better classification for them. In addition, we map these enzymes on an evolutionary timescale and provide a much-wanted resource for studying these interesting enzymes in different organisms.
Collapse
Affiliation(s)
- Kaveri Vaidya
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Golding Rodrigues
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Sonali Gupta
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Archit Devarajan
- Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, Madhya Pradesh, India
| | - Mihika Yeolekar
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - M S Madhusudhan
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| | - Siddhesh S Kamat
- Department of Biology, Indian Institute of Science Education and Research Pune, Pune, Maharashtra, India
| |
Collapse
|
4
|
Spiers AJ, Dorfmueller HC, Jerdan R, McGregor J, Nicoll A, Steel K, Cameron S. Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases. PLoS One 2023; 18:e0286540. [PMID: 37267309 PMCID: PMC10237404 DOI: 10.1371/journal.pone.0286540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/04/2023] Open
Abstract
Bacteria produce a variety of polysaccharides with functional roles in cell surface coating, surface and host interactions, and biofilms. We have identified an 'Orphan' bacterial cellulose synthase catalytic subunit (BcsA)-like protein found in four model pseudomonads, P. aeruginosa PA01, P. fluorescens SBW25, P. putida KT2440 and P. syringae pv. tomato DC3000. Pairwise alignments indicated that the Orphan and BcsA proteins shared less than 41% sequence identity suggesting they may not have the same structural folds or function. We identified 112 Orphans among soil and plant-associated pseudomonads as well as in phytopathogenic and human opportunistic pathogenic strains. The wide distribution of these highly conserved proteins suggest they form a novel family of synthases producing a different polysaccharide. In silico analysis, including sequence comparisons, secondary structure and topology predictions, and protein structural modelling, revealed a two-domain transmembrane ovoid-like structure for the Orphan protein with a periplasmic glycosyl hydrolase family GH17 domain linked via a transmembrane region to a cytoplasmic glycosyltransferase family GT2 domain. We suggest the GT2 domain synthesises β-(1,3)-glucan that is transferred to the GH17 domain where it is cleaved and cyclised to produce cyclic-β-(1,3)-glucan (CβG). Our structural models are consistent with enzymatic characterisation and recent molecular simulations of the PaPA01 and PpKT2440 GH17 domains. It also provides a functional explanation linking PaPAK and PaPA14 Orphan (also known as NdvB) transposon mutants with CβG production and biofilm-associated antibiotic resistance. Importantly, cyclic glucans are also involved in osmoregulation, plant infection and induced systemic suppression, and our findings suggest this novel family of CβG synthases may provide similar range of adaptive responses for pseudomonads.
Collapse
Affiliation(s)
- Andrew J. Spiers
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Helge C. Dorfmueller
- Division of Molecular Microbiology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Robyn Jerdan
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Jessica McGregor
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Abbie Nicoll
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Kenzie Steel
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Scott Cameron
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| |
Collapse
|
5
|
Bacteria.guru: Comparative Transcriptomics and Co-Expression Database for Bacterial Pathogens. J Mol Biol 2021; 434:167380. [PMID: 34838806 DOI: 10.1016/j.jmb.2021.167380] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 11/11/2021] [Accepted: 11/21/2021] [Indexed: 12/12/2022]
Abstract
While bacteria can be beneficial to our health, their deadly pathogenic potential has been an ever-present concern exacerbated by the emergence of drug-resistant strains. As such, there is a pressing urgency for an enhanced understanding of their gene function and regulation, which could mediate the development of novel antimicrobials. Transcriptomic analyses have been established as insightful and indispensable to the functional characterization of genes and identification of new biological pathways, but in the context of bacterial studies, they remain limited to species-specific datasets. To address this, we integrated the genomic and transcriptomic data of the 17 most notorious and researched bacterial pathogens, creating bacteria.guru, an interactive database that can identify, visualize, and compare gene expression profiles, coexpression networks, functionally enriched clusters, and gene families across species. Through illustrating antibiotic resistance mechanisms in P. aeruginosa, we demonstrate that bacteria.guru could potentially aid in discovering multi-faceted antibiotic targets and, overall, facilitate future bacterial research. AVAILABILITY: The database and coexpression networks are freely available from https://bacteria.guru/. Sample annotations can be found in the supplemental data.
Collapse
|
6
|
Ezaj MMA, Haque MS, Syed SB, Khan MSA, Ahmed KR, Khatun MT, Nayeem SMA, Rizvi GR, Al-Forkan M, Khaleda L. Comparative proteomic analysis to annotate the structural and functional association of the hypothetical proteins of S. maltophilia k279a and predict potential T and B cell targets for vaccination. PLoS One 2021; 16:e0252295. [PMID: 34043709 PMCID: PMC8159010 DOI: 10.1371/journal.pone.0252295] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 05/07/2021] [Indexed: 11/18/2022] Open
Abstract
Stenotrophomonas maltophilia is a multidrug-resistant bacterium with no precise clinical treatment. This bacterium can be a vital cause for death and different organ failures in immune-compromised, immune-competent, and long-time hospitalized patients. Extensive quorum sensing capability has become a challenge to develop new drugs against this pathogen. Moreover, the organism possesses about 789 proteins which function, structure, and pathogenesis remain obscured. In this piece of work, we tried to enlighten the aforementioned sectors using highly reliable bioinformatics tools validated by the scientific community. At first, the whole proteome sequence of the organism was retrieved and stored. Then we separated the hypothetical proteins and searched for the conserved domain with a high confidence level and multi-server validation, which resulted in 24 such proteins. Furthermore, all of their physical and chemical characterizations were performed, such as theoretical isoelectric point, molecular weight, GRAVY value, and many more. Besides, the subcellular localization, protein-protein interactions, functional motifs, 3D structures, antigenicity, and virulence factors were also evaluated. As an extension of this work, 'RTFAMSSER' and 'PAAPQPSAS' were predicted as potential T and B cell epitopes, respectively. We hope our findings will help in better understating the pathogenesis and smoothen the way to the cure.
Collapse
Affiliation(s)
- Md. Muzahid Ahmed Ezaj
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
- Reverse Vaccinology Research Division, Advanced Bioinformatics, Computational Biology and Data Science Laboratory, Chittagong, Bangladesh
| | - Md. Sajedul Haque
- Department of Chemistry, Faculty of Science, University of Chittagong, Chattogram, Bangladesh
| | - Shifath Bin Syed
- Department of Biotechnology and Genetic Engineering, Faculty of Biological Sciences, Islamic University, Kushtia, Bangladesh
| | - Md. Shakil Ahmed Khan
- Department of Biotechnology and Genetic Engineering, Faculty of Biological Sciences, Islamic University, Kushtia, Bangladesh
| | - Kazi Rejvee Ahmed
- Department of Biotechnology and Genetic Engineering, Faculty of Biological Sciences, Islamic University, Kushtia, Bangladesh
| | - Mst. Tania Khatun
- Department of Biotechnology and Genetic Engineering, Faculty of Biological Sciences, Islamic University, Kushtia, Bangladesh
| | - S. M. Abdul Nayeem
- Reverse Vaccinology Research Division, Advanced Bioinformatics, Computational Biology and Data Science Laboratory, Chittagong, Bangladesh
- Department of Chemistry, Faculty of Science, University of Chittagong, Chattogram, Bangladesh
| | - Golam Rosul Rizvi
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Mohammad Al-Forkan
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | - Laila Khaleda
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| |
Collapse
|
7
|
de Oliveira Almeida R, Valente GT. Predicting metabolic pathways of plant enzymes without using sequence similarity: Models from machine learning. THE PLANT GENOME 2020; 13:e20043. [PMID: 33217216 DOI: 10.1002/tpg2.20043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 06/03/2020] [Accepted: 06/10/2020] [Indexed: 06/11/2023]
Abstract
Most of the bioinformatics tools for enzyme annotation focus on enzymatic function assignments. Sequence similarity to well-characterized enzymes is often used for functional annotation and to assign metabolic pathways. However, these approaches are not feasible for all sequences leading to inaccurate annotations or lack of metabolic pathway information. Here we present the mApLe (metabolic pathway predictor of plant enzymes), a high-performance machine learning-based tool with models to label the metabolic pathway of enzymes rather than specifying enzymes' reactions. The mApLe uses molecular descriptors of the enzyme sequences to perform predictions without considering sequence similarities with reference sequences. Hence, mApLe can classify a diversity of enzymes, even the ones without any homolog or with incomplete EC numbers. This tool can be used to improve the quality of genomic annotation of plants or to narrow down the number of candidate genes for metabolic engineering researches. The mApLe tool is available online, and the GUI can be locally installed.
Collapse
Affiliation(s)
- Rodrigo de Oliveira Almeida
- Instituto Federal de Educação, Ciência e Tecnologia do Sudeste de Minas Gerais, Muriaé, Brazil
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
| | - Guilherme Targino Valente
- Department of Bioprocess and Biotechnology, School of Agriculture, São Paulo State University (Unesp), Botucatu, Brazil
- Department of Developmental Genetics, Max Planck Institut für Herz- und Lungenforschung, Bad Nauheim, Germany
| |
Collapse
|
8
|
Gysi DM, Nowick K. Construction, comparison and evolution of networks in life sciences and other disciplines. J R Soc Interface 2020; 17:20190610. [PMID: 32370689 PMCID: PMC7276545 DOI: 10.1098/rsif.2019.0610] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 04/09/2020] [Indexed: 12/12/2022] Open
Abstract
Network approaches have become pervasive in many research fields. They allow for a more comprehensive understanding of complex relationships between entities as well as their group-level properties and dynamics. Many networks change over time, be it within seconds or millions of years, depending on the nature of the network. Our focus will be on comparative network analyses in life sciences, where deciphering temporal network changes is a core interest of molecular, ecological, neuropsychological and evolutionary biologists. Further, we will take a journey through different disciplines, such as social sciences, finance and computational gastronomy, to present commonalities and differences in how networks change and can be analysed. Finally, we envision how borrowing ideas from these disciplines could enrich the future of life science research.
Collapse
Affiliation(s)
- Deisy Morselli Gysi
- Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, 04109 Leipzig, Germany
- Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, University of Leipzig, 04109 Leipzig, Germany
- Center for Complex Networks Research, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Faculty of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Königin-Luise-Straβe 1-3, 14195 Berlin, Germany
| |
Collapse
|
9
|
Koutsandreas T, Ladoukakis E, Pilalis E, Zarafeta D, Kolisis FN, Skretas G, Chatziioannou AA. ANASTASIA: An Automated Metagenomic Analysis Pipeline for Novel Enzyme Discovery Exploiting Next Generation Sequencing Data. Front Genet 2019; 10:469. [PMID: 31178894 PMCID: PMC6543708 DOI: 10.3389/fgene.2019.00469] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 05/01/2019] [Indexed: 01/27/2023] Open
Abstract
Metagenomic analysis of environmental samples provides deep insight into the enzymatic mixture of the corresponding niches, capable of revealing peptide sequences with novel functional properties exploiting the high performance of next-generation sequencing (NGS) technologies. At the same time due to their ever increasing complexity, there is a compelling need for ever larger computational configurations to ensure proper bioinformatic analysis, and fine annotation. With the aiming to address the challenges of such an endeavor, we have developed a novel web-based application named ANASTASIA (automated nucleotide aminoacid sequences translational plAtform for systemic interpretation and analysis). ANASTASIA provides a rich environment of bioinformatic tools, either publicly available or novel, proprietary algorithms, integrated within numerous automated algorithmic workflows, and which enables versatile data processing tasks for (meta)genomic sequence datasets. ANASTASIA was initially developed in the framework of the European FP7 project HotZyme, whose aim was to perform exhaustive analysis of metagenomes derived from thermal springs around the globe and to discover new enzymes of industrial interest. ANASTASIA has evolved to become a stable and extensible environment for diversified, metagenomic, functional analyses for a range of applications overarching industrial biotechnology to biomedicine, within the frames of the ELIXIR-GR project. As a showcase, we report the successful in silico mining of a novel thermostable esterase termed “EstDZ4” from a metagenomic sample collected from a hot spring located in Krisuvik, Iceland.
Collapse
Affiliation(s)
- Theodoros Koutsandreas
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece.,e-NIOS Applications PC, Athens, Greece
| | - Efthymios Ladoukakis
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece.,Laboratory of Biotechnology, School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - Eleftherios Pilalis
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece.,e-NIOS Applications PC, Athens, Greece
| | - Dimitra Zarafeta
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece
| | - Fragiskos N Kolisis
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece.,Laboratory of Biotechnology, School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - Georgios Skretas
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece
| | - Aristotelis A Chatziioannou
- Institute of Chemical Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, Athens, Greece.,e-NIOS Applications PC, Athens, Greece
| |
Collapse
|
10
|
Gupta C, Pereira A. Recent advances in gene function prediction using context-specific coexpression networks in plants. F1000Res 2019; 8:F1000 Faculty Rev-153. [PMID: 30800290 PMCID: PMC6364378 DOI: 10.12688/f1000research.17207.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/30/2019] [Indexed: 12/11/2022] Open
Abstract
Predicting gene functions from genome sequence alone has been difficult, and the functions of a large fraction of plant genes remain unknown. However, leveraging the vast amount of currently available gene expression data has the potential to facilitate our understanding of plant gene functions, especially in determining complex traits. Gene coexpression networks-created by integrating multiple expression datasets-connect genes with similar patterns of expression across multiple conditions. Dense gene communities in such networks, commonly referred to as modules, often indicate that the member genes are functionally related. As such, these modules serve as tools for generating new testable hypotheses, including the prediction of gene function and importance. Recently, we have seen a paradigm shift from the traditional "global" to more defined, context-specific coexpression networks. Such coexpression networks imply genetic correlations in specific biological contexts such as during development or in response to a stress. In this short review, we highlight a few recent studies that attempt to fill the large gaps in our knowledge about cellular functions of plant genes using context-specific coexpression networks.
Collapse
Affiliation(s)
- Chirag Gupta
- Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Andy Pereira
- Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
11
|
Abstract
This chapter covers the theory and practice of ortholog gene set computation. In the theoretical part we give detailed and formal descriptions of the relevant concepts. We also cover the topic of graph-based clustering as a tool to compute ortholog gene sets. In the second part we provide an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes, briefly describing some of the most popular programs and resources currently available for this task.
Collapse
|
12
|
Niu C, Payne GA, Woloshuk CP. Involvement of FST1 from Fusarium verticillioides in virulence and transport of inositol. MOLECULAR PLANT PATHOLOGY 2017; 18:695-707. [PMID: 27195938 PMCID: PMC6638204 DOI: 10.1111/mpp.12430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 05/11/2016] [Accepted: 05/13/2016] [Indexed: 06/05/2023]
Abstract
Fumonisin B1 (FB1), a polyketide mycotoxin produced by Fusarium verticillioides during the colonization of maize kernels, is detrimental to human and animal health. FST1 encodes a putative protein with 12 transmembrane domains; however, its function remains unknown. The FST1 gene is highly expressed by the fungus in the endosperm of maize kernels compared with the levels of expression in germ tissues. Previous research has shown that FST1 affects FB1 production, virulence, hydrogen peroxide resistance, hydrophobicity and macroconidia production. Here, we examine the phylogeny of FST1, its expression in a Saccharomyces cerevisiae strain lacking a functional myo-inositol transporter (ITR1) and the effect of amino acid changes in the central loop and C-terminus regions of FST1 on functionality. The results indicate that expression of FST1 in an ITR1 mutant strain restores growth on myo-inositol medium to wild-type levels and restores the inhibitory effects of FB1, suggesting that FST1 can transport both myo-inositol and FB1 into yeast cells. Our results with engineered FST1 also indicate that amino acids in the central loop and C-terminus regions are important for FST1 functionality in both S. cerevisiae and F. verticillioides. Overall, this research has established the first characterized inositol transporter in filamentous fungi and has advanced our knowledge about the global regulatory functions of FST1.
Collapse
Affiliation(s)
- Chenxing Niu
- Department of Botany and Plant PathologyPurdue UniversityWest LafayetteIN47907‐2054USA
| | - Gary A. Payne
- Department of Plant PathologyNorth Carolina State UniversityRaleighNC27695‐7567USA
| | - Charles P. Woloshuk
- Department of Botany and Plant PathologyPurdue UniversityWest LafayetteIN47907‐2054USA
| |
Collapse
|
13
|
Wang H, Yan L, Huang H, Ding C. From Protein Sequence to Protein Function via Multi-Label Linear Discriminant Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:503-513. [PMID: 27429445 DOI: 10.1109/tcbb.2016.2591529] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Sequence describes the primary structure of a protein, which contains important structural, characteristic, and genetic information and thereby motivates many sequence-based computational approaches to infer protein function. Among them, feature-base approaches attract increased attention because they make prediction from a set of transformed and more biologically meaningful sequence features. However, original features extracted from sequence are usually of high dimensionality and often compromised by irrelevant patterns, therefore dimension reduction is necessary prior to classification for efficient and effective protein function prediction. A protein usually performs several different functions within an organism, which makes protein function prediction a multi-label classification problem. In machine learning, multi-label classification deals with problems where each object may belong to more than one class. As a well-known feature reduction method, linear discriminant analysis (LDA) has been successfully applied in many practical applications. It, however, by nature is designed for single-label classification, in which each object can belong to exactly one class. Because directly applying LDA in multi-label classification causes ambiguity when computing scatters matrices, we apply a new Multi-label Linear Discriminant Analysis (MLDA) approach to address this problem and meanwhile preserve powerful classification capability inherited from classical LDA. We further extend MLDA by l1-normalization to overcome the problem of over-counting data points with multiple labels. In addition, we incorporate biological network data using Laplacian embedding into our method, and assess the reliability of predicted putative functions. Extensive empirical evaluations demonstrate promising results of our methods.
Collapse
|
14
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
15
|
Gazi MA, Kibria MG, Mahfuz M, Islam MR, Ghosh P, Afsar MNA, Khan MA, Ahmed T. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets. Gene 2016; 591:442-55. [PMID: 27374154 DOI: 10.1016/j.gene.2016.06.057] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 04/27/2016] [Accepted: 06/28/2016] [Indexed: 01/11/2023]
Abstract
The global control of tuberculosis (TB) remains a great challenge from the standpoint of diagnosis, detection of drug resistance, and treatment. Major serodiagnostic limitations include low sensitivity and high cost in detecting TB. On the other hand, treatment measures are often hindered by low efficacies of commonly used drugs and resistance developed by the bacteria. Hence, there is a need to look into newer diagnostic and therapeutic targets. The proteome information available suggests that among the 3906 proteins in Mycobacterium tuberculosis H37Rv, about quarter remain classified as hypothetical uncharacterized set. This study involves a combination of a number of bioinformatics tools to analyze those hypothetical proteins (HPs). An entire set of 999 proteins was primarily screened for protein sequences having conserved domains with high confidence using a combination of the latest versions of protein family databases. Subsequently, 98 of such potential target proteins were extensively analyzed by means of physicochemical characteristics, protein-protein interaction, sub-cellular localization, structural similarity and functional classification. Next, we predicted antigenic proteins from the entire set and identified B and T cell epitopes of these proteins in M. tuberculosis H37Rv. We predicted the function of these HPs belong to various classes of proteins such as enzymes, transporters, receptors, structural proteins, transcription regulators and other proteins. However, the structural similarity prediction of the annotated proteins substantiated the functional classification of those proteins. Consequently, based on higher antigenicity score and sub-cellular localization, we choose two (NP_216420.1, NP_216903.1) of the antigenic proteins to exemplify B and T cell epitope prediction approach. Finally we found 15 epitopes those located partially or fully in the linear epitope region. We found 21 conformational epitopes by using Ellipro server as well. In silico methodology used in this study and the data thus generated for HPs of M. tuberculosis H37Rv may facilitate swift experimental identification of potential serodiagnostic and therapeutic targets for treatment and control.
Collapse
Affiliation(s)
- Md Amran Gazi
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| | - Mohammad Golam Kibria
- Parasitology Laboratory, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| | - Mustafa Mahfuz
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| | - Md Rezaul Islam
- International Max Planck Research School, Grisebachstraße 5, 37077 Göttingen, Germany.
| | - Prakash Ghosh
- Parasitology Laboratory, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| | - Md Nure Alam Afsar
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| | - Md Arif Khan
- Bio-Bio-1 Research Foundation, Sangskriti Bikash Kendra Bhaban, 1/E/1, Poribag, Dhaka 1000, Bangladesh.
| | - Tahmeed Ahmed
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Bangladesh.
| |
Collapse
|
16
|
Payoungkiattikun W, Okazaki S, Nakano S, Ina A, H-Kittikun A, Asano Y. In Silico Identification for α-Amino-ε-Caprolactam Racemases by Using Information on the Structure and Function Relationship. Appl Biochem Biotechnol 2015. [DOI: 10.1007/s12010-015-1647-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Mukherjee S, Lapidus A, Shapiro N, Cheng JF, Han J, Reddy TBK, Huntemann M, Ivanova N, Mikhailova N, Chen A, Palaniappan K, Spring S, Göker M, Markowitz V, Woyke T, Tindall BJ, Klenk HP, Kyrpides NC, Pati A. High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1(T) (DSM 17521(T)) isolated from muddy waters of a drainage system in Chandigarh, India. Stand Genomic Sci 2015; 10:8. [PMID: 26203325 PMCID: PMC4511580 DOI: 10.1186/1944-3277-10-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 11/24/2014] [Indexed: 12/21/2022] Open
Abstract
Pontibacter roseus is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1(T) along with its complete genome sequence and annotation from a culture of DSM 17521(T). The 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50 RNA genes and is a part of Genomic Encyclopedia of Type Strains: KMG-I project.
Collapse
Affiliation(s)
| | - Alla Lapidus
- T. Dobzhansky Center for Genome Bionformatics, St. Petersburg State University, St. Petersburg, Russia
- Algorithmic Biology Lab, St. Petersburg Academic University, St. Petersburg, Russia
| | - Nicole Shapiro
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Jan-Fang Cheng
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - James Han
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - TBK Reddy
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | | | | | | | - Amy Chen
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Krishna Palaniappan
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Stefan Spring
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Markus Göker
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Victor Markowitz
- Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Tanja Woyke
- DOE Joint Genome Institute, Walnut Creek, California, USA
| | - Brian J Tindall
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Hans-Peter Klenk
- Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Walnut Creek, California, USA
- King Abdulaziz University, Jeddah, Saudi Arabia
| | - Amrita Pati
- DOE Joint Genome Institute, Walnut Creek, California, USA
| |
Collapse
|
18
|
Elshahawi SI, Ramelot TA, Seetharaman J, Chen J, Singh S, Yang Y, Pederson K, Kharel MK, Xiao R, Lew S, Yennamalli RM, Miller MD, Wang F, Tong L, Montelione GT, Kennedy MA, Bingman CA, Zhu H, Phillips GN, Thorson JS. Structure-guided functional characterization of enediyne self-sacrifice resistance proteins, CalU16 and CalU19. ACS Chem Biol 2014; 9:2347-58. [PMID: 25079510 PMCID: PMC4201346 DOI: 10.1021/cb500327m] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
![]()
Calicheamicin γ1I (1)
is an enediyne antitumor compound produced by Micromonospora
echinospora spp. calichensis, and its biosynthetic gene cluster
has been previously reported. Despite extensive analysis and biochemical
study, several genes in the biosynthetic gene cluster of 1 remain functionally unassigned. Using a structural genomics approach
and biochemical characterization, two proteins encoded by genes from
the 1 biosynthetic gene cluster assigned as “unknowns”,
CalU16 and CalU19, were characterized. Structure analysis revealed
that they possess the STeroidogenic Acute Regulatory protein related
lipid Transfer (START) domain known mainly to bind and transport lipids
and previously identified as the structural signature of the enediyne
self-resistance protein CalC. Subsequent study revealed calU16 and calU19 to confer resistance to 1, and reminiscent of the prototype CalC, both CalU16 and CalU19 were
cleaved by 1in vitro. Through site-directed
mutagenesis and mass spectrometry, we identified the site of cleavage
in each protein and characterized their function in conferring resistance
against 1. This report emphasizes the importance of structural
genomics as a powerful tool for the functional annotation of unknown
proteins.
Collapse
Affiliation(s)
- Sherif I. Elshahawi
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
- Center
for Pharmaceutical Research and Innovation (CPRI), College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
| | - Theresa A. Ramelot
- Department
of Chemistry and Biochemistry, Northeast Structural Genomics Consortium, Miami University, Oxford, Ohio 45056, United States
| | - Jayaraman Seetharaman
- Department
of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York 10027, United States
| | - Jing Chen
- Department of Molecular and Cellular Biochemistry & Center for Structural Biology, College of Medicine, University of Kentucky, Lexington, Kentucky 40536, United States
| | - Shanteri Singh
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
- Center
for Pharmaceutical Research and Innovation (CPRI), College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
| | - Yunhuang Yang
- Department
of Chemistry and Biochemistry, Northeast Structural Genomics Consortium, Miami University, Oxford, Ohio 45056, United States
| | - Kari Pederson
- Complex Carbohydrate
Research Center, Northeast Structural Genomics Consortium, University of Georgia, Athens, Georgia 30602, United States
| | - Madan K. Kharel
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
- Center
for Pharmaceutical Research and Innovation (CPRI), College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
| | - Rong Xiao
- Center
for Advanced Biotechnology and Medicine, Department of Molecular Biology
and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, United States
| | - Scott Lew
- Department
of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York 10027, United States
| | - Ragothaman M. Yennamalli
- Department
of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, United States
| | - Mitchell D. Miller
- Department
of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, United States
| | - Fengbin Wang
- Department
of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, United States
| | - Liang Tong
- Department
of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, New York 10027, United States
| | - Gaetano T. Montelione
- Center
for Advanced Biotechnology and Medicine, Department of Molecular Biology
and Biochemistry, and Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, United States
- Department
of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, United States
| | - Michael A. Kennedy
- Department
of Chemistry and Biochemistry, Northeast Structural Genomics Consortium, Miami University, Oxford, Ohio 45056, United States
| | - Craig A. Bingman
- Department
of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Haining Zhu
- Department of Molecular and Cellular Biochemistry & Center for Structural Biology, College of Medicine, University of Kentucky, Lexington, Kentucky 40536, United States
| | - George N. Phillips
- Department
of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, United States
| | - Jon S. Thorson
- Department
of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
- Center
for Pharmaceutical Research and Innovation (CPRI), College of Pharmacy, University of Kentucky, Lexington, Kentucky 40536, United States
| |
Collapse
|
19
|
How to learn about gene function: text-mining or ontologies? Methods 2014; 74:3-15. [PMID: 25088781 DOI: 10.1016/j.ymeth.2014.07.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 07/01/2014] [Accepted: 07/09/2014] [Indexed: 12/31/2022] Open
Abstract
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
Collapse
|
20
|
Puggioni V, Dondi A, Folli C, Shin I, Rhee S, Percudani R. Gene Context Analysis Reveals Functional Divergence between Hypothetically Equivalent Enzymes of the Purine–Ureide Pathway. Biochemistry 2014; 53:735-45. [DOI: 10.1021/bi4010107] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Vincenzo Puggioni
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| | - Ambra Dondi
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| | - Claudia Folli
- Department
of Food Science, University of Parma, Italy
| | - Inchul Shin
- Department
of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Sangkee Rhee
- Department
of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Riccardo Percudani
- Laboratory
of Biochemistry, Molecular Biology, and Bioinformatics, Department
of Life Sciences, University of Parma, Italy
| |
Collapse
|
21
|
Structure-based functional site recognition for p21-activated kinase 4. Arch Pharm Res 2013; 36:1494-9. [DOI: 10.1007/s12272-013-0165-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 07/25/2011] [Indexed: 01/15/2023]
|
22
|
Pearson WR. An introduction to sequence similarity ("homology") searching. CURRENT PROTOCOLS IN BIOINFORMATICS 2013; Chapter 3:3.1.1-3.1.8. [PMID: 23749753 PMCID: PMC3820096 DOI: 10.1002/0471250953.bi0301s42] [Citation(s) in RCA: 435] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Sequence similarity searching, typically with BLAST, is the most widely used and most reliable strategy for characterizing newly determined sequences. Sequence similarity searches can identify "homologous" proteins or genes by detecting excess similarity- statistically significant similarity that reflects common ancestry. This unit provides an overview of the inference of homology from significant similarity, and introduces other units in this chapter that provide more details on effective strategies for identifying homologs.
Collapse
|
23
|
Rubinstein R, Ramagopal UA, Nathenson SG, Almo SC, Fiser A. Functional classification of immune regulatory proteins. Structure 2013; 21:766-76. [PMID: 23583034 PMCID: PMC3654037 DOI: 10.1016/j.str.2013.02.022] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Revised: 01/29/2013] [Accepted: 02/16/2013] [Indexed: 11/29/2022]
Abstract
The members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. Guided by the Brotherhood approach, we present the high-resolution structural characterization of a homophilic interaction involving the class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.
Collapse
Affiliation(s)
- Rotem Rubinstein
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Udupi A. Ramagopal
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Stanley G. Nathenson
- Department of Immunology and Microbiology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
- Department of Cell Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Steven C. Almo
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
- Department of Physiology and Biophysics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
24
|
Bitra A, Hussain B, Tanwar AS, Anand R. Identification of Function and Mechanistic Insights of Guanine Deaminase from Nitrosomonas europaea: Role of the C-Terminal Loop in Catalysis. Biochemistry 2013; 52:3512-22. [DOI: 10.1021/bi400068g] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Aruna Bitra
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| | - Bhukya Hussain
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| | | | - Ruchi Anand
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| |
Collapse
|
25
|
Banerjee A. Novel targets in drug design: enzymes in the protein ubiquitylation pathway. Expert Opin Drug Discov 2013; 1:151-60. [PMID: 23495798 DOI: 10.1517/17460441.1.2.151] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein ubiquitylation is a pathway by which many proteins are selectively degraded. Its role has been shown in processes such as cell division and differentiation, oncogenesis, apoptosis, DNA repair, membrane transport and the removal of abnormal proteins. The ubiquitylation pathway enzymes are an insufficiently researched area for drug development. A genetic method has been developed (supported by computational biology) to identify potentially useful small molecules that will have a positive impact on our battle against cancer and other diseases. In silico screening is used for initial selection of drug-like compounds. This method is based on docking three-dimensional chemical libraries onto the target enzyme's functional site for initial screens using a computational scheme, followed by genetic and in vivo methods for hit optimisation. Focus has been on using the ubiquitin conjugation pathway as target for therapeutic intervention against cancer and potent inhibitors of ubiquitylation subpathways have been obtained (including those that are vital for the survival of aggressive cancer cells/tumours). Leads from the development of in vitro inhibitors provided a direction for the development of in vivo inhibitors as investigational tools, and as promising therapeutic agents.
Collapse
Affiliation(s)
- Amit Banerjee
- Wayne State University, Department of Pharmaceutical Sciences, Eugene Applebaum College of Pharmacy & Health Sciences and Karmanos Cancer Institute, 259 Mack Avenue, Room 3142, Detroit, Michigan 48201, USA.
| |
Collapse
|
26
|
Fan H, Hitchcock DS, Seidel RD, Hillerich B, Lin H, Almo SC, Sali A, Shoichet BK, Raushel FM. Assignment of pterin deaminase activity to an enzyme of unknown function guided by homology modeling and docking. J Am Chem Soc 2013; 135:795-803. [PMID: 23256477 PMCID: PMC3557803 DOI: 10.1021/ja309680b] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed on the basis of a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57,672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. On the basis of docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, k(cat)/K(m) [M(-1) s(-1)]): formylpterin, 5.2 × 10(6); pterin-6-carboxylate, 4.0 × 10(6); pterin-7-carboxylate, 3.7 × 10(6); pterin, 3.3 × 10(6); hydroxymethylpterin, 1.2 × 10(6); biopterin, 1.0 × 10(6); d-(+)-neopterin, 3.1 × 10(5); isoxanthopterin, 2.8 × 10(5); sepiapterin, 1.3 × 10(5); folate, 1.3 × 10(5), xanthopterin, 1.17 × 10(5); and 7,8-dihydrohydroxymethylpterin, 3.3 × 10(4). While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco
- Department of Pharmaceutical Chemistry, University of California, San Francisco
- California Institute for Quantitative Biosciences, University of California, San Francisco
| | - Daniel S. Hitchcock
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas 77843
| | - Ronald D. Seidel
- Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461
| | - Brandan Hillerich
- Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461
| | - Henry Lin
- Department of Pharmaceutical Chemistry, University of California, San Francisco
| | - Steven C. Almo
- Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco
- Department of Pharmaceutical Chemistry, University of California, San Francisco
- California Institute for Quantitative Biosciences, University of California, San Francisco
| | - Brian K. Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco
| | - Frank M. Raushel
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas 77843
- Department of Chemistry, Texas A&M University, College Station, Texas 77843
| |
Collapse
|
27
|
Silva LL, Marcet-Houben M, Nahum LA, Zerlotini A, Gabaldón T, Oliveira G. The Schistosoma mansoni phylome: using evolutionary genomics to gain insight into a parasite's biology. BMC Genomics 2012; 13:617. [PMID: 23148687 PMCID: PMC3534613 DOI: 10.1186/1471-2164-13-617] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 10/22/2012] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Schistosoma mansoni is one of the causative agents of schistosomiasis, a neglected tropical disease that affects about 237 million people worldwide. Despite recent efforts, we still lack a general understanding of the relevant host-parasite interactions, and the possible treatments are limited by the emergence of resistant strains and the absence of a vaccine. The S. mansoni genome was completely sequenced and still under continuous annotation. Nevertheless, more than 45% of the encoded proteins remain without experimental characterization or even functional prediction. To improve our knowledge regarding the biology of this parasite, we conducted a proteome-wide evolutionary analysis to provide a broad view of the S. mansoni's proteome evolution and to improve its functional annotation. RESULTS Using a phylogenomic approach, we reconstructed the S. mansoni phylome, which comprises the evolutionary histories of all parasite proteins and their homologs across 12 other organisms. The analysis of a total of 7,964 phylogenies allowed a deeper understanding of genomic complexity and evolutionary adaptations to a parasitic lifestyle. In particular, the identification of lineage-specific gene duplications pointed to the diversification of several protein families that are relevant for host-parasite interaction, including proteases, tetraspanins, fucosyltransferases, venom allergen-like proteins, and tegumental-allergen-like proteins. In addition to the evolutionary knowledge, the phylome data enabled us to automatically re-annotate 3,451 proteins through a phylogenetic-based approach rather than solely sequence similarity searches. To allow further exploitation of this valuable data, all information has been made available at PhylomeDB (http://www.phylomedb.org). CONCLUSIONS In this study, we used an evolutionary approach to assess S. mansoni parasite biology, improve genome/proteome functional annotation, and provide insights into host-parasite interactions. Taking advantage of a proteome-wide perspective rather than focusing on individual proteins, we identified that this parasite has experienced specific gene duplication events, particularly affecting genes that are potentially related to the parasitic lifestyle. These innovations may be related to the mechanisms that protect S. mansoni against host immune responses being important adaptations for the parasite survival in a potentially hostile environment. Continuing this work, a comparative analysis involving genomic, transcriptomic, and proteomic data from other helminth parasites, other parasites, and vectors will supply more information regarding parasite's biology as well as host-parasite interactions.
Collapse
Affiliation(s)
- Larissa Lopes Silva
- Grupo de Genômica e Biologia Computacional, Centro de Pesquisas René Rachou. Instituto Nacional de Ciência e Tecnologia em Doenças Tropicais. Fundação Oswaldo Cruz - FIOCRUZ, Belo Horizonte, MG, 30190-002, Brazil
- Centro de Excelência em Bioinformática, Fundação Oswaldo Cruz – FIOCRUZ, Belo Horizonte, MG, Brazil
- Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais – UFMG, Belo Horizonte, MG, Brazil
| | - Marina Marcet-Houben
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain
| | - Laila Alves Nahum
- Grupo de Genômica e Biologia Computacional, Centro de Pesquisas René Rachou. Instituto Nacional de Ciência e Tecnologia em Doenças Tropicais. Fundação Oswaldo Cruz - FIOCRUZ, Belo Horizonte, MG, 30190-002, Brazil
- Centro de Excelência em Bioinformática, Fundação Oswaldo Cruz – FIOCRUZ, Belo Horizonte, MG, Brazil
- Faculdade Infórium de Tecnologia, Belo Horizonte, MG, 30130-180, Brazil
| | - Adhemar Zerlotini
- Centro de Excelência em Bioinformática, Fundação Oswaldo Cruz – FIOCRUZ, Belo Horizonte, MG, Brazil
- Laboratório Multiusuário de Bioinformática, Embrapa Informática Agropecuária, Campinas, São Paulo, Brazil
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain
| | - Guilherme Oliveira
- Grupo de Genômica e Biologia Computacional, Centro de Pesquisas René Rachou. Instituto Nacional de Ciência e Tecnologia em Doenças Tropicais. Fundação Oswaldo Cruz - FIOCRUZ, Belo Horizonte, MG, 30190-002, Brazil
- Centro de Excelência em Bioinformática, Fundação Oswaldo Cruz – FIOCRUZ, Belo Horizonte, MG, Brazil
| |
Collapse
|
28
|
Ashkenazi S, Snir R, Ofran Y. Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins. ACTA ACUST UNITED AC 2012; 28:3203-10. [PMID: 23080118 DOI: 10.1093/bioinformatics/bts608] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Assessing the false positive rate of function prediction methods is difficult, as it is hard to establish that a protein does not have a certain function. To determine to what extent proteins with similar sequences have a common function, we focused on photosynthesis-related proteins. A protein that comes from a non-photosynthetic organism is, undoubtedly, not involved in photosynthesis. RESULTS We show that function diverges very rapidly: 70% of the close homologs of photosynthetic proteins come from non-photosynthetic organisms. Therefore, high sequence similarity, in most cases, is not tantamount to similar function. However, we found that many functionally similar proteins often share short sequence elements, which may correspond to a functional site and could reveal functional similarities more accurately than sequence similarity. CONCLUSIONS These results shed light on the way biological function is conserved in evolution and may help improve large-scale analysis of protein function.
Collapse
Affiliation(s)
- Shaul Ashkenazi
- The Goodman faculty of life sciences, Bar Ilan University, Ramat Gan 52900, Israel
| | | | | |
Collapse
|
29
|
Klie S, Mutwil M, Persson S, Nikoloski Z. Inferring gene functions through dissection of relevance networks: interleaving the intra- and inter-species views. MOLECULAR BIOSYSTEMS 2012; 8:2233-41. [PMID: 22744313 DOI: 10.1039/c2mb25089f] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Inference of accurate gene annotations requires integration of existing biological knowledge, structured in a form of ontology, with data from transcriptomics high-throughput technologies. This undertaking requires developing algorithms that integrate genome-scale data, even for model organisms. Gene relevance networks have emerged as a powerful representative of the structure of the data. Such networks can be used for intra-species transfer of gene annotations following the guilt-by-association principle. An analogous principle can serve as a basis for inter-species transfer of gene annotations by comparing well-defined subnetworks. In this review, we compare and contrast the concepts of relevance and proximity networks and briefly review the concept of semantic similarity. We then provide a detailed account of quantitative guilt-by-association inference in the setting of genome-scale relevance networks. Moreover, we systematically survey the existing network-based approaches for automated gene function annotation and categorize them under one umbrella in terms of employed methodology. Furthermore, we discuss suitable data selection strategies required for deriving meaningful and unbiased genome-scale networks from large transcriptomics compendia. Lastly, by simulating gene function prediction with a classical network-based algorithm, we show how the number of genes of unknown function influences prediction within a species and pinpoint the need and the requirements for inter-species knowledge transfer.
Collapse
Affiliation(s)
- Sebastian Klie
- Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | | | | | | |
Collapse
|
30
|
Mendes V, Maranha A, Alarico S, da Costa MS, Empadinhas N. Mycobacterium tuberculosis Rv2419c, the missing glucosyl-3-phosphoglycerate phosphatase for the second step in methylglucose lipopolysaccharide biosynthesis. Sci Rep 2011; 1:177. [PMID: 22355692 PMCID: PMC3240985 DOI: 10.1038/srep00177] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 11/15/2011] [Indexed: 11/13/2022] Open
Abstract
Mycobacteria synthesize intracellular methylglucose lipopolysaccharides (MGLP) proposed to regulate fatty acid synthesis. Although their structures have been elucidated, the identity of most biosynthetic genes remains unknown. The first step in MGLP biosynthesis is catalyzed by a glucosyl-3-phosphoglycerate synthase (GpgS, Rv1208 in Mycobacterium tuberculosis H37Rv). However, a typical glucosyl-3-phosphoglycerate phosphatase (GpgP, EC3.1.3.70) for dephosphorylation of glucosyl-3-phosphoglycerate to glucosylglycerate, was absent from mycobacterial genomes. We purified the native GpgP from Mycobacterium vanbaalenii and identified the corresponding gene deduced from amino acid sequences by mass spectrometry. The M. tuberculosis ortholog (Rv2419c), annotated as a putative phosphoglycerate mutase (PGM, EC5.4.2.1), was expressed and functionally characterized as a new GpgP. Regardless of the high specificity for glucosyl-3-phosphoglycerate, the mycobacterial GpgP is not a sequence homolog of known isofunctional GpgPs. The assignment of a new function in M. tuberculosis genome expands our understanding of this organism's genetic repertoire and of the early events in MGLP biosynthesis.
Collapse
Affiliation(s)
- Vítor Mendes
- CNC-Center for Neuroscience and Cell Biology, University of Coimbra, 3004-517 Coimbra, Portugal
| | | | | | | | | |
Collapse
|
31
|
Rodrigues TDS, Cardoso FC, Teixeira SMR, Oliveira SC, Braga AP. Protein classification with Extended-Sequence Coding by sliding window. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1721-1726. [PMID: 21519118 DOI: 10.1109/tcbb.2011.78] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A large number of unclassified sequences is still found in public databases, which suggests that there is still need for new investigations in the area. In this contribution, we present a methodology based on Artificial Neural Networks for protein functional classification. A new protein coding scheme, called here Extended-Sequence Coding by Sliding Windows, is presented with the goal of overcoming some of the difficulties of the well method Sequence Coding by Sliding Window. The new protein coding scheme uses more than one sliding window length with a weight factor that is proportional to the window length, avoiding the ambiguity problem without ignoring the identity of small subsequences Accuracy for Sequence Coding by Sliding Windows ranged from 60.1 to 77.7 percent for the first bacterium protein set and from 61.9 to 76.7 percent for the second one, whereas the accuracy for the proposed Extended-Sequence Coding by Sliding Windows scheme ranged from 70.7 to 97.1 percent for the first bacterium protein set and from 61.1 to 93.3 percent for the second one. Additionally, protein sequences classified inconsistently by the Artificial Neural Networks were analyzed by CD-Search revealing that there are some disagreement in public repositories, calling the attention for the relevant issue of error propagation in annotated databases due the incorrect transferred annotations.
Collapse
Affiliation(s)
- Thiago de Souza Rodrigues
- Computer Department, Federal Center of Technological Education of Minas Gerais, Av. Amazonas 5253, Nova Suiça, Belo Horizonte 30421-169, MG, Brazil
| | | | | | | | | |
Collapse
|
32
|
Tang GW, Altman RB. Remote thioredoxin recognition using evolutionary conservation and structural dynamics. Structure 2011; 19:461-70. [PMID: 21481770 DOI: 10.1016/j.str.2011.02.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2010] [Revised: 02/06/2011] [Accepted: 02/16/2011] [Indexed: 12/25/2022]
Abstract
The thioredoxin family of oxidoreductases plays an important role in redox signaling and control of protein function. Not only are thioredoxins linked to a variety of disorders, but their stable structure has also seen application in protein engineering. Both sequence-based and structure-based tools exist for thioredoxin identification, but remote homolog detection remains a challenge. We developed a thioredoxin predictor using the approach of integrating sequence with structural information. We combined a sequence-based Hidden Markov Model (HMM) with a molecular dynamics enhanced structure-based recognition method (dynamic FEATURE, DF). This hybrid method (HMMDF) has high precision and recall (0.90 and 0.95, respectively) compared with HMM (0.92 and 0.87, respectively) and DF (0.82 and 0.97, respectively). Dynamic FEATURE is sensitive but struggles to resolve closely related protein families, while HMM identifies these evolutionary differences by compromising sensitivity. Our method applied to structural genomics targets makes a strong prediction of a novel thioredoxin.
Collapse
Affiliation(s)
- Grace W Tang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | |
Collapse
|
33
|
Searching the protein structure database for ligand-binding site similarities using CPASS v.2. BMC Res Notes 2011; 4:17. [PMID: 21269480 PMCID: PMC3057182 DOI: 10.1186/1756-0500-4-17] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 01/26/2011] [Indexed: 11/17/2022] Open
Abstract
Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.
Collapse
|
34
|
Stark JL, Powers R. Application of NMR and molecular docking in structure-based drug discovery. Top Curr Chem (Cham) 2011; 326:1-34. [PMID: 21915777 DOI: 10.1007/128_2011_213] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Drug discovery is a complex and costly endeavor, where few drugs that reach the clinical testing phase make it to market. High-throughput screening (HTS) is the primary method used by the pharmaceutical industry to identify initial lead compounds. Unfortunately, HTS has a high failure rate and is not particularly efficient at identifying viable drug leads. These shortcomings have encouraged the development of alternative methods to drive the drug discovery process. Specifically, nuclear magnetic resonance (NMR) spectroscopy and molecular docking are routinely being employed as important components of drug discovery research. Molecular docking provides an extremely rapid way to evaluate likely binders from a large chemical library with minimal cost. NMR ligand-affinity screens can directly detect a protein-ligand interaction, can measure a corresponding dissociation constant, and can reliably identify the ligand binding site and generate a co-structure. Furthermore, NMR ligand affinity screens and molecular docking are perfectly complementary techniques, where the combination of the two has the potential to improve the efficiency and success rate of drug discovery. This review will highlight the use of NMR ligand affinity screens and molecular docking in drug discovery and describe recent examples where the two techniques were combined to identify new and effective therapeutic drugs.
Collapse
Affiliation(s)
- Jaime L Stark
- Department of Chemistry, University of Nebraska, Lincoln, NE 68588-0304, USA
| | | |
Collapse
|
35
|
Schröder A, Eichner J, Supper J, Eichner J, Wanke D, Henneges C, Zell A. Predicting DNA-binding specificities of eukaryotic transcription factors. PLoS One 2010; 5:e13876. [PMID: 21152420 PMCID: PMC2994704 DOI: 10.1371/journal.pone.0013876] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Accepted: 10/14/2010] [Indexed: 11/18/2022] Open
Abstract
Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy.
Collapse
Affiliation(s)
- Adrian Schröder
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
36
|
Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010; 11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open
Abstract
Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
Collapse
Affiliation(s)
- Ratna R Thangudu
- National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | |
Collapse
|
37
|
Pérez AJ, Rodríguez A, Trelles O, Thode G. A computational strategy for protein function assignment which addresses the multidomain problem. Comp Funct Genomics 2010; 3:423-40. [PMID: 18629055 PMCID: PMC2447339 DOI: 10.1002/cfg.208] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2002] [Accepted: 08/12/2002] [Indexed: 11/25/2022] Open
Abstract
A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented.
This approach is based on keyword (KW) and feature (FT) information stored in
the SWISS-PROT database. The former refers to particular protein characteristics
and the latter locates these characteristics at a specific sequence position. In this way,
a certain keyword is only assigned to a sequence if sequence similarity is found in
the position described by the FT field. Exhaustive tests performed over sequences
with homologues (cluster set) and without homologues (singleton set) in the database
show that assigning functions is much ’cleaner’ when information about domains (FT
field) is used, than when only the keywords are used.
Collapse
Affiliation(s)
- A J Pérez
- Genetics Department, University of Málaga, Málaga 29071, Spain.
| | | | | | | |
Collapse
|
38
|
Amthauer HA, Tsatsoulis C. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning. BMC Genomics 2010; 11:340. [PMID: 20509921 PMCID: PMC2890565 DOI: 10.1186/1471-2164-11-340] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Accepted: 05/28/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. RESULTS We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. CONCLUSIONS We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.
Collapse
Affiliation(s)
- Heather A Amthauer
- Department of Computer Science, Frostburg State University, Frostburg, Maryland, USA.
| | | |
Collapse
|
39
|
Silla CN, Freitas AA. A survey of hierarchical classification across different application domains. Data Min Knowl Discov 2010. [DOI: 10.1007/s10618-010-0175-9] [Citation(s) in RCA: 200] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
40
|
Freitas AA, Wieser DC, Apweiler R. On the importance of comprehensible classification models for protein function prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:172-182. [PMID: 20150679 DOI: 10.1109/tcbb.2008.47] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge, which can lead to new insights and hypotheses that are biologically meaningful and advance the understanding of protein functions by biologists. The overall goal of this paper is to critically evaluate this approach, offering a refreshing new perspective on this issue, focusing not only on predictive accuracy but also on the comprehensibility of the induced protein function prediction models. More specifically, this paper aims to offer two main contributions to the area of protein function prediction. First, it presents the case for discovering comprehensible protein function prediction models from data, discussing in detail the advantages of such models, namely, increasing the confidence of the biologist in the system's predictions, leading to new insights about the data and the formulation of new biological hypotheses, and detecting errors in the data. Second, it presents a critical review of the pros and cons of several different knowledge representations that can be used in order to support the discovery of comprehensible protein function prediction models.
Collapse
Affiliation(s)
- Alex A Freitas
- Computing Laboratory, University of Kent, Canterbury, UK.
| | | | | |
Collapse
|
41
|
Erdin S, Ward RM, Venner E, Lichtarge O. Evolutionary trace annotation of protein function in the structural proteome. J Mol Biol 2009; 396:1451-73. [PMID: 20036248 DOI: 10.1016/j.jmb.2009.12.037] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 12/05/2009] [Accepted: 12/18/2009] [Indexed: 11/16/2022]
Abstract
By design, structural genomics (SG) solves many structures that cannot be assigned function based on homology to known proteins. Alternative function annotation methods are therefore needed and this study focuses on function prediction with three-dimensional (3D) templates: small structural motifs built of just a few functionally critical residues. Although experimentally proven functional residues are scarce, we show here that Evolutionary Trace (ET) rankings of residue importance are sufficient to build 3D templates, match them, and then assign Gene Ontology (GO) functions in enzymes and non-enzymes alike. In a high-specificity mode, this Evolutionary Trace Annotation (ETA) method covered half (53%) of the 2384 annotated SG protein controls. Three-quarters (76%) of predictions were both correct and complete. The positive predictive value for all GO depths (all-depth PPV) was 84%, and it rose to 94% over GO depths 1-3 (depth 3 PPV). In a high-sensitivity mode, coverage rose significantly (84%), while accuracy fell moderately: 68% of predictions were both correct and complete, all-depth PPV was 75%, and depth 3 PPV was 86%. These data concur with prior mutational experiments showing that ET rank information identifies key functional determinants in proteins. In practice, ETA predicted functions in 42% of 3461 unannotated SG proteins. In 529 cases--including 280 non-enzymes and 21 for metal ion ligands--the expected accuracy is 84% at any GO depth and 94% down to GO depth 3, while for the remaining 931 the expected accuracies are 60% and 71%, respectively. Thus, local structural comparisons of evolutionarily important residues can help decipher protein functions to known reliability levels and without prior assumption on functional mechanisms. ETA is available at http://mammoth.bcm.tmc.edu/eta.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | |
Collapse
|
42
|
Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR. Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res 2009; 38:D518-24. [PMID: 19843613 PMCID: PMC2808861 DOI: 10.1093/nar/gkp842] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.
Collapse
Affiliation(s)
- Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Xiang DF, Xu C, Kumaran D, Brown AC, Sauder JM, Burley SK, Swaminathan S, Raushel FM. Functional annotation of two new carboxypeptidases from the amidohydrolase superfamily of enzymes. Biochemistry 2009; 48:4567-76. [PMID: 19358546 DOI: 10.1021/bi900453u] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Two proteins from the amidohydrolase superfamily of enzymes were cloned, expressed, and purified to homogeneity. The first protein, Cc0300, was from Caulobacter crescentus CB-15 (Cc0300), while the second one (Sgx9355e) was derived from an environmental DNA sequence originally isolated from the Sargasso Sea ( gi|44371129 ). The catalytic functions and the substrate profiles for the two enzymes were determined with the aid of combinatorial dipeptide libraries. Both enzymes were shown to catalyze the hydrolysis of l-Xaa-l-Xaa dipeptides in which the amino acid at the N-terminus was relatively unimportant. These enzymes were specific for hydrophobic amino acids at the C-terminus. With Cc0300, substrates terminating in isoleucine, leucine, phenylalanine, tyrosine, valine, methionine, and tryptophan were hydrolyzed. The same specificity was observed with Sgx9355e, but this protein was also able to hydrolyze peptides terminating in threonine. Both enzymes were able to hydrolyze N-acetyl and N-formyl derivatives of the hydrophobic amino acids and tripeptides. The best substrates identified for Cc0300 were l-Ala-l-Leu with k(cat) and k(cat)/K(m) values of 37 s(-1) and 1.1 x 10(5) M(-1) s(-1), respectively, and N-formyl-l-Tyr with k(cat) and k(cat)/K(m) values of 33 s(-1) and 3.9 x 10(5) M(-1) s(-1), respectively. The best substrate identified for Sgx9355e was l-Ala-l-Phe with k(cat) and k(cat)/K(m) values of 0.41 s(-1) and 5.8 x 10(3) M(-1) s(-1). The three-dimensional structure of Sgx9355e was determined to a resolution of 2.33 A with l-methionine bound in the active site. The alpha-carboxylate of the methionine is ion-paired to His-237 and also hydrogen bonded to the backbone amide groups of Val-201 and Leu-202. The alpha-amino group of the bound methionine interacts with Asp-328. The structural determinants for substrate recognition were identified and compared with other enzymes in this superfamily that hydrolyze dipeptides with different specificities.
Collapse
Affiliation(s)
- Dao Feng Xiang
- Department of Chemistry, P.O. Box 30012, Texas A&M University, College Station, Texas 77845, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Wang Y, Rekaya R. A comprehensive analysis of gene expression evolution between humans and mice. Evol Bioinform Online 2009; 5:81-90. [PMID: 19812728 PMCID: PMC2747126 DOI: 10.4137/ebo.s2874] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Evolutionary changes in gene expression account for most phenotypic differences between species. Advances in microarray technology have made the systematic study of gene expression evolution possible. In this study, gene expression patterns were compared between human and mouse genomes using two published methods. Specifically, we studied how gene expression evolution was related to GO terms and tried to decode the relationship between promoter evolution and gene expression evolution. The results showed that (1) the significant enrichment of biological processes in orthologs of expression conservation reveals functional significance of gene expression conservation. The more conserved gene expression in some biological processes than is expected in a purely neutral model reveals negative selection on gene expression. However, fast evolving genes mainly support the neutrality of gene expression evolution, and (2) gene expression conservation is positively but only slightly correlated with promoter conservation based on a motif-count score of the promoter alignment. Our results suggest a neutral model with negative selection for gene expression evolution between humans and mice, and promoter evolution could have some effects on gene expression evolution.
Collapse
Affiliation(s)
- Yupeng Wang
- Department of Animal and Dairy Science
- Institute of Bioinformatics
| | - Romdhane Rekaya
- Department of Animal and Dairy Science
- Institute of Bioinformatics
- Department of Statistics, University of Georgia Athens, GA 30602, USA.
| |
Collapse
|
45
|
Arakaki AK, Huang Y, Skolnick J. EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics 2009; 10:107. [PMID: 19361344 PMCID: PMC2670841 DOI: 10.1186/1471-2105-10-107] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Accepted: 04/13/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. RESULTS We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz2, exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz2 and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz2 generates considerably more unique assignments than KEGG. CONCLUSION Performance benchmarks and the comparison with KEGG demonstrate that EFICAz2 is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz2 web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html.
Collapse
Affiliation(s)
- Adrian K Arakaki
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, 30318, USA
| | - Ying Huang
- California Institute for Telecommunications and Information Technology, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, 30318, USA
| |
Collapse
|
46
|
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009; 10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology 250 14th St NW, Atlanta, GA 30318, USA.
| | | |
Collapse
|
47
|
Ward RM, Venner E, Daines B, Murray S, Erdin S, Kristensen DM, Lichtarge O. Evolutionary Trace Annotation Server: automated enzyme function prediction in protein structures using 3D templates. Bioinformatics 2009; 25:1426-7. [PMID: 19307237 PMCID: PMC2682511 DOI: 10.1093/bioinformatics/btp160] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY The Evolutionary Trace Annotation (ETA) Server predicts enzymatic activity. ETA starts with a structure of unknown function, such as those from structural genomics, and with no prior knowledge of its mechanism uses the phylogenetic Evolutionary Trace (ET) method to extract key functional residues and propose a function-associated 3D motif, called a 3D template. ETA then searches previously annotated structures for geometric template matches that suggest molecular and thus functional mimicry. In order to maximize the predictive value of these matches, ETA next applies distinctive specificity filters -- evolutionary similarity, function plurality and match reciprocity. In large scale controls on enzymes, prediction coverage is 43% but the positive predictive value rises to 92%, thus minimizing false annotations. Users may modify any search parameter, including the template. ETA thus expands the ET suite for protein structure annotation, and can contribute to the annotation efforts of metaservers. AVAILABILITY The ETA Server is a web application available at (http://mammoth.bcm.tmc.edu/eta/).
Collapse
Affiliation(s)
- R Matthew Ward
- Department of Molecular and Human Genetics, Program in Structural and Computational Biology and Molecular, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | | | | | | | | | | | | |
Collapse
|
48
|
Li Z, Luo RT, Mi S, Sun M, Chen P, Bao J, Neilly MB, Jayathilaka N, Johnson DS, Wang L, Lavau C, Zhang Y, Tseng C, Zhang X, Wang J, Yu J, Yang H, Wang SM, Rowley JD, Chen J, Thirman MJ. Consistent deregulation of gene expression between human and murine MLL rearrangement leukemias. Cancer Res 2009; 69:1109-16. [PMID: 19155294 DOI: 10.1158/0008-5472.can-08-3381] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Important biological and pathologic properties are often conserved across species. Although several mouse leukemia models have been well established, the genes deregulated in both human and murine leukemia cells have not been studied systematically. We performed a serial analysis of gene expression in both human and murine MLL-ELL or MLL-ENL leukemia cells and identified 88 genes that seemed to be significantly deregulated in both types of leukemia cells, including 57 genes not reported previously as being deregulated in MLL-associated leukemias. These changes were validated by quantitative PCR. The most up-regulated genes include several HOX genes (e.g., HOX A5, HOXA9, and HOXA10) and MEIS1, which are the typical hallmark of MLL rearrangement leukemia. The most down-regulated genes include LTF, LCN2, MMP9, S100A8, S100A9, PADI4, TGFBI, and CYBB. Notably, the up-regulated genes are enriched in gene ontology terms, such as gene expression and transcription, whereas the down-regulated genes are enriched in signal transduction and apoptosis. We showed that the CpG islands of the down-regulated genes are hypermethylated. We also showed that seven individual microRNAs (miRNA) from the mir-17-92 cluster, which are overexpressed in human MLL rearrangement leukemias, are also consistently overexpressed in mouse MLL rearrangement leukemia cells. Nineteen possible targets of these miRNAs were identified, and two of them (i.e., APP and RASSF2) were confirmed further by luciferase reporter and mutagenesis assays. The identification and validation of consistent changes of gene expression in human and murine MLL rearrangement leukemias provide important insights into the genetic base for MLL-associated leukemogenesis.
Collapse
Affiliation(s)
- Zejuan Li
- Department of Medicine, University of Chicago, Chicago, Illinois, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Lelandais G, Tanty V, Geneix C, Etchebest C, Jacq C, Devaux F. Genome adaptation to chemical stress: clues from comparative transcriptomics in Saccharomyces cerevisiae and Candida glabrata. Genome Biol 2008; 9:R164. [PMID: 19025642 PMCID: PMC2614496 DOI: 10.1186/gb-2008-9-11-r164] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Accepted: 11/24/2008] [Indexed: 12/21/2022] Open
Abstract
Comparative transcriptomics of Saccharomyces cerevisiae and Candida glabrata revealed a remarkable conservation of response to drug-induced stress, despite underlying differences in the regulatory networks. Background Recent technical and methodological advances have placed microbial models at the forefront of evolutionary and environmental genomics. To better understand the logic of genetic network evolution, we combined comparative transcriptomics, a differential clustering algorithm and promoter analyses in a study of the evolution of transcriptional networks responding to an antifungal agent in two yeast species: the free-living model organism Saccharomyces cerevisiae and the human pathogen Candida glabrata. Results We found that although the gene expression patterns characterizing the response to drugs were remarkably conserved between the two species, part of the underlying regulatory networks differed. In particular, the roles of the oxidative stress response transcription factors ScYap1p (in S. cerevisiae) and Cgap1p (in C. glabrata) had diverged. The sets of genes whose benomyl response depends on these factors are significantly different. Also, the DNA motifs targeted by ScYap1p and Cgap1p are differently represented in the promoters of these genes, suggesting that the DNA binding properties of the two proteins are slightly different. Experimental assays of ScYap1p and Cgap1p activities in vivo were in accordance with this last observation. Conclusions Based on these results and recently published data, we suggest that the robustness of environmental stress responses among related species contrasts with the rapid evolution of regulatory sequences, and depends on both the coevolution of transcription factor binding properties and the versatility of regulatory associations within transcriptional networks.
Collapse
Affiliation(s)
- Gaëlle Lelandais
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM UMR S726, Université Paris 7, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France.
| | | | | | | | | | | |
Collapse
|
50
|
Lacroix V, Cottret L, Thébault P, Sagot MF. An introduction to metabolic networks and their structural analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:594-617. [PMID: 18989046 DOI: 10.1109/tcbb.2008.79] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
There has been a renewed interest for metabolism in the computational biology community, leading to an avalanche of papers coming from methodological network analysis as well as experimental and theoretical biology. This paper is meant to serve as an initial guide for both the biologists interested in formal approaches and the mathematicians or computer scientists wishing to inject more realism into their models. The paper is focused on the structural aspects of metabolism only. The literature is vast enough already, and the thread through it difficult to follow even for the more experienced worker in the field. We explain methods for acquiring data and reconstructing metabolic networks, and review the various models that have been used for their structural analysis. Several concepts such as modularity are introduced, as are the controversies that have beset the field these past few years, for instance, on whether metabolic networks are small-world or scale-free, and on which model better explains the evolution of metabolism. Clarifying the work that has been done also helps in identifying open questions and in proposing relevant future directions in the field, which we do along the paper and in the conclusion.
Collapse
Affiliation(s)
- Vincent Lacroix
- Genome Bioinformatics Research Group, Centre de Regulacio Genomica (CRG), PRBB, Aiguader 88, 08003 Barcelona, Spain.
| | | | | | | |
Collapse
|