1
|
Kohl MP, Chane-Woon-Ming B, Bahena-Ceron R, Jaramillo-Ponce J, Antoine L, Herrgott L, Romby P, Marzi S. Ribosome Profiling Methods Adapted to the Study of RNA-Dependent Translation Regulation in Staphylococcus aureus. Methods Mol Biol 2024; 2741:73-100. [PMID: 38217649 DOI: 10.1007/978-1-0716-3565-0_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2024]
Abstract
Noncoding RNAs, including regulatory RNAs (sRNAs), are instrumental in regulating gene expression in pathogenic bacteria, allowing them to adapt to various stresses encountered in their host environments. Staphylococcus aureus is a well-studied model for RNA-mediated regulation of virulence and pathogenicity, with sRNAs playing significant roles in shaping S. aureus interactions with human and animal hosts. By modulating the translation and/or stability of target mRNAs, sRNAs regulate the synthesis of virulence factors and regulatory proteins required for pathogenesis. Moreover, perturbation of the levels of RNA modifications in two other classes of noncoding RNAs, rRNAs, and tRNAs, has been proposed to contribute to stress adaptation. However, the study of how these various factors affect translation regulation has often been restricted to specific genes, using in vivo reporters and/or in vitro translation systems. Genome-wide sequencing approaches offer novel perspectives for studying RNA-dependent regulation. In particular, ribosome profiling methods provide a powerful resource for characterizing the overall landscape of translational regulation, contributing to a better understanding of S. aureus physiopathology. Here, we describe protocols that we have adapted to perform ribosome profiling in S. aureus.
Collapse
Affiliation(s)
- Maximilian P Kohl
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | | | - Roberto Bahena-Ceron
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | - Jose Jaramillo-Ponce
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | - Laura Antoine
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | - Lucas Herrgott
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | - Pascale Romby
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France
| | - Stefano Marzi
- Architecture et Réactivité de l'ARN, CNRS 9002, Université de Strasbourg, Strasbourg, France.
| |
Collapse
|
2
|
Quan L, Chu X, Sun X, Wu T, Lyu Q. How Deepbics Quantifies Intensities of Transcription Factor-DNA Binding and Facilitates Prediction of Single Nucleotide Variant Pathogenicity With a Deep Learning Model Trained On ChIP-Seq Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1594-1599. [PMID: 35471887 DOI: 10.1109/tcbb.2022.3170343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The binding of DNA sequences to cell type-specific transcription factors is essential for regulating gene expression in all organisms. Many variants occurring in these binding regions play crucial roles in human disease by disrupting the cis-regulation of gene expression. We first implemented a sequence-based deep learning model called deepBICS to quantify the intensity of transcription factors-DNA binding. The experimental results not only showed the superiority of deepBICS on ChIP-seq data sets but also suggested deepBICS as a language model could help the classification of disease-related and neutral variants. We then built a language model-based method called deepBICS4SNV to predict the pathogenicity of single nucleotide variants. The good performance of deepBICS4SNV on 2 tests related to Mendelian disorders and viral diseases shows the sequence contextual information derived from language models can improve prediction accuracy and generalization capability.
Collapse
|
3
|
Heterologous Expression of Recombinant Human Cytochrome P450 (CYP) in Escherichia coli: N-Terminal Modification, Expression, Isolation, Purification, and Reconstitution. BIOTECH 2023; 12:biotech12010017. [PMID: 36810444 PMCID: PMC9944785 DOI: 10.3390/biotech12010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 02/10/2023] Open
Abstract
Cytochrome P450 (CYP) enzymes play important roles in metabolising endogenous and xenobiotic substances. Characterisations of human CYP proteins have been advanced with the rapid development of molecular technology that allows heterologous expression of human CYPs. Among several hosts, bacteria systems such as Escherichia coli (E. coli) have been widely used thanks to their ease of use, high level of protein yields, and affordable maintenance costs. However, the levels of expression in E. coli reported in the literature sometimes differ significantly. This paper aims to review several contributing factors, including N-terminal modifications, co-expression with a chaperon, selections of vectors and E. coli strains, bacteria culture and protein expression conditions, bacteria membrane preparations, CYP protein solubilizations, CYP protein purifications, and reconstitution of CYP catalytic systems. The common factors that would most likely lead to high expression of CYPs were identified and summarised. Nevertheless, each factor may still require careful evaluation for individual CYP isoforms to achieve a maximal expression level and catalytic activity. Recombinant E. coli systems have been evidenced as a useful tool in obtaining the ideal level of human CYP proteins, which ultimately allows for subsequent characterisations of structures and functions.
Collapse
|
4
|
Quan L, Sun X, Wu J, Mei J, Huang L, He R, Nie L, Chen Y, Lyu Q. Learning Useful Representations of DNA Sequences From ChIP-Seq Datasets for Exploring Transcription Factor Binding Specificities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:998-1008. [PMID: 32976105 DOI: 10.1109/tcbb.2020.3026787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs)are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq)to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell)to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.
Collapse
|
5
|
Gemayel K, Lomsadze A, Borodovsky M. StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes. FRONTIERS IN BIOINFORMATICS 2021; 1:704157. [PMID: 36303749 PMCID: PMC9581028 DOI: 10.3389/fbinf.2021.704157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 11/04/2021] [Indexed: 12/31/2022] Open
Abstract
State-of-the-art algorithms of ab initio gene prediction for prokaryotic genomes were shown to be sufficiently accurate. A pair of algorithms would agree on predictions of gene 3'ends. Nonetheless, predictions of gene starts would not match for 15-25% of genes in a genome. This discrepancy is a serious issue that is difficult to be resolved due to the absence of sufficiently large sets of genes with experimentally verified starts. We have introduced StartLink that infers gene starts from conservation patterns revealed by multiple alignments of homologous nucleotide sequences. We also have introduced StartLink+ combining both ab initio and alignment-based methods. The ability of StartLink to predict the start of a given gene is restricted by the availability of homologs in a database. We observed that StartLink made predictions for 85% of genes per genome on average. The StartLink+ accuracy was shown to be 98-99% on the sets of genes with experimentally verified starts. In comparison with database annotations, we observed that the annotated gene starts deviated from the StartLink+ predictions for ∼5% of genes in AT-rich genomes and for 10-15% of genes in GC-rich genomes on average. The use of StartLink+ has a potential to significantly improve gene start annotation in genomic databases.
Collapse
Affiliation(s)
- Karl Gemayel
- School of Computational Science and Engineering, Georgia Tech, Atlanta, GA, United States
| | - Alexandre Lomsadze
- Wallace H Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, United States
| | - Mark Borodovsky
- School of Computational Science and Engineering, Georgia Tech, Atlanta, GA, United States
- Wallace H Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow, Russia
| |
Collapse
|
6
|
Sudo N, Lee K, Sekine Y, Ohnishi M, Iyoda S. RNA-binding protein Hfq downregulates locus of enterocyte effacement-encoded regulators independent of small regulatory RNA in enterohemorrhagic Escherichia coli. Mol Microbiol 2021; 117:86-101. [PMID: 34411346 DOI: 10.1111/mmi.14799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 11/25/2022]
Abstract
Enterohemorrhagic Escherichia coli (EHEC) causes severe human diseases worldwide. The type 3 secretion system and effector proteins are essential for EHEC infection, and are encoded by the locus of enterocyte effacement (LEE). RNA-binding protein Hfq is essential for small regulatory RNA (sRNA)-mediated regulation at a posttranscriptional level and full virulence of many pathogenic bacteria. Although two early studies indicated that Hfq represses LEE expression by posttranscriptionally controlling the expression of genes grlRA and/or ler, both of which encode LEE regulators mediating a positive regulatory loop, the detailed molecular mechanism and biological significance remain unclear. Herein, we show that LEE overexpression was caused by defective RNA-binding activity of the Hfq distal face, which posttranscriptionally represses grlA and ler expression. In vitro analyses revealed that the Hfq distal face directly binds near the translational initiation site of grlA and ler mRNAs, and inhibits their translation. Taken together, we conclude that Hfq inhibits grlA and ler translation by binding their mRNAs through the distal face in an sRNA-independent manner. Additionally, we show that Hfq-mediated repression of LEE is critical for normal EHEC growth because all suppressor mutations that restored the growth defect in the hfq mutant abolished hfq deletion-induced overexpression of LEE.
Collapse
Affiliation(s)
- Naoki Sudo
- Department of Bacteriology I, National Institute of Infectious Diseases, Tokyo, Japan
| | - Kenichi Lee
- Department of Bacteriology I, National Institute of Infectious Diseases, Tokyo, Japan
| | - Yasuhiko Sekine
- Department of Life Science, College of Science, Rikkyo University, Tokyo, Japan
| | - Makoto Ohnishi
- Department of Bacteriology I, National Institute of Infectious Diseases, Tokyo, Japan
| | - Sunao Iyoda
- Department of Bacteriology I, National Institute of Infectious Diseases, Tokyo, Japan
| |
Collapse
|
7
|
Droste J, Rückert C, Kalinowski J, Hamed MB, Anné J, Simoens K, Bernaerts K, Economou A, Busche T. Extensive Reannotation of the Genome of the Model Streptomycete Streptomyces lividans TK24 Based on Transcriptome and Proteome Information. Front Microbiol 2021; 12:604034. [PMID: 33935985 PMCID: PMC8079986 DOI: 10.3389/fmicb.2021.604034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 03/12/2021] [Indexed: 01/04/2023] Open
Abstract
Streptomyces lividans TK24 is a relevant Gram-positive soil inhabiting bacterium and one of the model organisms of the genus Streptomyces. It is known for its potential to produce secondary metabolites, antibiotics, and other industrially relevant products. S. lividans TK24 is the plasmid-free derivative of S. lividans 66 and a close genetic relative of the strain Streptomyces coelicolor A3(2). In this study, we used transcriptome and proteome data to improve the annotation of the S. lividans TK24 genome. The RNA-seq data of primary 5'-ends of transcripts were used to determine transcription start sites (TSS) in the genome. We identified 5,424 TSS, of which 4,664 were assigned to annotated CDS and ncRNAs, 687 to antisense transcripts distributed between 606 CDS and their UTRs, 67 to tRNAs, and 108 to novel transcripts and CDS. Using the TSS data, the promoter regions and their motifs were analyzed in detail, revealing a conserved -10 (TAnnnT) and a weakly conserved -35 region (nTGACn). The analysis of the 5' untranslated region (UTRs) of S. lividans TK24 revealed 17% leaderless transcripts. Several cis-regulatory elements, like riboswitches or attenuator structures could be detected in the 5'-UTRs. The S. lividans TK24 transcriptome contains at least 929 operons. The genome harbors 27 secondary metabolite gene clusters of which 26 could be shown to be transcribed under at least one of the applied conditions. Comparison of the reannotated genome with that of the strain Streptomyces coelicolor A3(2) revealed a high degree of similarity. This study presents an extensive reannotation of the S. lividans TK24 genome based on transcriptome and proteome analyses. The analysis of TSS data revealed insights into the promoter structure, 5'-UTRs, cis-regulatory elements, attenuator structures and novel transcripts, like small RNAs. Finally, the repertoire of secondary metabolite gene clusters was examined. These data provide a basis for future studies regarding gene characterization, transcriptional regulatory networks, and usage as a secondary metabolite producing strain.
Collapse
Affiliation(s)
- Julian Droste
- Microbial Genomics and Biotechnology, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Christian Rückert
- Microbial Genomics and Biotechnology, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Jörn Kalinowski
- Microbial Genomics and Biotechnology, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Mohamed Belal Hamed
- Laboratory of Molecular Bacteriology, Department of Microbiology and Immunology, KU Leuven, Rega Institute, Leuven, Belgium.,Molecular Biology Department, National Research Centre, Dokii, Egypt
| | - Jozef Anné
- Laboratory of Molecular Bacteriology, Department of Microbiology and Immunology, KU Leuven, Rega Institute, Leuven, Belgium
| | - Kenneth Simoens
- Bio- and Chemical Systems Technology, Reactor Engineering, and Safety (CREaS) Section, Department of Chemical Engineering, KU Leuven, Leuven, Belgium
| | - Kristel Bernaerts
- Bio- and Chemical Systems Technology, Reactor Engineering, and Safety (CREaS) Section, Department of Chemical Engineering, KU Leuven, Leuven, Belgium
| | - Anastassios Economou
- Laboratory of Molecular Bacteriology, Department of Microbiology and Immunology, KU Leuven, Rega Institute, Leuven, Belgium
| | - Tobias Busche
- Microbial Genomics and Biotechnology, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
8
|
Eggeling R. Disentangling transcription factor binding site complexity. Nucleic Acids Res 2019; 46:e121. [PMID: 30085218 PMCID: PMC6237759 DOI: 10.1093/nar/gky683] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 07/17/2018] [Indexed: 12/15/2022] Open
Abstract
The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.
Collapse
Affiliation(s)
- Ralf Eggeling
- Department of Computer Science, University of Helsinki, Gustaf-Hällströmin katu 2b, FIN-00140 Helsinki, Finland
| |
Collapse
|
9
|
Lai X, Stigliani A, Vachon G, Carles C, Smaczniak C, Zubieta C, Kaufmann K, Parcy F. Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants. MOLECULAR PLANT 2019; 12:743-763. [PMID: 30447332 DOI: 10.1016/j.molp.2018.10.010] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/20/2018] [Accepted: 10/30/2018] [Indexed: 06/09/2023]
Abstract
Transcription factors (TFs) are key cellular components that control gene expression. They recognize specific DNA sequences, the TF binding sites (TFBSs), and thus are targeted to specific regions of the genome where they can recruit transcriptional co-factors and/or chromatin regulators to fine-tune spatiotemporal gene regulation. Therefore, the identification of TFBSs in genomic sequences and their subsequent quantitative modeling is of crucial importance for understanding and predicting gene expression. Here, we review how TFBSs can be determined experimentally, how the TFBS models can be constructed in silico, and how they can be optimized by taking into account features such as position interdependence within TFBSs, DNA shape, and/or by introducing state-of-the-art computational algorithms such as deep learning methods. In addition, we discuss the integration of context variables into the TFBS modeling, including nucleosome positioning, chromatin states, methylation patterns, 3D genome architectures, and TF cooperative binding, in order to better predict TF binding under cellular contexts. Finally, we explore the possibilities of combining the optimized TFBS model with technological advances, such as targeted TFBS perturbation by CRISPR, to better understand gene regulation, evolution, and plant diversity.
Collapse
Affiliation(s)
- Xuelei Lai
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| | - Arnaud Stigliani
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Gilles Vachon
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cristel Carles
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Cezary Smaczniak
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Chloe Zubieta
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Kerstin Kaufmann
- Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - François Parcy
- CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| |
Collapse
|
10
|
Abstract
Sequence-specific nucleic acid binding proteins do not recognize one sequence out of all possibilities; rather, they bind to many sequences with a range of affinities. In this issue of Cell Chemical Biology, Lin et al. (2016), describe the entire landscape of affinities between different RNA molecules and an RNA-binding protein, thus providing a comprehensive description of the factors affecting specificity.
Collapse
Affiliation(s)
- Adrian R Ferré-D'Amaré
- National Heart, Lung and Blood Institute, 50 South Drive, MSC 8012, Bethesda, MD 20892, USA.
| |
Collapse
|
11
|
Anti-inflammatory activity of a serine protease produced from Bacillus pumilus SG2. BIOCATALYSIS AND AGRICULTURAL BIOTECHNOLOGY 2019. [DOI: 10.1016/j.bcab.2019.101162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
12
|
Castro GT, Zárate LE, Nobre CN, Freitas HC. A Fast Parallel K-Modes Algorithm for Clustering Nucleotide Sequences to Predict Translation Initiation Sites. J Comput Biol 2019; 26:442-456. [PMID: 30785342 DOI: 10.1089/cmb.2018.0245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Predicting the location of the translation initiation sites (TIS) is an important problem of molecular biology. In this field, the computational cost for balancing non-TIS sequences is substantial and demands high-performance computing. In this article, we present an optimized version of the K-modes algorithm to cluster TIS sequences and a comparison with the standard K-means clustering. The adapted algorithm uses simple instructions and fewer computational resources to deliver a significant speedup without compromising the sequence clustering results. We also implemented two optimized parallel versions of the algorithm, one for graphics processing units (GPUs) and the other one for general-purpose multicore processors. In our experiments, the GPU K-modes's performance was up to 203 times faster than the respective sequential version for processing Arabidopsis thaliana sequence.
Collapse
Affiliation(s)
- Guilherme Torres Castro
- Department of Computer Science, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil
| | - Luis Enrique Zárate
- Department of Computer Science, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil
| | - Cristiane Neri Nobre
- Department of Computer Science, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil
| | - Henrique Cota Freitas
- Department of Computer Science, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
13
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
14
|
Boshtam M, Khanahmad Shahreza H, Feizollahzadeh S, Rahimmanesh I, Asgary S. Expression and purification of biologically active recombinant rabbit monocyte chemoattractant protein1 in Escherichia coli. FEMS Microbiol Lett 2018; 365:4955552. [PMID: 29596634 DOI: 10.1093/femsle/fny070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 03/26/2018] [Indexed: 12/22/2022] Open
Abstract
Monocyte chemoattractant protein 1 (MCP1) with recruiting monocytes is an important factor at the beginning of inflammatory disorders such as atherosclerosis which seems its blocking preclude this process and help improvement of related diseases. To perform clinical research in this field, MCP1 protein is required but firstly, animal studies should be done. As the rabbit is a suitable model for many inflammatory disorders, and Escherichia coli BL21(DE3) (BL21) cell is a high-efficiency host for protein expression, we decided to produce recombinant rabbit MCP1 (rRMCP1) in BL21/pET28a system. After codon usage, a construct containing RMCP1 sequence was synthesized, cloned into the pET28a plasmid, and overexpressed in BL21 cells. Followed that, with changing expression condition such as cell concentration before the induction, time period, temperature, shaking rate and inducer concentration (IPTG), rRMCP1 expression was optimized, and purified by Ni-NTA. The biological activity of the expressed protein was verified using monocyte migration assay. Using this expression system, nearly 28 mg/mL rRMCP1 was produced at 26°C/180 rpm for 24 h in LB broth medium with 1 mM IPTG. Therefore, we were succeeded to express the intermediate level of rRMCP1 with this method. This amount of protein is sufficient for biological researches in the laboratory.
Collapse
Affiliation(s)
- Maryam Boshtam
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan 8174643446, Iran
| | - Hossein Khanahmad Shahreza
- Genetic and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan 8174643446, Iran
| | - Sadegh Feizollahzadeh
- Faculty of Paramedical, Urmia University of Medical Sciences, Urmia 5756115198, Iran
| | - Ilnaz Rahimmanesh
- Genetic and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan 8174643446, Iran
| | - Sedigheh Asgary
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan 8174643446, Iran
| |
Collapse
|
15
|
Eggeling R, Grosse I, Grau J. InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites. Bioinformatics 2017; 33:580-582. [PMID: 28035026 PMCID: PMC5408807 DOI: 10.1093/bioinformatics/btw689] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 10/27/2016] [Indexed: 11/14/2022] Open
Abstract
Summary Recent studies have shown that the traditional position weight matrix model is often insufficient for modeling transcription factor binding sites, as intra-motif dependencies play a significant role for an accurate description of binding motifs. Here, we present the Java application InMoDe, a collection of tools for learning, leveraging and visualizing such dependencies of putative higher order. The distinguishing feature of InMoDe is a robust model selection from a class of parsimonious models, taking into account dependencies only if justified by the data while choosing for simplicity otherwise. Availability and Implementation InMoDe is implemented in Java and is available as command line application, as application with a graphical user-interface, and as an integration into Galaxy on the project website at http://www.jstacs.de/index.php/InMoDe.
Collapse
Affiliation(s)
- Ralf Eggeling
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
16
|
Hecht A, Glasgow J, Jaschke PR, Bawazer LA, Munson MS, Cochran JR, Endy D, Salit M. Measurements of translation initiation from all 64 codons in E. coli. Nucleic Acids Res 2017; 45:3615-3626. [PMID: 28334756 PMCID: PMC5397182 DOI: 10.1093/nar/gkx070] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/25/2017] [Indexed: 12/21/2022] Open
Abstract
Our understanding of translation underpins our capacity to engineer living systems. The canonical start codon (AUG) and a few near-cognates (GUG, UUG) are considered as the ‘start codons’ for translation initiation in Escherichia coli. Translation is typically not thought to initiate from the 61 remaining codons. Here, we quantified translation initiation of green fluorescent protein and nanoluciferase in E. coli from all 64 triplet codons and across a range of DNA copy number. We detected initiation of protein synthesis above measurement background for 47 codons. Translation from non-canonical start codons ranged from 0.007 to 3% relative to translation from AUG. Translation from 17 non-AUG codons exceeded the highest reported rates of non-cognate codon recognition. Translation initiation from non-canonical start codons may contribute to the synthesis of peptides in both natural and synthetic biological systems.
Collapse
Affiliation(s)
- Ariel Hecht
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Jeff Glasgow
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Paul R Jaschke
- Department of Bioengineering, Stanford, CA 94305, USA.,Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Lukmaan A Bawazer
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Matthew S Munson
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Jennifer R Cochran
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Drew Endy
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| | - Marc Salit
- Joint Initiative for Metrology in Biology, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Stanford, CA 94305, USA.,Department of Bioengineering, Stanford, CA 94305, USA
| |
Collapse
|
17
|
Nunes Pinto CL, Nobre CN, Zárate LE. Transductive learning as an alternative to translation initiation site identification. BMC Bioinformatics 2017; 18:81. [PMID: 28152994 PMCID: PMC5290616 DOI: 10.1186/s12859-017-1502-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 01/28/2017] [Indexed: 11/23/2022] Open
Abstract
Background The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method. Results The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results. Conclusions In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1502-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Cristiane Neri Nobre
- Pontifical Catholic University of Minas Gerais - PUC-MG, 255, Walter Ianni Street, Belo Horizonte, 31980-110, Brazil
| | - Luis Enrique Zárate
- Pontifical Catholic University of Minas Gerais - PUC-MG, 255, Walter Ianni Street, Belo Horizonte, 31980-110, Brazil
| |
Collapse
|
18
|
Meyer MM. The role of mRNA structure in bacterial translational regulation. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27301829 DOI: 10.1002/wrna.1370] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 05/12/2016] [Accepted: 05/16/2016] [Indexed: 01/08/2023]
Abstract
The characteristics of bacterial messenger RNAs (mRNAs) that influence translation efficiency provide many convenient handles for regulation of gene expression, especially when coupled with the processes of transcription termination and mRNA degradation. An mRNA's structure, especially near the site of initiation, has profound consequences for how readily it is translated. This property allows bacterial gene expression to be altered by changes to mRNA structure induced by temperature, or interactions with a wide variety of cellular components including small molecules, other RNAs (such as sRNAs and tRNAs), and RNA-binding proteins. This review discusses the links between mRNA structure and translation efficiency, and how mRNA structure is manipulated by conditions and signals within the cell to regulate gene expression. The range of RNA regulators discussed follows a continuum from very complex tertiary structures such as riboswitch aptamers and ribosomal protein-binding sites to thermosensors and mRNA:sRNA interactions that involve only base-pairing interactions. Furthermore, the high degrees of diversity observed for both mRNA structures and the mechanisms by which inhibition of translation occur have significant consequences for understanding the evolution of bacterial translational regulation. WIREs RNA 2017, 8:e1370. doi: 10.1002/wrna.1370 For further resources related to this article, please visit the WIREs website.
Collapse
|
19
|
Alkhateeb RS, Vorhölter FJ, Rückert C, Mentz A, Wibberg D, Hublik G, Niehaus K, Pühler A. Genome wide transcription start sites analysis of Xanthomonas campestris pv. campestris B100 with insights into the gum gene cluster directing the biosynthesis of the exopolysaccharide xanthan. J Biotechnol 2016; 225:18-28. [PMID: 26975844 DOI: 10.1016/j.jbiotec.2016.03.020] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2015] [Revised: 03/08/2016] [Accepted: 03/10/2016] [Indexed: 01/18/2023]
Abstract
Xanthomonas campestris pv. campestris (Xcc) is the major producer of the exopolysaccharide xanthan, the commercially most important natural polysaccharide of microbial origin. The current work provides deeper insights into the yet uncharacterized transcriptomic features of the xanthan producing strain Xcc-B100. Towards this goal, RNA sequencing of a library based on the selective enrichment of the 5' ends of native transcripts was performed. This approach resulted in the genome wide identification of 3067 transcription start sites (TSSs) that were further classified based on their genomic positions. Among them, 1545 mapped upstream of an actively transcribed CDS and 1363 were classified as novel TSSs representing antisense, internal, and TSSs belonging to previously unidentified genomic features. Analyzing the transcriptional strength of primary and antisense TSSs revealed that in some instances antisense transcription seemed to be initiated at a higher level than its sense counterpart. Mapping the exact positions of TSSs aided in the identification of promoter consensus motifs, ribosomal binding sites, and enhanced the genome annotation of 159 in silico predicted translational start (TLS) sites. The global view on length distribution of the 5' untranslated regions (5'-UTRs) deduced from the data pointed to the occurrence of leaderless transcripts and transcripts with unusually long 5'-UTRs, in addition to identifying seven putative riboswitch elements for Xcc-B100. Concerning the biosynthesis of xanthan, we focused on the transcriptional organization of the gum gene cluster. Under the conditions tested, we present evidence for a complex transcription pattern of the gum genes with multiple TSSs and an obvious considerable role of antisense transcription. The gene gumB, encoding an outer membrane xanthan exporter, is presented here as an example for genes that possessed a strong antisense TSS.
Collapse
Affiliation(s)
- Rabeaa S Alkhateeb
- Abteilung für Proteom und Metabolomforschung, Fakultät für Biologie, Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Frank-Jörg Vorhölter
- Abteilung für Proteom und Metabolomforschung, Fakultät für Biologie, Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany; Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Christian Rückert
- Technologie Platform Genomics, Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Almut Mentz
- Technologie Platform Genomics, Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Daniel Wibberg
- Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Gerd Hublik
- Jungbunzlauer Austria AG, Pernhofen 1, 2064 Wulzeshofen, Austria
| | - Karsten Niehaus
- Abteilung für Proteom und Metabolomforschung, Fakultät für Biologie, Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany.
| | - Alfred Pühler
- Centrum für Biotechnologie (CeBiTec), Universität Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| |
Collapse
|
20
|
Jennings MJ, Barrios AF, Tan S. Elimination of truncated recombinant protein expressed in Escherichia coli by removing cryptic translation initiation site. Protein Expr Purif 2015; 121:17-21. [PMID: 26739786 DOI: 10.1016/j.pep.2015.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Revised: 11/25/2015] [Accepted: 12/03/2015] [Indexed: 10/22/2022]
Abstract
Undesirable truncated recombinant protein products pose a special expression and purification challenge because such products often share similar chromatographic properties as the desired full length protein. We describe here our observation of both full length and a truncated form of a yeast protein (Gcn5) expressed in Escherichia coli, and the reduction or elimination of the truncated form by mutating a cryptic Shine-Dalgarno or START codon within the Gcn5 coding region. Unsuccessful attempts to engineer in a cryptic translation initiation site into other recombinant proteins suggest that cryptic Shine-Dalgarno or START codon sequences are necessary but not sufficient for cryptic translation in E. coli.
Collapse
Affiliation(s)
- Matthew J Jennings
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Adam F Barrios
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Song Tan
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
21
|
Eggeling R, Roos T, Myllymäki P, Grosse I. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 2015; 16:375. [PMID: 26552868 PMCID: PMC4640111 DOI: 10.1186/s12859-015-0797-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 11/29/2022] Open
Abstract
Background Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery. Results To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice. Conclusions The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0797-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ralf Eggeling
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Teemu Roos
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Petri Myllymäki
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
22
|
Schneider TD. Twenty Years of Delila and Molecular Information Theory: The Altenberg-Austin Workshop in Theoretical Biology Biological Information, Beyond Metaphor: Causality, Explanation, and Unification Altenberg, Austria, 11-14 July 2002. ACTA ACUST UNITED AC 2015; 1:250-260. [PMID: 18084638 DOI: 10.1162/biot.2006.1.3.250] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A brief personal history is given about how information theory can be applied to binding sites of genetic control molecules on nucleic acids. The primary example used is ribosome binding sites in Escherichia coli. Once the sites are aligned, the information needed to describe the sites can be computed using Claude Shannon's method. This is displayed by a computer graphic called a sequence logo. The logo represents an average binding site, and the mathematics easily allows one to determine the components of this average. That is, given a set of binding sites, the information for individual binding sites can also be computed. One can go further and predict the information of sites that are not in the original data set. Information theory also allows one to model the flexibility of ribosome binding sites, and this led us to a simple model for ribosome translational initiation in which the molecular components fit together only when the ribosome is at a good ribosome binding site. Since information theory is general, the same mathematics applies to human splice junctions, where we can predict the effect of sequence changes that cause human genetic diseases and cancer. The second example given is the Pribnow 'box' which, when viewed by the information theory method, reveals a mechanism for initiation of both transcription and DNA replication. Replication, transcription, splicing, and translation into protein represent the central dogma, so these examples show how molecular information theory is contributing to our knowledge of basic biology.
Collapse
Affiliation(s)
- Thomas D Schneider
- National Cancer Institute at Frederick, Laboratory of Experimental and Computational Biology, P. O. Box B, Frederick, MD 21702-1201. (301) 846-5581 (-5532 for messages), fax: (301) 846-5598, . http://www.lecb.ncifcrf.gov/ toms/
| |
Collapse
|
23
|
|
24
|
Eggeling R, Gohr A, Keilwagen J, Mohr M, Posch S, Smith AD, Grosse I. On the value of intra-motif dependencies of human insulator protein CTCF. PLoS One 2014; 9:e85629. [PMID: 24465627 PMCID: PMC3899044 DOI: 10.1371/journal.pone.0085629] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/05/2013] [Indexed: 01/08/2023] Open
Abstract
The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.
Collapse
Affiliation(s)
- Ralf Eggeling
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - André Gohr
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Quedlinburg, Germany
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
| | - Michaela Mohr
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - Andrew D. Smith
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States of America
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
- German Center of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
25
|
McDougle DR, Palaria A, Magnetta E, Meling DD, Das A. Functional studies of N-terminally modified CYP2J2 epoxygenase in model lipid bilayers. Protein Sci 2014; 22:964-79. [PMID: 23661295 DOI: 10.1002/pro.2280] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Revised: 04/18/2013] [Accepted: 05/04/2013] [Indexed: 01/14/2023]
Abstract
CYP2J2 epoxygenase is a membrane bound cytochrome P450 that converts omega-3 and omega-6 fatty acids into physiologically active epoxides. In this work, we present a comprehensive comparison of the effects of N-terminal modifications on the properties of CYP2J2 with respect to the activity of the protein in model lipid bilayers using Nanodiscs. We demonstrate that the complete truncation of the N-terminus changes the association of this protein with the E.coli membrane but does not disrupt incorporation in the lipid bilayers of Nanodiscs. Notably, the introduction of silent mutations at the N-terminus was used to express full length CYP2J2 in E. coli while maintaining wild-type functionality. We further show that lipid bilayers are essential for the productive use of NADPH for ebastine hydroxylation by CYP2J2. Taken together, it was determined that the presence of the N-terminus is not as critical as the presence of a membrane environment for efficient electron transfer from cytochrome P450 reductase to CYP2J2 for ebastine hydroxylation in Nanodiscs. This suggests that adopting the native-like conformation of CYP2J2 and cytochrome P450 reductase in lipid bilayers is essential for effective use of reducing equivalents from NADPH for ebastine hydroxylation.
Collapse
Affiliation(s)
- Daniel R McDougle
- Department of Comparative Biosciences, University of Illinois Urbana-Champaign, Urbana, Illinois 61802, USA
| | | | | | | | | |
Collapse
|
26
|
Pfeifer-Sancar K, Mentz A, Rückert C, Kalinowski J. Comprehensive analysis of the Corynebacterium glutamicum transcriptome using an improved RNAseq technique. BMC Genomics 2013; 14:888. [PMID: 24341750 PMCID: PMC3890552 DOI: 10.1186/1471-2164-14-888] [Citation(s) in RCA: 135] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 12/03/2013] [Indexed: 01/16/2023] Open
Abstract
Background The use of RNAseq to resolve the transcriptional organization of an organism was established in recent years and also showed the complexity and dynamics of bacterial transcriptomes. The aim of this study was to comprehensively investigate the transcriptome of the industrially relevant amino acid producer and model organism Corynebacterium glutamicum by RNAseq in order to improve its genome annotation and to describe important features for transcription and translation. Results RNAseq data sets were obtained by two methods, one that focuses on 5′-ends of primary transcripts and another that provides the overall transcriptome with an improved resolution of 3′-ends of transcripts. Subsequent data analysis led to the identification of more than 2,000 transcription start sites (TSSs), the definition of 5′-UTRs (untranslated regions) for annotated protein-coding genes, operon structures and many novel transcripts located between or in antisense orientation to protein-coding regions. Interestingly, a high number of mRNAs (33%) is transcribed as leaderless transcripts. From the data, consensus promoter and ribosome binding site (RBS) motifs were identified and it was shown that the majority of genes in C. glutamicum are transcribed monocistronically, but operons containing up to 16 genes are also present. Conclusions The comprehensive transcriptome map of C. glutamicum established in this study represents a major step forward towards a complete definition of genetic elements (e.g. promoter regions, gene starts and stops, 5′-UTRs, RBSs, transcript starts and ends) and provides the ideal basis for further analyses on transcriptional regulatory networks in this organism. The methods developed are easily applicable for other bacteria and have the potential to be used also for quantification of transcriptomes, replacing microarrays in the near future.
Collapse
Affiliation(s)
| | | | | | - Jörn Kalinowski
- Microbial Genomics and Biotechnology, Center for Biotechnology, Bielefeld University, Universitätsstraße 27, 33615, Bielefeld, Germany.
| |
Collapse
|
27
|
Zelasko S, Palaria A, Das A. Optimizations to achieve high-level expression of cytochrome P450 proteins using Escherichia coli expression systems. Protein Expr Purif 2013; 92:77-87. [DOI: 10.1016/j.pep.2013.07.017] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 07/26/2013] [Accepted: 07/30/2013] [Indexed: 12/18/2022]
|
28
|
Abstract
The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary. It briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
Collapse
|
29
|
Fakruddin M, Mohammad Mazumdar R, Bin Mannan KS, Chowdhury A, Hossain MN. Critical Factors Affecting the Success of Cloning, Expression, and Mass Production of Enzymes by Recombinant E. coli. ISRN BIOTECHNOLOGY 2012; 2013:590587. [PMID: 25969776 PMCID: PMC4403561 DOI: 10.5402/2013/590587] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/01/2012] [Accepted: 08/07/2012] [Indexed: 11/23/2022]
Abstract
E. coli is the most frequently used host for production of enzymes and other proteins by recombinant DNA technology. E. coli is preferable for its relative simplicity, inexpensive and fast high-density cultivation, well-known genetics, and large number of compatible molecular tools available. Despite all these advantages, expression and production of recombinant enzymes are not always successful and often result in insoluble and nonfunctional proteins. There are many factors that affect the success of cloning, expression, and mass production of enzymes by recombinant E. coli. In this paper, these critical factors and approaches to overcome these obstacles are summarized focusing controlled expression of target protein/enzyme in an unmodified form at industrial level.
Collapse
Affiliation(s)
- Md Fakruddin
- Industrial Microbiology Laboratory, Institute of Food Science and Technology (IFST), Bangladesh Council of Scientific and Industrial Research (BCSIR), Dhaka 1205, Bangladesh
| | | | | | - Abhijit Chowdhury
- Industrial Microbiology Laboratory, Institute of Food Science and Technology (IFST), Bangladesh Council of Scientific and Industrial Research (BCSIR), Dhaka 1205, Bangladesh
| | - Md Nur Hossain
- Industrial Microbiology Laboratory, Institute of Food Science and Technology (IFST), Bangladesh Council of Scientific and Industrial Research (BCSIR), Dhaka 1205, Bangladesh
| |
Collapse
|
30
|
Gfeller D. Uncovering new aspects of protein interactions through analysis of specificity landscapes in peptide recognition domains. FEBS Lett 2012; 586:2764-72. [PMID: 22710167 DOI: 10.1016/j.febslet.2012.03.054] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 03/27/2012] [Accepted: 03/27/2012] [Indexed: 12/20/2022]
Abstract
Protein interactions underlie all biological processes. An important class of protein interactions, often observed in signaling pathways, consists of peptide recognition domains binding short protein segments on the surface of their target proteins. Recent developments in experimental techniques have uncovered many such interactions and shed new lights on their specificity. To analyze these data, novel computational methods have been introduced that can accurately describe the specificity landscape of peptide recognition domains and predict new interactions. Combining large-scale analysis of binding specificity data with structure-based modeling can further reveal new biological insights into the molecular recognition events underlying signaling pathways.
Collapse
Affiliation(s)
- David Gfeller
- Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, CH-1015 Lausanne, Switzerland.
| |
Collapse
|
31
|
Silva LM, Teixeira FCDS, Ortega JM, Zárate LE, Nobre CN. Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA. BMC Genomics 2011; 12 Suppl 4:S9. [PMID: 22369295 PMCID: PMC3287592 DOI: 10.1186/1471-2164-12-s4-s9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study focuses on the problem from the perspective of balancing classes and we present an undersampling balancing method, M-clus, which is based on clustering. The method also adds features to sequences and improves the performance of the classifier through the inclusion of knowledge obtained by the model, called InAKnow. RESULTS Through this methodology, the measures of performance used (accuracy, sensitivity, specificity and adjusted accuracy) are greater than 93% for the Mus musculus and Rattus norvegicus organisms, and varied between 72.97% and 97.43% for the other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. The precision increases significantly by 39% and 22.9% for Mus musculus and Rattus norvegicus, respectively, when the knowledge obtained by the model is included. For the other organisms, the precision increases by between 37.10% and 59.49%. The inclusion of certain features during training, for example, the presence of ATG in the upstream region of the Translation Initiation Site, improves the rate of sensitivity by approximately 7%. Using the M-Clus balancing method generates a significant increase in the rate of sensitivity from 51.39% to 91.55% (Mus musculus) and from 47.45% to 88.09% (Rattus norvegicus). CONCLUSIONS In order to solve the problem of TIS prediction, the results indicate that the methodology proposed in this work is adequate, particularly when using the concept of acquired knowledge which increased the accuracy in all databases evaluated.
Collapse
Affiliation(s)
- Lívia Márcia Silva
- Departamento de Ciência da Computação - Universidade Federal de São João del-Rei, Brazil
| | | | | | | | | |
Collapse
|
32
|
Intron identification approaches based on weighted features and fuzzy decision trees. Comput Biol Med 2011; 42:112-22. [PMID: 22099702 DOI: 10.1016/j.compbiomed.2011.10.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2010] [Revised: 04/11/2011] [Accepted: 10/13/2011] [Indexed: 11/22/2022]
Abstract
Current computational predictions of splice sites largely depend on the sequence patterns of known intronic sequence features (ISFs) described in the classical intron definition model (IDM). The computation-oriented IDM (CO-IDM) clearly provides more specific and concrete information for describing intron flanks of splice sites (IFSSs). In the paper, we proposed a novel approach of fuzzy decision trees (FDTs) which utilize (1) weighted ISFs of twelve uni-frame patterns (UFPs) and forty-five multi-frame patterns (MFPs) and (2) gain ratios to improve the performances in identifying an intron. First, we fuzzified extracted features from genomic sequences using membership functions with an unsupervised self-organizing map (SOM) technique. Then, we brought in different viewpoints of globally weighting and crossly referring in generating fuzzy rules, which are interpretable and useful for biologists to verify whether a sequence is an intron or not. Finally, the experimental results revealed the effectiveness of the proposed method in improving the identification accuracy. Besides, we also implemented an on-line intronic identifier to infer an unknown genomic sequence.
Collapse
|
33
|
Douillard FP, O'Connell-Motherway M, Cambillau C, van Sinderen D. Expanding the molecular toolbox for Lactococcus lactis: construction of an inducible thioredoxin gene fusion expression system. Microb Cell Fact 2011; 10:66. [PMID: 21827702 PMCID: PMC3162883 DOI: 10.1186/1475-2859-10-66] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 08/09/2011] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The development of the Nisin Inducible Controlled Expression (NICE) system in the food-grade bacterium Lactococcus lactis subsp. cremoris represents a cornerstone in the use of Gram-positive bacterial expression systems for biotechnological purposes. However, proteins that are subjected to such over-expression in L. lactis may suffer from improper folding, inclusion body formation and/or protein degradation, thereby significantly reducing the yield of soluble target protein. Although such drawbacks are not specific to L. lactis, no molecular tools have been developed to prevent or circumvent these recurrent problems of protein expression in L. lactis. RESULTS Mimicking thioredoxin gene fusion systems available for E. coli, two nisin-inducible expression vectors were constructed to over-produce various proteins in L. lactis as thioredoxin fusion proteins. In this study, we demonstrate that our novel L. lactis fusion partner expression vectors allow high-level expression of soluble heterologous proteins Tuc2009 ORF40, Bbr_0140 and Tuc2009 BppU/BppL that were previously insoluble or not expressed using existing L. lactis expression vectors. Over-expressed proteins were subsequently purified by Ni-TED affinity chromatography. Intact heterologous proteins were detected by immunoblotting analyses. We also show that the thioredoxin moiety of the purified fusion protein was specifically and efficiently cleaved off by enterokinase treatment. CONCLUSIONS This study is the first description of a thioredoxin gene fusion expression system, purposely developed to circumvent problems associated with protein over-expression in L. lactis. It was shown to prevent protein insolubility and degradation, allowing sufficient production of soluble proteins for further structural and functional characterization.
Collapse
Affiliation(s)
- François P Douillard
- Department of Microbiology, University College Cork, Cork, Ireland
- Department of Veterinary Sciences, University of Helsinki, Agnes Sjöbergin katu 2, 00790 Helsinki, Finland
| | - Mary O'Connell-Motherway
- Department of Microbiology, University College Cork, Cork, Ireland
- Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland
| | - Christian Cambillau
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 Centre National de la Recherche Scientifique and Universités d'Aix-Marseille I & II, Campus de Luminy, Case 932, 13288 Marseille Cedex 09, France
| | - Douwe van Sinderen
- Department of Microbiology, University College Cork, Cork, Ireland
- Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland
| |
Collapse
|
34
|
Hansted JG, Pietikäinen L, Hög F, Sperling-Petersen HU, Mortensen KK. Expressivity tag: a novel tool for increased expression in Escherichia coli. J Biotechnol 2011; 155:275-83. [PMID: 21801766 DOI: 10.1016/j.jbiotec.2011.07.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Revised: 07/07/2011] [Accepted: 07/11/2011] [Indexed: 11/18/2022]
Abstract
Protein expression in Escherichia coli is rarely trivial as low expression and insolubility are common problems. In this work we define a fusion partner, which increases expression levels similarly to the distinct function of solubility and affinity tags. This type of fusion tag we term an expressivity tag. Our work is based on earlier observations where 3' deletions of the InfB gene displays strongly increased expression levels. We have constructed progressively shortened fragments of the InfB(1-471) gene and fused gene fragments to a gfp reporter gene. A 5-fold increase in GFP expression was seen for an optimal 21 nucleotide InfB(1-21) sequence compared to gfp independently. We defined the InfB(1-21) sequence as an expressivity tag. The tag was tested for improved expression of two biotechnological important proteins streptavidin and a single chain antibody (scFv). Expression of both streptavidin and scFv(L32) was improved as evaluated by SDS-PAGE. Calculation of folding energies in the translation initiation region gave higher free energies for gfp, L32 and streptavidin when linked to InfB(1-21) than independently. InfB(1-21) did however not improve the codon usage or codon adaptation index. The expressivity tag is an important addition to the box of tools available for optimizing heterologous protein expression.
Collapse
Affiliation(s)
- Jon Gade Hansted
- Department of Molecular Biology, Aarhus University, Gustav Wieds Vej 10C, DK-8000 Aarhus C, Denmark
| | | | | | | | | |
Collapse
|
35
|
Abstract
We present the complete genome sequence and proteogenomic map for Acholeplasma laidlawii PG-8A (class Mollicutes, order Acholeplasmatales, family Acholeplasmataceae). The genome of A. laidlawii is represented by a single 1,496,992-bp circular chromosome with an average G+C content of 31 mol%. This is the longest genome among the Mollicutes with a known nucleotide sequence. It contains genes of polymerase type I, SOS response, and signal transduction systems, as well as RNA regulatory elements, riboswitches, and T boxes. This demonstrates a significant capability for the regulation of gene expression and mutagenic response to stress. Acholeplasma laidlawii and phytoplasmas are the only Mollicutes known to use the universal genetic code, in which UGA is a stop codon. Within the Mollicutes group, only the sterol-nonrequiring Acholeplasma has the capacity to synthesize saturated fatty acids de novo. Proteomic data were used in the primary annotation of the genome, validating expression of many predicted proteins. We also detected posttranslational modifications of A. laidlawii proteins: phosphorylation and acylation. Seventy-four candidate phosphorylated proteins were found: 16 candidates are proteins unique to A. laidlawii, and 11 of them are surface-anchored or integral membrane proteins, which implies the presence of active signaling pathways. Among 20 acylated proteins, 14 contained palmitic chains, and six contained stearic chains. No residue of linoleic or oleic acid was observed. Acylated proteins were components of mainly sugar and inorganic ion transport systems and were surface-anchored proteins with unknown functions.
Collapse
|
36
|
Vernet E, Kotzsch A, Voldborg B, Sundström M. Screening of genetic parameters for soluble protein expression in Escherichia coli. Protein Expr Purif 2011; 77:104-11. [DOI: 10.1016/j.pep.2010.11.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Revised: 11/24/2010] [Accepted: 11/24/2010] [Indexed: 11/17/2022]
|
37
|
Asada M, Hirakawa H, Kuhara S. Classification of Bacteria Based on the Biases of Terminal Amino Acid Residues. Protein J 2011; 30:290-7. [DOI: 10.1007/s10930-011-9332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
38
|
Shemesh R, Novik A, Cohen Y. Follow the leader: preference for specific amino acids directly following the initial methionine in proteins of different organisms. GENOMICS PROTEOMICS & BIOINFORMATICS 2011; 8:180-9. [PMID: 20970746 PMCID: PMC5054127 DOI: 10.1016/s1672-0229(10)60020-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
It is well established that the vast majority of proteins of all taxonomical groups and species are initiated by an AUG codon, translated into the amino acid methionine (Met). Many attempts were made to evaluate the importance of the sequences surrounding the initiation codon, mostly focusing on the RNA sequence. However, the role and importance of the amino acids following the initiating Met residue were rarely investigated, mostly in bacteria and fungi. Herein, we computationally examined the protein sequences of all major taxonomical groups represented in the Swiss-Prot database, and evaluated the preference of each group to specific amino acids at the positions directly following the initial Met. The results indicate that there is a species-specific preference for the second amino acid of the majority of protein sequences. Interestingly, the preference for a certain amino acid at the second position changes throughout evolution from lysine in prokaryotes, through serine in lower eukaryotes, to alanine in higher plants and animals.
Collapse
|
39
|
Bivona L, Zou Z, Stutzman N, Sun PD. Influence of the second amino acid on recombinant protein expression. Protein Expr Purif 2010; 74:248-56. [PMID: 20600944 DOI: 10.1016/j.pep.2010.06.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Revised: 06/15/2010] [Accepted: 06/15/2010] [Indexed: 11/25/2022]
Abstract
Factors affecting protein expression have been intensely studied to the benefit of recombinant protein production. Through mutational analysis at the +2 amino acid position of recombinant Igα, we examined the effect of all 20 amino acids on protein expression. The results showed that amino acids at the +2 position affected 10-fold in the recombinant protein expression. Specifically, Ala, Cys, Pro, Ser, Thr, and Lys at the +2 position resulted in significantly higher expression of recombinant Igα than other amino acids, while Met, His and Glu resulted in greatly reduced protein expression. This expression difference depended on the amino acid instead of their codon usage. Consistent with the mutational results, a statistically significant enrichment in Ala and Ser at the +2 position was observed among highly expressed Escherichia coli genes. This work suggests a general approach to enhance protein expression by incorporating an Ala or Ser after the initiation codon.
Collapse
Affiliation(s)
- Louis Bivona
- Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852, USA
| | | | | | | |
Collapse
|
40
|
Mechanisms of the initiation of protein synthesis: in reading frame binding of ribosomes to mRNA. Mol Biol Rep 2010; 38:847-55. [PMID: 20467902 DOI: 10.1007/s11033-010-0176-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 04/12/2010] [Indexed: 12/21/2022]
Abstract
The various mechanisms proposed to describe the initiation of protein synthesis are reviewed with a focus on their initiation signals. A characteristic feature of the various mechanisms is that each one of them postulates a distinct initiation signal. The signals of the Shine-Dalgarno (SD), the scanning and the internal ribosome entry site (IRES) mechanisms are all located exclusively in the 5' leader sequence, whereas, the signal of the cumulative specificity (CS) mechanism includes the entire initiation site (IS). Computer analysis of known E. coli IS sequences showed signal characteristics in the entire model IS consisting of 47 bases, in segments of the 5' leader and of the protein-coding regions. The proposal that eukaryotic translation actually occurs in two steps is scrutinized. In a first step, initiation factors (eIF4F) interact with the cap of the mRNA, thereby enhancing the accessibility of the IS. In the second step, initiation is by the conserved prokaryotic mechanism in which the ribosomes bind directly to the mRNA without ribosomal scanning. This binding occurs by the proposed process of in reading frame binding of ribosomes to mRNA, which is consistent with the CS mechanism. The basic CS mechanism is able to account for the initiation of translation of leaderless mRNAs, as well as for that of canonical mRNAs. The SD, the scanning and the IRES mechanisms, on the other hand, are inconsistent with the initiation of translation of leaderless mRNAs. Based on these and other observations, it is deemed that the CS mechanism is the universal initiation mechanism.
Collapse
|
41
|
Zhang L, Liu X, Wang C, Liu X, Cheng G, Wu Y. Expression, purification and direct eletrochemistry of cytochrome P450 6A1 from the house fly, Musca domestica. Protein Expr Purif 2010; 71:74-8. [DOI: 10.1016/j.pep.2009.12.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Revised: 12/14/2009] [Accepted: 12/14/2009] [Indexed: 10/20/2022]
|
42
|
Signal sequence non-optimal codons are required for the correct folding of mature maltose binding protein. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2010; 1798:1244-9. [PMID: 20230779 DOI: 10.1016/j.bbamem.2010.03.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Revised: 03/07/2010] [Accepted: 03/09/2010] [Indexed: 11/22/2022]
Abstract
Non-optimal codons are generally characterised by a low concentration of isoaccepting tRNA and a slower translation rate compared to optimal codons. In a previous study, we reported a 20-fold reduction in maltose binding protein (MBP) level when the non-optimal codons in the signal sequence were optimised. In this study, we report that the 20-fold reduction is rescued when MBP is expressed at 28 degrees C instead of 37 degrees C, suggesting that the signal sequence optimised MBP protein (MBP-opt) may be misfolded, and is being degraded at 37 degrees C. Consistent with this idea, transient induction of the heat shock proteases prior to MBP expression at 28 degrees C restores the 20-fold difference, demonstrating that the difference in production levels is due to post-translational degradation of MBP-opt by the heat-shock proteases. Analysis of the structure of purified MBP-wt and MBP-opt grown at 28 degrees C showed that although they have similar secondary structure content, MBP-opt is more resistant to thermal unfolding than is MBP-wt. The two proteins also exhibit different tryptic fragment profiles, further confirming that they are folded into conformationally different states. This is the first study to demonstrate that signal sequence non-optimal codons can influence the folding of the mature exported protein.
Collapse
|
43
|
Vandenbon A, Nakai K. Modeling tissue-specific structural patterns in human and mouse promoters. Nucleic Acids Res 2009; 38:17-25. [PMID: 19850720 PMCID: PMC2800225 DOI: 10.1093/nar/gkp866] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Sets of genes expressed in the same tissue are believed to be under the regulation of a similar set of transcription factors, and can thus be assumed to contain similar structural patterns in their regulatory regions. Here we present a study of the structural patterns in promoters of genes expressed specifically in 26 human and 34 mouse tissues. For each tissue we constructed promoter structure models, taking into account presences of motifs, their positioning to the transcription start site, and pairwise positioning of motifs. We found that 35 out of 60 models (58%) were able to distinguish positive test promoter sequences from control promoter sequences with statistical significance. Models with high performance include those for liver, skeletal muscle, kidney and tongue. Many of the important structural patterns in these models involve transcription factors of known importance in the tissues in question and structural patterns tend to be conserved between human and mouse. In addition to that, promoter models for related tissues tend to have high inter-tissue performance, indicating that their promoters share common structural patterns. Together, these results illustrate the validity of our models, but also indicate that the promoter structures for some tissues are easier to model than those of others.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | | |
Collapse
|
44
|
Zhang S, Xu M, Li S, Su Z. Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes. Nucleic Acids Res 2009; 37:e72. [PMID: 19383880 PMCID: PMC2691844 DOI: 10.1093/nar/gkp248] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Bioinformatics and Genomics, Bioinformatics Research Center, the University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | | | |
Collapse
|
45
|
Dawy Z, Morcos F, Weindl J, Mueller JC. Translation initiation modeling and mutational analysis based on the -end of the Escherichia coli 16S rRNA sequence. Biosystems 2009; 96:58-64. [DOI: 10.1016/j.biosystems.2008.11.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Revised: 11/10/2008] [Accepted: 11/14/2008] [Indexed: 01/11/2023]
|
46
|
Zare-Mirakabad F, Ahrabian H, Sadeghi M, Nowzari-Dalini A, Goliaei B. New scoring schema for finding motifs in DNA Sequences. BMC Bioinformatics 2009; 10:93. [PMID: 19302709 PMCID: PMC2679735 DOI: 10.1186/1471-2105-10-93] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2008] [Accepted: 03/20/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. RESULTS We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. CONCLUSION The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.
Collapse
Affiliation(s)
- Fatemeh Zare-Mirakabad
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hayedeh Ahrabian
- Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
| | - Mehdei Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
- School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran
| | - Abbas Nowzari-Dalini
- Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
| | - Bahram Goliaei
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
47
|
Zhang J, Zhao J, Li D, Liu S, Li L, Sun Q, Huang M, Yang Z. Cloning of the gene encoding an insecticidal protein inPseudomonas pseudoalcaligenes. ANN MICROBIOL 2009. [DOI: 10.1007/bf03175597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
48
|
McCoy J, Lavallie E. Expression and purification of thioredoxin fusion proteins. ACTA ACUST UNITED AC 2008; Chapter 16:Unit16.8. [PMID: 18265135 DOI: 10.1002/0471142727.mb1608s28] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This unit describes a gene fusion expression system that uses thioredoxin, the product of the Escherichia coli trxA gene, as the fusion partner. The system is particularly useful for high-level production of soluble fusion proteins in the E. coli cytoplasm; in many cases heterologous proteins produced as thioredoxin fusion proteins are correctly folded and display full biological activity. Protein fusions to His-patch Trx can usually be purified in a single step from cell lysates. Additional protocols describe E. coli cell lysis using a French pressure cell and fractionation, osmotic release of thioredoxin fusion proteins from the E. coli cytoplasm, and heat treatment to purify some thioredoxin fusion proteins.
Collapse
Affiliation(s)
- J McCoy
- Genetics Institute, Cambridge, Massachusetts, USA
| | | |
Collapse
|
49
|
Laing E, Sidhu K, Hubbard SJ. Predicted transcription factor binding sites as predictors of operons in Escherichia coli and Streptomyces coelicolor. BMC Genomics 2008; 9:79. [PMID: 18269733 PMCID: PMC2276206 DOI: 10.1186/1471-2164-9-79] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Accepted: 02/12/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND As a polycistronic transcriptional unit of one or more adjacent genes, operons play a key role in regulation and function in prokaryotic biology, and a better understanding of how they are constituted and controlled is needed. Recent efforts have attempted to predict operonic status in sequenced genomes using a variety of techniques and data sources. To date, non-homology based operon prediction strategies have mainly used predicted promoters and terminators present at the extremities of transcriptional unit as predictors, with reasonable success. However, transcription factor binding sites (TFBSs), typically found upstream of the first gene in an operon, have not yet been evaluated. RESULTS Here we apply a method originally developed for the prediction of TFBSs in Escherichia coli that minimises the need for prior knowledge and tests its ability to predict operons in E. coli and the 'more complex', pharmaceutically important, Streptomyces coelicolor. We demonstrate that through building genome specific TFBS position-specific-weight-matrices (PSWMs) it is possible to predict operons in E. coli and S. coelicolor with 83% and 93% accuracy respectively, using only TFBS as delimiters of operons. Additionally, the 'palindromicity' of TFBS footprint data of E. coli is characterised. CONCLUSION TFBS are proposed as novel independent features for use in prokaryotic operon prediction (whether alone or as part of a set of features) given their efficacy as operon predictors in E. coli and S. coelicolor. We also show that TFBS footprint data in E. coli generally contains inverted repeats with significantly (p < 0.05) greater palindromicity than random sequences. Consequently, the palindromicity of putative TFBSs predicted can also enhance operon predictions.
Collapse
Affiliation(s)
- Emma Laing
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
- School of Biomedical and Molecular Sciences, University of Surrey, Guildford, GU2 7XH, UK
| | - Khushwant Sidhu
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
| | - Simon J Hubbard
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
| |
Collapse
|
50
|
Vandenbon A, Miyamoto Y, Takimoto N, Kusakabe T, Nakai K. Markov chain-based promoter structure modeling for tissue-specific expression pattern prediction. DNA Res 2008; 15:3-11. [PMID: 18258700 PMCID: PMC2650632 DOI: 10.1093/dnares/dsm034] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Transcriptional regulation is the first level of regulation of gene expression and is therefore a major topic in computational biology. Genes with similar expression patterns can be assumed to be co-regulated at the transcriptional level by promoter sequences with a similar structure. Current approaches for modeling shared regulatory features tend to focus mainly on clustering of cis-regulatory sites. Here we introduce a Markov chain-based promoter structure model that uses both shared motifs and shared features from an input set of promoter sequences to predict candidate genes with similar expression. The model uses positional preference, order, and orientation of motifs. The trained model is used to score a genomic set of promoter sequences: high-scoring promoters are assumed to have a structure similar to the input sequences and are thus expected to drive similar expression patterns. We applied our model on two datasets in Caenorhabditis elegans and in Ciona intestinalis. Both computational and experimental verifications indicate that this model is capable of predicting candidate promoters driving similar expression patterns as the input-regulatory sequences. This model can be useful for finding promising candidate genes for wet-lab experiments and for increasing our understanding of transcriptional regulation.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | |
Collapse
|