1
|
Mir MH, Parmar S, Singh C, Kalia D. Location-agnostic site-specific protein bioconjugation via Baylis Hillman adducts. Nat Commun 2024; 15:859. [PMID: 38286847 PMCID: PMC10825175 DOI: 10.1038/s41467-024-45124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 01/15/2024] [Indexed: 01/31/2024] Open
Abstract
Proteins labelled site-specifically with small molecules are valuable assets for chemical biology and drug development. The unique reactivity profile of the 1,2-aminothiol moiety of N-terminal cysteines (N-Cys) of proteins renders it highly attractive for regioselective protein labelling. Herein, we report an ultrafast Z-selective reaction between isatin-derived Baylis Hillman adducts and 1,2-aminothiols to form a bis-heterocyclic scaffold, and employ it for stable protein bioconjugation under both in vitro and live-cell conditions. We refer to our protein bioconjugation technology as Baylis Hillman orchestrated protein aminothiol labelling (BHoPAL). Furthermore, we report a lipoic acid ligase-based technology for introducing the 1,2-aminothiol moiety at any desired site within proteins, rendering BHoPAL location-agnostic (not limited to N-Cys). By using this approach in tandem with BHoPAL, we generate dually labelled protein bioconjugates appended with different labels at two distinct specific sites on a single protein molecule. Taken together, the protein bioconjugation toolkit that we disclose herein will contribute towards the generation of both mono and multi-labelled protein-small molecule bioconjugates for applications as diverse as biophysical assays, cellular imaging, and the production of therapeutic protein-drug conjugates. In addition to protein bioconjugation, the bis-heterocyclic scaffold we report herein will find applications in synthetic and medicinal chemistry.
Collapse
Affiliation(s)
- Mudassir H Mir
- Department of Chemistry, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal Bypass Road, Bhauri, Bhopal, 462066, Madhya Pradesh, India
| | - Sangeeta Parmar
- Department of Chemistry, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal Bypass Road, Bhauri, Bhopal, 462066, Madhya Pradesh, India
| | - Chhaya Singh
- Department of Chemistry, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal Bypass Road, Bhauri, Bhopal, 462066, Madhya Pradesh, India
| | - Dimpy Kalia
- Department of Chemistry, Indian Institute of Science Education and Research (IISER) Bhopal, Bhopal Bypass Road, Bhauri, Bhopal, 462066, Madhya Pradesh, India.
| |
Collapse
|
2
|
Owen MD, Sacks C, Bathina S, Emmins RA, Dickson AJ. Characterising the structural and cellular role of immunoglobulin C-terminal lysine in secretory pathways. J Biotechnol 2023; 374:38-48. [PMID: 37495115 DOI: 10.1016/j.jbiotec.2023.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023]
Abstract
Improved understanding of expression of recombinant immunoglobulin (IgG)-based therapies can decrease manufacturing process costs and bring down costs to patients. Deletion of C-terminal Lysine (C-Lys) from IgG molecules has been shown to greatly impact yield. This study set out to characterise structural components of IgG C-terminal variants which modulate protein expression by examination of the consequences of mutations at the C-terminal of IgG on expression and by the use of fluorescent C-terminal fragment fusion proteins. Cell-based and cell-free experiments were also implemented to characterise how the C-terminal differentially engages with cellular pathways to modulate expression. IgG variants engineered by removal of the C-terminal Lys were expressed at significantly lower rates than control variants by CHO (and HEK) cells. Engineered constructs of mCherry fused with short regions of the C-terminal regions of IgG mimicked the ordering of expressability observed for IgG variants. These fluorescent C-terminal fragment fusions offered the potential to profile how sequences (and point mutations) modified expression. Via combinations of cell and cell-free systems, screening across a range of variants of IgG and mCherry reporter constructs has shown that interactions between specific C-terminal amino acid sequences and the ribosome can regulate the rate and extent of expression. This study highlights the importance of amino acid sequence regulatory events determining the efficiency of production of desirable recombinant proteins, showing that wildtype C-terminal lysine is a necessary capping molecule for IgG1 expression. From a wider perspective, these data are especially significant towards the design of novel entities. The approach has also provided information about novel short C-terminal tags which may be used to provide selective synthesis of specific subunits in the production of multisubunit products. Alternative strategies for removing C-terminal amino acid heterogeneity whilst maintaining efficient rates of expression have been provided.
Collapse
Affiliation(s)
- Mark D Owen
- Department of Chemical Engineering, The University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, UK.
| | | | | | | | - Alan J Dickson
- Department of Chemical Engineering, The University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, UK.
| |
Collapse
|
3
|
De Rosa L, Di Stasi R, Romanelli A, D’Andrea LD. Exploiting Protein N-Terminus for Site-Specific Bioconjugation. Molecules 2021; 26:3521. [PMID: 34207845 PMCID: PMC8228110 DOI: 10.3390/molecules26123521] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 11/29/2022] Open
Abstract
Although a plethora of chemistries have been developed to selectively decorate protein molecules, novel strategies continue to be reported with the final aim of improving selectivity and mildness of the reaction conditions, preserve protein integrity, and fulfill all the increasing requirements of the modern applications of protein conjugates. The targeting of the protein N-terminal alpha-amine group appears a convenient solution to the issue, emerging as a useful and unique reactive site universally present in each protein molecule. Herein, we provide an updated overview of the methodologies developed until today to afford the selective modification of proteins through the targeting of the N-terminal alpha-amine. Chemical and enzymatic strategies enabling the selective labeling of the protein N-terminal alpha-amine group are described.
Collapse
Affiliation(s)
- Lucia De Rosa
- Istituto di Biostrutture e Bioimmagini, CNR, Via Mezzocannone 16, 80134 Napoli, Italy; (L.D.R.); (R.D.S.)
| | - Rossella Di Stasi
- Istituto di Biostrutture e Bioimmagini, CNR, Via Mezzocannone 16, 80134 Napoli, Italy; (L.D.R.); (R.D.S.)
| | - Alessandra Romanelli
- Dipartimento di Scienze Farmaceutiche, Università Degli Studi di Milano, Via Venezian 21, 20133 Milano, Italy;
| | - Luca Domenico D’Andrea
- Istituto di Scienze e Tecnologie Chimiche “Giulio Natta”, CNR Via M. Bianco 9, 20131 Milano, Italy
| |
Collapse
|
4
|
Influence of nascent polypeptide positive charges on translation dynamics. Biochem J 2021; 477:2921-2934. [PMID: 32797214 DOI: 10.1042/bcj20200303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 07/17/2020] [Accepted: 07/23/2020] [Indexed: 01/05/2023]
Abstract
Protein segments with a high concentration of positively charged amino acid residues are often used in reporter constructs designed to activate ribosomal mRNA/protein decay pathways, such as those involving nonstop mRNA decay (NSD), no-go mRNA decay (NGD) and the ribosome quality control (RQC) complex. It has been proposed that the electrostatic interaction of the positively charged nascent peptide with the negatively charged ribosomal exit tunnel leads to translation arrest. When stalled long enough, the translation process is terminated with the degradation of the transcript and an incomplete protein. Although early experiments made a strong argument for this mechanism, other features associated with positively charged reporters, such as codon bias and mRNA and protein structure, have emerged as potent inducers of ribosome stalling. We carefully reviewed the published data on the protein and mRNA expression of artificial constructs with diverse compositions as assessed in different organisms. We concluded that, although polybasic sequences generally lead to lower translation efficiency, it appears that an aggravating factor, such as a nonoptimal codon composition, is necessary to cause translation termination events.
Collapse
|
5
|
Weber M, Burgos R, Yus E, Yang J, Lluch‐Senar M, Serrano L. Impact of C-terminal amino acid composition on protein expression in bacteria. Mol Syst Biol 2020; 16:e9208. [PMID: 32449593 PMCID: PMC7246954 DOI: 10.15252/msb.20199208] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/30/2022] Open
Abstract
The C-terminal sequence of a protein is involved in processes such as efficiency of translation termination and protein degradation. However, the general relationship between features of this C-terminal sequence and levels of protein expression remains unknown. Here, we identified C-terminal amino acid biases that are ubiquitous across the bacterial taxonomy (1,582 genomes). We showed that the frequency is higher for positively charged amino acids (lysine, arginine), while hydrophobic amino acids and threonine are lower. We then studied the impact of C-terminal composition on protein levels in a library of Mycoplasma pneumoniae mutants, covering all possible combinations of the two last codons. We found that charged and polar residues, in particular lysine, led to higher expression, while hydrophobic and aromatic residues led to lower expression, with a difference in protein levels up to fourfold. We further showed that modulation of protein degradation rate could be one of the main mechanisms driving these differences. Our results demonstrate that the identity of the last amino acids has a strong influence on protein expression levels.
Collapse
Affiliation(s)
- Marc Weber
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Raul Burgos
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Eva Yus
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Jae‐Seong Yang
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Maria Lluch‐Senar
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG)The Barcelona Institute of Science and TechnologyBarcelonaSpain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
- ICREABarcelonaSpain
| |
Collapse
|
6
|
Bao W, Yuan CA, Zhang Y, Han K, Nandi AK, Honig B, Huang DS. Mutli-Features Prediction of Protein Translational Modification Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1453-1460. [PMID: 28961121 DOI: 10.1109/tcbb.2017.2752703] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Post translational modification plays a significiant role in the biological processing. The potential post translational modification is composed of the center sites and the adjacent amino acid residues which are fundamental protein sequence residues. It can be helpful to perform their biological functions and contribute to understanding the molecular mechanisms that are the foundations of protein design and drug design. The existing algorithms of predicting modified sites often have some shortcomings, such as lower stability and accuracy. In this paper, a combination of physical, chemical, statistical, and biological properties of a protein have been ulitized as the features, and a novel framework is proposed to predict a protein's post translational modification sites. The multi-layer neural network and support vector machine are invoked to predict the potential modified sites with the selected features that include the compositions of amino acid residues, the E-H description of protein segments, and several properties from the AAIndex database. Being aware of the possible redundant information, the feature selection is proposed in the propocessing step in this research. The experimental results show that the proposed method has the ability to improve the accuracy in this classification issue.
Collapse
|
7
|
Androsiuk P, Jastrzębski JP, Paukszto Ł, Okorski A, Pszczółkowska A, Chwedorzewska KJ, Koc J, Górecki R, Giełwanowska I. The complete chloroplast genome of Colobanthus apetalus (Labill.) Druce: genome organization and comparison with related species. PeerJ 2018; 6:e4723. [PMID: 29844954 PMCID: PMC5970550 DOI: 10.7717/peerj.4723] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 04/17/2018] [Indexed: 02/02/2023] Open
Abstract
Colobanthus apetalus is a member of the genus Colobanthus, one of the 86 genera of the large family Caryophyllaceae which groups annual and perennial herbs (rarely shrubs) that are widely distributed around the globe, mainly in the Holarctic. The genus Colobanthus consists of 25 species, including Colobanthus quitensis, an extremophile plant native to the maritime Antarctic. Complete chloroplast (cp) genomes are useful for phylogenetic studies and species identification. In this study, next-generation sequencing (NGS) was used to identify the cp genome of C. apetalus. The complete cp genome of C. apetalus has the length of 151,228 bp, 36.65% GC content, and a quadripartite structure with a large single copy (LSC) of 83,380 bp and a small single copy (SSC) of 17,206 bp separated by inverted repeats (IRs) of 25,321 bp. The cp genome contains 131 genes, including 112 unique genes and 19 genes which are duplicated in the IRs. The group of 112 unique genes features 73 protein-coding genes, 30 tRNA genes, four rRNA genes and five conserved chloroplast open reading frames (ORFs). A total of 12 forward repeats, 10 palindromic repeats, five reverse repeats and three complementary repeats were detected. In addition, a simple sequence repeat (SSR) analysis revealed 41 (mono-, di-, tri-, tetra-, penta- and hexanucleotide) SSRs, most of which were AT-rich. A detailed comparison of C. apetalus and C. quitensis cp genomes revealed identical gene content and order. A phylogenetic tree was built based on the sequences of 76 protein-coding genes that are shared by the eleven sequenced representatives of Caryophyllaceae and C. apetalus, and it revealed that C. apetalus and C. quitensis form a clade that is closely related to Silene species and Agrostemma githago. Moreover, the genus Silene appeared as a polymorphic taxon. The results of this study expand our knowledge about the evolution and molecular biology of Caryophyllaceae.
Collapse
Affiliation(s)
- Piotr Androsiuk
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Jan Paweł Jastrzębski
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Łukasz Paukszto
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Adam Okorski
- Department of Entomology, Phytopathology and Molecular Diagnostics, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Agnieszka Pszczółkowska
- Department of Entomology, Phytopathology and Molecular Diagnostics, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | | | - Justyna Koc
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Ryszard Górecki
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| | - Irena Giełwanowska
- Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
| |
Collapse
|
8
|
Bao W, You ZH, Huang DS. CIPPN: computational identification of protein pupylation sites by using neural network. Oncotarget 2017; 8:108867-108879. [PMID: 29312575 PMCID: PMC5752488 DOI: 10.18632/oncotarget.22335] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 09/03/2017] [Indexed: 11/25/2022] Open
Abstract
Recently, experiments revealed the pupylation to be a signal for the selective regulation of proteins in several serious human diseases. As one of the most significant post translational modification in the field of biology and disease, pupylation has the ability to playing the key role in the regulation various diseases’ biological processes. Meanwhile, effectively identification such type modification will be helpful for proteins to perform their biological functions and contribute to understanding the molecular mechanism, which is the foundation of drug design. The existing algorithms of identification such types of modified sites often have some defects, such as low accuracy and time-consuming. In this research, the pupylation sites’ identification model, CIPPN, demonstrates better performance than other existing approaches in this field. The proposed predictor achieves Acc value of 89.12 and Mcc value of 0.7949 in 10-fold cross-validation tests in the Pupdb Database (http://cwtung.kmu.edu.tw/pupdb). Significantly, such algorithm not only investigates the sequential, structural and evolutionary hallmarks around pupylation sites but also compares the differences of pupylation from the environmental, conservative and functional characterization of substrates. Therefore, the proposed feature description approach and algorithm results prove to be useful for further experimental investigation of such modification’s identification.
Collapse
Affiliation(s)
- Wenzheng Bao
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
9
|
Santiago-Frangos A, Jeliazkov JR, Gray JJ, Woodson SA. Acidic C-terminal domains autoregulate the RNA chaperone Hfq. eLife 2017; 6:27049. [PMID: 28826489 PMCID: PMC5606850 DOI: 10.7554/elife.27049] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 08/03/2017] [Indexed: 11/15/2022] Open
Abstract
The RNA chaperone Hfq is an Sm protein that facilitates base pairing between bacterial small RNAs (sRNAs) and mRNAs involved in stress response and pathogenesis. Hfq possesses an intrinsically disordered C-terminal domain (CTD) that may tune the function of the Sm domain in different organisms. In Escherichia coli, the Hfq CTD increases kinetic competition between sRNAs and recycles Hfq from the sRNA-mRNA duplex. Here, de novo Rosetta modeling and competitive binding experiments show that the acidic tip of the E. coli Hfq CTD transiently binds the basic Sm core residues necessary for RNA annealing. The CTD tip competes against non-specific RNA binding, facilitates dsRNA release, and prevents indiscriminate DNA aggregation, suggesting that this acidic peptide mimics nucleic acid to auto-regulate RNA binding to the Sm ring. The mechanism of CTD auto-inhibition predicts the chaperone function of Hfq in bacterial genera and illuminates how Sm proteins may evolve new functions.
Collapse
Affiliation(s)
- Andrew Santiago-Frangos
- Cell, Molecular and Developmental Biology and Biophysics Program, Johns Hopkins University, Baltimore, United States
| | - Jeliazko R Jeliazkov
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, United States
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, United States
| | - Sarah A Woodson
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, United States
| |
Collapse
|
10
|
Requião RD, Fernandes L, de Souza HJA, Rossetto S, Domitrovic T, Palhano FL. Protein charge distribution in proteomes and its impact on translation. PLoS Comput Biol 2017; 13:e1005549. [PMID: 28531225 PMCID: PMC5460897 DOI: 10.1371/journal.pcbi.1005549] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 06/06/2017] [Accepted: 05/02/2017] [Indexed: 11/25/2022] Open
Abstract
As proteins are synthesized, the nascent polypeptide must pass through a negatively charged exit tunnel. During this stage, positively charged stretches can interact with the ribosome walls and slow the translation. Therefore, charged polypeptides may be important factors that affect protein expression. To determine the frequency and distribution of positively and negatively charged stretches in different proteomes, the net charge was calculated for every 30 consecutive amino acid residues, which corresponds to the length of the ribosome exit tunnel. The following annotated and reviewed proteins in the UniProt database (Swiss-Prot) were analyzed: 551,705 proteins from different organisms and a total of 180 million protein segments. We observed that there were more negative than positive stretches and that super-charged positive sequences (i.e., net charges ≥ 14) were underrepresented in the proteomes. Overall, the proteins were more positively charged at their N-termini and C-termini, and this feature was present in most organisms and subcellular localizations. To investigate whether the N-terminal charges affect the elongation rates, previously published ribosomal profiling data obtained from S. cerevisiae, without translation-interfering drugs, were analyzed. We observed a nonlinear effect of the charge on the ribosome occupancy in which values ≥ +5 and ≤ -6 showed increased and reduced ribosome densities, respectively. These groups also showed different distributions across 80S monosomes and polysomes. Basic polypeptides are more common within short proteins that are translated by monosomes, whereas negative stretches are more abundant in polysome-translated proteins. These findings suggest that the nascent peptide charge impacts translation and can be one of the factors that regulate translation efficiency and protein expression. Which factors shape the sequence of amino acids that will form a protein? The biochemical features of amino acids, such as their charge and hydrophobicity, are important drivers of protein tridimensional folding, which creates interaction sites for binding other molecules and directs proteins to specific cellular compartments. These features all impact the activity of the proteins after they are produced. Another less obvious factor that influences the protein’s primary structure may be how efficiently a given amino acid sequence is produced by the ribosome. It is known that a repetitive stretch of positively charged amino acids may interact with the negative charges in the ribosome exit tunnel, slowing, or even halting, translation. By analyzing the charge of protein stretches in different organisms, we observed that proteins tend to present positively charged stretches at their extremities, and high charge values can slow (for positive charges) or speed (for negative charges) translation. An interesting consequence of this trend is that proteins that are translated in high quantities by several ribosomes at the same RNA (polysomes) tend to have more negatively charged stretches than proteins that are translated by a single ribosome per RNA (monosomes).
Collapse
Affiliation(s)
- Rodrigo D. Requião
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Luiza Fernandes
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Henrique José Araujo de Souza
- Programa de Pós-Graduação em Informática, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Silvana Rossetto
- Programa de Pós-Graduação em Informática, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tatiana Domitrovic
- Departamento de Virologia, Instituto de Microbiologia Paulo de Góes, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
- * E-mail: (FLP); (TD)
| | - Fernando L. Palhano
- Programa de Biologia Estrutural, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
- * E-mail: (FLP); (TD)
| |
Collapse
|
11
|
Self-Referential Encoding on Modules of Anticodon Pairs-Roots of the Biological Flow System. Life (Basel) 2017; 7:life7020016. [PMID: 28383509 PMCID: PMC5492138 DOI: 10.3390/life7020016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 03/24/2017] [Accepted: 03/26/2017] [Indexed: 12/22/2022] Open
Abstract
The proposal that the genetic code was formed on the basis of (proto)tRNA Dimer-Directed Protein Synthesis is reviewed and updated. The tRNAs paired through the anticodon loops are an indication on the process. Dimers are considered mimics of the ribosomes-structures that hold tRNAs together and facilitate the transferase reaction, and of the translation process-anticodons are at the same time codons for each other. The primitive protein synthesis system gets stabilized when the product peptides are stable and apt to bind the producers therewith establishing a self-stimulating production cycle. The chronology of amino acid encoding starts with Glycine and Serine, indicating the metabolic support of the Glycine-Serine C1-assimilation pathway, which is also consistent with evidence on origins of bioenergetics mechanisms. Since it is not possible to reach for substrates simpler than C1 and compounds in the identified pathway are apt for generating the other central metabolic routes, it is considered that protein synthesis is the beginning and center of a succession of sink-effective mechanisms that drive the formation and evolution of the metabolic flow system. Plasticity and diversification of proteins construct the cellular system following the orientation given by the flow and implementing it. Nucleic acid monomers participate in bioenergetics and the polymers are conservative memory systems for the synthesis of proteins. Protoplasmic fission is the final sink-effective mechanism, part of cell reproduction, guaranteeing that proteins don't accumulate to saturation, which would trigger inhibition.
Collapse
|
12
|
Chen D, Disotuar MM, Xiong X, Wang Y, Chou DHC. Selective N-terminal functionalization of native peptides and proteins. Chem Sci 2017; 8:2717-2722. [PMID: 28553506 PMCID: PMC5426342 DOI: 10.1039/c6sc04744k] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 01/06/2017] [Indexed: 12/12/2022] Open
Abstract
We report an efficient, highly selective modification on the N-terminal amines of peptides and proteins using aldehyde derivatives via reductive alkylation. After modification of a library of unprotected peptides XYSKEASAL (X varies over 20 natural amino acids) by benzaldehyde at room temperature, pH 6.1 resulted in excellent N-terminal selectivity (α-amino/ε-amino: >99 : 1) and high reaction conversion for 19 out of the 20 peptides. Under similar conditions, highly selective N-terminal modifications were achieved with a variety of aldehydes. Furthermore, N-termini of native peptides and proteins could be selectively modified under the same conditions to introduce bioorthogonal functional groups. Using human insulin as an example, we further demonstrated that preserving the positive charge in the N-terminus using reductive alkylation instead of acylation leads to a 5-fold increase in bioactivity. In summary, our reported method provides a universal strategy for site-selective N-terminal functionalization in native peptides and proteins.
Collapse
Affiliation(s)
- Diao Chen
- Department of Biochemistry , University of Utah , 15 N. Medical Drive East 4100 , Salt Lake City , UT 84112 , USA .
| | - Maria M Disotuar
- Department of Biochemistry , University of Utah , 15 N. Medical Drive East 4100 , Salt Lake City , UT 84112 , USA .
| | - Xiaochun Xiong
- Department of Biochemistry , University of Utah , 15 N. Medical Drive East 4100 , Salt Lake City , UT 84112 , USA .
| | - Yuanxiang Wang
- Department of Biochemistry , University of Utah , 15 N. Medical Drive East 4100 , Salt Lake City , UT 84112 , USA .
| | - Danny Hung-Chieh Chou
- Department of Biochemistry , University of Utah , 15 N. Medical Drive East 4100 , Salt Lake City , UT 84112 , USA .
| |
Collapse
|
13
|
Bao W, Jiang Z. Prediction of Lysine Pupylation Sites with Machine Learning Methods. INTELLIGENT COMPUTING THEORIES AND APPLICATION 2017. [DOI: 10.1007/978-3-319-63312-1_36] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
14
|
Charneski CA, Hurst LD. Positive Charge Loading at Protein Termini Is Due to Membrane Protein Topology, Not a Translational Ramp. Mol Biol Evol 2013; 31:70-84. [DOI: 10.1093/molbev/mst169] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
15
|
Hansted JG, Pietikäinen L, Hög F, Sperling-Petersen HU, Mortensen KK. Expressivity tag: a novel tool for increased expression in Escherichia coli. J Biotechnol 2011; 155:275-83. [PMID: 21801766 DOI: 10.1016/j.jbiotec.2011.07.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Revised: 07/07/2011] [Accepted: 07/11/2011] [Indexed: 11/18/2022]
Abstract
Protein expression in Escherichia coli is rarely trivial as low expression and insolubility are common problems. In this work we define a fusion partner, which increases expression levels similarly to the distinct function of solubility and affinity tags. This type of fusion tag we term an expressivity tag. Our work is based on earlier observations where 3' deletions of the InfB gene displays strongly increased expression levels. We have constructed progressively shortened fragments of the InfB(1-471) gene and fused gene fragments to a gfp reporter gene. A 5-fold increase in GFP expression was seen for an optimal 21 nucleotide InfB(1-21) sequence compared to gfp independently. We defined the InfB(1-21) sequence as an expressivity tag. The tag was tested for improved expression of two biotechnological important proteins streptavidin and a single chain antibody (scFv). Expression of both streptavidin and scFv(L32) was improved as evaluated by SDS-PAGE. Calculation of folding energies in the translation initiation region gave higher free energies for gfp, L32 and streptavidin when linked to InfB(1-21) than independently. InfB(1-21) did however not improve the codon usage or codon adaptation index. The expressivity tag is an important addition to the box of tools available for optimizing heterologous protein expression.
Collapse
Affiliation(s)
- Jon Gade Hansted
- Department of Molecular Biology, Aarhus University, Gustav Wieds Vej 10C, DK-8000 Aarhus C, Denmark
| | | | | | | | | |
Collapse
|
16
|
Asada M, Hirakawa H, Kuhara S. Classification of Bacteria Based on the Biases of Terminal Amino Acid Residues. Protein J 2011; 30:290-7. [DOI: 10.1007/s10930-011-9332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Takahashi H, Yokota A, Takenawa T, Iwakura M. Sequence Perturbation Analysis: Addressing Amino Acid Indices to Elucidate the C-Terminal Role of Escherichia Coli Dihydrofolate Reductase. ACTA ACUST UNITED AC 2009; 145:751-62. [DOI: 10.1093/jb/mvp034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
18
|
Guimarães RC, Moreira CHC, de Farias ST. A self-referential model for the formation of the genetic code. Theory Biosci 2008; 127:249-70. [PMID: 18493811 DOI: 10.1007/s12064-008-0043-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2007] [Accepted: 04/11/2008] [Indexed: 10/22/2022]
Abstract
A model for the formation of the genetic code is presented where protein synthesis is directed initially by tRNA dimers. Proteins that are resistant to degradation and efficient RNA-binders protect the RNAs. Replication becomes elongational producing poly-tRNAs from which the mRNAs and ribosomes are derived. Attributions are successively fixed to tRNAs paired through the perfect palindromic anticodons, with the same bases at the extremities (5'ANA: UNU 3'; GNG: CNC; principal dinucleotides, pDiN). The 5' degeneracy is then developed. The first pairs to be encoded correspond to the hydropathy correlation outliers (Gly-CC: Pro-GG and Ser-GA: Ser-CU) and to the sector of homogeneous pDiN, composed by two pyrimidines or two purines. These amino acids are preferred in the N-ends of proteins, stabilizers of proteins against catabolism and strong RNA-binders. The next pairs complete the sector of homogeneous pDiN (Asp, Glu-UC: Leu-AG and Asn, Lys-UU: Phe-AA). This set of nine amino acids forms the protein cores with the predominant aperiodic conformation. Next enter the pairs with mixed pDiN (one purine and one pyrimidine), the RY attributions composing the protein N-ends and the YR attributions the C-ends. The last pair contains the main punctuation signs (Ile, Met, iMet-AU: Tyr, Stop-UA). The model indicates that genetic information emerged during the process of formation of the coding/decoding system and that genes were defined by the proteins. Stable proteins constructed the nucleoprotein system by binding to the RNAs that produced them. In this circular rationale, genes are memories in a metabolic system for production of proteins that stabilize it. The simplicity and the highly deterministic character of the process suggest that the Last Universal Common Ancestor populations could be composed, in early stages, of lineages bearing similar genetic codes.
Collapse
Affiliation(s)
- Romeu Cardoso Guimarães
- Dept. Biologia Geral, Inst. Ciências Biológicas, Univ. Federal de Minas Gerais, Belo Horizonte, MG , 31270.901, Brazil.
| | | | | |
Collapse
|
19
|
C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families. BMC Genomics 2007; 8:191. [PMID: 17594486 PMCID: PMC1929074 DOI: 10.1186/1471-2164-8-191] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 06/26/2007] [Indexed: 12/28/2022] Open
Abstract
Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined. Conclusion We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.
Collapse
|
20
|
Li W, Zou H, Tao M. Sequences downstream of the start codon and their relations to G + C content and optimal growth temperature in prokaryotic genomes. Antonie van Leeuwenhoek 2007; 92:417-27. [PMID: 17562217 DOI: 10.1007/s10482-007-9170-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Accepted: 03/30/2007] [Indexed: 11/29/2022]
Abstract
The mechanism of translation initiation is responsible for shaping the mRNA sequences downstream of the start codon. However, this region has not been systematically analyzed in prokaryotes. We used sequence logos and statistic methods to analyze the patterns of overrepresented sequences in this region for 125 species of bacteria and 23 species of archaea. The specific positions are compared to the first 33 amino acids in the proteins. At the 2nd amino acid position, Lys, Ser or Thr is highly overrepresented for 68% to 84% of the genomes examined and Ala is highly overrepresented for 57% of the genomes. Overrepresentation of Lys2 is negatively correlated with the G + C content and overrepresentation of Ser2 or Thr2 is positively correlated with the G + C content of genomes. Ile at the 4th to the 8th positions were found to be overrepresented for 91% of the genomes analyzed and this seemed to be conserved for both bacteria and archaea. Organisms growing at high temperatures have relatively low extent of nucleotides bias at 5' termini of open reading frames (ORFs). The extent of overrepresenting A and underrepresenting G at ORF 5' termini is reduced in thermophiles and hyperthermophiles for both archaea and bacteria.
Collapse
Affiliation(s)
- Wencheng Li
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
| | | | | |
Collapse
|
21
|
Farias STD, Moreira CHC, Guimarães RC. Structure of the genetic code suggested by the hydropathy correlation between anticodons and amino acid residues. ORIGINS LIFE EVOL B 2007; 37:83-103. [PMID: 16955335 DOI: 10.1007/s11084-006-9008-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Accepted: 11/08/2005] [Indexed: 10/24/2022]
Abstract
The correlation between hydropathies of anticodons and amino acids, detected by other authors utilizing scales of amino acid molecules in solution, was improved with the utilization of scales of amino acid residues in proteins. Three partitions were discerned in the correlation plot with the principal dinucleotides of anticodons (pDiN, excluding the wobble position). (a) The set of outliers of the correlation: Gly-CC, Pro-GG, Ser-GA and Ser-CU. The amino acids are consistently small, hydro-apathetic, stabilizers of protein N-ends, preferred in aperiodic protein conformations and belong to synthetases class II. The pDiN sequences are representative of the homogeneous sector (triplets NRR and NYY), distinguished from the mixed sector (triplets NRY and NYR), that depict a 70% correspondence to the synthetases class II and I, respectively. The triplet pairs proposed to be responsible for the coherence in the set of outliers are of the palindromic kind, where the lateral bases are the same, CCC: GGG and AGA: UCU. This suggests that UCU previously belonged to Ser, adding to other indications that the attribution of Arg to YCU was due to an expansion of the Arg-tRNA synthetase specificity. The other attributions produced two correlation sets. (b) One corresponds to the remaining pDiN of the homogeneous sector, containing both synthetase classes; its regression line overlapped the one formed by the remaining attributions to class II. (c) The other contains the pDiN of the mixed sector and produced steeper slopes, especially with the class I attributions. It is suggested that the correlation was established when the amino acid composition of the protein synthetases became progressively enriched and that the set of outliers were the earliest to have been fixed.
Collapse
Affiliation(s)
- Sávio Torres de Farias
- Department Biologia Geral, Institute Ciências Biológicas, University Federal de Minas Gerais, 31270.901 Belo Horizonte, MG, Brazil
| | | | | |
Collapse
|
22
|
Kochetov AV. Alternative translation start sites and their significance for eukaryotic proteomes. Mol Biol 2006. [DOI: 10.1134/s0026893306050049] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
23
|
Bogdanov AA, Karpov VL. RNA-protein interactions at the initial and terminal stages of protein biosynthesis as investigated by Lev Kisselev (on the occasion of his 70th anniversary). BIOCHEMISTRY (MOSCOW) 2006; 71:915-24. [PMID: 16978156 DOI: 10.1134/s0006297906080141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
This review highlights studies by Lev L. Kisselev and his colleagues on the initial and terminal stages of protein biosynthesis, which cover the period of the last 45 years (1961-2006). They investigated spatial structure of tRNAs, structure and functions of aminoacyl-tRNA-synthetases of higher organisms, and the final step of protein synthesis, termination of translation. L. Kisselev and his team have made three major contributions to these fields of molecular biology; (i) they proposed the hypothesis on the role of anticodon triplet of tRNA in recognition by cognate aminoacyl-tRNA synthetase, which has been experimentally confirmed and is now included in textbooks; (ii) identified primary structures and functions of two eukaryotic protein factors (eRF1 and eRF3) playing a pivotal role in translation termination; (iii) characterized a structural basis for stop codon recognition by eRF1 within the ribosome and discovered the negative structural elements of eRF1, limiting its recognition of one or two stop-codons.
Collapse
Affiliation(s)
- A A Bogdanov
- Lomonosov Moscow State University, Moscow, 119992, Russia.
| | | |
Collapse
|
24
|
Abstract
The two ends of each protein are known as the amino (N-) and carboxyl (C-) termini. Short signatures in a protein's termini often carry vital cellular function. No systematic research has been conducted to address the importance of short signatures (3 to 10 amino acids) in protein termini at the proteomic level. Specifically, it is unknown whether such signatures are evolutionarily conserved, and if so, whether this conservation confers shared biological functions. Current signature detection methods fail to detect such short signatures due to inadequate statistical scores. The findings presented in this study strongly support the notion that functional significance of protein sets may be captured by short signatures at their termini. A positional search method was applied to over one million proteins from the UniProt database. The result is a collection of about a thousand significant signature groups (SIGs) that include previously identified as well as many novel signatures in protein termini. These SIGs represent protein sets with minimal or no overall sequence similarity excepting the similarity at their termini. The most significant SIGs are assigned by their strong correspondence to functional annotations derived from external databases such as Gene Ontology. Each of the SIGs is associated with the statistical significance of its functional association. These SIGs provide a valuable source for testing previously overlooked signatures in protein termini and allow for the investigation of the role played by such signatures throughout evolution. The SIGs archive and advanced search options are available at http://www.proteus.cs.huji.ac.il.
Collapse
Affiliation(s)
- Iris Bahir
- Department of Biological Chemistry, Institute of life Sciences, The Hebrew University of Jerusalem, Israel
| | | |
Collapse
|
25
|
Cridge AG, Major LL, Mahagaonkar AA, Poole ES, Isaksson LA, Tate WP. Comparison of characteristics and function of translation termination signals between and within prokaryotic and eukaryotic organisms. Nucleic Acids Res 2006; 34:1959-73. [PMID: 16614446 PMCID: PMC1435984 DOI: 10.1093/nar/gkl074] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Six diverse prokaryotic and five eukaryotic genomes were compared to deduce whether the protein synthesis termination signal has common determinants within and across both kingdoms. Four of the six prokaryotic and all of the eukaryotic genomes investigated demonstrated a similar pattern of nucleotide bias both 5′ and 3′ of the stop codon. A preferred core signal of 4 nt was evident, encompassing the stop codon and the following nucleotide. Codons decoded by hyper-modified tRNAs were over-represented in the region 5′ to the stop codon in genes from both kingdoms. The origin of the 3′ bias was more variable particularly among the prokaryotic organisms. In both kingdoms, genes with the highest expression index exhibited a strong bias but genes with the lowest expression showed none. Absence of bias in parasitic prokaryotes may reflect an absence of pressure to evolve more efficient translation. Experiments were undertaken to determine if a correlation existed between bias in signal abundance and termination efficiency. In Escherichia coli signal abundance correlated with termination efficiency for UAA and UGA stop codons, but not in mammalian cells. Termination signals that were highly inefficient could be made more efficient by increasing the concentration of the cognate decoding release factor.
Collapse
Affiliation(s)
| | | | | | | | - Leif A. Isaksson
- Department of Genetics, Microbiology and Toxicology, Stockholm UniversityS-10691 Stockholm, Sweden
| | - Warren P. Tate
- To whom correspondence should be addressed. Tel: +64 3 479 7864; Fax: +64 3 479 7866;
| |
Collapse
|
26
|
Kuznetsov IB, Hwang S. A novel sensitive method for the detection of user-defined compositional bias in biological sequences. Bioinformatics 2006; 22:1055-63. [PMID: 16500936 DOI: 10.1093/bioinformatics/btl049] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias. RESULTS We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.
Collapse
Affiliation(s)
- Igor B Kuznetsov
- Gen*NY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, State University of New York One Discovery Drive, Rensselaer, NY 12144, USA.
| | | |
Collapse
|
27
|
Volkova OA, Titov SE, Kochetov AV. Correlation between the contexts of the translation initiation signal and the N-terminal sequence of arabidopsis, yeast, mouse, and human proteins. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906070037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
28
|
Laio A, Micheletti C. Are structural biases at protein termini a signature of vectorial folding? Proteins 2005; 62:17-23. [PMID: 16281293 DOI: 10.1002/prot.20712] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Experimental investigations of the biosynthesis of a number of proteins have pointed out that part of the native structure may be acquired already during translation. We carried out a comprehensive statistical analysis of some average structural properties of proteins that have been put forward as possible signatures of this progressive buildup process. Contrary to a widespread belief, we found that there is no major propensity of the amino acids to form contacts with residues that are closer to the N-terminus. Moreover, we found that the C-terminus is significantly more compact and locally organized than the N-terminus. This bias, though, is unlikely to be related to vectorial effects, since it correlates with subtle differences in the primary sequence. These findings indicate that even if proteins acquire their structure vectorially, no signature of this seems to be detectable in their average structural properties.
Collapse
Affiliation(s)
- Alessandro Laio
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, Lugano, Switzerland
| | | |
Collapse
|
29
|
Krishna MMG, Englander SW. The N-terminal to C-terminal motif in protein folding and function. Proc Natl Acad Sci U S A 2005; 102:1053-8. [PMID: 15657118 PMCID: PMC545867 DOI: 10.1073/pnas.0409114102] [Citation(s) in RCA: 123] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Essentially all proteins known to fold kinetically in a two-state manner have their N- and C-terminal secondary structural elements in contact, and the terminal elements often dock as part of the experimentally measurable initial folding step. Conversely, all N-C no-contact proteins studied so far fold by non-two-state kinetics. By comparison, about half of the single domain proteins in the Protein Data Bank have their N- and C-terminal elements in contact, more than expected on a random probability basis but not nearly enough to account for the bias in protein folding. Possible reasons for this bias relate to the mechanisms for initial protein folding, native state stability, and final turnover.
Collapse
Affiliation(s)
- Mallela M G Krishna
- Johnson Research Foundation, Department of Biochemistry and Biophysics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-6059, USA.
| | | |
Collapse
|
30
|
Abstract
The availability of complete genome sequences enables the statistical analysis of sequence features without significant database-imposed bias. The carboxyl termini of proteins often contain regions associated with protein targeting and enhanced translational termination. We analyzed the frequency of occurrence of C-terminal tripeptides in representative archaeal, bacterial, and eukaryotic genomes. The sequence distribution in prokaryotic genomes nearly matches that generated by the randomization of the observed tripeptide set. In contrast, eukaryotic genomes contain large numbers of overrepresented sequences. Some of these correspond to highly repeated sequences from either duplicated endogenous genes or transposon open reading frames. Gratifyingly, others represent previously known targeting signals or sequences associated with an increase in translational termination efficiency. However, a number of overrepresented tripeptides have not been previously noted and may represent novel functional sequences. For example, the sequence XSS may enhance translational termination efficiency in plants, whereas FWC may be a targeting or processing signal for certain amino acid permeases in yeast.
Collapse
Affiliation(s)
- Gregory J Gatto
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | |
Collapse
|
31
|
Abstract
Sequence motifs at the protein carboxyl termini in linear polypeptides are uniquely positioned and functionally capable of serving as recognition signatures for a variety of cellular and biochemical processes. At the proteome level, it is unknown whether and what carboxyl-terminal sequences might be particularly conserved, which may be directly related to specific biological functions shared among certain groups of proteins. To investigate this question, we analyzed the terminal sequences of reported yeast open reading frames, which presumably constitute the predicted, entire proteome of Saccharomyces cerevisiae. The results show that there are both known and novel terminal sequences. They are conserved at a frequency similar to that of functionally important, experimentally confirmed signals such as the HDEL sequence that mediates the endoplasmic reticulum retention and/or retrieval. The findings support the notion that there may be additional carboxyl-terminal signals, and the conserved motifs could be experimentally tested for currently unknown biological functions. Similar analyses were also applied to the limited proteome databases of other organisms with overall consistent findings. Therefore, indexing a proteome according to its carboxyl-terminal sequences may provide a means for functional classification and determination of proteins.
Collapse
Affiliation(s)
- Jean-Ju Chung
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | |
Collapse
|
32
|
Scheglmann D, Werner K, Eiselt G, Klinger R. Role of paired basic residues of protein C-termini in phospholipid binding. Protein Eng Des Sel 2002; 15:521-8. [PMID: 12082171 DOI: 10.1093/protein/15.6.521] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
It is a well known phenomenon that the occurrence of several distinct amino acids at the C-terminus of proteins is non-random. We have analysed all Saccharomyces cerevisiae proteins predicted by computer databases and found lysine to be the most frequent residue both at the last (-1) and at the penultimate amino acid (-2) positions. To test the hypothesis that C-terminal basic residues efficiently bind to phospholipids we randomly expressed GST-fusion proteins from a yeast genomic library. Fifty-four different peptide fragments were found to bind phospholipids and 40% of them contained lysine/arginine residues at the (-1) or (-2) positions. One peptide showed high sequence similarity with the yeast protein Sip18p. Mutational analysis revealed that both C-terminal lysine residues of Sip18p are essential for phospholipid-binding in vitro. We assume that basic amino acid residues at the (-1) and (-2) positions in C-termini are suitable to attach the C-terminus of a given protein to membrane components such as phospholipids, thereby stabilizing the spatial structure of the protein or contributing to its subcellular localization. This mechanism could be an additional explanation for the C-terminal amino acid bias observed in proteins of several species.
Collapse
Affiliation(s)
- Dietrich Scheglmann
- Institute for Biochemistry II, Medical Faculty of the Friedrich Schiller University Jena, Nonnenplan 2, D-07743 Jena, Germany
| | | | | | | |
Collapse
|
33
|
Maurer-Stroh S, Eisenhaber B, Eisenhaber F. N-terminal N-myristoylation of proteins: refinement of the sequence motif and its taxon-specific differences. J Mol Biol 2002; 317:523-40. [PMID: 11955007 DOI: 10.1006/jmbi.2002.5425] [Citation(s) in RCA: 150] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
N-terminal N-myristoylation is a lipid anchor modification of eukaryotic and viral proteins targeting them to membrane locations, thus changing the cellular function of modified proteins. Protein myristoylation is critical in many pathways; e.g. in signal transduction, apoptosis, or alternative extracellular protein export. The myristoyl-CoA:protein N-myristoyltransferase (NMT) recognizes the sequence motif of appropriate substrate proteins at the N terminus and attaches the lipid moiety to the absolutely required N-terminal glycine residue. Reliable recognition of capacity for N-terminal myristoylation from the substrate protein sequence alone is desirable for proteome-wide function annotation projects but the existing PROSITE motif is not practical, since it produces huge numbers of false positive and even some false negative predictions. As a first step towards a new prediction method, it is necessary to refine the sequence motif coding for N-terminal N-myristoylation. Relying on the in-depth study of the amino acid sequence variability of substrate proteins, on binding site analyses in X-ray structures or 3D homology models for NMTs from various taxa, and on consideration of biochemical data extracted from the scientific literature, we found indications that, at least within a complete substrate protein, the N-terminal 17 protein residues experience different types of variability restrictions. We identified three motif regions: region 1 (positions 1-6) fitting the binding pocket; region 2 (positions 7-10) interacting with the NMT's surface at the mouth of the catalytic cavity; and region 3 (positions 11-17) comprising a hydrophilic linker. Each region was characterized by physical requirements to single sequence positions or groups of positions regarding volume, polarity, backbone flexibility and other typical properties of amino acids (http://mendel.imp.univie.ac.at/myristate/). These specificity differences are confined partly to taxonomic ranges and are proposed for the design of NMT inhibitors in pathogenic fungal and protozoan systems including Aspergillus fumigatus, Leishmania major, Trypanosoma cruzi, Trypanosoma brucei, Giardia intestinalis, Entamoeba histolytica, Pneumocystis carinii, Strongyloides stercoralis and Schistosoma mansoni. An exhaustive search for NMT-homologues led to the discovery of two putative entomopoxviral NMTs.
Collapse
|
34
|
Stenström CM, Holmgren E, Isaksson LA. Cooperative effects by the initiation codon and its flanking regions on translation initiation. Gene 2001; 273:259-65. [PMID: 11595172 DOI: 10.1016/s0378-1119(01)00584-4] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The purine-rich Shine-Dalgarno (SD) sequence located a few bases upstream of the mRNA initiation codon supports translation initiation by complementary binding to the anti-SD in the 16S rRNA, close to its 3' end. AUG is the canonical initiation codon but the weaker UUG and GUG codons are also used for a minority of genes. The codon sequence of the downstream region (DR), including the +2 codon immediately following the initiation codon, is also important for initiation efficiency. We have studied the interplay between these three initiation determinants on gene expression in growing Escherichia coli. One optimal SD sequence (SD(+)) and one lacking any apparent complementarity to the anti-SD in 16S rRNA (SD(-)) were analyzed. The SD(+) and DR sequences affected initiation in a synergistic manner and large differences in the effects were found. The gene expression level associated with the most efficient of these DRs together with SD(-) was comparable to that of other DRs together with SD(+). The otherwise weak initiation codon UUG, but not GUG, was comparable with AUG in strength, if placed in the context of two of the DRs. The +2 codon was one, but not the only, determinant for this unexpectedly high efficiency of UUG.
Collapse
Affiliation(s)
- C M Stenström
- Department of Microbiology, Stockholm University, S-106 91, Stockholm, Sweden
| | | | | |
Collapse
|
35
|
Abstract
An analysis of amino acid composition of small, naturally occurring peptides ranging in size from 3 to 50 residues has been carried out. The purpose of the study is to determine whether differential trends in amino acid usage exist for small peptides compared to larger polypeptides and proteins. Results indicate that Cys, Trp, and Phe are substantially more frequent in peptides compared to their abundance in proteins at large. Aliphatic hydrophobic residues, particularly Leu and Ile, are somewhat underrepresented, while the frequency of Glu is significantly reduced. The shorter peptides are also more frequently neutral and become increasingly charged as their size increases.
Collapse
Affiliation(s)
- H O Villar
- Telik, Inc., 750 Gateway Blvd., South San Francisco, CA 94080, USA.
| | | |
Collapse
|