1
|
Mac Donagh J, Marchesini A, Spiga A, Fallico MJ, Arrías PN, Monzon AM, Vagiona AC, Gonçalves-Kulik M, Mier P, Andrade-Navarro MA. Structured Tandem Repeats in Protein Interactions. Int J Mol Sci 2024; 25:2994. [PMID: 38474241 DOI: 10.3390/ijms25052994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
Tandem repeats (TRs) in protein sequences are consecutive, highly similar sequence motifs. Some types of TRs fold into structural units that pack together in ensembles, forming either an (open) elongated domain or a (closed) propeller, where the last unit of the ensemble packs against the first one. Here, we examine TR proteins (TRPs) to see how their sequence, structure, and evolutionary properties favor them for a function as mediators of protein interactions. Our observations suggest that TRPs bind other proteins using large, structured surfaces like globular domains; in particular, open-structured TR ensembles are favored by flexible termini and the possibility to tightly coil against their targets. While, intuitively, open ensembles of TRs seem prone to evolve due to their potential to accommodate insertions and deletions of units, these evolutionary events are unexpectedly rare, suggesting that they are advantageous for the emergence of the ancestral sequence but are early fixed. We hypothesize that their flexibility makes it easier for further proteins to adapt to interact with them, which would explain their large number of protein interactions. We provide insight into the properties of open TR ensembles, which make them scaffolds for alternative protein complexes to organize genes, RNA and proteins.
Collapse
Affiliation(s)
- Juan Mac Donagh
- Science and Technology Department, National University of Quilmes, Bernal B1876, Argentina
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
| | - Abril Marchesini
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
- Biotechnology and Molecular Biology Institute (IBBM, UNLP-CONICET), Faculty of Exact Sciences, University of La Plata, La Plata 1900, Argentina
| | - Agostina Spiga
- Science and Technology Department, National University of Quilmes, Bernal B1876, Argentina
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
| | - Maximiliano José Fallico
- Laboratory of Bioactive Compound Research and Development, Faculty of Exact Sciences, University of La Plata, La Plata 1900, Argentina
| | - Paula Nazarena Arrías
- Department of Biomedical Sciences, University of Padova, Via U. Bassi 58/b, 35121 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Aimilia-Christina Vagiona
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Mariane Gonçalves-Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
2
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
3
|
Elena-Real CA, Mier P, Sibille N, Andrade-Navarro MA, Bernadó P. Structure-function relationships in protein homorepeats. Curr Opin Struct Biol 2023; 83:102726. [PMID: 37924569 DOI: 10.1016/j.sbi.2023.102726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 11/06/2023]
Abstract
Homorepeats (or polyX), protein segments containing repetitions of the same amino acid, are abundant in proteomes from all kingdoms of life and are involved in crucial biological functions as well as several neurodegenerative and developmental diseases. Mainly inserted in disordered segments of proteins, the structure/function relationships of homorepeats remain largely unexplored. In this review, we summarize present knowledge for the most abundant homorepeats, highlighting the role of the inherent structure and the conformational influence exerted by their flanking regions. Recent experimental and computational methods enable residue-specific investigations of these regions and promise novel structural and dynamic information for this elusive group of proteins. This information should increase our knowledge about the structural bases of phenomena such as liquid-liquid phase separation and trinucleotide repeat disorders.
Collapse
Affiliation(s)
- Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France. https://twitter.com/carloselenareal
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Nathalie Sibille
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France.
| |
Collapse
|
4
|
Mier P, Andrade-Navarro MA. The nucleotide landscape of polyXY regions. Comput Struct Biotechnol J 2023; 21:5408-5412. [PMID: 38022702 PMCID: PMC10652141 DOI: 10.1016/j.csbj.2023.10.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 10/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
PolyXY regions are compositionally biased regions composed of two different amino acids. They are classified according to the arrangement of the two amino acid types 'X' and 'Y' into direpeats (composed of alternating amino acids, e.g. 'XYXYXY'), joined (composed of two consecutive stretches of each amino acid, e.g. 'XXXYYY') and shuffled (other arrangements, e.g., 'XYXXYY'). They have been characterized at the amino acid level in all domains of life, and are described as often found within intrinsically disordered regions. Since DNA replication slippage has been proposed as a driver of repeat variation, and given that some polyXY have a repetitive nature, we hypothesized that characterizing the nucleotide coding of various types of polyXY could give hints about their origin and evolution. To test this, we obtained all polyXY regions in the human transcriptome, categorized them, and studied their coding nucleotide sequences. We observed that polyXY exacerbates the codon biases, and that the similarity between the X and Y codons is higher than in the background proteome. Our results support a general mechanism of emergence and evolution of polyXY from single-codon polyX. PolyXY are revealed as hotspots for replication slippage, particularly those composed of repeats: joined and direpeat polyXY. Inter-conversion to shuffled polyXY disrupts nucleotide repeats and restricts further evolution by replication slippage, a mechanism that we previously observed in polyX. Our results shed light on polyXY composition and should simplify the determination of their functions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
5
|
Mier P, Andrade-Navarro MA. Evolutionary Study of Protein Short Tandem Repeats in Protein Families. Biomolecules 2023; 13:1116. [PMID: 37509152 PMCID: PMC10377733 DOI: 10.3390/biom13071116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/06/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
Tandem repeats in proteins are patterns of residues repeated directly adjacent to each other. The evolution of these repeats can be assessed by using groups of homologous sequences, which can help pointing to events of unit duplication or deletion. High pressure in a protein family for variation of a given type of repeat might point to their function. Here, we propose the analysis of protein families to calculate protein short tandem repeats (pSTRs) in each protein sequence and assess their variability within the family in terms of number of units. To facilitate this analysis, we developed the pSTR tool, a method to analyze the evolution of protein short tandem repeats in a given protein family by pairwise comparisons between evolutionarily related protein sequences. We evaluated pSTR unit number variation in protein families of 12 complete metazoan proteomes. We hypothesize that families with more dynamic ensembles of repeats could reflect particular roles of these repeats in processes that require more adaptability.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| |
Collapse
|
6
|
Schumbera E, Mier P, Andrade-Navarro MA. Phase separating Rho: a widespread regulatory function of disordered regions in proteins revealed in bacteria. Signal Transduct Target Ther 2023; 8:253. [PMID: 37344523 DOI: 10.1038/s41392-023-01505-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 04/28/2023] [Accepted: 05/16/2023] [Indexed: 06/23/2023] Open
Affiliation(s)
- Eric Schumbera
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128, Mainz, Germany
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128, Mainz, Germany.
| |
Collapse
|
7
|
Erdozain S, Barrionuevo E, Ripoll L, Mier P, Andrade-Navarro MA. Protein repeats evolve and emerge in giant viruses. J Struct Biol 2023; 215:107962. [PMID: 37031868 DOI: 10.1016/j.jsb.2023.107962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/21/2023] [Accepted: 04/04/2023] [Indexed: 04/11/2023]
Abstract
Nucleocytoplasmatic large DNA viruses (NCLDVs or giant viruses) stand out because of their relatively large genomes encoding hundreds of proteins. These species give us an unprecedented opportunity to study the emergence and evolution of repeats in protein sequences. On the one hand, as viruses, these species have a restricted set of functions, which can help us better define the functional landscape of repeats. On the other hand, given the particular use of the genetic machinery of the host, it is worth asking whether this allows the variations of genetic material that lead to repeats in non-viral species. To support research in the characterization of repeat protein evolution and function, we present here an analysis focused on the repeat proteins of giant viruses, namely tandem repeats (TRs), short repeats (SRs), and homorepeats (polyX). Proteins with large and short repeats are not very frequent in non-eukaryotic organisms because of the difficulties that their folding may entail; however, their presence in giant viruses remarks their advantage for performance in the protein environment of the eukaryotic host. The heterogeneous content of these TRs, SRs and polyX in some viruses hints at diverse needs. Comparisons to homologs suggest that the mechanisms that generate these repeats are extensively used by some of these viruses, but also their capacity to adopt genes with repeats. Giant viruses could be very good models for the study of the emergence and evolution of protein repeats.
Collapse
Affiliation(s)
- Sofía Erdozain
- Instituto de Biotecnología y Biología Molecular, Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, Argentina
| | - Emilia Barrionuevo
- Laboratory of Bioactive Research and Development, Faculty of Exact Sciences, National University of La Plata, Argentina
| | - Lucas Ripoll
- Laboratory of Genetic Engineering, Cell, and Molecular Biology, National University of Quilmes, Argentina
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
8
|
Kastano K, Mier P, Dosztányi Z, Promponas VJ, Andrade-Navarro MA. Functional Tuning of Intrinsically Disordered Regions in Human Proteins by Composition Bias. Biomolecules 2022; 12:biom12101486. [PMID: 36291695 PMCID: PMC9599065 DOI: 10.3390/biom12101486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/30/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
Intrinsically disordered regions (IDRs) in protein sequences are flexible, have low structural constraints and as a result have faster rates of evolution. This lack of evolutionary conservation greatly limits the use of sequence homology for the classification and functional assessment of IDRs, as opposed to globular domains. The study of IDRs requires other properties for their classification and functional prediction. While composition bias is not a necessary property of IDRs, compositionally biased regions (CBRs) have been noted as frequent part of IDRs. We hypothesized that to characterize IDRs, it could be helpful to study their overlap with particular types of CBRs. Here, we evaluate this overlap in the human proteome. A total of 2/3 of residues in IDRs overlap CBRs. Considering CBRs enriched in one type of amino acid, we can distinguish CBRs that tend to be fully included within long IDRs (R, H, N, D, P, G), from those that partially overlap shorter IDRs (S, E, K, T), and others that tend to overlap IDR terminals (Q, A). CBRs overlap more often IDRs in nuclear proteins and in proteins involved in liquid-liquid phase separation (LLPS). Study of protein interaction networks reveals the enrichment of CBRs in IDRs by tandem repetition of short linear motifs (rich in S or P), and the existence of E-rich polar regions that could support specific protein interactions with non-specific interactions. Our results open ways to pin down the function of IDRs from their partial compositional biases.
Collapse
Affiliation(s)
- Kristina Kastano
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös Loránd University, Pázmány Péter stny 1/c, H-1117 Budapest, Hungary
| | - Vasilis J. Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, 1678 Nicosia, Cyprus
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
- Correspondence:
| |
Collapse
|
9
|
Mier P, Elena-Real CA, Cortés J, Bernadó P, Andrade-Navarro MA. The sequence context in poly-alanine regions: structure, function and conservation. Bioinformatics 2022; 38:4851-4858. [PMID: 36106994 PMCID: PMC9620824 DOI: 10.1093/bioinformatics/btac610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 07/07/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Motivation Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. Results We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure–function relationships. Availability and implementation The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| | - Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS , Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| |
Collapse
|
10
|
Mier P, Andrade-Navarro MA. Regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape. Comput Struct Biotechnol J 2022; 20:5516-5523. [PMID: 36249567 PMCID: PMC9550522 DOI: 10.1016/j.csbj.2022.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/07/2022] [Accepted: 09/07/2022] [Indexed: 11/17/2022] Open
Abstract
Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.
Collapse
Affiliation(s)
- Pablo Mier
- Corresponding author at: Hanns-Dieter-Hüsch-Weg 15 55118 Mainz (Germany).
| | | |
Collapse
|
11
|
Vagiona AC, Mier P, Petrakis S, Andrade-Navarro MA. Analysis of Huntington's Disease Modifiers Using the Hyperbolic Mapping of the Protein Interaction Network. Int J Mol Sci 2022; 23:5853. [PMID: 35628660 PMCID: PMC9144261 DOI: 10.3390/ijms23105853] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 02/05/2023] Open
Abstract
Huntington's disease (HD) is caused by the production of a mutant huntingtin (HTT) with an abnormally long poly-glutamine (polyQ) tract, forming aggregates and inclusions in neurons. Previous work by us and others has shown that an increase or decrease in polyQ-triggered aggregates can be passive simply due to the interaction of proteins with the aggregates. To search for proteins with active (functional) effects, which might be more effective in finding therapies and mechanisms of HD, we selected among the proteins that interact with HTT a total of 49 pairs of proteins that, while being paralogous to each other (and thus expected to have similar passive interaction with HTT), are located in different regions of the protein interaction network (suggesting participation in different pathways or complexes). Three of these 49 pairs contained members with opposite effects on HD, according to the literature. The negative members of the three pairs, MID1, IKBKG, and IKBKB, interact with PPP2CA and TUBB, which are known negative factors in HD, as well as with HSP90AA1 and RPS3. The positive members of the three pairs interact with HSPA9. Our results provide potential HD modifiers of functional relevance and reveal the dynamic aspect of paralog evolution within the interaction network.
Collapse
Affiliation(s)
- Aimilia-Christina Vagiona
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (A.-C.V.); (P.M.)
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (A.-C.V.); (P.M.)
| | - Spyros Petrakis
- Institute of Applied Biosciences/Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece;
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (A.-C.V.); (P.M.)
| |
Collapse
|
12
|
Mier P, Fontaine JF, Stoldt M, Libbrecht R, Martelli C, Foitzik S, Andrade-Navarro MA. Annotation and Analysis of 3902 Odorant Receptor Protein Sequences from 21 Insect Species Provide Insights into the Evolution of Odorant Receptor Gene Families in Solitary and Social Insects. Genes (Basel) 2022; 13:genes13050919. [PMID: 35627304 PMCID: PMC9141868 DOI: 10.3390/genes13050919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/17/2022] [Accepted: 05/19/2022] [Indexed: 11/26/2022] Open
Abstract
The gene family of insect olfactory receptors (ORs) has expanded greatly over the course of evolution. ORs enable insects to detect volatile chemicals and therefore play an important role in social interactions, enemy and prey recognition, and foraging. The sequences of several thousand ORs are known, but their specific function or their ligands have only been identified for very few of them. To advance the functional characterization of ORs, we have assembled, curated, and aligned the sequences of 3902 ORs from 21 insect species, which we provide as an annotated online resource. Using functionally characterized proteins from the fly Drosophila melanogaster, the mosquito Anopheles gambiae and the ant Harpegnathos saltator, we identified amino acid positions that best predict response to ligands. We examined the conservation of these predicted relevant residues in all OR subfamilies; the results showed that the subfamilies that expanded strongly in social insects had a high degree of conservation in their binding sites. This suggests that the ORs of social insect families are typically finely tuned and exhibit sensitivity to very similar odorants. Our novel approach provides a powerful tool to exploit functional information from a limited number of genes to study the functional evolution of large gene families.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
- Correspondence:
| | - Jean-Fred Fontaine
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
| | - Marah Stoldt
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
| | - Romain Libbrecht
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
| | - Carlotta Martelli
- Institute of Developmental Biology and Neurobiology (iDN), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany;
| | - Susanne Foitzik
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution (iomE), Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany; (J.-F.F.); (M.S.); (R.L.); (S.F.); (M.A.A.-N.)
| |
Collapse
|
13
|
Mier P, Andrade-Navarro MA. PolyX2: Fast Detection of Homorepeats in Large Protein Datasets. Genes (Basel) 2022; 13:genes13050758. [PMID: 35627143 PMCID: PMC9141109 DOI: 10.3390/genes13050758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/22/2022] [Accepted: 04/22/2022] [Indexed: 12/03/2022] Open
Abstract
Homorepeat sequences, consecutive runs of identical amino acids, are prevalent in eukaryotic proteins. It has become necessary to annotate and evaluate this feature in entire proteomes. The definition of what constitutes a homorepeat is not fixed, and different research approaches may require different definitions; therefore, flexible approaches to analyze homorepeats in complete proteomes are needed. Here, we present polyX2, a fast, simple but tunable script to scan protein datasets for all possible homorepeats. The user can modify the length of the window to scan, the minimum number of identical residues that must be found in the window, and the types of homorepeats to be found.
Collapse
|
14
|
Abstract
Polyglutamine (polyQ) regions are highly abundant consecutive runs of glutamine residues. They have been generally studied in relation to the so-called polyQ-associated diseases, characterized by protein aggregation caused by the expansion of the polyQ tract via a CAG-slippage mechanism. However, more than 4,800 human proteins contain a polyQ, and only nine of these regions are known to be associated with disease. Computational sequence studies and experimental structure determinations are completing a more interesting picture in which polyQ emerge as a motif for modulation of protein-protein interactions. But long polyQ regions may lead to an excess of interactions, and produce aggregates. Within this mechanistic perspective of polyQ function and malfunction, we discuss polyQ definition and properties such as variable codon usage, sequence and context structure imposition, functional relevance, evolutionary patterns in species-centered analyses, and open resources.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
15
|
Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SCE, Dosztanyi Z, Andrade-Navarro MA. Disentangling the complexity of low complexity proteins. Brief Bioinform 2021; 21:458-472. [PMID: 30698641 PMCID: PMC7299295 DOI: 10.1093/bib/bbz007] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 12/19/2018] [Accepted: 01/07/2019] [Indexed: 12/31/2022] Open
Abstract
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Lisanna Paladin
- Department of Biomedical Science, University of Padova, Padova, Italy
| | - Stella Tamana
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Sophia Petrosian
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Borbála Hajdu-Soltész
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Annika Urbanek
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Zoltán Gáspári
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Christos A Ouzounis
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Andrey V Kajava
- Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.,Institute of Bioengineering, University ITMO, St. Petersburg, Russia
| | - John M Hancock
- Earlham Institute, Norwich, UK.,ELIXIR Hub, Welcome Genome Campus, Hinxton, UK
| | - Silvio C E Tosatto
- Department of Biomedical Science, University of Padova, Padova, Italy.,CNR Institute of Neuroscience, Padova, Italy
| | - Zsuzsanna Dosztanyi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
16
|
Mier P, Andrade-Navarro MA. Avoided motifs: short amino acid strings missing from protein datasets. Biol Chem 2021; 402:945-951. [PMID: 33660494 DOI: 10.1515/hsz-2020-0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022]
Abstract
According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, D-55128Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, D-55128Mainz, Germany
| |
Collapse
|
17
|
Kamel M, Kastano K, Mier P, Andrade-Navarro MA. REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences. J Mol Biol 2021; 433:166895. [PMID: 33972020 DOI: 10.1016/j.jmb.2021.166895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 02/01/2021] [Accepted: 02/21/2021] [Indexed: 12/13/2022]
Abstract
Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses for a selection of 78 UniProt reference proteomes. We illustrate how these data can be used to study the evolution of TRs using comparative genomics. REP2 can be accessed at http://cbdm-01.zdv.uni-mainz.de/~munoz/rep/.
Collapse
Affiliation(s)
- Mohamed Kamel
- Department of Computer Science, Faculty of Mathematics and Informatics, University of M'sila, 28000 M'sila, Algeria; Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Kristina Kastano
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
18
|
Kastano K, Mier P, Andrade-Navarro MA. The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin. Int J Mol Sci 2021; 22:1727. [PMID: 33572172 PMCID: PMC7915032 DOI: 10.3390/ijms22041727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 01/25/2021] [Accepted: 02/04/2021] [Indexed: 12/11/2022] Open
Abstract
Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins across species have been used to evaluate their significance and function. To investigate how to apply this evolutionary approach to the study of LCR function in protein-protein interactions, we performed a detailed analysis for Huntingtin (HTT), a large protein that is a hub for interaction with hundreds of proteins, has a variety of LCRs, and for which partial structural information (in complex with HAP40) is available. We hypothesize that proteins RASA1, SYN2, and KAT2B may compete with HAP40 for their attachment to the core of HTT using similar LCRs. Our results illustrate how evolution might favor the interplay of LCRs with domains, and the possibility of detecting multiple modes of LCR-mediated protein-protein interactions with a large hub such as HTT when enough protein interaction data is available.
Collapse
Affiliation(s)
| | | | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany; (K.K.); (P.M.)
| |
Collapse
|
19
|
Abstract
Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
- * E-mail:
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
20
|
Jarnot P, Ziemska-Legiecka J, Dobson L, Merski M, Mier P, Andrade-Navarro MA, Hancock JM, Dosztányi Z, Paladin L, Necci M, Piovesan D, Tosatto SCE, Promponas VJ, Grynberg M, Gruca A. PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins. Nucleic Acids Res 2020; 48:W77-W84. [PMID: 32421769 PMCID: PMC7319588 DOI: 10.1093/nar/gkaa339] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/08/2020] [Accepted: 05/01/2020] [Indexed: 12/25/2022] Open
Abstract
Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.
Collapse
Affiliation(s)
- Patryk Jarnot
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | | | - Laszlo Dobson
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083 Budapest, Hungary.,Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, 1117 Budapest, Hungary
| | - Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - John M Hancock
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös LorándUniversity, Budapest, Pázmány Péter stny 1/c 1117, Budapest, Hungary
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Marco Necci
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, P.O. Box 20537, Nicosia, CY 1678, Cyprus
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawinskiego 5A, 02-106 Warsaw, Poland
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
21
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
22
|
Abstract
Background Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context. Results Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ. Conclusions Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
23
|
Mier P, Andrade-Navarro MA. MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments. Evol Bioinform Online 2020; 16:1176934320916199. [PMID: 32425492 PMCID: PMC7218316 DOI: 10.1177/1176934320916199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 03/10/2020] [Indexed: 11/17/2022] Open
Abstract
Multiple sequence alignments are usually phylogenetically driven. They are studied in the framework of evolution. But sometimes, it is interesting to study residue conservation at positions unconstrained by evolutionary rules. We present a supervised method to access a layer of information difficult to appreciate visually when many protein sequences are aligned. This new tool (MAGA; http://cbdm-01.zdv.uni-mainz.de/~munoz/maga/) locates positions in multiple sequence alignments differentially conserved in manually defined groups of sequences.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, Mainz 55128, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, Mainz 55128, Germany
| |
Collapse
|
24
|
Urbanek A, Popovic M, Morató A, Estaña A, Elena-Real CA, Mier P, Fournet A, Allemand F, Delbecq S, Andrade-Navarro MA, Cortés J, Sibille N, Bernadó P. Flanking Regions Determine the Structure of the Poly-Glutamine in Huntingtin through Mechanisms Common among Glutamine-Rich Human Proteins. Structure 2020; 28:733-746.e5. [PMID: 32402249 DOI: 10.1016/j.str.2020.04.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/18/2020] [Accepted: 04/11/2020] [Indexed: 10/24/2022]
Abstract
The causative agent of Huntington's disease, the poly-Q homo-repeat in the N-terminal region of huntingtin (httex1), is flanked by a 17-residue-long fragment (N17) and a proline-rich region (PRR), which promote and inhibit the aggregation propensity of the protein, respectively, by poorly understood mechanisms. Based on experimental data obtained from site-specifically labeled NMR samples, we derived an ensemble model of httex1 that identified both flanking regions as opposing poly-Q secondary structure promoters. While N17 triggers helicity through a promiscuous hydrogen bond network involving the side chains of the first glutamines in the poly-Q tract, the PRR promotes extended conformations in neighboring glutamines. Furthermore, a bioinformatics analysis of the human proteome showed that these structural traits are present in many human glutamine-rich proteins and that they are more prevalent in proteins with longer poly-Q tracts. Taken together, these observations provide the structural bases to understand previous biophysical and functional data on httex1.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France; LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire (LBCM-EA4558 Vaccination Antiparasitaire), UFR Pharmacie, Université de Montpellier, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France.
| |
Collapse
|
25
|
Leismann J, Spagnuolo M, Pradhan M, Wacheul L, Vu MA, Musheev M, Mier P, Andrade-Navarro MA, Graille M, Niehrs C, Lafontaine DL, Roignant JY. The 18S ribosomal RNA m 6 A methyltransferase Mettl5 is required for normal walking behavior in Drosophila. EMBO Rep 2020; 21:e49443. [PMID: 32350990 DOI: 10.15252/embr.201949443] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 04/02/2020] [Accepted: 04/07/2020] [Indexed: 11/09/2022] Open
Abstract
RNA modifications have recently emerged as an important layer of gene regulation. N6-methyladenosine (m6 A) is the most prominent modification on eukaryotic messenger RNA and has also been found on noncoding RNA, including ribosomal and small nuclear RNA. Recently, several m6 A methyltransferases were identified, uncovering the specificity of m6 A deposition by structurally distinct enzymes. In order to discover additional m6 A enzymes, we performed an RNAi screen to deplete annotated orthologs of human methyltransferase-like proteins (METTLs) in Drosophila cells and identified CG9666, the ortholog of human METTL5. We show that CG9666 is required for specific deposition of m6 A on 18S ribosomal RNA via direct interaction with the Drosophila ortholog of human TRMT112, CG12975. Depletion of CG9666 yields a subsequent loss of the 18S rRNA m6 A modification, which lies in the vicinity of the ribosome decoding center; however, this does not compromise rRNA maturation. Instead, a loss of CG9666-mediated m6 A impacts fly behavior, providing an underlying molecular mechanism for the reported human phenotype in intellectual disability. Thus, our work expands the repertoire of m6 A methyltransferases, demonstrates the specialization of these enzymes, and further addresses the significance of ribosomal RNA modifications in gene expression and animal behavior.
Collapse
Affiliation(s)
| | | | | | - Ludivine Wacheul
- RNA Molecular Biology, ULB Cancer Research Center (U-CRC), Centre for Microscopy and Molecular Imaging (CMMI), Fonds de la Recherche Scientifique (F.R.S.-FNRS), Université Libre de Bruxelles (ULB), Charleroi-Gosselies, Belgium
| | - Minh Anh Vu
- Institute of Molecular Biology (IMB), Mainz, Germany
| | | | - Pablo Mier
- Faculty of Biology, Johannes-Gutenberg Universität Mainz, Mainz, Germany
| | | | - Marc Graille
- BIOC, CNRS, Ecole Polytechnique, IP Paris, Palaiseau, France
| | - Christof Niehrs
- Institute of Molecular Biology (IMB), Mainz, Germany.,Division of Molecular Embryology, DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Denis Lj Lafontaine
- RNA Molecular Biology, ULB Cancer Research Center (U-CRC), Centre for Microscopy and Molecular Imaging (CMMI), Fonds de la Recherche Scientifique (F.R.S.-FNRS), Université Libre de Bruxelles (ULB), Charleroi-Gosselies, Belgium
| | - Jean-Yves Roignant
- Institute of Molecular Biology (IMB), Mainz, Germany.,Center for Integrative Genomics, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
26
|
Mier P, Elena-Real C, Urbanek A, Bernadó P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J 2020; 18:306-313. [PMID: 32071707 PMCID: PMC7016039 DOI: 10.1016/j.csbj.2020.01.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/13/2019] [Accepted: 01/30/2020] [Indexed: 12/18/2022] Open
Abstract
Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Carlos Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
27
|
Rubio A, Mier P, Andrade-Navarro MA, Garzón A, Jiménez J, Pérez-Pulido AJ. CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats. Database (Oxford) 2020; 2020:baaa088. [PMID: 33206958 PMCID: PMC7673337 DOI: 10.1093/database/baaa088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/07/2020] [Accepted: 09/10/2020] [Indexed: 12/20/2022]
Abstract
The genomics era is resulting in the generation of a plethora of biological sequences that are usually stored in public databases. There are many computational tools that facilitate the annotation of these sequences, but sometimes they produce mistakes that enter the databases and can be propagated when erroneous data are used for secondary analyses, such as gene prediction or homology searching. While developing a computational gene finder based on protein-coding sequences, we discovered that the reference UniProtKB protein database is contaminated with some spurious sequences translated from DNA containing clustered regularly interspaced short palindromic repeats. We therefore encourage developers of prokaryotic computational gene finders and protein database curators to consider this source of error.
Collapse
Affiliation(s)
- Alejandro Rubio
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128, Mainz, Germany
| | | | - Andrés Garzón
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Juan Jiménez
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Ctra. Utrera, Km.1, 41013, Sevilla, Spain
| |
Collapse
|
28
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
29
|
Mier P, Andrade-Navarro MA. Toward completion of the Earth's proteome: an update a decade later. Brief Bioinform 2019; 20:463-470. [PMID: 29040399 DOI: 10.1093/bib/bbx127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 09/08/2017] [Indexed: 12/13/2022] Open
Abstract
Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down the prediction we did a decade ago.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg, Mainz, Germany
| | | |
Collapse
|
30
|
Rubio A, Casimiro-Soriguer CS, Mier P, Andrade-Navarro MA, Garzón A, Jimenez J, Pérez-Pulido AJ. AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions. Methods Mol Biol 2019; 1962:207-214. [PMID: 31020562 DOI: 10.1007/978-1-4939-9173-0_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AnABlast is a computational tool that highlights protein-coding regions within intergenic and intronic DNA sequences which escape detection by standard gene prediction algorithms. DNA sequences with small protein-coding genes or exons, complex intron-containing genes, or degenerated DNA fragments are efficiently targeted by AnABlast. Furthermore, this algorithm is particularly useful in detecting protein-coding sequences with nonsignificant homologs to sequences in databases. AnABlast can be executed online at http://www.bioinfocabd.upo.es/anablast/ .
Collapse
Affiliation(s)
- Alejandro Rubio
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Carlos S Casimiro-Soriguer
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Mainz, Germany
| | | | - Andrés Garzón
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain
| | - Juan Jimenez
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain.
| | - Antonio J Pérez-Pulido
- Facultad de Ciencias Experimentales (Área de Genética), Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC), Universidad Pablo de Olavide, Sevilla, Spain.
| |
Collapse
|
31
|
Mier P, Andrade-Navarro MA. Traitpedia: a collaborative effort to gather species traits. Bioinformatics 2019; 35:1079-1081. [PMID: 30165582 PMCID: PMC6419907 DOI: 10.1093/bioinformatics/bty743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/17/2018] [Accepted: 08/23/2018] [Indexed: 11/25/2022] Open
Abstract
Summary Traitpedia is a collaborative database aimed to collect binary traits in a tabular form for a growing number of species. Availability and implementation Traitpedia can be accessed from http://cbdm-01.zdv.uni-mainz.de/~munoz/traitpedia. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| |
Collapse
|
32
|
Mier P, Pérez-Pulido AJ, Andrade-Navarro MA. Automated selection of homologs to track the evolutionary history of proteins. BMC Bioinformatics 2018; 19:431. [PMID: 30453878 PMCID: PMC6245638 DOI: 10.1186/s12859-018-2457-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 10/31/2018] [Indexed: 11/26/2022] Open
Abstract
Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give an idea of the evolutionary path of the protein as a series of consecutive steps, for example gaining or losing domains. However, a step-wise orthology search is a time-consuming task if the number of steps is high. Results To illustrate a solution for this problem, we present the web tool ProteinPathTracker, which allows to track the evolutionary history of a query protein by locating homologs in selected proteomes along several evolutionary paths. Additional functionalities include locking a region of interest to follow its evolution in the discovered homologous sequences and the study of the protein function evolution by analysis of the annotations of the homologs. Conclusions ProteinPathTracker is an easy-to-use web tool that automatises the practice of looking for selected homologs in distant species in a straightforward way for non-expert users. Electronic supplementary material The online version of this article (10.1186/s12859-018-2457-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | | | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
33
|
Alanis-Lobato G, Mier P, Andrade-Navarro M. The latent geometry of the human protein interaction network. Bioinformatics 2018; 34:2826-2834. [PMID: 29635317 PMCID: PMC6084611 DOI: 10.1093/bioinformatics/bty206] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 02/16/2018] [Accepted: 04/03/2018] [Indexed: 11/21/2022] Open
Abstract
Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperbolic plane offers a novel and informative way to visualize proteins and their interactions. We then used these coordinates to compute hyperbolic distances between proteins, which served as likelihood scores for the prediction of plausible protein interactions. Finally, we observed that proteins can efficiently communicate with each other via a greedy routing process, guided by the latent geometry of the hPIN. We show that these efficient communication channels can be used to determine the core members of signal transduction pathways and to study how system perturbations impact their efficiency. Availability and implementation An R implementation of our network embedder is available at https://github.com/galanisl/NetHypGeom. Also, a web tool for the geometric analysis of the hPIN accompanies this text at http://cbdm-01.zdv.uni-mainz.de/~galanisl/gapi. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg Universität, Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg Universität, Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Miguel Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg Universität, Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
34
|
Abstract
Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in unstable-in-length orthologous polyQ regions. This indicates that CAG triplets produce the necessary instability for a glutamine stretch to grow. Proteins related to polyQ-associated diseases behave in a more extreme way, with longer glutamine stretches in human and evolutionarily closer nonhuman primates, and an overall higher CAG usage. In the light of our results, we suggest an evolutionary model to explain the glutamine codon usage in polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
35
|
Brüne D, Andrade-Navarro MA, Mier P. Proteome-wide comparison between the amino acid composition of domains and linkers. BMC Res Notes 2018; 11:117. [PMID: 29426365 PMCID: PMC5807739 DOI: 10.1186/s13104-018-3221-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 02/01/2018] [Indexed: 02/01/2023] Open
Abstract
Objective Amino acid composition is a sequence feature that has been extensively used to characterize proteomes of many species and protein families. Yet the analysis of amino acid composition of protein domains and the linkers connecting them has received less attention. Here, we perform both a comprehensive full-proteome amino acid composition analysis and a similar analysis focusing on domains and linkers, to uncover domain- or linker-specific differential amino acid usage patterns. Results The amino acid composition in the 38 proteomes studied showcase the greater variability found in archaea and bacteria species compared to eukaryotes. When focusing on domains and linkers, we describe the preferential use of polar residues in linkers and hydrophobic residues in domains. To let any user perform this analysis on a given domain (or set of them), we developed a dedicated R script called RACCOON, which can be easily used and can provide interesting insights into the compositional differences between a domain and its surrounding linkers. Electronic supplementary material The online version of this article (10.1186/s13104-018-3221-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniel Brüne
- Institute of Pharmacy and Molecular Biotechnology, Ruprecht Karls University Heidelberg, 69120, Heidelberg, Germany
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128, Mainz, Germany.
| |
Collapse
|
36
|
Abstract
Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Mainz, Germany
- To whom correspondence should be addressed.
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
37
|
Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins 2017; 85:709-719. [PMID: 28097686 DOI: 10.1002/prot.25250] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 01/05/2017] [Accepted: 01/09/2017] [Indexed: 12/21/2022]
Abstract
Amino acid repeats, or homorepeats, are low complexity protein motifs consisting of tandem repetitions of a single amino acid. Their presence and relative number vary in different proteomes, and some studies have tried to address this variation, proteome by proteome. In this work, we present a full characterization of amino acid homorepeats across evolution. We studied the presence and differential usage of each possible homorepeat in proteomes from various taxonomic groups, using clusters of very similar proteins to eliminate redundancy. The position of each amino acid repeat within proteins, and the order of co-occurring amino acid repeats were also addressed. As a result, we present evidence about the unevenly evolution of homorepeats, as well as the functional implications of their relative position in proteins. We discuss some of these cases in their taxonomic context. Collectively, our results show evolutionary and positional signals that suggest that homorepeats have biological function, likely creating unspecific protein interactions or modulating specific interactions in a context dependent manner. In conclusion, our work supports the functional importance of homorepeats and establishes a basis for the study of other low complexity repeats. Proteins 2017; 85:709-719. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, 55128, Germany.,Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128, Germany
| | - Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, 55128, Germany.,Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, 55128, Germany.,Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128, Germany
| |
Collapse
|
38
|
Abstract
Proteins containing glutamine repeats (polyQ) are known to be structurally unstable. Abnormal expansion of polyQ in some proteins exceeding a certain threshold leads to neurodegenerative disease, a symptom of which are protein aggregates. This has led to extensive research of the structure of polyQ stretches. However, the accumulation of contradictory results suggests that protein context might be of importance. Here we aimed to evaluate the structural context of polyQ regions in proteins by analysing the secondary structure of polyQ proteins and their homologs. The results revealed that the secondary structure in polyQ vicinity is predominantly random coil or helix. Importantly, the regions surrounding the polyQ are often not solved in 3D structures. In the few cases where the point of insertion of the polyQ was mapped to a full protein, we observed that these are always located in the surface of the protein. The findings support the hypothesis that polyQ might serve to extend coiled coils at their C-terminus in highly disordered regions involved in protein-protein interactions.
Collapse
Affiliation(s)
- Franziska Totzeck
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, Germany
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, Germany
- Institute of Molecular Biology, Ackermannweg 4, Mainz, Germany
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, Mainz, Germany
- Institute of Molecular Biology, Ackermannweg 4, Mainz, Germany
- * E-mail:
| |
Collapse
|
39
|
Mier P, Pérez-Pulido AJ, Reynaud EG, Andrade-Navarro MA. Reading the Evolution of Compartmentalization in the Ribosome Assembly Toolbox: The YRG Protein Family. PLoS One 2017; 12:e0169750. [PMID: 28072865 PMCID: PMC5224878 DOI: 10.1371/journal.pone.0169750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 12/21/2016] [Indexed: 01/07/2023] Open
Abstract
Reconstructing the transition from a single compartment bacterium to a highly compartmentalized eukaryotic cell is one of the most studied problems of evolutionary cell biology. However, timing and details of the establishment of compartmentalization are unclear and difficult to assess. Here, we propose the use of molecular markers specific to cellular compartments to set up a framework to advance the understanding of this complex intracellular process. Specifically, we use a protein family related to ribosome biogenesis, YRG (YlqF related GTPases), whose evolution is linked to the establishment of cellular compartments, leveraging the current genomic data. We analyzed orthologous proteins of the YRG family in a set of 171 proteomes for a total of 370 proteins. We identified ten YRG protein subfamilies that can be associated to six subcellular compartments (nuclear bodies, nucleolus, nucleus, cytosol, mitochondria, and chloroplast), and which were found in archaeal, bacterial and eukaryotic proteomes. Our analysis reveals organism streamlining related events in specific taxonomic groups such as Fungi. We conclude that the YRG family could be used as a compartmentalization marker, which could help to trace the evolutionary path relating cellular compartments with ribosome biogenesis.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Molecular Biology (IMB), Faculty of Biology, Johannes-Gutenberg University of Mainz, Mainz, Germany
- * E-mail:
| | - Antonio J. Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Sevilla, Spain
| | - Emmanuel G. Reynaud
- School of Biomolecular and Biomedical Science, University College Dublin, Dublin, Ireland
| | - Miguel A. Andrade-Navarro
- Institute of Molecular Biology (IMB), Faculty of Biology, Johannes-Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
40
|
Alanis-Lobato G, Mier P, Andrade-Navarro MA. Manifold learning and maximum likelihood estimation for hyperbolic network embedding. Appl Netw Sci 2016; 1:10. [PMID: 30533502 PMCID: PMC6245200 DOI: 10.1007/s41109-016-0013-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 10/25/2016] [Indexed: 05/23/2023]
Abstract
The Popularity-Similarity (PS) model sustains that clustering and hierarchy, properties common to most networks representing complex systems, are the result of an optimisation process in which nodes seek to form ties, not only with the most connected (popular) system components, but also with those that are similar to them. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract popularity-similarity trade-offs and the formation of scale-free and strongly clustered networks can be accurately described. Current methods for mapping networks to hyperbolic space are based on maximum likelihood estimations or manifold learning. The former approach is very accurate but slow; the latter improves efficiency at the cost of accuracy. Here, we analyse the strengths and limitations of both strategies and assess the advantages of combining them to efficiently embed big networks, allowing for their examination from a geometric perspective. Our evaluations in artificial and real networks support the idea that hyperbolic distance constraints play a significant role in the formation of edges between nodes. This means that challenging problems in network science, like link prediction or community detection, could be more easily addressed under this geometric framework.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
- Faculty of Biology, Johannes Gutenberg Universität, Gresemundweg 2, Mainz, 55128 Germany
| | - Pablo Mier
- Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
- Faculty of Biology, Johannes Gutenberg Universität, Gresemundweg 2, Mainz, 55128 Germany
| | - Miguel A. Andrade-Navarro
- Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
- Faculty of Biology, Johannes Gutenberg Universität, Gresemundweg 2, Mainz, 55128 Germany
| |
Collapse
|
41
|
Mier P, Alanis-Lobato G, Andrade-Navarro MA. Protein-protein interactions can be predicted using coiled coil co-evolution patterns. J Theor Biol 2016; 412:198-203. [PMID: 27832945 DOI: 10.1016/j.jtbi.2016.11.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 10/21/2016] [Accepted: 11/04/2016] [Indexed: 12/29/2022]
Abstract
Protein-protein interactions are sometimes mediated by coiled coil structures. The evolutionary conservation of interacting orthologs in different species, along with the presence or absence of coiled coils in them, may help in the prediction of interacting pairs. Here, we illustrate how the presence of coiled coils in a protein can be exploited as a potential indicator for its interaction with another protein with coiled coils. The prediction capability of our strategy improves when restricting our dataset to highly reliable, known protein-protein interactions. Our study of the co-evolution of coiled coils demonstrates that pairs of interacting proteins can be distinguished from not interacting pairs by means of their structural information. This hints at the potential of our strategy to predict new protein-protein interactions.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128 Mainz, Germany; Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128 Mainz, Germany; Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg 2, 55128 Mainz, Germany; Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| |
Collapse
|
42
|
Alanis-Lobato G, Mier P, Andrade-Navarro MA. Efficient embedding of complex networks to hyperbolic space via their Laplacian. Sci Rep 2016; 6:30108. [PMID: 27445157 PMCID: PMC4957117 DOI: 10.1038/srep30108] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 06/29/2016] [Indexed: 11/09/2022] Open
Abstract
The different factors involved in the growth process of complex networks imprint valuable information in their observable topologies. How to exploit this information to accurately predict structural network changes is the subject of active research. A recent model of network growth sustains that the emergence of properties common to most complex systems is the result of certain trade-offs between node birth-time and similarity. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract this optimisation process. Current methods for network hyperbolic embedding search for node coordinates that maximise the likelihood that the network was produced by the afore-mentioned model. Here, a different strategy is followed in the form of the Laplacian-based Network Embedding, a simple yet accurate, efficient and data driven manifold learning approach, which allows for the quick geometric analysis of big networks. Comparisons against existing embedding and prediction techniques highlight its applicability to network evolution and link prediction.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| |
Collapse
|
43
|
Mier P, Andrade-Navarro MA. CABRA: Cluster and Annotate Blast Results Algorithm. BMC Res Notes 2016; 9:253. [PMID: 27129717 PMCID: PMC4851773 DOI: 10.1186/s13104-016-2062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 04/25/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Basic local alignment search tool (BLAST) searches are frequently used to look for homologous sequences and to annotate a query protein, but the increasing size of protein databases makes it difficult to review all results from a similarity search. FINDINGS We developed a web tool called Cluster and Annotate Blast Results Algorithm (CABRA), which enables a rapid BLAST search in a variety of updated reference proteomes, and provides a new way to functionally evaluate the results by the subsequent clustering of the hits and annotation of the clusters. The tool can be accessed from the following web-resource: http://cbdm-01.zdv.uni-mainz.de/~munoz/CABRA . CONCLUSIONS Cluster and Annotate Blast Results Algorithm simplifies the analysis of the results of a BLAST search by providing an overview of the result's annotations organized in clusters that can be iteratively modified by the user.
Collapse
Affiliation(s)
- Pablo Mier
- />Faculty of Biology, JGU Mainz, Gresemundweg, 2, 55128 Mainz, Germany
- />Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Miguel A. Andrade-Navarro
- />Faculty of Biology, JGU Mainz, Gresemundweg, 2, 55128 Mainz, Germany
- />Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| |
Collapse
|
44
|
Abstract
The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50 reference proteomes, along with the SwissProt database, which we could compress by 74.7%. The clustering algorithm was benchmarked using OrthoBench and compared with FASTA HERDER, a previous version of the algorithm, showing that FastaHerder2 can cluster a set of proteins yielding a high compression, with a lower error rate than its predecessor. We illustrate the use of FastaHerder2 to detect biologically relevant functional features in protein families. With our approach we seek to promote a modern view and usage of the protein sequence databases more appropriate to the postgenomic era.
Collapse
Affiliation(s)
- Pablo Mier
- 1 Faculty of Biology, Johannes Gutenberg University Mainz , Mainz, Germany .,2 Institute of Molecular Biology , Mainz, Germany
| | - Miguel A Andrade-Navarro
- 1 Faculty of Biology, Johannes Gutenberg University Mainz , Mainz, Germany .,2 Institute of Molecular Biology , Mainz, Germany
| |
Collapse
|
45
|
Muñoz-Centeno MC, Martín-Guevara C, Flores A, Pérez-Pulido AJ, Antúnez-Rodríguez C, Castillo AG, Sanchez-Durán M, Mier P, Bejarano ER. Mpg2 interacts and cooperates with Mpg1 to maintain yeast glycosylation. FEMS Yeast Res 2012; 12:511-20. [PMID: 22416758 DOI: 10.1111/j.1567-1364.2012.00801.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 02/10/2012] [Accepted: 03/05/2012] [Indexed: 12/01/2022] Open
Abstract
Using a yeast two-hybrid screen, we isolated a gene from Schizosaccharomyces pombe, whose product interacts with Mpg1, a GDP-mannose-1-phosphate guanylyltransferase involved in the maintenance of cell wall integrity and glycosylation. We have designated this gene mpg2 based on its similarity to Mpg1. Mpg2 is evolutionarily conserved in higher eukaryotes. In the absence of Mpg2, defects in cell growth and sensitivity to hygromycin B are observed. When mpg1 is depleted, the lack of mpg2 causes a synthetic enhancement of the growth defect, the sensitivity to hygromycin B and the cell cycle phenotype previously reported for mpg1 mutant. Finally, Mpg1 overexpression complements the Δmpg2 mutant phenotypes. Taken together, these results indicate that mpg1 and mpg2 function together in glycosylation and septum formation.
Collapse
Affiliation(s)
- M Cruz Muñoz-Centeno
- Instituto de Hortofruticultura Subtropical y Mediterránea La Mayora, Universidad de Málaga-Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Departamento de Biología Celular, Genética y Fisiología, Universidad de Málaga, Campus Teatinos, Málaga, Spain.
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Mier P, Pérez-Pulido AJ. Fungal Smn and Spf30 homologues are mainly present in filamentous fungi and genomes with many introns: implications for spinal muscular atrophy. Gene 2011; 491:135-41. [PMID: 22020225 DOI: 10.1016/j.gene.2011.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Revised: 09/24/2011] [Accepted: 10/02/2011] [Indexed: 10/16/2022]
Abstract
Spinal muscular atrophy is an important rare genetic disease characterized by the loss of motor neurons, where the main gene responsible is smn1. Orthologous genes have only been characterized in a single fungal genome: Schizosaccharomyces pombe. We have searched for putative SMN orthologues in publically available fungal genomes, finding that they are predominately present in filamentous fungi. SMN binding partners and the SPF30 SMN paralogue, which are all involved in mRNA splicing, were found to be present in a similar but non-identical subset of fungal genomes. The Saccharomycces cerevisiae yeast genome contains neither smn1 orthologues nor paralogues and it has been suggested that this might be related to the low number of introns in this yeast. Here we have tested this hypothesis by looking at other fungal genomes. Significantly, we find that fungal genomes with high numbers of introns also possess an SMN orthologue or at least its paralogue, SPF30.
Collapse
Affiliation(s)
- Pablo Mier
- Centro Andaluz de Biología del Desarrollo, CSIC-UPO, Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | | |
Collapse
|
47
|
Mier P, Angst R, Kraume M. Anwendung der Euler-Euler-Methode zur Simulation der Flüssigkeit/Feststoff-Strömung im Strahlschlaufenapparat. CHEM-ING-TECH 2003. [DOI: 10.1002/cite.200390073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
48
|
|
49
|
van der Ven AJ, Mier P, Peters WH, Dolstra H, van Erp PE, Koopmans PP, van der Meer JW. Monochlorobimane does not selectively label glutathione in peripheral blood mononuclear cells. Anal Biochem 1994; 217:41-7. [PMID: 7515598 DOI: 10.1006/abio.1994.1081] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Monochlorobimane (MCB) has been used by several investigators as a fluorescent label for quantifying glutathione (GSH) levels in human peripheral blood mononuclear cells (PBMC). This paper describes a biochemical evaluation of this approach. PBMC were incubated with MCB (10-100 microM) and the fluorescence in extracellular medium and cell lysates was measured. Nonlinear curves were obtained in both cases and no "plateau" was reached. The majority of the fluorescence was in the medium. Gel permeation (Sephadex G-25) of the lysate indicated a linear increase in protein-bimane adduct formation, reaching about 50% of the intracellular fluorescence after 1 h. Fractionation of the deproteinized samples with Sephadex G-10 showed that only about one-third of the "low-molecular-weight" fluorescence could be ascribed to GSH-bimane, in either the lysate or the medium. Furthermore, about 40% of the free GSH in lysates appeared unbound even after 1 h of incubation. These data are in line with our observation of an extremely low activity in PBMCs of glutathione S-transferase under the conditions employed. Our findings indicate that many variables influence the cellular fluorescence, including the presence of alternative metabolic pathways for MCB and the rapid excretion of GSH-bimane out of the cell. This lack of specificity limits the value of MCB as a GSH probe for PBMC and confirms earlier suggestions that a careful biochemical evaluation is a prerequisite for its application to any particular cell type.
Collapse
Affiliation(s)
- A J van der Ven
- Department of Internal Medicine, University Hospital Nijmegen St. Radboud, The Netherlands
| | | | | | | | | | | | | |
Collapse
|