Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Heringa J. Detection of internal repeats: how common are they? Curr Opin Struct Biol 1998;8:338-45. [PMID: 9666330 DOI: 10.1016/s0959-440x(98)80068-7] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Number

Cited by Other Article(s)

Mesdaghi S, Price RM, Madine J, Rigden DJ. Deep Learning-based structure modelling illuminates structure and function in uncharted regions of β-solenoid fold space. J Struct Biol 2023;215:108010. [PMID: 37544372 DOI: 10.1016/j.jsb.2023.108010] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 08/03/2023] [Indexed: 08/08/2023]

Manasra S, Kajava AV. Why does the first protein repeat often become the only one? J Struct Biol 2023;215:108014. [PMID: 37567371 DOI: 10.1016/j.jsb.2023.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 08/06/2023] [Accepted: 08/09/2023] [Indexed: 08/13/2023]

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PLoS Comput Biol 2021;17:e1008798. [PMID: 33857128 PMCID: PMC8078820 DOI: 10.1371/journal.pcbi.1008798] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 04/27/2021] [Accepted: 02/15/2021] [Indexed: 12/18/2022] Open

Paladin L, Bevilacqua M, Errigo S, Piovesan D, Mičetić I, Necci M, Monzon AM, Fabre ML, Lopez JL, Nilsson JF, Rios J, Menna PL, Cabrera M, Buitron MG, Kulik MG, Fernandez-Alberti S, Fornasari MS, Parisi G, Lagares A, Hirsh L, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res 2021;49:D452-D457. [PMID: 33237313 PMCID: PMC7778985 DOI: 10.1093/nar/gkaa1097] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open

Affiliation(s)

Lisanna Paladin Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Martina Bevilacqua Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Sara Errigo Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Damiano Piovesan Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Ivan Mičetić Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Marco Necci Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Alexander Miguel Monzon Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
Maria Laura Fabre IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
Jose Luis Lopez IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
Juliet F Nilsson IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
Javier Rios Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Pablo Lorenzano Menna Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Maia Cabrera Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Martin Gonzalez Buitron Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Mariane Gonçalves Kulik Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
Sebastian Fernandez-Alberti Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Maria Silvina Fornasari Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Gustavo Parisi Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
Antonio Lagares IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
Layla Hirsh Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
Miguel A Andrade-Navarro Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
Andrey V Kajava Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier, Montpellier, France
Silvio C E Tosatto Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy

Collapse

Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019;47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open

Affiliation(s)

Ole K Tørresen Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
Bastiaan Star Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
Pablo Mier Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
Miguel A Andrade-Navarro Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
Alex Bateman European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
Patryk Jarnot Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
Aleksandra Gruca Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
Marcin Grynberg Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
Andrey V Kajava Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France Institut de Biologie Computationnelle, 34095 Montpellier, France
Vasilis J Promponas Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
Maria Anisimova Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
Kjetill S Jakobsen Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
Dirk Linke Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway

Collapse

Purcell O, Cao J, Müller IE, Chen YC, Lu TK. Artificial Repeat-Structured siRNA Precursors as Tunable Regulators for Saccharomyces cerevisiae. ACS Synth Biol 2018;7:2403-2412. [PMID: 30176724 DOI: 10.1021/acssynbio.8b00185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Jelovic AM, Mitic NS, Eshafah S, Beljanski MV. Finding Statistically Significant Repeats in Nucleic Acids and Proteins. J Comput Biol 2017;25:375-387. [PMID: 29272145 DOI: 10.1089/cmb.2017.0046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Islam Z, Nagampalli RSK, Fatima MT, Ashraf GM. New paradigm in ankyrin repeats: Beyond protein-protein interaction module. Int J Biol Macromol 2017;109:1164-1173. [PMID: 29157912 DOI: 10.1016/j.ijbiomac.2017.11.101] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Revised: 11/13/2017] [Accepted: 11/16/2017] [Indexed: 01/06/2023]

Kharrat N, Belmabrouk S, Abdelhedi R, Benmarzoug R, Assidi M, Al Qahtani MH, Rebai A. Screening for clusters of charge in human virus proteomes. BMC Genomics 2016;17:758. [PMID: 27766959 PMCID: PMC5073957 DOI: 10.1186/s12864-016-3086-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

Background

The identification of charge clusters (runs of charged residues) in proteins and their mapping within the protein structure sequence is an important step toward a comprehensive analysis of how these particular motifs mediate, via electrostatic interactions, various molecular processes such as protein sorting, translocation, docking, orientation and binding to DNA and to other proteins. Few algorithms that specifically identify these charge clusters have been designed and described in the literature. In this study, 197 distinctive human viral proteomes were screened for the occurrence of charge clusters (CC) using a new computational approach.

Results

Three hundred and seventy three CC have been identified within the 2549 viral protein sequences screened. The number of protein sequences that are CC-free is 2176 (85.3 %) while 150 and 180 proteins contained positive charge (PCC) and negative charge clusters (NCC), respectively. The NCCs (211 detected) were more prevalent than PCC (162). PCC-containing proteins are significantly longer than those having NCCs (p = 2.10^-16). The most prevalent virus families having PCC and NCC were Herpesviridae followed by Papillomaviridae. However, the single-strand RNA group has in average three times more NCC than PCC. According to the functional domain classification, a significant difference in distribution was observed between PCC and NCC (p = 2. 10⁻⁸) with the occurrence of NCCs being more frequent in C-terminal region while PCC more often fall within functional domains. Only 29 proteins sequences contained both NCC and PCC. Moreover, 101 NCC were conserved in 84 proteins while only 62 PCC were conserved in 60 protein sequences. To understand the mechanism by which the membrane translocation functionalities are embedded in viral proteins, we screened our PCC for sequences corresponding to cell-penetrating peptides (CPPs) using two online databases: CellPPd and CPPpred. We found that all our PCCs, having length varying from 7 to 30 amino-acids were predicted as CPPs. Experimental validation is required to improve our understanding of the role of these PCCs in viral infection process.

Conclusions

Screening distinctive cluster charges in viral proteomes suggested a functional role of these protein regions and might provide potential clues to improve the current understanding of viral diseases in order to tailor better preventive and therapeutic approaches.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-3086-3) contains supplementary material, which is available to authorized users.

Collapse

In search of the boundary between repetitive and non-repetitive protein sequences. Biochem Soc Trans 2016;43:807-11. [PMID: 26517886 DOI: 10.1042/bst20150073] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Richard FD, Alves R, Kajava AV. Tally: a scoring tool for boundary determination between repetitive and non-repetitive protein sequences. Bioinformatics 2016;32:1952-8. [PMID: 27153701 DOI: 10.1093/bioinformatics/btw118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/25/2016] [Indexed: 12/23/2022] Open

Do Viet P, Roche DB, Kajava AV. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Lett 2015;589:2611-9. [PMID: 26320412 DOI: 10.1016/j.febslet.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/10/2015] [Accepted: 08/13/2015] [Indexed: 10/23/2022]

Jernigan KK, Bordenstein SR. Tandem-repeat protein domains across the tree of life. PeerJ 2015;3:e732. [PMID: 25653910 PMCID: PMC4304861 DOI: 10.7717/peerj.732] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2014] [Accepted: 12/29/2014] [Indexed: 12/19/2022] Open

Richard FD, Kajava AV. TRDistiller: A rapid filter for enrichment of sequence datasets with proteins containing tandem repeats. J Struct Biol 2014;186:386-91. [DOI: 10.1016/j.jsb.2014.03.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Revised: 03/14/2014] [Accepted: 03/17/2014] [Indexed: 10/25/2022]

Detection, characterization and evolution of internal repeats in Chitinases of known 3-D structure. PLoS One 2014;9:e91915. [PMID: 24637574 PMCID: PMC3956812 DOI: 10.1371/journal.pone.0091915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2013] [Accepted: 02/17/2014] [Indexed: 11/24/2022] Open

María Velasco A, Becerra A, Hernández-Morales R, Delaye L, Jiménez-Corona ME, Ponce-de-Leon S, Lazcano A. Low complexity regions (LCRs) contribute to the hypervariability of the HIV-1 gp120 protein. J Theor Biol 2013;338:80-6. [PMID: 24021867 DOI: 10.1016/j.jtbi.2013.08.039] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Revised: 08/01/2013] [Accepted: 08/31/2013] [Indexed: 01/27/2023]

Abstract

Low complexity regions (LCRs) are sequences of nucleic acids or proteins defined by a compositional bias. Their occurrence has been confirmed in sequences of the three cellular lineages (Bacteria, Archaea and Eucarya), and has also been reported in viral genomes. We present here the results of a detailed computer analysis of the LCRs present in the HIV-1 glycoprotein 120 (gp120) encoded by the viral gene env. The analysis was performed using a sample of 3637 Env polyprotein sequences derived from 4117 completely sequenced and translated HIV-1 genomes available in public databases as of December 2012. We have identified 1229 LCRs located in four different regions of the gp120 protein that correspond to four of the five regions that have been identified as hypervariable (V1, V2, V4 and V5). The remaining 29 LCRs are found in the signal peptide and in the conserved regions C2, C3, C4 and C5. No LCR has been identified in the hypervariable region V3. The LCRs detected in the V1, V2, V4, and V5 hypervariable regions exhibit a high Asn content in their amino acid composition, which very likely correspond to glycosylation sites, which may contribute to the retroviral ability to avoid the immune system. In sharp contrast with what is observed in gp120 proteins lacking LCRs, the glycosylation sites present in LCRs tend to be clustered towards the center of the region forming well-defined islands. The results presented here suggest that LCRs represent a hitherto undescribed source of genomic variability in lentivirus, and that these repeats may represent an important source of antigenic variation in HIV-1 populations. The results reported here may exemplify the evolutionary processes that may have increased the size of primitive cellular RNA genomes and the role of LCRs as a source of raw material during the processes of evolutionary acquisition of new functions.

Collapse

Kajava AV. Tandem repeats in proteins: from sequence to structure. J Struct Biol 2011;179:279-88. [PMID: 21884799 DOI: 10.1016/j.jsb.2011.08.009] [Citation(s) in RCA: 159] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/15/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]

Babu V, Uthayakumar M, Kirti Vaishnavi M, Senthilkumar R, Shankar M, Archana C, Sathya Priya S, Sekar K. RPS: Repeats in Protein Sequences. J Appl Crystallogr 2011. [DOI: 10.1107/s0021889811009393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Jorda J, Xue B, Uversky VN, Kajava AV. Protein tandem repeats - the more perfect, the less structured. FEBS J 2010;277:2673-82. [PMID: 20553501 DOI: 10.1111/j.1742-464x.2010.07684.x] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Jorda J, Xue B, Uversky VN, Kajava AV. Protein tandem repeats - the more perfect, the less structured. FEBS J 2010. [DOI: 10.1111/j.1742-4658.2010.07684.x] [Citation(s) in RCA: 104] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Naamati G, Fromer M, Linial M. Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty? BMC Genomics 2009;10:593. [PMID: 20003297 PMCID: PMC2805694 DOI: 10.1186/1471-2164-10-593] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2009] [Accepted: 12/10/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The complete proteome of the starlet sea anemone, Nematostella vectensis, provides insights into gene invention dating back to the Cnidarian-Bilaterian ancestor. With the addition of the complete proteomes of Hydra magnipapillata and Monosiga brevicollis, the investigation of proteins having unique features in early metazoan life has become practical. We focused on the properties and the evolutionary trends of tandem repeat (TR) sequences in Cnidaria proteomes.

RESULTS

We found that 11-16% of N. vectensis proteins contain tandem repeats. Most TRs cover 150 amino acid segments that are comprised of basic units of 5-20 amino acids. In total, the N. Vectensis proteome has about 3300 unique TR-units, but only a small fraction of them are shared with H. magnipapillata, M. brevicollis, or mammalian proteomes. The overall abundance of these TRs stands out relative to that of 14 proteomes representing the diversity among eukaryotes and within the metazoan world. TR-units are characterized by a unique composition of amino acids, with cysteine and histidine being over-represented. Structurally, most TR-segments are associated with coiled and disordered regions. Interestingly, 80% of the TR-segments can be read in more than one open reading frame. For over 100 of them, translation of the alternative frames would result in long proteins. Most domain families that are characterized as repeats in eukaryotes are found in the TR-proteomes from Nematostella and Hydra.

CONCLUSIONS

While most TR-proteins have originated from prediction tools and are still awaiting experimental validations, supportive evidence exists for hundreds of TR-units in Nematostella. The existence of TR-proteins in early metazoan life may have served as a robust mode for novel genes with previously overlooked structural and functional characteristics.

Collapse

Sandhya S, Rani SS, Pankaj B, Govind MK, Offmann B, Srinivasan N, Sowdhamini R. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One 2009;4:e4981. [PMID: 19333395 PMCID: PMC2659687 DOI: 10.1371/journal.pone.0004981] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2008] [Accepted: 02/26/2009] [Indexed: 11/24/2022] Open

Abstract

Background

Related protein domains of a superfamily can be specified by proteins of diverse lengths. The structural and functional implications of indels in a domain scaffold have been examined.

Methodology

In this study, domain superfamilies with large length variations (more than 30% difference from average domain size, referred as ‘length-deviant’ superfamilies and ‘length-rigid’ domain superfamilies (<10% length difference from average domain size) were analyzed for the functional impact of such structural differences. Our delineated dataset, derived from an objective algorithm, enables us to address indel roles in the presence of peculiar structural repeats, functional variation, protein-protein interactions and to examine ‘domain contexts’ of proteins tolerant to large length variations. Amongst the top-10 length-deviant superfamilies analyzed, we found that 80% of length-deviant superfamilies possess distant internal structural repeats and nearly half of them acquired diverse biological functions. In general, length-deviant superfamilies have higher chance, than length-rigid superfamilies, to be engaged in internal structural repeats. We also found that ∼40% of length-deviant domains exist as multi-domain proteins involving interactions with domains from the same or other superfamilies. Indels, in diverse domain superfamilies, were found to participate in the accretion of structural and functional features amongst related domains. With specific examples, we discuss how indels are involved directly or indirectly in the generation of oligomerization interfaces, introduction of substrate specificity, regulation of protein function and stability.

Conclusions

Our data suggests a multitude of roles for indels that are specialized for domain members of different domain superfamilies. These specialist roles that we observe and trends in the extent of length variation could influence decision making in modeling of new superfamily members. Likewise, the observed limits of length variation, specific for each domain superfamily would be particularly relevant in the choice of alignment length search filters commonly applied in protein sequence analysis.

Collapse

Sarani R, Udayaprakash NA, Subashini R, Mridula P, Yamane T, Sekar K. Large cryptic internal sequence repeats in protein structures from Homo sapiens. J Biosci 2009;34:103-12. [DOI: 10.1007/s12038-009-0012-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Wagner H, Morgenstern B, Dress A. Stability of multiple alignments and phylogenetic trees: an analysis of ABC-transporter proteins family. Algorithms Mol Biol 2008;3:15. [PMID: 18990223 PMCID: PMC2637874 DOI: 10.1186/1748-7188-3-15] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2008] [Accepted: 11/06/2008] [Indexed: 11/17/2022] Open

Simossis V, Kleinjung J, Heringa J. An overview of multiple sequence alignment. CURRENT PROTOCOLS IN BIOINFORMATICS 2008;Chapter 3:3.7.1-3.7.26. [PMID: 18428699 DOI: 10.1002/0471250953.bi0307s03] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Cheng H, Kim BH, Grishin NV. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs. Proteins 2008;70:1162-6. [PMID: 17932926 DOI: 10.1002/prot.21783] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Barney BM. Classification of proteins based on minimal modular repeats: lessons from nature in protein design. J Proteome Res 2007;5:473-82. [PMID: 16512661 DOI: 10.1021/pr050103m] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Laskin AA, Skryabin KG, Korotkov EV. Latent Periodicity of Protein Families, Identified with the Indel-Aware Algorithm. J Proteome Res 2007;6:862-8. [PMID: 17269743 DOI: 10.1021/pr0603203] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Achmüller C, Werther F, Wechner P, Auer B. Synthesis of genes with multiple identical domains. Biotechniques 2007;42:43-4, 46. [PMID: 17269484 DOI: 10.2144/000112313] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Turutina VP, Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV. Identification of amino acid latent periodicity within 94 protein families. J Comput Biol 2006;13:946-64. [PMID: 16761920 DOI: 10.1089/cmb.2006.13.946] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Morgenstern B, Prohaska SJ, Pöhler D, Stadler PF. Multiple sequence alignment with user-defined anchor points. Algorithms Mol Biol 2006;1:6. [PMID: 16722533 PMCID: PMC1481597 DOI: 10.1186/1748-7188-1-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2006] [Accepted: 04/19/2006] [Indexed: 11/15/2022] Open

Turutina VP, Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV. Identification of latent periodicity in amino acid sequences of protein families. BIOCHEMISTRY. BIOKHIMIIA 2006;71:18-31. [PMID: 16457614 DOI: 10.1134/s0006297906010032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Fadiel A, Eichenbaum KD, Hamza A. 'Genomemark': detecting word periodicity in biological sequences. J Biomol Struct Dyn 2005;23:457-64. [PMID: 16363880 DOI: 10.1080/07391102.2006.10507071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Sivaraja V, Kumar TKS, Leena PST, Chang AN, Vidya C, Goforth RL, Rajalingam D, Arvind K, Ye JL, Chou J, Henry R, Yu C. Three-dimensional solution structures of the chromodomains of cpSRP43. J Biol Chem 2005;280:41465-71. [PMID: 16183644 DOI: 10.1074/jbc.m507077200] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV. Latent periodicity of serine-threonine and tyrosine protein kinases and other protein families. Comput Biol Chem 2005;29:229-43. [PMID: 15979043 DOI: 10.1016/j.compbiolchem.2005.04.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2004] [Revised: 04/18/2005] [Accepted: 04/18/2005] [Indexed: 11/22/2022]

Cheng H, Grishin NV. DOM-fold: a structure with crossing loops found in DmpA, ornithine acetyltransferase, and molybdenum cofactor-binding domain. Protein Sci 2005;14:1902-10. [PMID: 15937278 PMCID: PMC2253344 DOI: 10.1110/ps.051364905] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Murray KB, Taylor WR, Thornton JM. Toward the detection and validation of repeats in protein structure. Proteins 2005;57:365-80. [PMID: 15340924 DOI: 10.1002/prot.20202] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Fadiel A, Lithwick S, Ganji G, Scherer SW. Remarkable sequence signatures in archaeal genomes. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2005;1:185-90. [PMID: 15803664 PMCID: PMC2685567 DOI: 10.1155/2003/458235] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV. Latent Periodicity of Serine/Threonine and Tyrosine Protein Kinases and Other Protein Families. Mol Biol 2005. [DOI: 10.1007/s11008-005-0052-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Wong JL, Wessel GM. Major components of a sea urchin block to polyspermy are structurally and functionally conserved. Evol Dev 2005;6:134-53. [PMID: 15099301 DOI: 10.1111/j.1525-142x.2004.04019.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Mosavi LK, Cammett TJ, Desrosiers DC, Peng ZY. The ankyrin repeat as molecular architecture for protein recognition. Protein Sci 2005;13:1435-48. [PMID: 15152081 PMCID: PMC2279977 DOI: 10.1110/ps.03554604] [Citation(s) in RCA: 638] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Shih ESC, Hwang MJ. Alternative alignments from comparison of protein structures. Proteins 2004;56:519-27. [PMID: 15229884 DOI: 10.1002/prot.20124] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Freiberg A, Machner MP, Pfeil W, Schubert WD, Heinz DW, Seckler R. Folding and stability of the leucine-rich repeat domain of internalin B from Listeri monocytogenes. J Mol Biol 2004;337:453-61. [PMID: 15003459 DOI: 10.1016/j.jmb.2004.01.044] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2003] [Revised: 01/08/2004] [Accepted: 01/23/2004] [Indexed: 11/26/2022]

Bondareva AA, Schmidt EE. Early vertebrate evolution of the TATA-binding protein, TBP. Mol Biol Evol 2003;20:1932-9. [PMID: 12885957 PMCID: PMC2577151 DOI: 10.1093/molbev/msg205] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mosavi LK, Williams S, Peng Zy ZY. Equilibrium folding and stability of myotrophin: a model ankyrin repeat protein. J Mol Biol 2002;320:165-70. [PMID: 12079376 DOI: 10.1016/s0022-2836(02)00441-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Heringa J. Local weighting schemes for protein multiple sequence alignment. COMPUTERS & CHEMISTRY 2002;26:459-77. [PMID: 12144176 DOI: 10.1016/s0097-8485(02)00008-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

George RA, Heringa J. SnapDRAGON: a method to delineate protein structural domains from sequence data. J Mol Biol 2002;316:839-51. [PMID: 11866536 DOI: 10.1006/jmbi.2001.5387] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Murray KB, Gorse D, Thornton JM. Wavelet transforms for the characterization and detection of repeating motifs. J Mol Biol 2002;316:341-63. [PMID: 11851343 DOI: 10.1006/jmbi.2001.5332] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Kajava AV. Review: proteins with repeated sequence--structural prediction and modeling. J Struct Biol 2001;134:132-44. [PMID: 11551175 DOI: 10.1006/jsbi.2000.4328] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Heger A, Holm L. Rapid automatic detection and alignment of repeats in protein sequences. Proteins 2000. [DOI: 10.1002/1097-0134(20001101)41:2%3c224::aid-prot70%3e3.0.co;2-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]