Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR. Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 2007;7:19. [PMID: 17298668 PMCID: PMC1805498 DOI: 10.1186/1471-2148-7-19] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2007] [Accepted: 02/13/2007] [Indexed: 01/28/2023] Open

For:	Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR. Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 2007;7:19. [PMID: 17298668 PMCID: PMC1805498 DOI: 10.1186/1471-2148-7-19] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2007] [Accepted: 02/13/2007] [Indexed: 01/28/2023] Open

Number

Cited by Other Article(s)

Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024;41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open

Tanoz I, Timsit Y. Protein Fold Usages in Ribosomes: Another Glance to the Past. Int J Mol Sci 2024;25:8806. [PMID: 39201491 PMCID: PMC11354259 DOI: 10.3390/ijms25168806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open

Abstract

The analysis of protein fold usage, similar to codon usage, offers profound insights into the evolution of biological systems and the origins of modern proteomes. While previous studies have examined fold distribution in modern genomes, our study focuses on the comparative distribution and usage of protein folds in ribosomes across bacteria, archaea, and eukaryotes. We identify the prevalence of certain 'super-ribosome folds,' such as the OB fold in bacteria and the SH3 domain in archaea and eukaryotes. The observed protein fold distribution in the ribosomes announces the future power-law distribution where only a few folds are highly prevalent, and most are rare. Additionally, we highlight the presence of three copies of proto-Rossmann folds in ribosomes across all kingdoms, showing its ancient and fundamental role in ribosomal structure and function. Our study also explores early mechanisms of molecular convergence, where different protein folds bind equivalent ribosomal RNA structures in ribosomes across different kingdoms. This comparative analysis enhances our understanding of ribosomal evolution, particularly the distinct evolutionary paths of the large and small subunits, and underscores the complex interplay between RNA and protein components in the transition from the RNA world to modern cellular life. Transcending the concept of folds also makes it possible to group a large number of ribosomal proteins into five categories of urfolds or metafolds, which could attest to their ancestral character and common origins. This work also demonstrates that the gradual acquisition of extensions by simple but ordered folds constitutes an inexorable evolutionary mechanism. This observation supports the idea that simple but structured ribosomal proteins preceded the development of their disordered extensions.

Collapse

Jilani M, Turcan A, Haspel N, Jagodzinski F. Elucidating the Structural Impacts of Protein InDels. Biomolecules 2022;12:1435. [PMID: 36291643 PMCID: PMC9599607 DOI: 10.3390/biom12101435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 09/17/2023] Open

Huded AKC, Jingade P, Mishra MK, Ercisli S, Ilhan G, Marc RA, Vodnar D. Comparative genomic analysis and phylogeny of NAC25 gene from cultivated and wild Coffea species. FRONTIERS IN PLANT SCIENCE 2022;13:1009733. [PMID: 36186041 PMCID: PMC9523601 DOI: 10.3389/fpls.2022.1009733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 08/30/2022] [Indexed: 06/16/2023]

Abstract

Coffee is a high value agricultural commodity grown in about 80 countries. Sustainable coffee cultivation is hampered by multiple biotic and abiotic stress conditions predominantly driven by climate change. The NAC proteins are plants specific transcription factors associated with various physiological functions in plants which include cell division, secondary wall formation, formation of shoot apical meristem, leaf senescence, flowering embryo and seed development. Besides, they are also involved in biotic and abiotic stress regulation. Due to their ubiquitous influence, studies on NAC transcription factors have gained momentum in different crop plant species. In the present study, NAC25 like transcription factor was isolated and characterized from two cultivated coffee species, Coffea arabica and Coffea canephora and five Indian wild coffee species for the first time. The full-length NAC25 gene varied from 2,456 bp in Coffea jenkinsii to 2,493 bp in C. arabica. In all the seven coffee species, sequencing of the NAC25 gene revealed 3 exons and 2 introns. The NAC25 gene is characterized by a highly conserved 377 bp NAM domain (N-terminus) and a highly variable C terminus region. The sequence analysis revealed an average of one SNP per every 40.92 bp in the coding region and 37.7 bp in the intronic region. Further, the non-synonymous SNPs are 8-11 fold higher compared to synonymous SNPs in the non-coding and coding region of the NAC25 gene, respectively. The expression of NAC25 gene was studied in six different tissue types in C. canephora and higher expression levels were observed in leaf and flower tissues. Further, the relative expression of NAC25 in comparison with the GAPDH gene revealed four folds and eight folds increase in expression levels in green fruit and ripen fruit, respectively. The evolutionary relationship revealed the independent evolution of the NAC25 gene in coffee.

Collapse

Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021;38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Ali S, Liu X, Sen L, Lan D, Wang J, Hassan MI, Wang Y. Sequence and structure-based method to predict diacylglycerol lipases in protein sequence. Int J Biol Macromol 2021;182:455-463. [PMID: 33836195 DOI: 10.1016/j.ijbiomac.2021.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/30/2021] [Accepted: 04/03/2021] [Indexed: 11/17/2022]

Gangi Setty T, Sarkar A, Coombes D, Dobson RCJ, Subramanian R. Structure and Function of N-Acetylmannosamine Kinases from Pathogenic Bacteria. ACS OMEGA 2020;5:30923-30936. [PMID: 33324800 PMCID: PMC7726757 DOI: 10.1021/acsomega.0c03699] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]

Abstract

Several pathogenic bacteria import and catabolize sialic acids as a source of carbon and nitrogen. Within the sialic acid catabolic pathway, the enzyme N-acetylmannosamine kinase (NanK) catalyzes the phosphorylation of N-acetylmannosamine to N-acetylmannosamine-6-phosphate. This kinase belongs to the ROK superfamily of enzymes, which generally contain a conserved zinc-finger (ZnF) motif that is important for their structure and function. Previous structural studies have shown that the ZnF motif is absent in NanK of Fusobacterium nucleatum (Fn-NanK), a Gram-negative bacterium that causes the gum disease gingivitis. However, the effect in loss of the ZnF motif on the kinase activity is unknown. Using kinetic and thermodynamic studies, we have studied the functional properties of Fn-NanK to its substrates ManNAc and ATP, compared its activity with other ZnF motif-containing NanK enzymes from closely related Gram-negative pathogenic bacteria Haemophilus influenzae (Hi-NanK), Pasteurella multocida (Pm-NanK), and Vibrio cholerae (Vc-NanK). Our studies show a 10-fold decrease in substrate binding affinity between Fn-NanK (apparent K_M ≈ 700 μM) and ZnF motif-containing NanKs (apparent K_M ≈ 60 μM). To understand the structural features that combat the loss of the ZnF motif in Fn-NanK, we solved the crystal structures of functionally homologous ZnF motif-containing NanKs from P. multocida and H. influenzae. Here, we report Pm-NanK:unliganded, Pm-NanK:AMPPNP, Pm-NanK:ManNAc, Hi-NanK:ManNAc, and Hi-NanK:ManNAc-6P:ADP crystal structures. Structural comparisons of Fn-NanK with Hi-NanK, Pm-NanK, and hMNK (human N-acetylmannosamine kinase domain of UDP-N-acetylglucosamine-2-epimerase/N-acetylmannosamine kinase, GNE) show that even though there is less sequence identity, they have high degree of structural similarity. Furthermore, our structural analyses highlight that the ZnF motif of Fn-NanK is substituted by a set of hydrophobic residues, which forms a hydrophobic cluster that helps the proper orientation of ManNac in the active site. In summary, ZnF-containing and ZnF-lacking NanK enzymes from different Gram-negative pathogenic bacteria are functionally very similar but differ in their metal requirement. Our structural studies unveil the structural modifications in Fn-NanK that compensate the loss of the ZnF motif in comparison to other NanK enzymes.

Collapse

Wu D, Liu A, Qu X, Liang J, Song M. Genome-wide identification, and phylogenetic and expression profiling analyses, of XTH gene families in Brassica rapa L. and Brassica oleracea L. BMC Genomics 2020;21:782. [PMID: 33176678 PMCID: PMC7656703 DOI: 10.1186/s12864-020-07153-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/14/2020] [Indexed: 12/26/2022] Open

Abstract

BACKGROUND

Xyloglucan endotransglucosylase/hydrolase genes (XTHs) are a multigene family and play key roles in regulating cell wall extensibility in plant growth and development. Brassica rapa and Brassica oleracea contain XTHs, but detailed identification and characterization of the XTH family in these species, and analysis of their tissue expression profiles, have not previously been carried out.

RESULTS

In this study, 53 and 38 XTH genes were identified in B. rapa and B. oleracea respectively, which contained some novel members not observed in previous studies. All XTHs of B. rapa, B. oleracea and Arabidopsis thaliana could be classified into three groups, Group I/II, III and the Early diverging group, based on phylogenetic relationships. Gene structures and motif patterns were similar within each group. All XTHs in this study contained two characteristic conserved domains (Glyco_hydro and XET_C). XTHs are located mainly in the cell wall but some are also located in the cytoplasm. Analyses of the mechanisms of gene family expansion revealed that whole-genome triplication (WGT) events and tandem duplication (TD) may have been the major mechanisms accounting for the expansion of the XTH gene family. Interestingly, TD genes all belonged to Group I/II, suggesting that TD was the main reason for the largest number of genes being in these groups. B. oleracea had lost more of the XTH genes, the conserved domain XET_C and the conserved active-site motif EXDXE compared with B. rapa, consistent with asymmetrical evolution between the two Brassica genomes. A majority of XTH genes exhibited different tissue-specific expression patterns based on RNA-seq data analyses. Moreover, there was differential expression of duplicated XTH genes in the two species, indicating that their functional differentiation occurred after B. rapa and B. oleracea diverged from a common ancestor.

CONCLUSIONS

We carried out the first systematic analysis of XTH gene families in B. rapa and B. oleracea. The results of this investigation can be used for reference in further studies on the functions of XTH genes and the evolution of this multigene family.

Collapse

Liang Z, Li M, Liu Z, Wang J. Genome-wide identification and characterization of the Hsp70 gene family in allopolyploid rapeseed (Brassica napus L.) compared with its diploid progenitors. PeerJ 2019;7:e7511. [PMID: 31497395 PMCID: PMC6707343 DOI: 10.7717/peerj.7511] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 07/17/2019] [Indexed: 11/27/2022] Open

Thomas BT, Ogunkanmi LA, Iwalokun BA, Popoola OD. Transition-transversion mutations in the polyketide synthase gene of Aspergillus section Nigri. Heliyon 2019;5:e01881. [PMID: 31338447 PMCID: PMC6579908 DOI: 10.1016/j.heliyon.2019.e01881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/25/2019] [Accepted: 05/30/2019] [Indexed: 11/21/2022] Open

Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018;86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]

Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013;13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open

Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012;29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Zhang Z, Xing C, Wang L, Gong B, Liu H. IndelFR: a database of indels in protein structures and their flanking regions. Nucleic Acids Res 2011;40:D512-8. [PMID: 22127860 PMCID: PMC3245007 DOI: 10.1093/nar/gkr1107] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Giacopuzzi E, Barlati S, Preti A, Venerando B, Monti E, Borsani G, Bresciani R. Gallus gallus NEU3 sialidase as model to study protein evolution mechanism based on rapid evolving loops. BMC BIOCHEMISTRY 2011;12:45. [PMID: 21861893 PMCID: PMC3179935 DOI: 10.1186/1471-2091-12-45] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Accepted: 08/23/2011] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Large surface loops contained within compact protein structures and not involved in catalytic process have been proposed as preferred regions for protein family evolution. These loops are subjected to lower sequence constraints and can evolve rapidly in novel structural variants. A good model to study this hypothesis is represented by sialidase enzymes. Indeed, the structure of sialidases is a β-propeller composed by anti-parallel β-sheets connected by loops that suit well with the rapid evolving loop hypothesis. These features prompted us to extend our studies on this protein family in birds, to get insights on the evolution of this class of glycohydrolases.

RESULTS

Gallus gallus (Gg) genome contains one NEU3 gene encoding a protein with a unique 188 amino acid sequence mainly constituted by a peptide motif repeated six times in tandem with no homology with any other known protein sequence. The repeat region is located at the same position as the roughly 80 amino acid loop characteristic of mammalian NEU4. Based on molecular modeling, all these sequences represent a connecting loop between the first two highly conserved β-strands of the fifth blade of the sialidase β-propeller. Moreover this loop is highly variable in sequence and size in NEU3 sialidases from other vertebrates. Finally, we found that the general enzymatic properties and subcellular localization of Gg NEU3 are not influenced by the deletion of the repeat sequence.

CONCLUSION

In this study we demonstrated that sialidase protein structure contains a surface loop, highly variable both in sequence and size, connecting two conserved β-sheets and emerging on the opposite site of the catalytic crevice. These data confirm that sialidase family can serve as suitable model for the study of the evolutionary process based on rapid evolving loops, which may had occurred in sialidases. Giving the peculiar organization of the loop region identified in Gg NEU3, this protein can be considered of particular interest in such evolutionary studies and to get deeper insights in sialidase evolution.

Collapse

Paśko Ł, Ericson PGP, Elzanowski A. Phylogenetic utility and evolution of indels: a study in neognathous birds. Mol Phylogenet Evol 2011;61:760-71. [PMID: 21843647 DOI: 10.1016/j.ympev.2011.07.021] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 07/28/2011] [Accepted: 07/30/2011] [Indexed: 11/25/2022]

Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2011;18:1522-35. [PMID: 21070951 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]

Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proc Natl Acad Sci U S A 2010;107:20352-7. [PMID: 21048085 DOI: 10.1073/pnas.1012999107] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Höhne M, Schätzle S, Jochens H, Robins K, Bornscheuer UT. Rational assignment of key motifs for function guides in silico enzyme identification. Nat Chem Biol 2010;6:807-13. [PMID: 20871599 DOI: 10.1038/nchembio.447] [Citation(s) in RCA: 284] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 08/23/2010] [Indexed: 11/09/2022]

Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol 2010;28:291-301. [PMID: 20671041 DOI: 10.1093/molbev/msq196] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hashimoto K, Madej T, Bryant SH, Panchenko AR. Functional states of homooligomers: insights from the evolution of glycosyltransferases. J Mol Biol 2010;399:196-206. [PMID: 20381499 DOI: 10.1016/j.jmb.2010.03.059] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 03/29/2010] [Accepted: 03/29/2010] [Indexed: 02/02/2023]

Abstract

Glycosylation is an important aspect of epigenetic regulation. Glycosyltransferase is a key enzyme in the biosynthesis of glycans, which glycosylates more than half of all proteins in eukaryotes and is involved in a wide range of biological processes. It has been suggested previously that homooligomerization in glycosyltransferases and other proteins might be crucial for their function. In this study, we explore functional homooligomeric states of glycosyltransferases in various organisms, trace their evolution, and perform comparative analyses to find structural features that can mediate or disrupt the formation of different homooligomers. First, we make a structure-based classification of the diverse superfamily of glycosyltransferases and confirm that the majority of the structures are indeed clustered into the GT-A or GT-B folds. We find that homooligomeric glycosyltransferases appear to be as ancient as monomeric glycosyltransferases and go back in evolution to the last universal common ancestor (LUCA). Moreover, we show that interface residues have significant bias to be gapped out or unaligned in the monomers, implying that they might represent features crucial for oligomer formation. Structural analysis of these features reveals that the majority of them represent loops, terminal regions, and helices, indicating that these secondary-structure elements mediate the formation of glycosyltransferases' homooligomers and directly contribute to the specific binding. We also observe relatively short protein regions that disrupt the homodimer interactions, although such cases are rare. These results suggest that relatively small structural changes in the nonconserved regions may contribute to the formation of different functional oligomeric states and might be important in regulation of enzyme activity through homooligomerization.

Collapse

Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009;18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Sandhya S, Rani SS, Pankaj B, Govind MK, Offmann B, Srinivasan N, Sowdhamini R. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One 2009;4:e4981. [PMID: 19333395 PMCID: PMC2659687 DOI: 10.1371/journal.pone.0004981] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2008] [Accepted: 02/26/2009] [Indexed: 11/24/2022] Open

Abstract

Background

Related protein domains of a superfamily can be specified by proteins of diverse lengths. The structural and functional implications of indels in a domain scaffold have been examined.

Methodology

In this study, domain superfamilies with large length variations (more than 30% difference from average domain size, referred as ‘length-deviant’ superfamilies and ‘length-rigid’ domain superfamilies (<10% length difference from average domain size) were analyzed for the functional impact of such structural differences. Our delineated dataset, derived from an objective algorithm, enables us to address indel roles in the presence of peculiar structural repeats, functional variation, protein-protein interactions and to examine ‘domain contexts’ of proteins tolerant to large length variations. Amongst the top-10 length-deviant superfamilies analyzed, we found that 80% of length-deviant superfamilies possess distant internal structural repeats and nearly half of them acquired diverse biological functions. In general, length-deviant superfamilies have higher chance, than length-rigid superfamilies, to be engaged in internal structural repeats. We also found that ∼40% of length-deviant domains exist as multi-domain proteins involving interactions with domains from the same or other superfamilies. Indels, in diverse domain superfamilies, were found to participate in the accretion of structural and functional features amongst related domains. With specific examples, we discuss how indels are involved directly or indirectly in the generation of oligomerization interfaces, introduction of substrate specificity, regulation of protein function and stability.

Conclusions

Our data suggests a multitude of roles for indels that are specialized for domain members of different domain superfamilies. These specialist roles that we observe and trends in the extent of length variation could influence decision making in modeling of new superfamily members. Likewise, the observed limits of length variation, specific for each domain superfamily would be particularly relevant in the choice of alignment length search filters commonly applied in protein sequence analysis.

Collapse

Wang Z, Martin J, Abubucker S, Yin Y, Gasser RB, Mitreva M. Systematic analysis of insertions and deletions specific to nematode proteins and their proposed functional and evolutionary relevance. BMC Evol Biol 2009;9:23. [PMID: 19175938 PMCID: PMC2644674 DOI: 10.1186/1471-2148-9-23] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 01/28/2009] [Indexed: 11/25/2022] Open

Abstract

Background

Amino acid insertions and deletions in proteins are considered relatively rare events, and their associations with the evolution and adaptation of organisms are not yet understood. In this study, we undertook a systematic analysis of over 214,000 polypeptides from 32 nematode species and identified insertions and deletions unique to nematode proteins in more than 1000 families and provided indirect evidence that these alterations are linked to the evolution and adaptation of nematodes.

Results

Amino acid alterations in sequences of nematodes were identified by comparison with homologous sequences from a wide range of eukaryotic (metzoan) organisms. This comparison revealed that the proteins inferred from transcriptomic datasets for nematodes contained more deletions than insertions, and that the deletions tended to be larger in length than insertions, indicating a decreased size of the transcriptome of nematodes compared with other organisms. The present findings showed that this reduction is more pronounced in parasitic nematodes compared with the free-living nematodes of the genus Caenorhabditis. Consistent with a requirement for conservation in proteins involved in the processing of genetic information, fewer insertions and deletions were detected in such proteins. On the other hand, more insertions and deletions were recorded for proteins inferred to be involved in the endocrine and immune systems, suggesting a link with adaptation. Similarly, proteins involved in multiple cellular pathways tended to display more deletions and insertions than those involved in a single pathway. The number of insertions and deletions shared by a range of plant parasitic nematodes were higher for proteins involved in lipid metabolism and electron transport compared with other nematodes, suggesting an association between metabolic adaptation and parasitism in plant hosts. We also identified three sizable deletions from proteins found to be specific to and shared by parasitic nematodes, which, given their uniqueness, might serve as target candidates for drug design.

Conclusion

This study illustrates the significance of using comparative genomics approaches to identify molecular elements unique to parasitic nematodes, which have adapted to a particular host organism and mode of existence during evolution. While the focus of this study was on nematodes, the approach has applicability to a wide range of other groups of organisms.

Collapse

Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008;18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]

Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 2008;9:353. [PMID: 18752676 PMCID: PMC2542381 DOI: 10.1186/1471-2105-9-353] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 08/27/2008] [Indexed: 01/21/2023] Open

Abstract

Background

Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes.

Results

Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries.

Conclusion

MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.

Collapse

Sandhya S, Pankaj B, Govind MK, Offmann B, Srinivasan N, Sowdhamini R. CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations. BMC STRUCTURAL BIOLOGY 2008;8:28. [PMID: 18513436 PMCID: PMC2423364 DOI: 10.1186/1472-6807-8-28] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 05/31/2008] [Indexed: 11/10/2022]

Jiang H, Blouin C. Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions. BMC Bioinformatics 2007;8:444. [PMID: 18005425 PMCID: PMC2225427 DOI: 10.1186/1471-2105-8-444] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Accepted: 11/15/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families.

RESULTS

Phylogenies were reconstructed from protein 3D structural data. The phylogenetic trees were used to infer ancestral structures with a consensus method. From these ancestral reconstructions, 42.7% of the observed insertions are nested insertions, which locate in previous insert regions. The average size of inserts tends to increase with the insert rank or total number of insertions in the variable regions. We found that the structures of some nested inserts show complex or even domain-like fold patterns with helices, strands and loops. Furthermore, a basal level of structural innovation was found in inserts which displayed a significant structural similarity exclusively to themselves. The beta-Lactamase/D-ala carboxypeptidase domain family is provided as an example to illustrate the inference of insertion events, and how the incremental growth of a variable region is capable to generate novel structural patterns.

CONCLUSION

Using 3D data, we proposed a method to reconstruct phylogenies. We applied the method to reconstruct the sequences of insertion events leading to the emergence of potentially novel structural elements within existing protein domains. The results suggest that structural innovation is possible via the stochastic process of insertions and rapid evolution within variable regions where inserts tend to be nested. We also demonstrate that the structure-based phylogeny enables the study of new questions relating to the evolution of protein domain and biological function.

Collapse