1
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
2
|
Tanoz I, Timsit Y. Protein Fold Usages in Ribosomes: Another Glance to the Past. Int J Mol Sci 2024; 25:8806. [PMID: 39201491 PMCID: PMC11354259 DOI: 10.3390/ijms25168806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open
Abstract
The analysis of protein fold usage, similar to codon usage, offers profound insights into the evolution of biological systems and the origins of modern proteomes. While previous studies have examined fold distribution in modern genomes, our study focuses on the comparative distribution and usage of protein folds in ribosomes across bacteria, archaea, and eukaryotes. We identify the prevalence of certain 'super-ribosome folds,' such as the OB fold in bacteria and the SH3 domain in archaea and eukaryotes. The observed protein fold distribution in the ribosomes announces the future power-law distribution where only a few folds are highly prevalent, and most are rare. Additionally, we highlight the presence of three copies of proto-Rossmann folds in ribosomes across all kingdoms, showing its ancient and fundamental role in ribosomal structure and function. Our study also explores early mechanisms of molecular convergence, where different protein folds bind equivalent ribosomal RNA structures in ribosomes across different kingdoms. This comparative analysis enhances our understanding of ribosomal evolution, particularly the distinct evolutionary paths of the large and small subunits, and underscores the complex interplay between RNA and protein components in the transition from the RNA world to modern cellular life. Transcending the concept of folds also makes it possible to group a large number of ribosomal proteins into five categories of urfolds or metafolds, which could attest to their ancestral character and common origins. This work also demonstrates that the gradual acquisition of extensions by simple but ordered folds constitutes an inexorable evolutionary mechanism. This observation supports the idea that simple but structured ribosomal proteins preceded the development of their disordered extensions.
Collapse
Affiliation(s)
- Inzhu Tanoz
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
| | - Youri Timsit
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 Rue Michel-Ange, 75016 Paris, France
| |
Collapse
|
3
|
Jilani M, Turcan A, Haspel N, Jagodzinski F. Elucidating the Structural Impacts of Protein InDels. Biomolecules 2022; 12:1435. [PMID: 36291643 PMCID: PMC9599607 DOI: 10.3390/biom12101435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 09/17/2023] Open
Abstract
The effects of amino acid insertions and deletions (InDels) remain a rather under-explored area of structural biology. These variations oftentimes are the cause of numerous disease phenotypes. In spite of this, research to study InDels and their structural significance remains limited, primarily due to a lack of experimental information and computational methods. In this work, we fill this gap by modeling InDels computationally; we investigate the rigidity differences between the wildtype and a mutant variant with one or more InDels. Further, we compare how structural effects due to InDels differ from the effects of amino acid substitutions, which are another type of amino acid mutation. We finish by performing a correlation analysis between our rigidity-based metrics and wet lab data for their ability to infer the effects of InDels on protein fitness.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Alistair Turcan
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Filip Jagodzinski
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| |
Collapse
|
4
|
Huded AKC, Jingade P, Mishra MK, Ercisli S, Ilhan G, Marc RA, Vodnar D. Comparative genomic analysis and phylogeny of NAC25 gene from cultivated and wild Coffea species. FRONTIERS IN PLANT SCIENCE 2022; 13:1009733. [PMID: 36186041 PMCID: PMC9523601 DOI: 10.3389/fpls.2022.1009733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 08/30/2022] [Indexed: 06/16/2023]
Abstract
Coffee is a high value agricultural commodity grown in about 80 countries. Sustainable coffee cultivation is hampered by multiple biotic and abiotic stress conditions predominantly driven by climate change. The NAC proteins are plants specific transcription factors associated with various physiological functions in plants which include cell division, secondary wall formation, formation of shoot apical meristem, leaf senescence, flowering embryo and seed development. Besides, they are also involved in biotic and abiotic stress regulation. Due to their ubiquitous influence, studies on NAC transcription factors have gained momentum in different crop plant species. In the present study, NAC25 like transcription factor was isolated and characterized from two cultivated coffee species, Coffea arabica and Coffea canephora and five Indian wild coffee species for the first time. The full-length NAC25 gene varied from 2,456 bp in Coffea jenkinsii to 2,493 bp in C. arabica. In all the seven coffee species, sequencing of the NAC25 gene revealed 3 exons and 2 introns. The NAC25 gene is characterized by a highly conserved 377 bp NAM domain (N-terminus) and a highly variable C terminus region. The sequence analysis revealed an average of one SNP per every 40.92 bp in the coding region and 37.7 bp in the intronic region. Further, the non-synonymous SNPs are 8-11 fold higher compared to synonymous SNPs in the non-coding and coding region of the NAC25 gene, respectively. The expression of NAC25 gene was studied in six different tissue types in C. canephora and higher expression levels were observed in leaf and flower tissues. Further, the relative expression of NAC25 in comparison with the GAPDH gene revealed four folds and eight folds increase in expression levels in green fruit and ripen fruit, respectively. The evolutionary relationship revealed the independent evolution of the NAC25 gene in coffee.
Collapse
Affiliation(s)
- Arun Kumar C. Huded
- Plant Biotechnology Division, Unit of Central Coffee Research Institute, Coffee Board, Mysore, Karnataka, India
| | - Pavankumar Jingade
- Plant Biotechnology Division, Unit of Central Coffee Research Institute, Coffee Board, Mysore, Karnataka, India
| | - Manoj Kumar Mishra
- Plant Biotechnology Division, Unit of Central Coffee Research Institute, Coffee Board, Mysore, Karnataka, India
| | - Sezai Ercisli
- Department of Horticulture, Faculty of Agriculture, Erzurum, Turkey
| | - Gulce Ilhan
- Department of Horticulture, Faculty of Agriculture, Erzurum, Turkey
| | - Romina Alina Marc
- Food Engineering Department, Faculty of Food Science and Technology, University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania
| | - Dan Vodnar
- Institute of Life Sciences, Faculty of Food Science and Technology, University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania
| |
Collapse
|
5
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
6
|
Ali S, Liu X, Sen L, Lan D, Wang J, Hassan MI, Wang Y. Sequence and structure-based method to predict diacylglycerol lipases in protein sequence. Int J Biol Macromol 2021; 182:455-463. [PMID: 33836195 DOI: 10.1016/j.ijbiomac.2021.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/30/2021] [Accepted: 04/03/2021] [Indexed: 11/17/2022]
Abstract
Lipase enzymes play a central role in biotechnology and the food industry. Diacylglyceride lipases (DAG) have received considerable attention due to their physiological significance and potential industrial usage. However, compared to the wide application of triacylglycerol (TAG) lipases, DAG lipases have a limited application due to their low thermostability and specific activity. The molecular basis of substrate specificity of DAG lipases remains elusive, making structure-guided engineering of TAG to DAG lipase difficult. Besides, the number of available DAG lipases is limited compared to TAG lipases. In the current study, we identified structural consensus motifs of DAG lipases that contribute to their DAG specificity on a structural comparison of DAG and TAG lipases. To find potential DAG lipases, sequence motifs and predicted secondary structures were used to screen millions of protein sequences and predict new DAG lipases. In total, 83 new putative DAG lipases were identified. The predicted DAG lipases were validated by expression of randomly chosen putative DAG lipases followed by functional assay for their DAG and TAG specific activity. The reported method is efficient and cost-effective for discovering new DAG lipases used in the food industry after the required characterization to meet potential application needs.
Collapse
Affiliation(s)
- Shahid Ali
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| | - Xiaohui Liu
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| | - Lin Sen
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| | - Dongming Lan
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China
| | - Jiaqi Wang
- School of Pharmaceutical Sciences (Shenzhen), Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | - Md Imtiyaz Hassan
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi, India.
| | - Yonghua Wang
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, People's Republic of China.
| |
Collapse
|
7
|
Gangi Setty T, Sarkar A, Coombes D, Dobson RCJ, Subramanian R. Structure and Function of N-Acetylmannosamine Kinases from Pathogenic Bacteria. ACS OMEGA 2020; 5:30923-30936. [PMID: 33324800 PMCID: PMC7726757 DOI: 10.1021/acsomega.0c03699] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
Several pathogenic bacteria import and catabolize sialic acids as a source of carbon and nitrogen. Within the sialic acid catabolic pathway, the enzyme N-acetylmannosamine kinase (NanK) catalyzes the phosphorylation of N-acetylmannosamine to N-acetylmannosamine-6-phosphate. This kinase belongs to the ROK superfamily of enzymes, which generally contain a conserved zinc-finger (ZnF) motif that is important for their structure and function. Previous structural studies have shown that the ZnF motif is absent in NanK of Fusobacterium nucleatum (Fn-NanK), a Gram-negative bacterium that causes the gum disease gingivitis. However, the effect in loss of the ZnF motif on the kinase activity is unknown. Using kinetic and thermodynamic studies, we have studied the functional properties of Fn-NanK to its substrates ManNAc and ATP, compared its activity with other ZnF motif-containing NanK enzymes from closely related Gram-negative pathogenic bacteria Haemophilus influenzae (Hi-NanK), Pasteurella multocida (Pm-NanK), and Vibrio cholerae (Vc-NanK). Our studies show a 10-fold decrease in substrate binding affinity between Fn-NanK (apparent KM ≈ 700 μM) and ZnF motif-containing NanKs (apparent KM ≈ 60 μM). To understand the structural features that combat the loss of the ZnF motif in Fn-NanK, we solved the crystal structures of functionally homologous ZnF motif-containing NanKs from P. multocida and H. influenzae. Here, we report Pm-NanK:unliganded, Pm-NanK:AMPPNP, Pm-NanK:ManNAc, Hi-NanK:ManNAc, and Hi-NanK:ManNAc-6P:ADP crystal structures. Structural comparisons of Fn-NanK with Hi-NanK, Pm-NanK, and hMNK (human N-acetylmannosamine kinase domain of UDP-N-acetylglucosamine-2-epimerase/N-acetylmannosamine kinase, GNE) show that even though there is less sequence identity, they have high degree of structural similarity. Furthermore, our structural analyses highlight that the ZnF motif of Fn-NanK is substituted by a set of hydrophobic residues, which forms a hydrophobic cluster that helps the proper orientation of ManNac in the active site. In summary, ZnF-containing and ZnF-lacking NanK enzymes from different Gram-negative pathogenic bacteria are functionally very similar but differ in their metal requirement. Our structural studies unveil the structural modifications in Fn-NanK that compensate the loss of the ZnF motif in comparison to other NanK enzymes.
Collapse
Affiliation(s)
- Thanuja Gangi Setty
- Institute for Stem
Cell Science and Regenerative Medicine, GKVK Post, Bangalore, KA 560065, India
- The University of Trans-Disciplinary Health Sciences
& Technology (TDU), Bangalore, KA 560064, India
| | - Arunabha Sarkar
- National Centre for Biological Sciences − TIFR, Bangalore 560065, India
| | - David Coombes
- Biomolecular Interaction Centre and School
of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand
| | - Renwick C. J. Dobson
- Biomolecular Interaction Centre and School
of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand
- Bio21 Molecular Science and Biotechnology
Institute, Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Ramaswamy Subramanian
- Institute for Stem
Cell Science and Regenerative Medicine, GKVK Post, Bangalore, KA 560065, India
- Department of Biological
Sciences and Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| |
Collapse
|
8
|
Wu D, Liu A, Qu X, Liang J, Song M. Genome-wide identification, and phylogenetic and expression profiling analyses, of XTH gene families in Brassica rapa L. and Brassica oleracea L. BMC Genomics 2020; 21:782. [PMID: 33176678 PMCID: PMC7656703 DOI: 10.1186/s12864-020-07153-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/14/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Xyloglucan endotransglucosylase/hydrolase genes (XTHs) are a multigene family and play key roles in regulating cell wall extensibility in plant growth and development. Brassica rapa and Brassica oleracea contain XTHs, but detailed identification and characterization of the XTH family in these species, and analysis of their tissue expression profiles, have not previously been carried out. RESULTS In this study, 53 and 38 XTH genes were identified in B. rapa and B. oleracea respectively, which contained some novel members not observed in previous studies. All XTHs of B. rapa, B. oleracea and Arabidopsis thaliana could be classified into three groups, Group I/II, III and the Early diverging group, based on phylogenetic relationships. Gene structures and motif patterns were similar within each group. All XTHs in this study contained two characteristic conserved domains (Glyco_hydro and XET_C). XTHs are located mainly in the cell wall but some are also located in the cytoplasm. Analyses of the mechanisms of gene family expansion revealed that whole-genome triplication (WGT) events and tandem duplication (TD) may have been the major mechanisms accounting for the expansion of the XTH gene family. Interestingly, TD genes all belonged to Group I/II, suggesting that TD was the main reason for the largest number of genes being in these groups. B. oleracea had lost more of the XTH genes, the conserved domain XET_C and the conserved active-site motif EXDXE compared with B. rapa, consistent with asymmetrical evolution between the two Brassica genomes. A majority of XTH genes exhibited different tissue-specific expression patterns based on RNA-seq data analyses. Moreover, there was differential expression of duplicated XTH genes in the two species, indicating that their functional differentiation occurred after B. rapa and B. oleracea diverged from a common ancestor. CONCLUSIONS We carried out the first systematic analysis of XTH gene families in B. rapa and B. oleracea. The results of this investigation can be used for reference in further studies on the functions of XTH genes and the evolution of this multigene family.
Collapse
Affiliation(s)
- Di Wu
- Qufu Normal University, College of Life Science, Qufu, 273165, P.R. China
| | - Anqi Liu
- Qufu Normal University, College of Life Science, Qufu, 273165, P.R. China
| | - Xiaoyu Qu
- Qufu Normal University, College of Life Science, Qufu, 273165, P.R. China
| | - Jiayi Liang
- Qufu Normal University, College of Life Science, Qufu, 273165, P.R. China
| | - Min Song
- Qufu Normal University, College of Life Science, Qufu, 273165, P.R. China.
| |
Collapse
|
9
|
Liang Z, Li M, Liu Z, Wang J. Genome-wide identification and characterization of the Hsp70 gene family in allopolyploid rapeseed ( Brassica napus L.) compared with its diploid progenitors. PeerJ 2019; 7:e7511. [PMID: 31497395 PMCID: PMC6707343 DOI: 10.7717/peerj.7511] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 07/17/2019] [Indexed: 11/27/2022] Open
Abstract
Heat shock protein 70 (Hsp70) plays an essential role in plant growth and development, as well as stress response. Rapeseed (Brassica napus L.) originated from recently interspecific hybridization between Brassica rapa and Brassica oleracea. In this study, a total of 47 Hsp70 genes were identified in B. napus (AnAnCnCn genome), including 22 genes from An subgenome and 25 genes from Cn subgenome. Meanwhile, 29 and 20 Hsp70 genes were explored in B. rapa (ArAr genome) and B. oleracea (CoCo genome), respectively. Based on phylogenetic analysis, 114 Hsp70 proteins derived from B. napus, B. rapa, B. oleracea and Arabidopsis thaliana, were divided into 6 subfamilies containing 16 Ar-An and 11 Co-Cn reliable orthologous pairs. The homology and synteny analysis indicated whole genome triplication and segmental duplication may be the major contributor for the expansion of Hsp70 gene family. Intron gain of BnHsp70 genes and domain loss of BnHsp70 proteins also were found in B. napus, associating with intron evolution and module evolution of proteins after allopolyploidization. In addition, transcriptional profiles analyses indicated that expression patterns of most BnHsp70 genes were tissue-specific. Moreover, Hsp70 orthologs exhibited different expression patterns in the same tissue and Cn subgenome biased expression was observed in leaf. These findings contribute to exploration of the evolutionary adaptation of polyploidy and will facilitate further application of BnHsp70 gene functions.
Collapse
Affiliation(s)
- Ziwei Liang
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Mengdi Li
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Zhengyi Liu
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| | - Jianbo Wang
- State Key Laboratory of Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan, China
| |
Collapse
|
10
|
Thomas BT, Ogunkanmi LA, Iwalokun BA, Popoola OD. Transition-transversion mutations in the polyketide synthase gene of Aspergillus section Nigri. Heliyon 2019; 5:e01881. [PMID: 31338447 PMCID: PMC6579908 DOI: 10.1016/j.heliyon.2019.e01881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/25/2019] [Accepted: 05/30/2019] [Indexed: 11/21/2022] Open
Abstract
This study determined the transition-transversion mutation in the pks gene of Aspergillus section Nigri in order to gain insight into the patterns of nucleotide base substitution and the process of molecular evolution using standard recommended techniques. Results obtained depict frequent occurrence of transition (23 ± 0.96) than transversion (11.37 ± 1.38) (p < 0.05) with C/T being the most frequently observed transitional base substitution and C/A the most frequently occurring transversional base change. The number of single base insertions (56 ± 1.00) were significantly higher than the observed single base deletions (38 ± 2.00) (p < 0.05) while varying degrees of two or more base deletions and insertions were also observed both inside and outside the open reading frame. The maximum likelihood value estimated for the pks gene was calculated to be -9458.80 in 423 positions of the final dataset while the transition-transversion ratio was estimated to be 0.50. The Tajima's neutrality test approaches seven (7) with the nucleotide diversity estimated to be approximately 65%. Evolutionary test depicts positive selection as ratio of non synonymous to synonymous divergence was found to be greater than ratio of the number of non synonymous to synonymous polymorphisms. The proportion of substitution driven by positive selection was calculated to be approximately 96.2%. This research therefore provides an insight into the understanding of pks gene mutation patterns as some of the observed indels resulted in frame shift mutations.
Collapse
Affiliation(s)
- Benjamin Thoha Thomas
- Department of Microbiology, Olabisi Onabanjo University, Ago Iwoye, Ogun State, Nigeria
| | | | - Bamidele Abiodun Iwalokun
- Division of Molecular Biology and Biotechnology, Nigeria Institute of Medical Research, Yaba, Lagos, Nigeria
| | | |
Collapse
|
11
|
Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018; 86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]
Abstract
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
Collapse
|
12
|
Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013; 13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. RESULTS Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. CONCLUSIONS We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.
Collapse
Affiliation(s)
- Pravech Ajawatanawong
- Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden.
| | | |
Collapse
|
13
|
Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012; 29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30-80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in regions where amino acids are exposed to solvent. We show that duplicate genes with high indel density also show high DNA sequence divergence. Indel density, but not amino acid divergence, can explain a large proportion of the tertiary structure divergence between proteins encoded by duplicate genes. Our observations are consistent across all five fish species. Taken together, they suggest a general pattern of duplicate gene evolution in which indels are important driving forces of evolutionary change.
Collapse
Affiliation(s)
- Baocheng Guo
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
14
|
Zhang Z, Xing C, Wang L, Gong B, Liu H. IndelFR: a database of indels in protein structures and their flanking regions. Nucleic Acids Res 2011; 40:D512-8. [PMID: 22127860 PMCID: PMC3245007 DOI: 10.1093/nar/gkr1107] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Insertion/deletion (indel) is one of the most common methods of protein sequence variation. Recent studies showed that indels could affect their flanking regions and they are important for protein function and evolution. Here, we describe the Indel Flanking Region Database (IndelFR, http://indel.bioinfo.sdu.edu.cn), which provides sequence and structure information about indels and their flanking regions in known protein domains. The indels were obtained through the pairwise alignment of homologous structures in SCOP superfamilies. The IndelFR database contains 2,925,017 indels with flanking regions extracted from 373,402 structural alignment pairs of 12,573 non-redundant domains from 1053 superfamilies. IndelFR provides access to information about indels and their flanking regions, including amino acid sequences, lengths, locations, secondary structure constitutions, hydrophilicity/hydrophobicity, domain information, 3D structures and so on. IndelFR has already been used for molecular evolution studies and may help to promote future functional studies of indels and their flanking regions.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan 250100, China
| | | | | | | | | |
Collapse
|
15
|
Giacopuzzi E, Barlati S, Preti A, Venerando B, Monti E, Borsani G, Bresciani R. Gallus gallus NEU3 sialidase as model to study protein evolution mechanism based on rapid evolving loops. BMC BIOCHEMISTRY 2011; 12:45. [PMID: 21861893 PMCID: PMC3179935 DOI: 10.1186/1471-2091-12-45] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Accepted: 08/23/2011] [Indexed: 11/10/2022]
Abstract
BACKGROUND Large surface loops contained within compact protein structures and not involved in catalytic process have been proposed as preferred regions for protein family evolution. These loops are subjected to lower sequence constraints and can evolve rapidly in novel structural variants. A good model to study this hypothesis is represented by sialidase enzymes. Indeed, the structure of sialidases is a β-propeller composed by anti-parallel β-sheets connected by loops that suit well with the rapid evolving loop hypothesis. These features prompted us to extend our studies on this protein family in birds, to get insights on the evolution of this class of glycohydrolases. RESULTS Gallus gallus (Gg) genome contains one NEU3 gene encoding a protein with a unique 188 amino acid sequence mainly constituted by a peptide motif repeated six times in tandem with no homology with any other known protein sequence. The repeat region is located at the same position as the roughly 80 amino acid loop characteristic of mammalian NEU4. Based on molecular modeling, all these sequences represent a connecting loop between the first two highly conserved β-strands of the fifth blade of the sialidase β-propeller. Moreover this loop is highly variable in sequence and size in NEU3 sialidases from other vertebrates. Finally, we found that the general enzymatic properties and subcellular localization of Gg NEU3 are not influenced by the deletion of the repeat sequence. CONCLUSION In this study we demonstrated that sialidase protein structure contains a surface loop, highly variable both in sequence and size, connecting two conserved β-sheets and emerging on the opposite site of the catalytic crevice. These data confirm that sialidase family can serve as suitable model for the study of the evolutionary process based on rapid evolving loops, which may had occurred in sialidases. Giving the peculiar organization of the loop region identified in Gg NEU3, this protein can be considered of particular interest in such evolutionary studies and to get deeper insights in sialidase evolution.
Collapse
Affiliation(s)
- Edoardo Giacopuzzi
- Department of Biomedical Sciences and Biotechnology, Unit of Biology and Genetics, University of Brescia, viale Europa 11, Brescia 25123, Italy
| | - Sergio Barlati
- Department of Biomedical Sciences and Biotechnology, Unit of Biology and Genetics, University of Brescia, viale Europa 11, Brescia 25123, Italy
| | - Augusto Preti
- Department of Biomedical Sciences and Biotechnology, Unit of Biochemistry and Clinical Chemistry, University of Brescia, viale Europa 11, Brescia 25123, Italy
| | - Bruno Venerando
- Department of Medical Chemistry, Biochemistry and Biotechnology, L.I.T.A., University of Milano, Via F.lli Cervi 93, Segrate 20090, Italy
| | - Eugenio Monti
- Department of Biomedical Sciences and Biotechnology, Unit of Biochemistry and Clinical Chemistry, University of Brescia, viale Europa 11, Brescia 25123, Italy
| | - Giuseppe Borsani
- Department of Biomedical Sciences and Biotechnology, Unit of Biology and Genetics, University of Brescia, viale Europa 11, Brescia 25123, Italy
| | - Roberto Bresciani
- Department of Biomedical Sciences and Biotechnology, Unit of Biochemistry and Clinical Chemistry, University of Brescia, viale Europa 11, Brescia 25123, Italy
| |
Collapse
|
16
|
Paśko Ł, Ericson PGP, Elzanowski A. Phylogenetic utility and evolution of indels: a study in neognathous birds. Mol Phylogenet Evol 2011; 61:760-71. [PMID: 21843647 DOI: 10.1016/j.ympev.2011.07.021] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 07/28/2011] [Accepted: 07/30/2011] [Indexed: 11/25/2022]
Abstract
Indels are increasingly used in phylogenetics and play a major role in genome size evolution, and yet both the phylogenetic information content of indels and their evolutionary significance remain to be better assessed. Using three presumably independently evolving nuclear gene fragments (28S rDNA, β-fibrinogen, ornithine decarboxylase) from 29 families of neognathous birds, we have obtained a topology that is in general agreement with the current molecular consensus tree, supports the monophyly of Metaves, and provides evidence for the unresolved relationships within the Charadriiformes. Based on the retrieved topology, we assess the relative impact of indels and nucleotide substitutions and demonstrate that the superposition of the two kinds of data yields a topology that could not be obtained from either data set alone. Although only two out of three gene fragments reveal the deletion bias, the combined nucleotide insertion-to-deletion ratio is 0.22, indicating a rapid decrease of intron length. The average indel fixation rate in the neognaths is 2.5 times faster than that in therian (placental) mammals of similar geologic age. As in mammals, there is a considerable variation of indel fixation rate that is 1.5 times higher in Galloanseres compared to Neoaves, and 2.4 times higher in the Rallidae compared to the average for Neoaves (8.2 times higher compared to the related Gruidae). Our results add to the evidence that indel fixation rates correlate with lineage-specific evolutionary rates.
Collapse
Affiliation(s)
- Łukasz Paśko
- Institute of Zoology, University of Wrocław, 21 Sienkiewicz Street, PL-50-335 Wrocław, Poland
| | | | | |
Collapse
|
17
|
Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2011; 18:1522-35. [PMID: 21070951 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]
Abstract
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.
Collapse
Affiliation(s)
- Benoit H Dessailly
- Department of Structural and Molecular Biology, University College of London, Gower Street, London WC1E6BT, UK.
| | | | | | | |
Collapse
|
18
|
Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proc Natl Acad Sci U S A 2010; 107:20352-7. [PMID: 21048085 DOI: 10.1073/pnas.1012999107] [Citation(s) in RCA: 140] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The main principles of protein-protein recognition are elucidated by the studies of homooligomers which in turn mediate and regulate gene expression, activity of enzymes, ion channels, receptors, and cell-cell adhesion processes. Here we explore oligomeric states of homologous proteins in various organisms to better understand the functional roles and evolutionary mechanisms of homooligomerization. We observe a great diversity in mechanisms controlling oligomerization and focus in our study on insertions and deletions in homologous proteins and how they enable or disable complex formation. We show that insertions and deletions which differentiate monomers and dimers have a significant tendency to be located on the interaction interfaces and about a quarter of all proteins studied and forty percent of enzymes have regions which mediate or disrupt the formation of oligomers. We suggest that relatively small insertions or deletions may have a profound effect on complex stability and/or specificity. Indeed removal of complex enabling regions from protein structures in many cases resulted in the complete or partial loss of stability. Moreover, we find that insertions and deletions modulating oligomerization have a lower aggregation propensity and contain a larger fraction of polar, charged residues, glycine and proline compared to conventional interfaces and protein surface. Most likely, these regions may mediate specific interactions, prevent nonspecific dysfunctional aggregation and preclude undesired interactions between close paralogs therefore separating their functional pathways. Last, we show how the presence or absence of insertions and deletions on interfaces might be of practical value in annotating protein oligomeric states.
Collapse
|
19
|
Höhne M, Schätzle S, Jochens H, Robins K, Bornscheuer UT. Rational assignment of key motifs for function guides in silico enzyme identification. Nat Chem Biol 2010; 6:807-13. [PMID: 20871599 DOI: 10.1038/nchembio.447] [Citation(s) in RCA: 284] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 08/23/2010] [Indexed: 11/09/2022]
Abstract
Biocatalysis has emerged as a powerful alternative to traditional chemistry, especially for asymmetric synthesis. One key requirement during process development is the discovery of a biocatalyst with an appropriate enantiopreference and enantioselectivity, which can be achieved, for instance, by protein engineering or screening of metagenome libraries. We have developed an in silico strategy for a sequence-based prediction of substrate specificity and enantiopreference. First, we used rational protein design to predict key amino acid substitutions that indicate the desired activity. Then, we searched protein databases for proteins already carrying these mutations instead of constructing the corresponding mutants in the laboratory. This methodology exploits the fact that naturally evolved proteins have undergone selection over millions of years, which has resulted in highly optimized catalysts. Using this in silico approach, we have discovered 17 (R)-selective amine transaminases, which catalyzed the synthesis of several (R)-amines with excellent optical purity up to >99% enantiomeric excess.
Collapse
Affiliation(s)
- Matthias Höhne
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, Greifswald University, Greifswald, Germany
| | | | | | | | | |
Collapse
|
20
|
Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol 2010; 28:291-301. [PMID: 20671041 DOI: 10.1093/molbev/msq196] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Amino acid substitution and insertions/deletions (indels) are two common events in protein evolution; however, current knowledge on indels is limited. In this study, we investigated the effects of indels on the flanking regions in protein structure superfamilies. Comprehensive analysis of structural classification of proteins superfamilies revealed that indels lead to a series of changes in the flanking regions, including the following: 1) structural shift in the tertiary structure, with a first-order exponential decay relation between structural shift and the distance to indels, 2) instability of the secondary structure elements in which parts of the α helix and β sheet are destroyed, and 3) an increase in the amino acid substitution rate of the primary structure and the nonsimilar amino acid substitution rate. In general, these quality changes are due to the combined effects of the "regional-inherent effect," "indel-accompanied effect," and "indel-following effect." Furthermore, these quality changes reflect changes in selective pressure. Indels are more likely to be preserved in regions with low selective pressure, and indels can further reduce the selective pressure on the flanking regions. These findings improve our understanding of the role of indels in protein evolution.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, China
| | | | | | | | | |
Collapse
|
21
|
Hashimoto K, Madej T, Bryant SH, Panchenko AR. Functional states of homooligomers: insights from the evolution of glycosyltransferases. J Mol Biol 2010; 399:196-206. [PMID: 20381499 DOI: 10.1016/j.jmb.2010.03.059] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 03/29/2010] [Accepted: 03/29/2010] [Indexed: 02/02/2023]
Abstract
Glycosylation is an important aspect of epigenetic regulation. Glycosyltransferase is a key enzyme in the biosynthesis of glycans, which glycosylates more than half of all proteins in eukaryotes and is involved in a wide range of biological processes. It has been suggested previously that homooligomerization in glycosyltransferases and other proteins might be crucial for their function. In this study, we explore functional homooligomeric states of glycosyltransferases in various organisms, trace their evolution, and perform comparative analyses to find structural features that can mediate or disrupt the formation of different homooligomers. First, we make a structure-based classification of the diverse superfamily of glycosyltransferases and confirm that the majority of the structures are indeed clustered into the GT-A or GT-B folds. We find that homooligomeric glycosyltransferases appear to be as ancient as monomeric glycosyltransferases and go back in evolution to the last universal common ancestor (LUCA). Moreover, we show that interface residues have significant bias to be gapped out or unaligned in the monomers, implying that they might represent features crucial for oligomer formation. Structural analysis of these features reveals that the majority of them represent loops, terminal regions, and helices, indicating that these secondary-structure elements mediate the formation of glycosyltransferases' homooligomers and directly contribute to the specific binding. We also observe relatively short protein regions that disrupt the homodimer interactions, although such cases are rare. These results suggest that relatively small structural changes in the nonconserved regions may contribute to the formation of different functional oligomeric states and might be important in regulation of enzyme activity through homooligomerization.
Collapse
Affiliation(s)
- Kosuke Hashimoto
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Building 38A 8S814, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
22
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009; 18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
23
|
Sandhya S, Rani SS, Pankaj B, Govind MK, Offmann B, Srinivasan N, Sowdhamini R. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One 2009; 4:e4981. [PMID: 19333395 PMCID: PMC2659687 DOI: 10.1371/journal.pone.0004981] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2008] [Accepted: 02/26/2009] [Indexed: 11/24/2022] Open
Abstract
Background Related protein domains of a superfamily can be specified by proteins of diverse lengths. The structural and functional implications of indels in a domain scaffold have been examined. Methodology In this study, domain superfamilies with large length variations (more than 30% difference from average domain size, referred as ‘length-deviant’ superfamilies and ‘length-rigid’ domain superfamilies (<10% length difference from average domain size) were analyzed for the functional impact of such structural differences. Our delineated dataset, derived from an objective algorithm, enables us to address indel roles in the presence of peculiar structural repeats, functional variation, protein-protein interactions and to examine ‘domain contexts’ of proteins tolerant to large length variations. Amongst the top-10 length-deviant superfamilies analyzed, we found that 80% of length-deviant superfamilies possess distant internal structural repeats and nearly half of them acquired diverse biological functions. In general, length-deviant superfamilies have higher chance, than length-rigid superfamilies, to be engaged in internal structural repeats. We also found that ∼40% of length-deviant domains exist as multi-domain proteins involving interactions with domains from the same or other superfamilies. Indels, in diverse domain superfamilies, were found to participate in the accretion of structural and functional features amongst related domains. With specific examples, we discuss how indels are involved directly or indirectly in the generation of oligomerization interfaces, introduction of substrate specificity, regulation of protein function and stability. Conclusions Our data suggests a multitude of roles for indels that are specialized for domain members of different domain superfamilies. These specialist roles that we observe and trends in the extent of length variation could influence decision making in modeling of new superfamily members. Likewise, the observed limits of length variation, specific for each domain superfamily would be particularly relevant in the choice of alignment length search filters commonly applied in protein sequence analysis.
Collapse
Affiliation(s)
- Sankaran Sandhya
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore, India
| | - Saane Sudha Rani
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore, India
| | - Barah Pankaj
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore, India
| | | | - Bernard Offmann
- Laboratoire de Biochimie et Génétique Moléculaire BP 7151, Université de La Réunion, La Réunion, France
| | | | - Ramanathan Sowdhamini
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bangalore, India
- * E-mail:
| |
Collapse
|
24
|
Wang Z, Martin J, Abubucker S, Yin Y, Gasser RB, Mitreva M. Systematic analysis of insertions and deletions specific to nematode proteins and their proposed functional and evolutionary relevance. BMC Evol Biol 2009; 9:23. [PMID: 19175938 PMCID: PMC2644674 DOI: 10.1186/1471-2148-9-23] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 01/28/2009] [Indexed: 11/25/2022] Open
Abstract
Background Amino acid insertions and deletions in proteins are considered relatively rare events, and their associations with the evolution and adaptation of organisms are not yet understood. In this study, we undertook a systematic analysis of over 214,000 polypeptides from 32 nematode species and identified insertions and deletions unique to nematode proteins in more than 1000 families and provided indirect evidence that these alterations are linked to the evolution and adaptation of nematodes. Results Amino acid alterations in sequences of nematodes were identified by comparison with homologous sequences from a wide range of eukaryotic (metzoan) organisms. This comparison revealed that the proteins inferred from transcriptomic datasets for nematodes contained more deletions than insertions, and that the deletions tended to be larger in length than insertions, indicating a decreased size of the transcriptome of nematodes compared with other organisms. The present findings showed that this reduction is more pronounced in parasitic nematodes compared with the free-living nematodes of the genus Caenorhabditis. Consistent with a requirement for conservation in proteins involved in the processing of genetic information, fewer insertions and deletions were detected in such proteins. On the other hand, more insertions and deletions were recorded for proteins inferred to be involved in the endocrine and immune systems, suggesting a link with adaptation. Similarly, proteins involved in multiple cellular pathways tended to display more deletions and insertions than those involved in a single pathway. The number of insertions and deletions shared by a range of plant parasitic nematodes were higher for proteins involved in lipid metabolism and electron transport compared with other nematodes, suggesting an association between metabolic adaptation and parasitism in plant hosts. We also identified three sizable deletions from proteins found to be specific to and shared by parasitic nematodes, which, given their uniqueness, might serve as target candidates for drug design. Conclusion This study illustrates the significance of using comparative genomics approaches to identify molecular elements unique to parasitic nematodes, which have adapted to a particular host organism and mode of existence during evolution. While the focus of this study was on nematodes, the approach has applicability to a wide range of other groups of organisms.
Collapse
Affiliation(s)
- Zhengyuan Wang
- The Genome Center, Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA.
| | | | | | | | | | | |
Collapse
|
25
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|
26
|
Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 2008; 9:353. [PMID: 18752676 PMCID: PMC2542381 DOI: 10.1186/1471-2105-9-353] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 08/27/2008] [Indexed: 01/21/2023] Open
Abstract
Background Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Results Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. Conclusion MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.
Collapse
Affiliation(s)
- Alinda Nagy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, H-1113 Budapest, Hungary.
| | | | | | | | | | | | | |
Collapse
|
27
|
Sandhya S, Pankaj B, Govind MK, Offmann B, Srinivasan N, Sowdhamini R. CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations. BMC STRUCTURAL BIOLOGY 2008; 8:28. [PMID: 18513436 PMCID: PMC2423364 DOI: 10.1186/1472-6807-8-28] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 05/31/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. In this paper, we describe an analysis of indel regions that accommodate length variations amongst related proteins. We have developed an algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. Further, we have used the method to characterize the length, structural type and biochemical features of indels in related protein domains. RESULTS CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure. On a non-redundant dataset of 353 domain superfamily alignments from PASS2, we find that 'length- deviant' protein superfamilies show > 30% length variation from their average domain length. 60% of additional lengths that occur in indels are short-length structures (< 5 residues) while 6% of indels are > 15 residues in length. Structural types in indels also show class-specific trends. CONCLUSION The extent of length variation varies across different superfamilies and indels show class-specific trends for preferred lengths and structural types. Such indels of different lengths even within a single protein domain superfamily could have structural and functional consequences that drive their selection, underlying their importance in similarity detection and computational modelling. The availability of systematic algorithms, like CUSP, should enable decision making in a domain superfamily-specific manner.
Collapse
Affiliation(s)
- Sankaran Sandhya
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India.
| | | | | | | | | | | |
Collapse
|
28
|
Jiang H, Blouin C. Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions. BMC Bioinformatics 2007; 8:444. [PMID: 18005425 PMCID: PMC2225427 DOI: 10.1186/1471-2105-8-444] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Accepted: 11/15/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families. RESULTS Phylogenies were reconstructed from protein 3D structural data. The phylogenetic trees were used to infer ancestral structures with a consensus method. From these ancestral reconstructions, 42.7% of the observed insertions are nested insertions, which locate in previous insert regions. The average size of inserts tends to increase with the insert rank or total number of insertions in the variable regions. We found that the structures of some nested inserts show complex or even domain-like fold patterns with helices, strands and loops. Furthermore, a basal level of structural innovation was found in inserts which displayed a significant structural similarity exclusively to themselves. The beta-Lactamase/D-ala carboxypeptidase domain family is provided as an example to illustrate the inference of insertion events, and how the incremental growth of a variable region is capable to generate novel structural patterns. CONCLUSION Using 3D data, we proposed a method to reconstruct phylogenies. We applied the method to reconstruct the sequences of insertion events leading to the emergence of potentially novel structural elements within existing protein domains. The results suggest that structural innovation is possible via the stochastic process of insertions and rapid evolution within variable regions where inserts tend to be nested. We also demonstrate that the structure-based phylogeny enables the study of new questions relating to the evolution of protein domain and biological function.
Collapse
Affiliation(s)
- Haiyan Jiang
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, B3H 1W5, Canada.
| | | |
Collapse
|