1
|
Gao B, Zhu S. The evolutionary novelty of insect defensins: from bacterial killing to toxin neutralization. Cell Mol Life Sci 2024; 81:230. [PMID: 38780625 PMCID: PMC11116330 DOI: 10.1007/s00018-024-05273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/05/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024]
Abstract
Insect host defense comprises two complementary dimensions, microbial killing-mediated resistance and microbial toxin neutralization-mediated resilience, both jointly providing protection against pathogen infections. Insect defensins are a class of effectors of innate immunity primarily responsible for resistance to Gram-positive bacteria. Here, we report a newly originated gene from an ancestral defensin via genetic deletion following gene duplication in Drosophila virilis, which confers an enhanced resilience to Gram-positive bacterial infection. This gene encodes an 18-mer arginine-rich peptide (termed DvirARP) with differences from its parent gene in its pattern of expression, structure and function. DvirARP specifically expresses in D. virilis female adults with a constitutive manner. It adopts a novel fold with a 310 helix and a two CXC motif-containing loop stabilized by two disulfide bridges. DvirARP exhibits no activity on the majority of microorganisms tested and only a weak activity against two Gram-positive bacteria. DvirARP knockout flies are viable and have no obvious defect in reproductivity but they are more susceptible to the DvirARP-resistant Staphylococcus aureus infection than the wild type files, which can be attributable to its ability in neutralization of the S. aureus secreted toxins. Phylogenetic distribution analysis reveals that DvirARP is restrictedly present in the Drosophila subgenus, but independent deletion variations also occur in defensins from the Sophophora subgenus, in support of the evolvability of this class of immune effectors. Our work illustrates for the first time how a duplicate resistance-mediated gene evolves an ability to increase the resilience of a subset of Drosophila species against bacterial infection.
Collapse
Affiliation(s)
- Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
2
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
3
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Gao B, Zhu S. Mutation-driven parallel evolution in emergence of ACE2-utilizing sarbecoviruses. Front Microbiol 2023; 14:1118025. [PMID: 36910184 PMCID: PMC9996049 DOI: 10.3389/fmicb.2023.1118025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/01/2023] [Indexed: 02/25/2023] Open
Abstract
Mutation and recombination are two major genetic mechanisms that drive the evolution of viruses. They both exert an interplay during virus evolution, in which mutations provide a first ancestral source of genetic diversity for subsequent recombination. Sarbecoviruses are a group of evolutionarily related β-coronaviruses including human severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-CoV-2 and a trove of related animal viruses called SARS-like CoVs (SL-CoVs). This group of members either use or not use angiotensin-converting enzyme 2 (ACE2) as their entry receptor, which has been linked to the properties of their spike protein receptor binding domains (RBDs). This raises an outstanding question regarding how ACE2 binding originated within sarbecoviruses. Using a combination of analyses of phylogenies, ancestral sequences, structures, functions and molecular dynamics, we provide evidence in favor of an evolutionary scenario, in which three distinct ancestral RBDs independently developed the ACE2 binding trait via parallel amino acid mutations. In this process, evolutionary intermediate RBDs might be firstly formed through loop extensions to offer key functional residues accompanying point mutations to remove energetically unfavorable interactions and to change the dynamics of the functional loops, all required for ACE2 binding. Subsequent optimization in the context of evolutionary intermediates led to the independent emergence of ACE2-binding RBDs in the SARS-CoV and SARS-CoV-2 clades of Asian origin and the clade comprising SL-CoVs of European and African descent. These findings will help enhance our understanding of mutation-driven evolution of sarbecoviruses in their early history.
Collapse
Affiliation(s)
- Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
5
|
Gu J, Isozumi N, Gao B, Ohki S, Zhu S. Mutation-driven evolution of antibacterial function in an ancestral antifungal scaffold: Significance for peptide engineering. Front Microbiol 2022; 13:1053078. [PMID: 36532476 PMCID: PMC9751787 DOI: 10.3389/fmicb.2022.1053078] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 11/15/2022] [Indexed: 07/02/2024] Open
Abstract
Mutation-driven evolution of novel function on an old gene has been documented in many development- and adaptive immunity-related genes but is poorly understood in immune effector molecules. Drosomycin-type antifungal peptides (DTAFPs) are a family of defensin-type effectors found in plants and ecdysozoans. Their primitive function was to control fungal infection and then co-opted for fighting against bacterial infection in plants, insects, and nematodes. This provides a model to study the structural and evolutionary mechanisms behind such functional diversification. In the present study, we determined the solution structure of mehamycin, a DTAFP from the Northern root-knot nematode Meloidogyne hapla with antibacterial activity and an 18-mer insert, and studied the mutational effect through using a mutant with the insert deleted. Mehamycin adopts an expected cysteine-stabilized α-helix and β-sheet fold in its core scaffold and the inserted region, called single Disulfide Bridge-linked Domain (abbreviated as sDBD), forms an extended loop protruding from the scaffold. The latter folds into an amphipathic architecture stabilized by one disulfide bridge, which likely confers mehamycin a bacterial membrane permeability. Deletion of the sDBD remarkably decreased the ability but accompanying an increase in thermostability, indicative of a structure-function trade-off in the mehamycin evolution. Allosteric analysis revealed an interior interaction between the two domains, which might promote point mutations at some key sites of the core domain and ultimately give rise to the emergence of antibacterial function. Our work may be valuable in guiding protein engineering of mehamycin to improve its activity and stability.
Collapse
Affiliation(s)
- Jing Gu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Noriyoshi Isozumi
- Center for Nano Materials and Technology (CNMT), Japan Advanced Institute of Science and Technology (JAIST), Nomi, Ishikawa, Japan
| | - Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shinya Ohki
- Center for Nano Materials and Technology (CNMT), Japan Advanced Institute of Science and Technology (JAIST), Nomi, Ishikawa, Japan
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Jilani M, Turcan A, Haspel N, Jagodzinski F. Elucidating the Structural Impacts of Protein InDels. Biomolecules 2022; 12:1435. [PMID: 36291643 PMCID: PMC9599607 DOI: 10.3390/biom12101435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 09/17/2023] Open
Abstract
The effects of amino acid insertions and deletions (InDels) remain a rather under-explored area of structural biology. These variations oftentimes are the cause of numerous disease phenotypes. In spite of this, research to study InDels and their structural significance remains limited, primarily due to a lack of experimental information and computational methods. In this work, we fill this gap by modeling InDels computationally; we investigate the rigidity differences between the wildtype and a mutant variant with one or more InDels. Further, we compare how structural effects due to InDels differ from the effects of amino acid substitutions, which are another type of amino acid mutation. We finish by performing a correlation analysis between our rigidity-based metrics and wet lab data for their ability to infer the effects of InDels on protein fitness.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Alistair Turcan
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Filip Jagodzinski
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| |
Collapse
|
7
|
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:85-110. [PMID: 34813061 DOI: 10.1007/978-1-0716-1826-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.
Collapse
|
8
|
Li DD, Wang JL, Liu Y, Li YZ, Zhang Z. Expanded analyses of the functional correlations within structural classifications of glycoside hydrolases. Comput Struct Biotechnol J 2021; 19:5931-5942. [PMID: 34849197 PMCID: PMC8602953 DOI: 10.1016/j.csbj.2021.10.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/30/2021] [Accepted: 10/30/2021] [Indexed: 01/01/2023] Open
Abstract
Glycoside hydrolases (GHs) are greatly diverse in sequences and functions, but systematic studies of GH relationships based on structural information are lacking. Here, we report that GHs have multiple evolutionary origins and are structurally derived from 27 homologous superfamilies and 16 folds, but GHs are highly biased to distribute in a few superfamilies and folds. Six of these superfamilies are widely encoded by archaea, bacteria, and eukaryotes, indicating that they may be the most ancient in origin. Most superfamilies vary in enzyme function, and some, such as the superfamilies of (β/α)8-barrel and (α/α)6-barrel structures, exhibit extreme functional diversity; this is highly positively correlated with sequence diversity. More than one-third of glycosidase activities show a phenomenon of convergent evolution, especially the degradation functions of GHs on polysaccharides. The GHs of most superfamilies have relatively narrow environmental distributions, normally with the highest abundance in host-associated environments and a distribution preference for moderate low-temperature and acidic environments. Overall, our expanded analysis facilitates an understanding of complex GH sequence-structure-function relationships and may guide our screening and engineering of GHs.
Collapse
Affiliation(s)
- Dan-Dan Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Jin-Lan Wang
- National Administration of Health Data, Jinan 250002, China
| | - Ya Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Yue-Zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China.,Suzhou Research Institute, Shandong University, Suzhou 215123, China
| |
Collapse
|
9
|
Gangi Setty T, Sarkar A, Coombes D, Dobson RCJ, Subramanian R. Structure and Function of N-Acetylmannosamine Kinases from Pathogenic Bacteria. ACS OMEGA 2020; 5:30923-30936. [PMID: 33324800 PMCID: PMC7726757 DOI: 10.1021/acsomega.0c03699] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
Several pathogenic bacteria import and catabolize sialic acids as a source of carbon and nitrogen. Within the sialic acid catabolic pathway, the enzyme N-acetylmannosamine kinase (NanK) catalyzes the phosphorylation of N-acetylmannosamine to N-acetylmannosamine-6-phosphate. This kinase belongs to the ROK superfamily of enzymes, which generally contain a conserved zinc-finger (ZnF) motif that is important for their structure and function. Previous structural studies have shown that the ZnF motif is absent in NanK of Fusobacterium nucleatum (Fn-NanK), a Gram-negative bacterium that causes the gum disease gingivitis. However, the effect in loss of the ZnF motif on the kinase activity is unknown. Using kinetic and thermodynamic studies, we have studied the functional properties of Fn-NanK to its substrates ManNAc and ATP, compared its activity with other ZnF motif-containing NanK enzymes from closely related Gram-negative pathogenic bacteria Haemophilus influenzae (Hi-NanK), Pasteurella multocida (Pm-NanK), and Vibrio cholerae (Vc-NanK). Our studies show a 10-fold decrease in substrate binding affinity between Fn-NanK (apparent KM ≈ 700 μM) and ZnF motif-containing NanKs (apparent KM ≈ 60 μM). To understand the structural features that combat the loss of the ZnF motif in Fn-NanK, we solved the crystal structures of functionally homologous ZnF motif-containing NanKs from P. multocida and H. influenzae. Here, we report Pm-NanK:unliganded, Pm-NanK:AMPPNP, Pm-NanK:ManNAc, Hi-NanK:ManNAc, and Hi-NanK:ManNAc-6P:ADP crystal structures. Structural comparisons of Fn-NanK with Hi-NanK, Pm-NanK, and hMNK (human N-acetylmannosamine kinase domain of UDP-N-acetylglucosamine-2-epimerase/N-acetylmannosamine kinase, GNE) show that even though there is less sequence identity, they have high degree of structural similarity. Furthermore, our structural analyses highlight that the ZnF motif of Fn-NanK is substituted by a set of hydrophobic residues, which forms a hydrophobic cluster that helps the proper orientation of ManNac in the active site. In summary, ZnF-containing and ZnF-lacking NanK enzymes from different Gram-negative pathogenic bacteria are functionally very similar but differ in their metal requirement. Our structural studies unveil the structural modifications in Fn-NanK that compensate the loss of the ZnF motif in comparison to other NanK enzymes.
Collapse
Affiliation(s)
- Thanuja Gangi Setty
- Institute for Stem
Cell Science and Regenerative Medicine, GKVK Post, Bangalore, KA 560065, India
- The University of Trans-Disciplinary Health Sciences
& Technology (TDU), Bangalore, KA 560064, India
| | - Arunabha Sarkar
- National Centre for Biological Sciences − TIFR, Bangalore 560065, India
| | - David Coombes
- Biomolecular Interaction Centre and School
of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand
| | - Renwick C. J. Dobson
- Biomolecular Interaction Centre and School
of Biological Sciences, University of Canterbury, Christchurch 8140, New Zealand
- Bio21 Molecular Science and Biotechnology
Institute, Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Ramaswamy Subramanian
- Institute for Stem
Cell Science and Regenerative Medicine, GKVK Post, Bangalore, KA 560065, India
- Department of Biological
Sciences and Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| |
Collapse
|
10
|
Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020; 27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. OBJECTIVE In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. METHODS Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. RESULTS We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. CONCLUSION The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yilei Fu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
11
|
Lang SA, McIlroy P, Shain DH. Structural Evolution of the Glacier Ice Worm F o ATP Synthase Complex. Protein J 2020; 39:152-159. [PMID: 32112190 DOI: 10.1007/s10930-020-09889-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The segmented annelid worm, Mesenchytraeus solifugus, is a permanent resident of temperate, maritime glaciers in the Pacific northwestern region of North America, displaying atypically high intracellular ATP levels which have been linked to its unusual ability to thrive in hydrated glacier ice. We have shown previously that ice worms contain a highly basic, carboxy terminal extension on their ATP6 regulatory subunit, likely acquired by horizontal gene transfer from a microbial dietary source. Here we examine the full complement of F1F0 ATP synthase structural subunits with attention to non-conservative, ice worm-specific structural modifications. Our genomics analyses and molecular models identify putative proton shuttling domains on either side of the F0 hemichannel, which predictably function to enhance proton flow across the mitochondrial membrane. Other components of the ice worm ATP synthase complex have remained largely unchanged in the context of Metazoan evolution.
Collapse
Affiliation(s)
- Shirley A Lang
- Department of Biology, Haverford College, Haverford, PA, 19041, USA
| | - Patrick McIlroy
- Department of Biology and Center for Computational and Integrative Biology, Rutgers The State University of New Jersey, Camden, NJ, 08102, USA
| | - Daniel H Shain
- Department of Biology and Center for Computational and Integrative Biology, Rutgers The State University of New Jersey, Camden, NJ, 08102, USA.
| |
Collapse
|
12
|
Transition-transversion mutations in the polyketide synthase gene of Aspergillus section Nigri. Heliyon 2019; 5:e01881. [PMID: 31338447 PMCID: PMC6579908 DOI: 10.1016/j.heliyon.2019.e01881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/25/2019] [Accepted: 05/30/2019] [Indexed: 11/21/2022] Open
Abstract
This study determined the transition-transversion mutation in the pks gene of Aspergillus section Nigri in order to gain insight into the patterns of nucleotide base substitution and the process of molecular evolution using standard recommended techniques. Results obtained depict frequent occurrence of transition (23 ± 0.96) than transversion (11.37 ± 1.38) (p < 0.05) with C/T being the most frequently observed transitional base substitution and C/A the most frequently occurring transversional base change. The number of single base insertions (56 ± 1.00) were significantly higher than the observed single base deletions (38 ± 2.00) (p < 0.05) while varying degrees of two or more base deletions and insertions were also observed both inside and outside the open reading frame. The maximum likelihood value estimated for the pks gene was calculated to be -9458.80 in 423 positions of the final dataset while the transition-transversion ratio was estimated to be 0.50. The Tajima's neutrality test approaches seven (7) with the nucleotide diversity estimated to be approximately 65%. Evolutionary test depicts positive selection as ratio of non synonymous to synonymous divergence was found to be greater than ratio of the number of non synonymous to synonymous polymorphisms. The proportion of substitution driven by positive selection was calculated to be approximately 96.2%. This research therefore provides an insight into the understanding of pks gene mutation patterns as some of the observed indels resulted in frame shift mutations.
Collapse
|
13
|
Zhang Z, Wang J, Gong Y, Li Y. Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 2018; 19:771. [PMID: 30355304 PMCID: PMC6201574 DOI: 10.1186/s12864-018-5178-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/16/2018] [Indexed: 11/10/2022] Open
Abstract
Background Quantitative evaluation of protein structural evolution is important for our understanding of protein biological functions and their evolutionary adaptation, and is useful in guiding protein engineering. However, compared to the models for sequence evolution, the quantitative models for protein structural evolution received less attention. Ancient protein superfamilies are often considered versatile, allowing genetic and functional diversifications during long-term evolution. In this study, we investigated the quantitative impacts of sequence variations on the structural evolution of homologues in 68 ancient protein superfamilies that exist widely in sequenced eukaryotic, bacterial and archaeal genomes. Results We found that the accumulated structural variations within ancient superfamilies could be explained largely by a bilinear model that simultaneously considers amino acid substitution and insertion/deletion (indel). Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those ancient superfamilies with high bilinear multiple correlation coefficients, the influence of each unit of substitution or indel on structural variations is almost constant within each superfamily, but varies greatly among different superfamilies. The influence of each unit indel on structural variations is always larger than that of each unit substitution within each superfamily, but the accumulated contributions of indels to structural variations are lower than those of substitutions in most superfamilies. The total contributions of sequence indels and substitutions (46% and 54%, respectively) to the structural variations that result from sequence variations are slightly different in ancient superfamilies. Conclusions Structural variations within ancient protein superfamilies accumulated under the significantly bilinear influence of amino acid substitutions and indels in sequences. Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those structural variations resulting from sequence variations, the total contribution of indels is slightly lower than that of amino acid substitutions. The regular clock exists not only in protein sequences, but also probably in protein structures. Electronic supplementary material The online version of this article (10.1186/s12864-018-5178-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Jinlan Wang
- Physical Examination Office of Shandong Province, Health and Family Planning Commission of Shandong Province, Jinan, 250014, China
| | - Ya Gong
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Yuezhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
14
|
Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018; 86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]
Abstract
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
Collapse
|
15
|
A Newly Determined Member of the meso-Diaminopimelate Dehydrogenase Family with a Broad Substrate Spectrum. Appl Environ Microbiol 2017; 83:AEM.00476-17. [PMID: 28341677 DOI: 10.1128/aem.00476-17] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 03/16/2017] [Indexed: 01/07/2023] Open
Abstract
meso-Diaminopimelate dehydrogenase (meso-DAPDH) from Symbiobacterium thermophilum (StDAPDH) is the first member of the meso-DAPDH family known to catalyze the asymmetric reductive amination of 2-keto acids to produce d-amino acids. It is important to understand the catalytic mechanisms of StDAPDH and other enzymes in this family. In this study, based on an evolutionary analysis and examination of catalytic activity, the meso-DAPDH enzymes can be divided into two types. Type I showed highly preferable activity toward meso-diaminopimelate (meso-DAP), and type II exhibited obviously reversible amination activity with a broad substrate spectrum. StDAPDH belongs to type II. A quaternary structure analysis revealed that insertions/deletions (indels) and a loss of quaternary structure resulted in divergence among members of the meso-DAPDH family. A structure alignment of StDAPDH with a representative of type I, the meso-DAPDH from Corynebacterium glutamicum (CgDAPDH), indicated that they had the same folding. Based on sequence and conservation analyses, two amino acid residues of StDAPDH, R35 and R71, were found to be highly conserved within type II while also distinct from each other between the subtypes. Site mutagenesis studies identified R71 as a substrate preference-related residue of StDAPDH, which may serve as an indicator of the amination preference of type II. These results deepen the present understanding of the meso-DAPDH family and provide a solid foundation for the discovery and engineering of meso-DAPDH for d-amino acid biosynthesis.IMPORTANCE The l-form of amino acids is typically more abundant than the d-form. However, the d-form has many important pharmaceutical applications. meso-Diaminopimelate dehydrogenase (meso-DAPDH) from Symbiobacterium thermophilum (StDAPDH) was the first member of meso-DAPDH known to catalyze the amination of 2-keto acids to produce d-amino acids. Accordingly, we analyzed the evolution of meso-DAPDH proteins and found that they form two groups, i.e., type I proteins, which show high preference toward meso-diaminopimelate (meso-DAP), and type II proteins, which show a broad substrate spectrum. We examined the differences in sequence, ternary structure, and quaternary structure to determine the mechanisms underlying the functional differences between the type I and type II lineages. These results will facilitate the identification of additional meso-DAPDHs and may provide guidance to protein engineering studies for d-amino acid biosynthesis.
Collapse
|
16
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
17
|
Al-Shatnawi M, Ahmad MO, Swamy MNS. MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions. BMC Bioinformatics 2015; 16:393. [PMID: 26597571 PMCID: PMC4657235 DOI: 10.1186/s12859-015-0826-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 11/14/2015] [Indexed: 11/16/2022] Open
Abstract
Background The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. Results We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). Conclusions We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mufleh Al-Shatnawi
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M Omair Ahmad
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| | - M N S Swamy
- Department of Electrical and Computer Engineering, Concordia University, 1455 De Maisonneuve Blvd. W., Montreal, H3G 1M8, Quebec, Canada.
| |
Collapse
|
18
|
Substrate-binding specificity of chitinase and chitosanase as revealed by active-site architecture analysis. Carbohydr Res 2015; 418:50-56. [PMID: 26545262 DOI: 10.1016/j.carres.2015.10.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 10/03/2015] [Accepted: 10/06/2015] [Indexed: 11/21/2022]
Abstract
Chitinases and chitosanases, referred to as chitinolytic enzymes, are two important categories of glycoside hydrolases (GH) that play a key role in degrading chitin and chitosan, two naturally abundant polysaccharides. Here, we investigate the active site architecture of the major chitosanase (GH8, GH46) and chitinase families (GH18, GH19). Both charged (Glu, His, Arg, Asp) and aromatic amino acids (Tyr, Trp, Phe) are observed with higher frequency within chitinolytic active sites as compared to elsewhere in the enzyme structure, indicating significant roles related to enzyme function. Hydrogen bonds between chitinolytic enzymes and the substrate C2 functional groups, i.e. amino groups and N-acetyl groups, drive substrate recognition, while non-specific CH-π interactions between aromatic residues and substrate mainly contribute to tighter binding and enhanced processivity evident in GH8 and GH18 enzymes. For different families of chitinolytic enzymes, the number, type, and position of substrate atoms bound in the active site vary, resulting in different substrate-binding specificities. The data presented here explain the synergistic action of multiple enzyme families at a molecular level and provide a more reasonable method for functional annotation, which can be further applied toward the practical engineering of chitinases and chitosanases.
Collapse
|
19
|
Al-Shatnawi M, Ahmad MO, Swamy MNS. Prediction of Indel flanking regions in protein sequences using a variable-order Markov model. Bioinformatics 2015; 31:40-7. [PMID: 25178462 DOI: 10.1093/bioinformatics/btu556] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Insertion/deletion (indel) and amino acid substitution are two common events that lead to the evolution of and variations in protein sequences. Further, many of the human diseases and functional divergence between homologous proteins are more related to indel mutations, even though they occur less often than the substitution mutations do. A reliable identification of indels and their flanking regions is a major challenge in research related to protein evolution, structures and functions. RESULTS In this article, we propose a novel scheme to predict indel flanking regions in a protein sequence for a given protein fold, based on a variable-order Markov model. The proposed indel flanking region (IndelFR) predictors are designed based on prediction by partial match (PPM) and probabilistic suffix tree (PST), which are referred to as the PPM IndelFR and PST IndelFR predictors, respectively. The overall performance evaluation results show that the proposed predictors are able to predict IndelFRs in the protein sequences with a high accuracy and F1 measure. In addition, the results show that if one is interested only in predicting IndelFRs in protein sequences, it would be preferable to use the proposed predictors instead of HMMER 3.0 in view of the substantially superior performance of the former.
Collapse
Affiliation(s)
- Mufleh Al-Shatnawi
- Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada
| | - M Omair Ahmad
- Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada
| | - M N S Swamy
- Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada
| |
Collapse
|
20
|
Bowers PM, Verdino P, Wang Z, da Silva Correia J, Chhoa M, Macondray G, Do M, Neben TY, Horlick RA, Stanfield RL, Wilson IA, King DJ. Nucleotide insertions and deletions complement point mutations to massively expand the diversity created by somatic hypermutation of antibodies. J Biol Chem 2014; 289:33557-67. [PMID: 25320089 DOI: 10.1074/jbc.m114.607176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
During somatic hypermutation (SHM), deamination of cytidine by activation-induced cytidine deaminase and subsequent DNA repair generates mutations within immunoglobulin V-regions. Nucleotide insertions and deletions (indels) have recently been shown to be critical for the evolution of antibody binding. Affinity maturation of 53 antibodies using in vitro SHM in a non-B cell context was compared with mutation patterns observed for SHM in vivo. The origin and frequency of indels seen during in vitro maturation were similar to that in vivo. Indels are localized to CDRs, and secondary mutations within insertions further optimize antigen binding. Structural determination of an antibody matured in vitro and comparison with human-derived antibodies containing insertions reveal conserved patterns of antibody maturation. These findings indicate that activation-induced cytidine deaminase acting on V-region sequences is sufficient to initiate authentic formation of indels in vitro and in vivo and that point mutations, indel formation, and clonal selection form a robust tripartite system for antibody evolution.
Collapse
Affiliation(s)
| | - Petra Verdino
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | | | - Mark Chhoa
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | - Minjee Do
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | | | - Robyn L Stanfield
- the Department of Integrative Structural and Computational Molecular Biology and Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California 92037
| | - Ian A Wilson
- the Department of Integrative Structural and Computational Molecular Biology and Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California 92037
| | - David J King
- From Anaptysbio Inc., San Diego, California 92121 and
| |
Collapse
|
21
|
Loss of quaternary structure is associated with rapid sequence divergence in the OSBS family. Proc Natl Acad Sci U S A 2014; 111:8535-40. [PMID: 24872444 DOI: 10.1073/pnas.1318703111] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The rate of protein evolution is determined by a combination of selective pressure on protein function and biophysical constraints on protein folding and structure. Determining the relative contributions of these properties is an unsolved problem in molecular evolution with broad implications for protein engineering and function prediction. As a case study, we examined the structural divergence of the rapidly evolving o-succinylbenzoate synthase (OSBS) family, which catalyzes a step in menaquinone synthesis in diverse microorganisms and plants. On average, the OSBS family is much more divergent than other protein families from the same set of species, with the most divergent family members sharing <15% sequence identity. Comparing 11 representative structures revealed that loss of quaternary structure and large deletions or insertions are associated with the family's rapid evolution. Neither of these properties has been investigated in previous studies to identify factors that affect the rate of protein evolution. Intriguingly, one subfamily retained a multimeric quaternary structure and has small insertions and deletions compared with related enzymes that catalyze diverse reactions. Many proteins in this subfamily catalyze both OSBS and N-succinylamino acid racemization (NSAR). Retention of ancestral structural characteristics in the NSAR/OSBS subfamily suggests that the rate of protein evolution is not proportional to the capacity to evolve new protein functions. Instead, structural features that are conserved among proteins with diverse functions might contribute to the evolution of new functions.
Collapse
|
22
|
Jovelin R, Cutter AD. Fine-scale signatures of molecular evolution reconcile models of indel-associated mutation. Genome Biol Evol 2013; 5:978-86. [PMID: 23558593 PMCID: PMC3673634 DOI: 10.1093/gbe/evt051] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Genomic structural alterations that vary within species, known as large copy number variants, represent an unanticipated and abundant source of genetic diversity that associates with variation in gene expression and susceptibility to disease. Even short insertions and deletions (indels) can exert important effects on genomes by locally increasing the mutation rate, with multiple mechanisms proposed to account for this pattern. To better understand how indels promote genome evolution, we demonstrate that the single nucleotide mutation rate is elevated in the vicinity of indels, with a resolution of tens of base pairs, for the two closely related nematode species Caenorhabditis remanei and C. sp. 23. In addition to indels being clustered with single nucleotide polymorphisms and fixed differences, we also show that transversion mutations are enriched in sequences that flank indels and that many indels associate with sequence repeats. These observations are compatible with a model that reconciles previously proposed mechanisms of indel-associated mutagenesis, implicating repeat sequences as a common driver of indel errors, which then recruit error-prone polymerases during DNA repair, resulting in a locally elevated single nucleotide mutation rate. The striking influence of indel variants on the molecular evolution of flanking sequences strengthens the emerging general view that mutations can induce further mutations.
Collapse
Affiliation(s)
- Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Ontario, Canada.
| | | |
Collapse
|
23
|
Wang Y, Tan X, Paterson AH. Different patterns of gene structure divergence following gene duplication in Arabidopsis. BMC Genomics 2013; 14:652. [PMID: 24063813 PMCID: PMC3848917 DOI: 10.1186/1471-2164-14-652] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 09/20/2013] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence. RESULTS In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence. CONCLUSIONS Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
Collapse
Affiliation(s)
- Yupeng Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA.
| | | | | |
Collapse
|
24
|
Stewart KL, Nelson MR, Eaton KV, Anderson WJ, Cordes MHJ. A role for indels in the evolution of Cro protein folds. Proteins 2013; 81:1988-96. [PMID: 23843258 DOI: 10.1002/prot.24358] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 05/30/2013] [Accepted: 06/10/2013] [Indexed: 11/06/2022]
Abstract
Insertions and deletions in protein sequences, or indels, can disrupt structure and may result in changes in protein folds during evolution or in association with alternative splicing. Pfl 6 and Xfaso 1 are two proteins in the Cro family that share a common ancestor but have different folds. Sequence alignments of the two proteins show two gaps, one at the N terminus, where the sequence of Xfaso 1 is two residues shorter, and one near the center of the sequence, where the sequence of Pfl 6 is five residues shorter. To test the potential importance of indels in Cro protein evolution, we generated hybrid variants of Pfl 6 and Xfaso 1 with indels in one or both regions, chosen according to several plausible sequence alignments. All but one deletion variant completely unfolded both proteins, showing that a longer N-terminal sequence was critical for Pfl 6 folding and a longer central region sequence was critical for Xfaso 1 folding. By contrast, Xfaso 1 tolerated a longer N-terminal sequence with little destabilization, and Pfl 6 tolerated central region insertions, albeit with substantial effects on thermal stability and some perturbation of the surrounding structure. None of the mutations appeared to convert one stable fold into the other. On the basis of this two-protein comparison, short insertion and deletion mutations probably played a role in evolutionary fold change in the Cro family, but were also not the only factors.
Collapse
Affiliation(s)
- Katie L Stewart
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, 85721-0088
| | | | | | | | | |
Collapse
|
25
|
Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013; 13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. RESULTS Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. CONCLUSIONS We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.
Collapse
Affiliation(s)
- Pravech Ajawatanawong
- Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden.
| | | |
Collapse
|
26
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
27
|
Discussion on research methods of bacterial resistant mutation mechanisms under selective culture--uncertainty analysis of data from the Luria-Delbrück fluctuation experiment. SCIENCE CHINA-LIFE SCIENCES 2012; 55:1007-21. [PMID: 23160830 DOI: 10.1007/s11427-012-4395-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2012] [Accepted: 10/09/2012] [Indexed: 10/27/2022]
Abstract
Whether bacterial drug-resistance is drug-induced or results from rapid propagation of random spontaneous mutations in the flora prior to exposure, remains a long-term key issue concerned and debated in both genetics and medicinal fields. In a pioneering study, Luria and Delbrück exposed E. coli to T1 phage, to investigate whether the number of resistant colonies followed the Poisson distribution. They deduced that the development of resistant colonies is independent of phage presence. Similar results have since been obtained on solid medium containing antibacterial agents. Luria and Delbrück's conclusions were long considered a gold standard for analyzing drug resistance mutations. More recently, the concept of adaptive mutation has triggered controversy over this approach. Microbiological observation shows that, following exposure to drugs of various concentrations, drug-resistant cells emerge and multiply depending on the time course, and show a process function, inconsistent with the definition of Poisson distribution (which assumes not only that resistance is independent of drug quantity but follows no specific time course). At the same time, since cells tend to aggregate after division rather than separating, colonies growing on drug plates arise from the multiplication of resistant bacteria cells of various initial population sizes. Thus, statistical analysis based on equivalence of initial populations will yield erroneous results. In this paper, 310 data from the Luria-Delbrück fluctuation experiment were reanalyzed from this perspective. In most cases, a high-end abnormal value, resulting from the non-synchronous variation of the two above-mentioned time variables, was observed. Therefore, the mean value cannot be regarded as an unbiased expectation estimate. The ratio between mean value and variance was similarly incomparable, because two different sampling methods were used. In fact, the Luria-Delbrück data appear to follow an aggregated, rather than Poisson distribution. In summary, the statistical analysis of Luria and Delbrück is insufficient to describe rules of resistant mutant development and multiplication. Correction of this historical misunderstanding will enable new insight into bacterial resistance mechanisms.
Collapse
|
28
|
An indel polymorphism in the hybrid incompatibility gene lethal hybrid rescue of Drosophila is functionally relevant. Genetics 2012; 192:683-91. [PMID: 22865735 DOI: 10.1534/genetics.112.141952] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Hybrid incompatibility (HI) genes are frequently observed to be rapidly evolving under selection. This observation has led to the attractive conjecture that selection-derived protein-sequence divergence is culpable for incompatibilities in hybrids. The Drosophila simulans HI gene Lethal hybrid rescue (Lhr) is an intriguing case, because despite having experienced rapid sequence evolution, its HI properties are a shared function inherited from the ancestral state. Using an unusual D. simulans Lhr hybrid rescue allele, Lhr(2), we here identify a conserved stretch of 10 amino acids in the C terminus of LHR that is critical for causing hybrid incompatibility. Altering these 10 amino acids weakens or abolishes the ability of Lhr to suppress the hybrid rescue alleles Lhr(1) or Hmr(1), respectively. Besides single-amino-acid substitutions, Lhr orthologs differ by a 16-aa indel polymorphism, with the ancestral deletion state fixed in D. melanogaster and the derived insertion state at very high frequency in D. simulans. Lhr(2) is a rare D. simulans allele that has the ancestral deletion state of the 16-aa polymorphism. Through a series of transgenic constructs we demonstrate that the ancestral deletion state contributes to the rescue activity of Lhr(2). This indel is thus a polymorphism that can affect the HI function of Lhr.
Collapse
|
29
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
30
|
Westesson O, Lunter G, Paten B, Holmes I. Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS One 2012; 7:e34572. [PMID: 22536326 PMCID: PMC3335033 DOI: 10.1371/journal.pone.0034572] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2012] [Accepted: 03/05/2012] [Indexed: 11/24/2022] Open
Abstract
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.
Collapse
Affiliation(s)
- Oscar Westesson
- University of California Berkeley and University of California San Francisco Graduate Program in Bioengineering, University of California, Berkeley, California, United States of America
| | - Gerton Lunter
- Wellcome Trust Center for Human Genetics, Oxford, Oxford, United Kingdom
| | - Benedict Paten
- Baskin School of Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Ian Holmes
- University of California Berkeley and University of California San Francisco Graduate Program in Bioengineering, University of California, Berkeley, California, United States of America
| |
Collapse
|
31
|
Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012; 29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Insertions and deletions (indels) in protein-coding genes are important sources of genetic variation. Their role in creating new proteins may be especially important after gene duplication. However, little is known about how indels affect the divergence of duplicate genes. We here study thousands of duplicate genes in five fish (teleost) species with completely sequenced genomes. The ancestor of these species has been subject to a fish-specific genome duplication (FSGD) event that occurred approximately 350 Ma. We find that duplicate genes contain at least 25% more indels than single-copy genes. These indels accumulated preferentially in the first 40 my after the FSGD. A lack of widespread asymmetric indel accumulation indicates that both members of a duplicate gene pair typically experience relaxed selection. Strikingly, we observe a 30-80% excess of deletions over insertions that is consistent for indels of various lengths and across the five genomes. We also find that indels preferentially accumulate inside loop regions of protein secondary structure and in regions where amino acids are exposed to solvent. We show that duplicate genes with high indel density also show high DNA sequence divergence. Indel density, but not amino acid divergence, can explain a large proportion of the tertiary structure divergence between proteins encoded by duplicate genes. Our observations are consistent across all five fish species. Taken together, they suggest a general pattern of duplicate gene evolution in which indels are important driving forces of evolutionary change.
Collapse
Affiliation(s)
- Baocheng Guo
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
32
|
Leushkin EV, Bazykin GA, Kondrashov AS. Insertions and deletions trigger adaptive walks in Drosophila proteins. Proc Biol Sci 2012; 279:3075-82. [PMID: 22456880 PMCID: PMC3385466 DOI: 10.1098/rspb.2011.2571] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Maps that relate all possible genotypes or phenotypes to fitness—fitness landscapes—are central to the evolution of life, but remain poorly known. An insertion or a deletion (indel) of one or several amino acids constitutes a substantial leap of a protein within the space of amino acid sequences, and it is unlikely that after such a leap the new sequence corresponds precisely to a fitness peak. Thus, one can expect an indel in the protein-coding sequence that gets fixed in a population to be followed by some number of adaptive amino acid substitutions, which move the new sequence towards a nearby fitness peak. Here, we study substitutions that occur after a frame-preserving indel in evolving proteins of Drosophila. An insertion triggers 1.03 ± 0.75 amino acid substitutions within the protein region centred at the site of insertion, and a deletion triggers 4.77 ± 1.03 substitutions within such a region. The difference between these values is probably owing to a higher fraction of effectively neutral insertions. Almost all of the triggered amino acid substitutions can be attributed to positive selection, and most of them occur relatively soon after the triggering indel and take place upstream of its site. A high fraction of substitutions that follow an indel occur at previously conserved sites, suggesting that an indel substantially changes selection that shapes the protein region around it. Thus, an indel is often followed by an adaptive walk of length that is in agreement with the theory of molecular adaptation.
Collapse
Affiliation(s)
- Evgeny V Leushkin
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskye Gory 1-73, Moscow 119991, Russia.
| | | | | |
Collapse
|
33
|
Zhang Z, Xing C, Wang L, Gong B, Liu H. IndelFR: a database of indels in protein structures and their flanking regions. Nucleic Acids Res 2011; 40:D512-8. [PMID: 22127860 PMCID: PMC3245007 DOI: 10.1093/nar/gkr1107] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Insertion/deletion (indel) is one of the most common methods of protein sequence variation. Recent studies showed that indels could affect their flanking regions and they are important for protein function and evolution. Here, we describe the Indel Flanking Region Database (IndelFR, http://indel.bioinfo.sdu.edu.cn), which provides sequence and structure information about indels and their flanking regions in known protein domains. The indels were obtained through the pairwise alignment of homologous structures in SCOP superfamilies. The IndelFR database contains 2,925,017 indels with flanking regions extracted from 373,402 structural alignment pairs of 12,573 non-redundant domains from 1053 superfamilies. IndelFR provides access to information about indels and their flanking regions, including amino acid sequences, lengths, locations, secondary structure constitutions, hydrophilicity/hydrophobicity, domain information, 3D structures and so on. IndelFR has already been used for molecular evolution studies and may help to promote future functional studies of indels and their flanking regions.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan 250100, China
| | | | | | | | | |
Collapse
|
34
|
Zhang Z, Wang Y, Wang L, Gao P. The combined effects of amino acid substitutions and indels on the evolution of structure within protein families. PLoS One 2010; 5:e14316. [PMID: 21179197 PMCID: PMC3001449 DOI: 10.1371/journal.pone.0014316] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 11/16/2010] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND In the process of protein evolution, sequence variations within protein families can cause changes in protein structures and functions. However, structures tend to be more conserved than sequences and functions. This leads to an intriguing question: what is the evolutionary mechanism by which sequence variations produce structural changes? To investigate this question, we focused on the most common types of sequence variations: amino acid substitutions and insertions/deletions (indels). Here their combined effects on protein structure evolution within protein families are studied. RESULTS Sequence-structure correlation analysis on 75 homologous structure families (from SCOP) that contain 20 or more non-redundant structures shows that in most of these families there is, statistically, a bilinear correlation between the amount of substitutions and indels versus the degree of structure variations. Bilinear regression of percent sequence non-identity (PNI) and standardized number of gaps (SNG) versus RMSD was performed. The coefficients from the regression analysis could be used to estimate the structure changes caused by each unit of substitution (structural substitution sensitivity, SSS) and by each unit of indel (structural indel sensitivity, SIDS). An analysis on 52 families with high bilinear fitting multiple correlation coefficients and statistically significant regression coefficients showed that SSS is mainly constrained by disulfide bonds, which almost have no effects on SIDS. CONCLUSIONS Structural changes in homologous protein families could be rationally explained by a bilinear model combining amino acid substitutions and indels. These results may further improve our understanding of the evolutionary mechanisms of protein structures.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
| | - Yuxiao Wang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- Division of Basic Science, UT Southwestern, Dallas, Texas, United States of America
| | - Lushan Wang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- * E-mail: (LW); (PG)
| | - Peiji Gao
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- * E-mail: (LW); (PG)
| |
Collapse
|
35
|
Regional context in the alignment of biological sequence pairs. J Mol Evol 2010; 72:147-59. [PMID: 21107551 PMCID: PMC3064887 DOI: 10.1007/s00239-010-9409-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2010] [Accepted: 11/08/2010] [Indexed: 11/24/2022]
Abstract
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.
Collapse
|