1
|
Quantifying the Mutational Robustness of Protein-Coding Genes. J Mol Evol 2021; 89:357-369. [PMID: 33934169 DOI: 10.1007/s00239-021-10009-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/05/2021] [Indexed: 10/21/2022]
Abstract
We use large-scale mutagenesis data and computer simulations to quantify the mutational robustness of protein-coding genes by taking into account constraints arising from protein function and the genetic code. Analyses of the distribution of amino acid substitutions from 18 mutagenesis studies revealed an average of 45% of neutral variants; while mutagenesis data of 12 proteins artificially designed under no other constraints but stability, reach an average of 60%. Simulations using a lattice protein model allow us to contrast these estimates to the expected mutational robustness of protein families by generating unbiased samples of foldable sequences, which we find to have 30% of neutral variants. In agreement with mutagenesis data of designed proteins, the model shows that maximally robust protein families might access up to twice the amount of neutral variants observed in the unbiased samples (i.e. 60%). A biophysical model of protein-ligand binding suggests that constraints associated to molecular function have only a moderate impact on robustness of approximately 5 to 10% of neutral variants; and that the direction of this effect depends on the relation between functional performance and thermodynamic stability. Although the genetic code constraints the access of a gene's nucleotide sequence to only 30% of the full distribution of amino acid mutations, it provides an extra 15 to 20% of neutral variants to the estimations above, such that the expected, observed, and maximal robustness of protein-coding genes are approximately 50, 65, and 75%, respectively. We discuss our results in the light of three main hypothesis put forward to explain the existence of mutationally robust genes.
Collapse
|
2
|
Eaton KV, Anderson WJ, Dubrava MS, Kumirov VK, Dykstra EM, Cordes MHJ. Studying protein fold evolution with hybrids of differently folded homologs. Protein Eng Des Sel 2015; 28:241-50. [PMID: 25991865 DOI: 10.1093/protein/gzv027] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Accepted: 04/20/2015] [Indexed: 11/13/2022] Open
Abstract
To study the sequence determinants governing protein fold evolution, we generated hybrid sequences from two homologous proteins with 40% identity but different folds: Pfl 6 Cro, which has a mixed α + β structure, and Xfaso 1 Cro, which has an all α-helical structure. First, we first examined eight chimeric hybrids in which the more structurally conserved N-terminal half of one protein was fused to the more structurally divergent C-terminal half of the other. None of these chimeras folded, as judged by circular dichroism spectra and thermal melts, suggesting that both halves have strong intrinsic preferences for the native global fold pattern, and/or that the interfaces between the halves are not readily interchangeable. Second, we examined 10 hybrids in which blocks of the structurally divergent C-terminal region were exchanged. These hybrids showed varying levels of thermal stability and suggested that the key residues in the Xfaso 1 C terminus specifying the all-α fold were concentrated near the end of helix 4 in Xfaso 1, which aligns to the end of strand 2 in Pfl 6. Finally, we generated hybrid substitutions for each individual residue in this critical region and measured thermal stabilities. The results suggested that R47 and V48 were the strongest factors that excluded formation of the α + β fold in the C-terminal region of Xfaso 1. In support of this idea, we found that the folding stability of one of the original eight chimeras could be rescued by back-substituting these two residues. Overall, the results show not only that the key factors for Cro fold specificity and evolution are global and multifarious, but also that some all-α Cro proteins have a C-terminal subdomain sequence within a few substitutions of switching to the α + β fold.
Collapse
Affiliation(s)
- Karen V Eaton
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| | - William J Anderson
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| | - Matthew S Dubrava
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| | - Vlad K Kumirov
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| | - Emily M Dykstra
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| | - Matthew H J Cordes
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721-0088, USA
| |
Collapse
|
3
|
|
4
|
Becchetti A. Empirically founded genotype-phenotype maps from mammalian cyclic nucleotide-gated ion channels. J Theor Biol 2014; 363:205-15. [PMID: 25172772 DOI: 10.1016/j.jtbi.2014.08.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 07/22/2014] [Accepted: 08/20/2014] [Indexed: 10/24/2022]
Abstract
A major barrier between evolutionary and functional biology is the difficulty of determining appropriate genotype-phenotype-fitness maps, particularly in metazoans. Concrete perspectives towards unifying these approaches are offered by studies on the physiological systems that depend on ion channel dynamics. I focus on the cyclic nucleotide-gated (CNG) channels implicated in the photoreceptor's response to light. From an evolutionary standpoint, sensory systems offers interpretative advantages, as the relation between the sensory response and environment is relatively straightforward. For CNG and other ion channels, extensive data are available about the physiological consequences of scanning mutagenesis on sensitive protein domains, such as the conduction pore. Mutant ion channels can be easily studied in living cells, so that the relation between genotypes and phenotypes is less speculative than usual. By relying on relatively simple theoretical frameworks, I used these data to relate the sequence space with phenotypes at increasing hierarchical levels. These empirical genotype-phenotype and phenotype-phenotype landscapes became smoother at higher integration levels, especially in heterozygous condition. The epistatic interaction between sites was analyzed from double mutant constructs. Magnitude epistasis was common. Moreover, evidence of reciprocal sign epistasis and the presence of permissive mutations were also observed, which suggest how adaptive regions can be connected across maladaptive valleys. The approach I describe suggests a way to better relate the evolutionary dynamics with the underlying physiology.
Collapse
Affiliation(s)
- Andrea Becchetti
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy.
| |
Collapse
|
5
|
Pagan RF, Massey SE. A nonadaptive origin of a beneficial trait: in silico selection for free energy of folding leads to the neutral emergence of mutational robustness in single domain proteins. J Mol Evol 2013; 78:130-9. [PMID: 24362542 DOI: 10.1007/s00239-013-9606-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 12/04/2013] [Indexed: 10/25/2022]
Abstract
Proteins are regarded as being robust to the deleterious effects of mutations. Here, the neutral emergence of mutational robustness in a population of single domain proteins is explored using computer simulations. A pairwise contact model was used to calculate the ΔG of folding (ΔG folding) using the three dimensional protein structure of leech eglin C. A random amino acid sequence with low mutational robustness, defined as the average ΔΔG resulting from a point mutation (ΔΔG average), was threaded onto the structure. A population of 1,000 threaded sequences was evolved under selection for stability, using an upper and lower energy threshold. Under these conditions, mutational robustness increased over time in the most common sequence in the population. In contrast, when the wild type sequence was used it did not show an increase in robustness. This implies that the emergence of mutational robustness is sequence specific and that wild type sequences may be close to maximal robustness. In addition, an inverse relationship between ∆∆G average and protein stability is shown, resulting partly from a larger average effect of point mutations in more stable proteins. The emergence of mutational robustness was also observed in the Escherichia coli colE1 Rop and human CD59 proteins, implying that the property may be common in single domain proteins under certain simulation conditions. The results indicate that at least a portion of mutational robustness in small globular proteins might have arisen by a process of neutral emergence, and could be an example of a beneficial trait that has not been directly selected for, termed a "pseudaptation."
Collapse
Affiliation(s)
- Rafael F Pagan
- Physics Department, University of Puerto Rico - Rio Piedras, San Juan, PR, USA
| | | |
Collapse
|
6
|
Foster DV, Rorick MM, Gesell T, Feeney LM, Foster JG. Dynamic landscapes: a model of context and contingency in evolution. J Theor Biol 2013; 334:162-72. [PMID: 23796530 DOI: 10.1016/j.jtbi.2013.05.030] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Revised: 05/12/2013] [Accepted: 05/31/2013] [Indexed: 01/09/2023]
Abstract
Although the basic mechanics of evolution have been understood since Darwin, debate continues over whether macroevolutionary phenomena are driven by the fitness structure of genotype space or by ecological interaction. In this paper we propose a simple model capturing key features of fitness-landscape and ecological models of evolution. Our model describes evolutionary dynamics in a high-dimensional, structured genotype space with interspecies interaction. We find promising qualitative similarity with the empirical facts about macroevolution, including broadly distributed extinction sizes and realistic exploration of the genotype space. The abstraction of our model permits numerous applications beyond macroevolution, including protein and RNA evolution.
Collapse
|
7
|
Stewart KL, Dodds ED, Wysocki VH, Cordes MHJ. A polymetamorphic protein. Protein Sci 2013; 22:641-9. [PMID: 23471712 DOI: 10.1002/pro.2248] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Revised: 02/25/2013] [Accepted: 03/01/2013] [Indexed: 11/10/2022]
Abstract
Arc repressor is a homodimeric protein with a ribbon-helix-helix fold. A single polar-to-hydrophobic substitution (N11L) at a solvent-exposed position leads to population of an alternate dimeric fold in which 3₁₀ helices replace a β-sheet. Here we find that the variant Q9V/N11L/R13V (S-VLV), with two additional polar-to-hydrophobic surface mutations in the same β-sheet, forms a highly stable, reversibly folded octamer with approximately half the α-helical content of wild-type Arc. At low protein concentration and low ionic strength, S-VLV also populates both dimeric topologies previously observed for N11L, as judged by NMR chemical shift comparisons. Thus, accumulation of simple hydrophobic mutations in Arc progressively reduces fold specificity, leading first to a sequence with two folds and then to a manifold bridge sequence with at least three different topologies. Residues 9-14 of S-VLV form a highly hydrophobic stretch that is predicted to be amyloidogenic, but we do not observe aggregates of higher order than octamer. Increases in sequence hydrophobicity can promote amyloid aggregation but also exert broader and more complex effects on fold specificity. Altered native folds, changes in fold coupled to oligomerization, toxic pre-amyloid oligomers, and amyloid fibrils may represent a near continuum of accessible alternatives in protein structure space.
Collapse
Affiliation(s)
- Katie L Stewart
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona, USA
| | | | | | | |
Collapse
|
8
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
9
|
Analytic markovian rates for generalized protein structure evolution. PLoS One 2012; 7:e34228. [PMID: 22693543 PMCID: PMC3367531 DOI: 10.1371/journal.pone.0034228] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 02/26/2012] [Indexed: 12/24/2022] Open
Abstract
A general understanding of the complex phenomenon of protein evolution requires the accurate description of the constraints that define the sub-space of proteins with mutations that do not appreciably reduce the fitness of the organism. Such constraints can have multiple origins, in this work we present a model for constrained evolutionary trajectories represented by a Markovian process throughout a set of protein-like structures artificially constructed to be topological intermediates between the structure of two natural occurring proteins. The number and type of intermediate steps defines how constrained the total evolutionary process is. By using a coarse-grained representation for the protein structures, we derive an analytic formulation of the transition rates between each of the intermediate structures. The results indicate that compact structures with a high number of hydrogen bonds are more probable and have a higher likelihood to arise during evolution. Knowledge of the transition rates allows for the study of complex evolutionary pathways represented by trajectories through a set of intermediate structures.
Collapse
|
10
|
The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis. J Mol Evol 2012; 74:1-34. [PMID: 22210458 DOI: 10.1007/s00239-011-9480-1] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 12/12/2011] [Indexed: 12/20/2022]
Abstract
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Collapse
|
11
|
Rorick MM, Wagner GP. Protein structural modularity and robustness are associated with evolvability. Genome Biol Evol 2011; 3:456-75. [PMID: 21602570 PMCID: PMC3134980 DOI: 10.1093/gbe/evr046] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Theory suggests that biological modularity and robustness allow for maintenance of fitness under mutational change, and when this change is adaptive, for evolvability. Empirical demonstrations that these traits promote evolvability in nature remain scant however. This is in part because modularity, robustness, and evolvability are difficult to define and measure in real biological systems. Here, we address whether structural modularity and/or robustness confer evolvability at the level of proteins by looking for associations between indices of protein structural modularity, structural robustness, and evolvability. We propose a novel index for protein structural modularity: the number of regular secondary structure elements (helices and strands) divided by the number of residues in the structure. We index protein evolvability as the proportion of sites with evidence of being under positive selection multiplied by the average rate of adaptive evolution at these sites, and we measure this as an average over a phylogeny of 25 mammalian species. We use contact density as an index of protein designability, and thus, structural robustness. We find that protein evolvability is positively associated with structural modularity as well as structural robustness and that the effect of structural modularity on evolvability is independent of the structural robustness index. We interpret these associations to be the result of reduced constraints on amino acid substitutions in highly modular and robust protein structures, which results in faster adaptation through natural selection.
Collapse
Affiliation(s)
- Mary M Rorick
- Department of Genetics, Yale University, New Haven, Connecticut, USA.
| | | |
Collapse
|
12
|
Goldstein RA. The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 2011; 79:1396-407. [DOI: 10.1002/prot.22964] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Revised: 11/17/2010] [Accepted: 11/25/2010] [Indexed: 11/11/2022]
|
13
|
Ferrada E, Wagner A. Evolutionary innovations and the organization of protein functions in genotype space. PLoS One 2010; 5:e14172. [PMID: 21152394 PMCID: PMC2994758 DOI: 10.1371/journal.pone.0014172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 10/28/2010] [Indexed: 11/18/2022] Open
Abstract
The organization of protein structures in protein genotype space is well studied. The same does not hold for protein functions, whose organization is important to understand how novel protein functions can arise through blind evolutionary searches of sequence space. In systems other than proteins, two organizational features of genotype space facilitate phenotypic innovation. The first is that genotypes with the same phenotype form vast and connected genotype networks. The second is that different neighborhoods in this space contain different novel phenotypes. We here characterize the organization of enzymatic functions in protein genotype space, using a data set of more than 30,000 proteins with known structure and function. We show that different neighborhoods of genotype space contain proteins with very different functions. This property both facilitates evolutionary innovation through exploration of a genotype network, and it constrains the evolution of novel phenotypes. The phenotypic diversity of different neighborhoods is caused by the fact that some functions can be carried out by multiple structures. We show that the space of protein functions is not homogeneous, and different genotype neighborhoods tend to contain a different spectrum of functions, whose diversity increases with increasing distance of these neighborhoods in sequence space. Whether a protein with a given function can evolve specific new functions is thus determined by the protein's location in sequence space.
Collapse
Affiliation(s)
- Evandro Ferrada
- Department of Biochemistry, University of Zurich, Zurich, Switzerland.
| | | |
Collapse
|
14
|
Tuffin M, Anderson D, Heath C, Cowan DA. Metagenomic gene discovery: how far have we moved into novel sequence space? Biotechnol J 2010; 4:1671-83. [PMID: 19946882 DOI: 10.1002/biot.200900235] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Metagenomics emerged in the late 1990s as a tool for accessing and studying the collective microbial genetic material in the environment. The advent of the technology generated great excitement, as it has provided new opportunities and technologies for studying the wealth of microbial genetic diversity in the environment. Metagenomics has been widely predicted to access new dimensions of protein sequence space. A decade on, we review how far we have actually moved into new sequence space (and other aspects of protein space) using metagenomic tools. While several novel enzyme activities and protein structures have been identified through metagenomic strategies, the greatest advancement has been made in the isolation of novel protein sequences, some of which have no close relatives, form deeply branched lineages and even represent novel families. This is particularly true for glycosyl hydrolases and lipase/esterases, despite the fact that these activities are frequently screened for in metagenomic studies. However, there is much room for improvement in the methods employed and they will need to be addressed so that access to novel biocatalytic activities can be widened.
Collapse
Affiliation(s)
- Marla Tuffin
- Institute for Microbial Biotechnology and Metagenomics, Department of Biotechnology, University of Western Cape, Cape town, South Africa
| | | | | | | |
Collapse
|
15
|
Sadowski MI, Taylor WR. Protein structures, folds and fold spaces. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:033103. [PMID: 21386276 DOI: 10.1088/0953-8984/22/3/033103] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
There has been considerable progress towards the goal of understanding the space of possible tertiary structures adopted by proteins. Despite a greatly increased rate of structure determination and a deliberate strategy of sequencing proteins expected to be very different from those already known, it is now rare to see a genuinely new fold, leading to the conclusion that we have seen the majority of natural structural types. The increase in knowledge has also led to a critical examination of traditional fold-based classifications and their meaning for evolution and protein structures. We review these issues and discuss possible solutions.
Collapse
Affiliation(s)
- Michael I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | | |
Collapse
|
16
|
Rorick MM, Wagner GP. The origin of conserved protein domains and amino acid repeats via adaptive competition for control over amino acid residues. J Mol Evol 2010; 70:29-43. [PMID: 20024539 PMCID: PMC3368225 DOI: 10.1007/s00239-009-9305-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 11/18/2009] [Indexed: 10/20/2022]
Abstract
Some proteins, such as homeodomain transcription factors, contain highly conserved regions of sequence. It has recently been suggested that multiple functional domains overlap in the homeodomain, together explaining this high conservation. However, the question remains why so many functional domains cluster together in one relatively small and constrained region of the protein. Here we have modeled an evolutionary mechanism that can produce this kind of clustering: conserved functional domains are displaced from the parts of the molecule that are undergoing adaptive evolution because novel functions generally out-compete conserved functions for control over the identity of amino acid residues. We call this model COAA, for Competition Over Amino Acids. We also studied the evolution of amino acid repeats (a.k.a. homopeptides), which are especially prevalent in transcription factors. Repeats that are encoded by non-homogenous mixtures of synonymous codons cannot be explained by replication slippage alone. Our model provides two explanations for their origin, maintenance, and over-representation in highly conserved proteins. We demonstrate that either competition between multiple functional domains for space within a sequence, or reuse of a sequence for many functions over time, can cause the evolution of amino acid repeats. Both of these processes are characteristic of multifunctional proteins such as homeodomain transcription factors. We conclude that the COAA model can explain two widely recognized features of transcription factor proteins: conserved domains and a tendency to accumulate homopeptides.
Collapse
Affiliation(s)
- Mary M Rorick
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8106, USA.
| | | |
Collapse
|
17
|
Munteanu A, Stadler PF. Mutate now, die later. Evolutionary dynamics with delayed selection. J Theor Biol 2009; 260:412-21. [PMID: 19577580 DOI: 10.1016/j.jtbi.2009.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Revised: 06/15/2009] [Accepted: 06/24/2009] [Indexed: 10/20/2022]
Abstract
We analyze here the evolutionary consequences of selection with delay in a population genetics context. In the classical works on evolutionary dynamics, an individual produces off-springs in direct proportion to its fitness, a process in which mutations may occur. In the present scenario of delayed selection, individuals that acquire deleterious mutations can still reproduce unharmed for several generations. During this time delay, the damage passed on to off-springs can potentially be repaired by subsequent compensatory mutations. In the absence of such a repair, the individual becomes sterile. Here we study the population-genetic effects of such a time delay by means of both numerical simulations and theoretical modeling. The results show that delayed selection lowers the extinction threshold, endangering the survival of the population. Surprisingly, however, no traces of this delay effect are encountered in the sequence diversity of the population. These conclusions suggest that delayed selection is hard to detect in genetic data and thus could be a wide-spread but rarely detected phenomenon.
Collapse
Affiliation(s)
- Andreea Munteanu
- ICREA-GRIB Complex Systems Lab, UPF, Parc de Recerca Biomedica Barcelona Dr Aiguader 88, E-08003 Barcelona, Spain.
| | | |
Collapse
|
18
|
Abstract
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Collapse
|
19
|
Ferrada E, Wagner A. Protein robustness promotes evolutionary innovations on large evolutionary time-scales. Proc Biol Sci 2008; 275:1595-602. [PMID: 18430649 DOI: 10.1098/rspb.2007.1617] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent laboratory experiments suggest that a molecule's ability to evolve neutrally is important for its ability to generate evolutionary innovations. In contrast to laboratory experiments, life unfolds on time-scales of billions of years. Here, we ask whether a molecule's ability to evolve neutrally-a measure of its robustness-facilitates evolutionary innovation also on these large time-scales. To this end, we use protein designability, the number of sequences that can adopt a given protein structure, as an estimate of the structure's ability to evolve neutrally. Based on two complementary measures of functional diversity-catalytic diversity and molecular functional diversity in gene ontology-we show that more robust proteins have a greater capacity to produce functional innovations. Significant associations among structural designability, folding rate and intrinsic disorder also exist, underlining the complex relationship of the structural factors that affect protein evolution.
Collapse
Affiliation(s)
- Evandro Ferrada
- Department of Biochemistry, University of Zurich, Building Y27, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| | | |
Collapse
|
20
|
Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18:170-7. [DOI: 10.1016/j.sbi.2008.01.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 12/20/2007] [Accepted: 01/09/2008] [Indexed: 11/29/2022]
|
21
|
Meier S, Jensen PR, David CN, Chapman J, Holstein TW, Grzesiek S, Ozbek S. Continuous molecular evolution of protein-domain structures by single amino acid changes. Curr Biol 2007; 17:173-8. [PMID: 17240343 DOI: 10.1016/j.cub.2006.10.063] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2006] [Revised: 10/25/2006] [Accepted: 10/26/2006] [Indexed: 11/29/2022]
Abstract
Protein structures cluster into families of folds that can result from extremely different amino acid sequences [1]. Because the enormous amount of genetic information generates a limited number of protein folds [2], a particular domain structure often assumes numerous functions. How new protein structures and new functions evolve under these limitations remains elusive. Molecular evolution may be driven by the ability of biomacromolecules to adopt multiple conformations as a bridge between different folds [3-6]. This could allow proteins to explore new structures and new tasks while part of the structural ensemble retains the initial conformation and function as a safeguard [7]. Here we show that a global structural switch can arise from single amino acid changes in cysteine-rich domains (CRD) of cnidarian nematocyst proteins. The ability of these CRDs to form two structures with different disulfide patterns from an identical cysteine pattern is distinctive [8]. By applying a structure-based mutagenesis approach, we demonstrate that a cysteine-rich domain can interconvert between two natively occurring domain structures via a bridge state containing both structures. Comparing cnidarian CRD sequences leads us to believe that the mutations we introduced to stabilize each structure reflect the birth of new protein folds in evolution.
Collapse
Affiliation(s)
- Sebastian Meier
- Department of Structural Biology, Biozentrum, University of Basel, Klingelbergstrasse 70, CH-4056 Basel, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
22
|
Griffiths PD. Boom and bust from influenza. Rev Med Virol 2007; 17:63-5. [PMID: 17335119 DOI: 10.1002/rmv.536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
23
|
Koelle K, Cobey S, Grenfell B, Pascual M. Epochal evolution shapes the phylodynamics of interpandemic influenza A (H3N2) in humans. Science 2007; 314:1898-903. [PMID: 17185596 DOI: 10.1126/science.1132745] [Citation(s) in RCA: 334] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Human influenza A (subtype H3N2) is characterized genetically by the limited standing diversity of its hemagglutinin and antigenically by clusters that emerge and replace each other within 2 to 8 years. By introducing an epidemiological model that allows for differences between the genetic and antigenic properties of the virus's hemagglutinin, we show that these patterns can arise from cluster-specific immunity alone. Central to the formulation is a genotype-to-phenotype mapping, based on neutral networks, with antigenic phenotypes, not genotypes, determining the degree of strain cross-immunity. The model parsimoniously explains well-known, as well as previously unremarked, features of interpandemic influenza dynamics and evolution. It captures the observed boom-and-bust pattern of viral evolution, with periods of antigenic stasis during which genetic diversity grows, and with episodic contraction of this diversity during cluster transitions.
Collapse
MESH Headings
- Amino Acid Substitution
- Antigenic Variation
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- Computer Simulation
- Cross Reactions
- Disease Outbreaks
- Disease Susceptibility
- Epitopes/chemistry
- Epitopes/genetics
- Epitopes/immunology
- Evolution, Molecular
- Genotype
- Hemagglutinin Glycoproteins, Influenza Virus/chemistry
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Humans
- Immunity, Herd
- Influenza A Virus, H3N2 Subtype/genetics
- Influenza A Virus, H3N2 Subtype/immunology
- Influenza, Human/epidemiology
- Influenza, Human/immunology
- Influenza, Human/transmission
- Influenza, Human/virology
- Models, Biological
- Models, Statistical
- Phenotype
- Phylogeny
- Point Mutation
- Polymorphism, Genetic
Collapse
Affiliation(s)
- Katia Koelle
- Department of Ecology and Evolutionary Biology, 2019 Kraus Natural Science Building, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA.
| | | | | | | |
Collapse
|
24
|
Meier S, Özbek S. A biological cosmos of parallel universes: Does protein structural plasticity facilitate evolution? Bioessays 2007; 29:1095-104. [DOI: 10.1002/bies.20661] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
25
|
Rodrigue N, Philippe H, Lartillot N. Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 2006; 23:1762-75. [PMID: 16787998 DOI: 10.1093/molbev/msl041] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In recent works, methods have been proposed for applying phylogenetic models that allow for a general interdependence between the amino acid positions of a protein. As of yet, such models have focused on site interdependencies resulting from sequence-structure compatibility constraints, using simplified structural representations in combination with a set of statistical potentials. This structural compatibility criterion is meant as a proxy for sequence fitness, and the methods developed thus far can incorporate different site-interdependent fitness proxies based on other measurements. However, no methods have been proposed for comparing and evaluating the adequacy of alternative fitness proxies in this context, or for more general comparisons with canonical models of protein evolution. In the present work, we apply Bayesian methods of model selection-based on numerical calculations of marginal likelihoods and posterior predictive checks-to evaluate models encompassing the site-interdependent framework. Our application of these methods indicates that considering site-interdependencies, as done here, leads to an improved model fit for all data sets studied. Yet, we find that the use of pairwise contact potentials alone does not suitably account for across-site rate heterogeneity or amino acid exchange propensities; for such complexities, site-independent treatments are still called for. The most favored models combine the use of statistical potentials with a suitably rich site-independent model. Altogether, the methodology employed here should allow for a more rigorous and systematic exploration of different ways of modeling explicit structural constraints, or any other site-interdependent criterion, while best exploiting the richness of previously proposed models.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada.
| | | | | |
Collapse
|
26
|
Abstract
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.
Collapse
Affiliation(s)
- Gavin E Crooks
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720-3102, USA.
| | | | | |
Collapse
|
27
|
Abstract
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | |
Collapse
|
28
|
Wiederstein M, Sippl MJ. Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials. J Mol Biol 2004; 345:1199-212. [PMID: 15644215 DOI: 10.1016/j.jmb.2004.11.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2004] [Revised: 11/05/2004] [Accepted: 11/07/2004] [Indexed: 11/27/2022]
Abstract
Modifications of the amino acid sequence generally affect protein stability. Here, we use knowledge-based potentials to estimate the stability of protein structures under sequence variation. Calculations on a variety of protein scaffolds result in a clear distinction of known mutable regions from arbitrarily chosen control patches. For example, randomly changing the sequence of an antibody paratope yields a significantly lower number of destabilized mutants as compared to the randomization of comparable regions on the protein surface. The technique is computationally efficient and can be used to screen protein structures for regions that are amenable to molecular tinkering by preserving the stability of the mutated proteins.
Collapse
Affiliation(s)
- Markus Wiederstein
- Center of Applied Molecular Engineering, University of Salzburg, Jakob Haringerstrasse 5, 5020 Salzburg, Austria
| | | |
Collapse
|
29
|
Parisi G, Echave J. The structurally constrained protein evolution model accounts for sequence patterns of the LbetaH superfamily. BMC Evol Biol 2004; 4:41. [PMID: 15500694 PMCID: PMC538250 DOI: 10.1186/1471-2148-4-41] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2003] [Accepted: 10/22/2004] [Indexed: 11/24/2022] Open
Abstract
Background Structure conservation constrains evolutionary sequence divergence, resulting in observable sequence patterns. Most current models of protein evolution do not take structure into account explicitly, being unsuitable for investigating the effects of structure conservation on sequence divergence. To this end, we recently developed the Structurally Constrained Protein Evolution (SCPE) model. The model starts with the coding sequence of a protein with known three-dimensional structure. At each evolutionary time-step of an SCPE simulation, a trial sequence is generated by introducing a random point mutation in the current coding DNA sequence. Then, a "score" for the trial sequence is calculated and the mutation is accepted only if its score is under a given cutoff, λ. The SCPE score measures the distance between the trial sequence and a given reference sequence, given the structure. In our first brief report we used a "global score", in which the same reference sequence, the ancestral one, was used at each evolutionary step. Here, we introduce a new scoring function, the "local score", in which the sequence accepted at the previous evolutionary time-step is used as the reference. We assess the model on the UDP-N-acetylglucosamine acyltransferase (LPXA) family, as in our previous report, and we extend this study to all other members of the left-handed parallel beta helix fold (LβH) superfamily whose structure has been determined. Results We studied site-dependent entropies, amino acid probability distributions, and substitution matrices predicted by SCPE and compared with experimental data for several members of the LβH superfamily. We also evaluated structure conservation during simulations. Overall, SCPE outperforms JTT in the description of sequence patterns observed in structurally constrained sites. Maximum Likelihood calculations show that the local-score and global-score SCPE substitution matrices obtained for LPXA outperform the JTT model for the LPXA family and for the structurally constrained sites of class i of other members within the LβH superfamily. Conclusion We extended the SCPE model by introducing a new scoring function, the local score. We performed a thorough assessment of the SCPE model on the LPXA family and extended it to all other members of known structure of the LβH superfamily.
Collapse
Affiliation(s)
- Gustavo Parisi
- Centro de Estudios e Investigaciones, Universidad Nacional de Quilmes, Roque Saenz Peña 180, B1876BXD Bernal, Argentina
| | - Julián Echave
- Centro de Estudios e Investigaciones, Universidad Nacional de Quilmes, Roque Saenz Peña 180, B1876BXD Bernal, Argentina
| |
Collapse
|
30
|
Altmeyer S, Füchslin RM, McCaskill JS. Folding stabilizes the evolution of catalysts. ARTIFICIAL LIFE 2004; 10:23-38. [PMID: 15035861 DOI: 10.1162/106454604322875896] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Sequence folding is known to determine the spatial structure and catalytic function of proteins and nucleic acids. We show here that folding also plays a key role in enhancing the evolutionary stability of the intermolecular recognition necessary for the prevalent mode of catalytic action in replication, namely, in trans, one molecule catalyzing the replication of another copy, rather than itself. This points to a novel aspect of why molecular life is structured as it is, in the context of life as it could be: folding allows limited, structurally localized recognition to be strongly sensitive to global sequence changes, facilitating the evolution of cooperative interactions. RNA secondary structure folding, for example is shown to be able to stabilize the evolution of prolonged functional sequences, using only a part of this length extension for intermolecular recognition, beyond the limits of the (cooperative) error threshold. Such folding could facilitate the evolution of polymerases in spatially heterogeneous systems. This facilitation is, in fact, vital because physical limitations prevent complete sequence-dependent discrimination for any significant-size biopolymer substrate. The influence of partial sequence recognition between biopolymer catalysts and complex substrates is investigated within a stochastic, spatially resolved evolutionary model of trans catalysis. We use an analytically tractable nonlinear master equation formulation called PRESS (McCaskill et al., Biol. Chem. 382: 1343-1363), which makes use of an extrapolation of the spatial dynamics down from infinite dimensional space, and compare the results with Monte Carlo simulations.
Collapse
Affiliation(s)
- Stephan Altmeyer
- Fraunhofer Gesellschaft, Biomolecular Information Processing, Institute Center Birlinghoven, D-53754 St. Augustin, Germany
| | | | | |
Collapse
|
31
|
Abstract
The primordial genetic code probably has been a drastically simplified ancestor of the canonical code that is used by contemporary cells. In order to understand how the present-day code came about we first need to explain how the language of the building plan can change without destroying the encoded information. In this work we introduce a minimal organism model that is based on biophysically reasonable descriptions of RNA and protein, namely secondary structure folding and knowledge based potentials. The evolution of a population of such organism under competition for a common resource is simulated explicitly at the level of individual replication events. Starting with very simple codes, and hence greatly reduced amino acid alphabets, we observe a diversification of the codes in most simulation runs. The driving force behind this effect is the possibility to produce fitter proteins when the repertoire of amino acids is enlarged.
Collapse
Affiliation(s)
- Günter Weberndorfer
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Wien, Austria
| | | | | |
Collapse
|
32
|
Park JY, Harris D. Construction and assessment of models of CYP2E1: predictions of metabolism from docking, molecular dynamics, and density functional theoretical calculations. J Med Chem 2003; 46:1645-60. [PMID: 12699383 DOI: 10.1021/jm020538a] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
3D models of CYP2E1 were constructed for the purpose of structure-based prediction of 2E1 metabolism of diverse substrates based on configuration sampling of ligand-atom-oxyferryl center distances and quantum chemical criteria. Models were constructed on the basis of sequence alignments of 2E1 with templates of known structure, including rabbit CYP2C5 (3LVdH) and bacterial CYP450s. Following geometric and energetic assessments, the utility of the model was tested in structure-based predictions of metabolism. Autodock was used to dock chlorzoxazone, p-nitrophenol, N-nitrosodimethylamine, acetominophen, caffeine, theophylline, and methoxyflurane into the model CYP2E1 employing a model oxyferryl heme with charges based on density functional theoretical parametrization. In all cases, the lowest energy bound docked configurations corresponded to ones with the substrate intimately associated with the oxyferryl center. Configurations among the lowest energy docked forms of each of the ligands had orientations relative to the oxyferryl center consistent with the experimentally observed metabolites. Docking of long-chain dialkylnitrosoamines revealed no heme binding site bound configurations, in agreement with the negligible metabolism of these ligands. The lowest energy docked configurations of chlorzoxazone, p-nitrophenol, and N-nitrosodimethylamine, high-affinity substrates of 2E1, were used to initiate 300 ps molecular dynamics (MD) trajectories. The MD-sampled ligand-oxyferryl heme reactant configurations were in good accord with density functional theoretical (DFT) optimized oxyferryl-heme-ligand geometries. Analysis of the MD-sampled ligand-2E1 configurations from multiple docked orientations indicates the configurations with closest exposure of reactive centers to the oxyferryl heme to be correlated with observed metabolites with proper consideration of H-abstraction energetics. DFT assessment of relative radical energetics is directly compared with differential H-abstraction activation energetics by compound I and by a p-nitrosophenoxy radical compound I surrogate for the specific case of methoxyflurane and is shown to be in good agreement.
Collapse
Affiliation(s)
- Jin-Young Park
- Molecular Research Institute, 2495 Old Middlefield Way, Mountain View, California 94043, USA
| | | |
Collapse
|
33
|
Aita T, Ota M, Husimi Y. An in silico exploration of the neutral network in protein sequence space. J Theor Biol 2003; 221:599-613. [PMID: 12713943 DOI: 10.1006/jtbi.2003.3209] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.
Collapse
Affiliation(s)
- Takuyo Aita
- Tsukuba Research Institute, Novartis Pharma K. K. Ohkubo 8, Tsukuba 300-2611, Japan
| | | | | |
Collapse
|