1
|
Van Wormhoudt A, del Río Portilla MÁ, Auzoux-Bordenave S. Gene structure and domain architecture in the biomineralizing protein Lustrin A from the abalone Haliotis rufescens. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
2
|
Chen C, Gao S, Sun Q, Tang Y, Han Y, Zhang J, Li Z. Induced splice site mutation generates alternative intron splicing in starch synthase II ( SSII) gene in rice. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1370984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Chao Chen
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Shan Gao
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Qing Sun
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Yuling Tang
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Yuhao Han
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Jinkun Zhang
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| | - Zhipeng Li
- The Laboratory of Vector Biology and Control, Department of Biotechnology, College of Engineering, Beijing Normal University (Zhuhai), Zhuhai, P.R. China
| |
Collapse
|
3
|
Hleap JS, Susko E, Blouin C. Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture. BMC STRUCTURAL BIOLOGY 2013; 13:20. [PMID: 24131821 PMCID: PMC4016585 DOI: 10.1186/1472-6807-13-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 10/11/2013] [Indexed: 12/23/2022]
Abstract
Background Assessing protein modularity is important to understand protein evolution. Still the question of the existence of a sub-domain modular architecture remains. We propose a graph-theory approach with significance and power testing to identify modules in protein structures. In the first step, clusters are determined by optimizing the partition that maximizes the modularity score. Second, each cluster is tested for significance. Significant clusters are referred to as modules. Evolutionary modules are identified by analyzing homologous structures. Dynamic modules are inferred from sets of snapshots of molecular simulations. We present here a methodology to identify sub-domain architecture robustly, biologically meaningful, and statistically supported. Results The robustness of this new method is tested using simulated data with known modularity. Modules are correctly identified even when there is a low correlation between landmarks within a module. We also analyzed the evolutionary modularity of a data set of α-amylase catalytic domain homologs, and the dynamic modularity of the Niemann-Pick C1 (NPC1) protein N-terminal domain. The α-amylase contains an (α/β)8 barrel (TIM barrel) with the polysaccharides cleavage site and a calcium-binding domain. In this data set we identified four robust evolutionary modules, one of which forms the minimal functional TIM barrel topology. The NPC1 protein is involved in the intracellular lipid metabolism coordinating sterol trafficking. NPC1 N-terminus is the first luminal domain which binds to cholesterol and its oxygenated derivatives. Our inferred dynamic modules in the protein NPC1 are also shown to match functional components of the protein related to the NPC1 disease. Conclusions A domain compartmentalization can be found and described in correlation space. To our knowledge, there is no other method attempting to identify sub-domain architecture from the correlation among residues. Most attempts made focus on sequence motifs of protein-protein interactions, binding sites, or sequence conservancy. We were able to describe functional/structural sub-domain architecture related to key residues for starch cleavage, calcium, and chloride binding sites in the α-amylase, and sterol opening-defining modules and disease-related residues in the NPC1. We also described the evolutionary sub-domain architecture of the α-amylase catalytic domain, identifying the already reported minimum functional TIM barrel.
Collapse
Affiliation(s)
- Jose Sergio Hleap
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, B3H 4R2, Canada.
| | | | | |
Collapse
|
4
|
França GS, Cancherini DV, de Souza SJ. Evolutionary history of exon shuffling. Genetica 2012; 140:249-57. [PMID: 22948334 DOI: 10.1007/s10709-012-9676-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 08/23/2012] [Indexed: 11/29/2022]
Abstract
Exon shuffling has been characterized as one of the major evolutionary forces shaping both the genome and the proteome of eukaryotes. This mechanism was particularly important in the creation of multidomain proteins during animal evolution, bringing a number of functional genetic novelties. Here, genome information from a variety of eukaryotic species was used to address several issues related to the evolutionary history of exon shuffling. By comparing all protein sequences within each species, we were able to characterize exon shuffling signatures throughout metazoans. Intron phase (the position of the intron regarding the codon) and exon symmetry (the pattern of flanking introns for a given exon or block of adjacent exons) were features used to evaluate exon shuffling. We confirmed previous observations that exon shuffling mediated by phase 1 introns (1-1 exon shuffling) is the predominant kind in multicellular animals. Evidence is provided that such pattern was achieved since the early steps of animal evolution, supported by a detectable presence of 1-1 shuffling units in Trichoplax adhaerens and a considerable prevalence of them in Nematostella vectensis. In contrast, Monosiga brevicollis, one of the closest relatives of metazoans, and Arabidopsis thaliana, showed no evidence of 1-1 exon or domain shuffling above what it would be expected by chance. Instead, exon shuffling events are less abundant and predominantly mediated by phase 0 introns (0-0 exon shuffling) in those non-metazoan species. Moreover, an intermediate pattern of 1-1 and 0-0 exon shuffling was observed for the placozoan T. adhaerens, a primitive animal. Finally, characterization of flanking intron phases around domain borders allowed us to identify a common set of symmetric 1-1 domains that have been shuffled throughout the metazoan lineage.
Collapse
Affiliation(s)
- Gustavo S França
- Ludwig Institute for Cancer Research, São Paulo Branch, São Paulo, Brazil
| | | | | |
Collapse
|
5
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
6
|
Artamonova II, Gelfand MS. Comparative Genomics and Evolution of Alternative Splicing: The Pessimists' Science. Chem Rev 2007; 107:3407-30. [PMID: 17645315 DOI: 10.1021/cr068304c] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Irena I Artamonova
- Group of Bioinformatics, Vavilov Institute of General Genetics, RAS, Gubkina 3, Moscow 119991, Russia
| | | |
Collapse
|
7
|
De Kee DW, Gopalan V, Stoltzfus A. A Sequence-Based Model Accounts Largely for the Relationship of Intron Positions to Protein Structural Features. Mol Biol Evol 2007; 24:2158-68. [PMID: 17646255 DOI: 10.1093/molbev/msm151] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Claims of intron-structure correlations have played a major role in debates surrounding split gene origins. In the formative (as opposed to disruptive or "insertional") model of split gene origins, introns represent the scars of chimaeric gene assembly. When analyzed retrospectively, formative introns should tend to fall between modular units, if such units exist, or at least to exhibit a preference for sites favorable to chimaera formation. However, there is another possible source of preferences: under a disruptive model of split gene origins, fortuitous intron-structure correlations may arise because the gain of introns is biased with respect to flanking nucleotide sequences. To investigate the extent to which a sequence-biased intron gain model may account for the present-day distribution of introns, data on over 10,000 introns in eukaryotic protein-coding genes were integrated with structural data from a set of 1,851 nonredundant protein chains. The positions of introns with respect to secondary structures, solvent accessibility, and so-called "modules" were evaluated relative to the expectations of a null model, a disruptive model based on amino acid frequencies at splice junctions, and a formative model defined relative to these. The null model can be excluded for most structural features and is highly improbable when intron sites are grouped by reading frame phase. Phase-dependent correlations with secondary structure and side-chain surface accessibility are particularly strong. However, these phase-dependent correlations are explained largely by the sequence-based disruptive model.
Collapse
Affiliation(s)
- Danny W De Kee
- Center for Advanced Research in Biotechnology, Rockville, MD, USA
| | | | | |
Collapse
|
8
|
de Roos ADG. Conserved intron positions in ancient protein modules. Biol Direct 2007; 2:7. [PMID: 17288589 PMCID: PMC1800838 DOI: 10.1186/1745-6150-2-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 02/08/2007] [Indexed: 12/31/2022] Open
Abstract
Background The timing of the origin of introns is of crucial importance for an understanding of early genome architecture. The Exon theory of genes proposed a role for introns in the formation of multi-exon proteins by exon shuffling and predicts the presence of conserved splice sites in ancient genes. In this study, large-scale analysis of potential conserved splice sites was performed using an intron-exon database (ExInt) derived from GenBank. Results A set of conserved intron positions was found by matching identical splice sites sequences from distantly-related eukaryotic kingdoms. Most amino acid sequences with conserved introns were homologous to consensus sequences of functional domains from conserved proteins including kinases, phosphatases, small GTPases, transporters and matrix proteins. These included ancient proteins that originated before the eukaryote-prokaryote split, for instance the catalytic domain of protein phosphatase 2A where a total of eleven conserved introns were found. Using an experimental setup in which the relation between a splice site and the ancientness of its surrounding sequence could be studied, it was found that the presence of an intron was positively correlated to the ancientness of its surrounding sequence. Intron phase conservation was linked to the conservation of the gene sequence and not to the splice site sequence itself. However, no apparent differences in phase distribution were found between introns in conserved versus non-conserved sequences. Conclusion The data confirm an origin of introns deep in the eukaryotic branch and is in concordance with the presence of introns in the first functional protein modules in an 'Exon theory of genes' scenario. A model is proposed in which shuffling of primordial short exonic sequences led to the formation of the first functional protein modules, in line with hypotheses that see the formation of introns integral to the origins of genome evolution. Reviewers This article was reviewed by Scott Roy (nominated by Anthony Poole), Sandro de Souza (nominated by Manyuan Long), and Gáspár Jékely.
Collapse
Affiliation(s)
- Albert D G de Roos
- Syncyte BioIntelligence, P.O. Box 600, 1000 AP, Amsterdam, The Netherlands.
| |
Collapse
|
9
|
Gudlaugsdottir S, Boswell DR, Wood GR, Ma J. Exon size distribution and the origin of introns. Genetica 2007; 131:299-306. [PMID: 17279432 DOI: 10.1007/s10709-007-9139-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2005] [Accepted: 01/06/2007] [Indexed: 11/29/2022]
Abstract
Since it was first recognised that eukaryotic genes are fragmented into coding segments (exons) separated by non-coding segments (introns), the reason for this phenomenon has been debated. There are two dominant theories: that the piecewise arrangement of genes allows functional protein domains, represented by exons, to recombine by shuffling to form novel proteins with combinations of functions; or that introns represent parasitic DNA that can infest the eukaryotic genome because it does not interfere grossly with the fitness of its host. Differing distributions of exon lengths are predicted by these two theories. In this paper we examine distributions of exon lengths for six different organisms and find that they offer empirical evidence that both theories may in part be correct.
Collapse
|
10
|
Fridmanis D, Fredriksson R, Kapa I, Schiöth HB, Klovins J. Formation of new genes explains lower intron density in mammalian Rhodopsin G protein-coupled receptors. Mol Phylogenet Evol 2006; 43:864-80. [PMID: 17188520 DOI: 10.1016/j.ympev.2006.11.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Revised: 10/06/2006] [Accepted: 11/02/2006] [Indexed: 10/23/2022]
Abstract
Mammalian G protein-coupled receptor (GPCR) genes are characterised by a large proportion of intronless genes or a lower density of introns when compared with GPCRs of invertebrates. It is unclear which mechanisms have influenced intron density in this protein family, which is one of the largest in the mammalian genomes. We used a combination of Hidden Markov Models (HMM) and BLAST searches to establish the comprehensive repertoire of Rhodopsin GPCRs from seven species and performed overall alignments and phylogenetic analysis using the maximum parsimony method for over 1400 receptors in 12 subgroups. We identified 14 different Ancestral Receptor Groups (ARGs) that have members in both vertebrate and invertebrate species. We found that there exists a remarkable difference in the intron density among ancestral and new Rhodopsin GPCRs. The intron density among ARGs members was more than 3.5-fold higher than that within non-ARG members and more than 2-fold higher when considering only the 7TM region. This suggests that the new GPCR genes have been predominantly formed intronless while the ancestral receptors likely accumulated introns during their evolution. Many of the intron positions found in mammalian ARG receptor sequences were found to be present in orthologue invertebrate receptors suggesting that these intron positions are ancient. This analysis also revealed that one intron position is much more frequent than any other position and it is common for a number of phylogenetically different Rhodopsin GPCR groups. This intron position lies within a functionally important, conserved, DRY motif which may form a proto-splice site that could contribute to positional intron insertion. Moreover, we have found that other receptor motifs, similar to DRY, also contain introns between the second and third nucleotide of the arginine codon which also forms a proto-splice site. Our analysis presents compelling evidence that there was not a major loss of introns in mammalian GPCRs and formation of new GPCRs among mammals explains why these have fewer introns compared to invertebrate GPCRs. We also discuss and speculate about the possible role of different RNA- and DNA-based mechanisms of intron insertion and loss.
Collapse
Affiliation(s)
- Davids Fridmanis
- Biomedical Research and Study Centre, University of Latvia, Ratsupites 1, Riga, Latvia
| | | | | | | | | |
Collapse
|
11
|
Grzyb J, Latowski D, Strzałka K. Lipocalins - a family portrait. JOURNAL OF PLANT PHYSIOLOGY 2006; 163:895-915. [PMID: 16504339 DOI: 10.1016/j.jplph.2005.12.007] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2005] [Accepted: 12/12/2005] [Indexed: 05/06/2023]
Abstract
Lipocalins are a widely distributed group of proteins whose common feature is the presence of six-or eight-stranded beta-barrel in their tertiary structure and highly conservative motifs short conserved region, (SCR) in their amino acid sequences. The presence of three SCRs is typical for kernel lipocalins, while outlier lipocalins have only one or two such regions. Owing to their ability to bind and transport small, hydrophobic molecules, lipocalins participate in the distribution of such substances. However, the physiological significance of lipocalins is not limited to transfer processes. They play an important role in the regulation of immunological and developmental processes, and are also involved in the reactions of organisms to various stress factors and in the pathways of signal transduction. Of special interest is the enzymatic activity found in a few members of the lipocalin family, as well as the interaction with natural membranes, both directly with lipids and through membrane-localized protein receptors.
Collapse
Affiliation(s)
- Joanna Grzyb
- Department of Plant Physiology and Biochemistry, Faculty of Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, Poland
| | | | | |
Collapse
|
12
|
Koonin EV. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct 2006; 1:22. [PMID: 16907971 PMCID: PMC1570339 DOI: 10.1186/1745-6150-1-22] [Citation(s) in RCA: 187] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 08/14/2006] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Ever since the discovery of 'genes in pieces' and mRNA splicing in eukaryotes, origin and evolution of spliceosomal introns have been considered within the conceptual framework of the 'introns early' versus 'introns late' debate. The 'introns early' hypothesis, which is closely linked to the so-called exon theory of gene evolution, posits that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. Under this scenario, the absence of spliceosomal introns in prokaryotes is considered to be a result of "genome streamlining". The 'introns late' hypothesis counters that spliceosomal introns emerged only in eukaryotes, and moreover, have been inserted into protein-coding genes continuously throughout the evolution of eukaryotes. Beyond the formal dilemma, the more substantial side of this debate has to do with possible roles of introns in the evolution of eukaryotes. RESULTS I argue that several lines of evidence now suggest a coherent solution to the introns-early versus introns-late debate, and the emerging picture of intron evolution integrates aspects of both views although, formally, there seems to be no support for the original version of introns-early. Firstly, there is growing evidence that spliceosomal introns evolved from group II self-splicing introns which are present, usually, in small numbers, in many bacteria, and probably, moved into the evolving eukaryotic genome from the alpha-proteobacterial progenitor of the mitochondria. Secondly, the concept of a primordial pool of 'virus-like' genetic elements implies that self-splicing introns are among the most ancient genetic entities. Thirdly, reconstructions of the ancestral state of eukaryotic genes suggest that the last common ancestor of extant eukaryotes had an intron-rich genome. Thus, it appears that ancestors of spliceosomal introns, indeed, have existed since the earliest stages of life's evolution, in a formal agreement with the introns-early scenario. However, there is no evidence that these ancient introns ever became widespread before the emergence of eukaryotes, hence, the central tenet of introns-early, the role of introns in early evolution of proteins, has no support. However, the demonstration that numerous introns invaded eukaryotic genes at the outset of eukaryotic evolution and that subsequent intron gain has been limited in many eukaryotic lineages implicates introns as an ancestral feature of eukaryotic genomes and refutes radical versions of introns-late. Perhaps, most importantly, I argue that the intron invasion triggered other pivotal events of eukaryogenesis, including the emergence of the spliceosome, the nucleus, the linear chromosomes, the telomerase, and the ubiquitin signaling system. This concept of eukaryogenesis, in a sense, revives some tenets of the exon hypothesis, by assigning to introns crucial roles in eukaryotic evolutionary innovation. CONCLUSION The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics and theoretical considerations goes as follows: self-splicing introns since the earliest stages of life's evolution--numerous spliceosomal introns invading genes of the emerging eukaryote during eukaryogenesis--subsequent lineage-specific loss and gain of introns. The intron invasion, probably, spawned by the mitochondrial endosymbiont, might have critically contributed to the emergence of the principal features of the eukaryotic cell. This scenario combines aspects of the introns-early and introns-late views. REVIEWERS this article was reviewed by W. Ford Doolittle, James Darnell (nominated by W. Ford Doolittle), William Martin, and Anthony Poole.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
13
|
Abstract
The origins and importance of spliceosomal introns comprise one of the longest-abiding mysteries of molecular evolution. Considerable debate remains over several aspects of the evolution of spliceosomal introns, including the timing of intron origin and proliferation, the mechanisms by which introns are lost and gained, and the forces that have shaped intron evolution. Recent important progress has been made in each of these areas. Patterns of intron-position correspondence between widely diverged eukaryotic species have provided insights into the origins of the vast differences in intron number between eukaryotic species, and studies of specific cases of intron loss and gain have led to progress in understanding the underlying molecular mechanisms and the forces that control intron evolution.
Collapse
Affiliation(s)
- Scott William Roy
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
| | | |
Collapse
|
14
|
Abstract
Most of the phenotypic diversity that we perceive in the natural world is directly attributable to the peculiar structure of the eukaryotic gene, which harbors numerous embellishments relative to the situation in prokaryotes. The most profound changes include introns that must be spliced out of precursor mRNAs, transcribed but untranslated leader and trailer sequences (untranslated regions), modular regulatory elements that drive patterns of gene expression, and expansive intergenic regions that harbor additional diffuse control mechanisms. Explaining the origins of these features is difficult because they each impose an intrinsic disadvantage by increasing the genic mutation rate to defective alleles. To address these issues, a general hypothesis for the emergence of eukaryotic gene structure is provided here. Extensive information on absolute population sizes, recombination rates, and mutation rates strongly supports the view that eukaryotes have reduced genetic effective population sizes relative to prokaryotes, with especially extreme reductions being the rule in multicellular lineages. The resultant increase in the power of random genetic drift appears to be sufficient to overwhelm the weak mutational disadvantages associated with most novel aspects of the eukaryotic gene, supporting the idea that most such changes are simple outcomes of semi-neutral processes rather than direct products of natural selection. However, by establishing an essentially permanent change in the population-genetic environment permissive to the genome-wide repatterning of gene structure, the eukaryotic condition also promoted a reliable resource from which natural selection could secondarily build novel forms of organismal complexity. Under this hypothesis, arguments based on molecular, cellular, and/or physiological constraints are insufficient to explain the disparities in gene, genomic, and phenotypic complexity between prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Michael Lynch
- Department of Biology, Indiana University, Bloomington, USA.
| |
Collapse
|
15
|
Maier SA, Galellis JR, McDermid HE. Phylogenetic analysis reveals a novel protein family closely related to adenosine deaminase. J Mol Evol 2005; 61:776-94. [PMID: 16245011 DOI: 10.1007/s00239-005-0046-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Accepted: 06/16/2005] [Indexed: 11/30/2022]
Abstract
Adenosine deaminase (ADA) is a well-characterized enzyme involved in the depletion of adenosine levels. A group of proteins with similarity to ADA, the adenosine deaminase-related growth factors (ADGF; known as CECR1 in vertebrates), has been described recently in various organisms. We have determined the phylogenetic relationships of various gene products with significant amino acid similarity to ADA using parsimony and Bayesian methods, and discovered a novel paralogue, termed ADA-like (ADAL). The ADGF proteins share a novel amino acid motif, "MPKG," within which the proline and lysine residues are also conserved in the ADAL and ADA subfamilies. The significance of this new domain is unknown, but it is located just upstream of two ADA catalytic residues, of which all eight are conserved among the ADGF and ADAL proteins. This conservation suggests that ADGF and ADAL may share the same catalytic function as ADA, which has been proven for some ADGF members. These analyses also revealed that some genes previously thought to be classic ADAs are instead ADAL or ADGFs. We here define the ADGF, ADAL, ADA, adenine deaminase (ADE), and AMP deaminase (AMPD) groups as subfamilies of the adenyl-deaminase family. The availability of genomic data for the members of this family allowed us to reconstruct the intron evolution within the phylogeny and strengthen the introns-late hypothesis of the synthetic introns theory. This study shows that ADA activity is clearly more complex than once thought, perhaps involving a delicately balanced pattern of temporal and spatial expression of a number of paralogous proteins.
Collapse
Affiliation(s)
- Stephanie A Maier
- Department of Biological Sciences, University of Alberta, G508 Biological Sciences Building,, Edmonton, Alberta, T6G 2E9, Canada
| | | | | |
Collapse
|
16
|
Vibranovski MD, Sakabe NJ, de Oliveira RS, de Souza SJ. Signs of ancient and modern exon-shuffling are correlated to the distribution of ancient and modern domains along proteins. J Mol Evol 2005; 61:341-50. [PMID: 16034650 DOI: 10.1007/s00239-004-0318-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2004] [Accepted: 03/11/2005] [Indexed: 11/24/2022]
Abstract
Exon-shuffling is an important mechanism accounting for the origin of many new proteins in eukaryotes. However, its role in the creation of proteins in the ancestor of prokaryotes and eukaryotes is still debatable. Excess of symmetric exons is thought to represent evidence for exon-shuffling since the exchange of exons flanked by introns of the same phase does not disrupt the reading frame of the host gene. In this report, we found that there is a significant correlation between symmetric units of shuffling and the age of protein domains. Ancient domains, present in both prokaryotes and eukaryotes, are more frequently bounded by phase 0 introns and their distribution is biased towards the central part of proteins. Modern domains are more frequently bounded by phase 1 introns and are present predominantly at the ends of proteins. We propose a model in which shuffling of ancient domains mainly flanked by phase 0 introns was important in the ancestor of eukaryotes and prokaryotes, during the creation of the central part of proteins. Shuffling of modern domains, predominantly flanked by phase 1 introns, accounted for the origin of the extremities of proteins during eukaryotic evolution.
Collapse
|
17
|
Ruvinsky A, Eskesen ST, Eskesen FN, Hurst LD. Can codon usage bias explain intron phase distributions and exon symmetry? J Mol Evol 2005; 60:99-104. [PMID: 15696372 DOI: 10.1007/s00239-004-0032-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Accepted: 08/31/2004] [Indexed: 10/25/2022]
Abstract
More introns exist between codons (phase 0) than between the first and the second bases (phase 1) or between the second and the third base (phase 2) within the codon. Many explanations have been suggested for this excess of phase 0. It has, for example, been argued to reflect an ancient utility for introns in separating exons that code for separate protein modules. There may, however, be a simple, alternative explanation. Introns typically require, for correct splicing, particular nucleotides immediately 5' in exons (typically a G) and immediately 3' in the following exon (also often a G). Introns therefore tend to be found between particular nucleotide pairs (e.g., G|G pairs) in the coding sequence. If, owing to bias in usage of different codons, these pairs are especially common at phase 0, then intron phase biases may have a trivial explanation. Here we take codon usage frequencies for a variety of eukaryotes and use these to generate random sequences. We then ask about the phase of putative intron insertion sites. Importantly, in all simulated data sets intron phase distribution is biased in favor of phase 0. In many cases the bias is of the magnitude observed in real data and can be attributed to codon usage bias. It is also known that exons may carry either the same phase (symmetric) or different phases (asymmetric) at the opposite ends. We simulated a distribution of different types of exons using frequencies of introns observed in real genes assuming random combination of intron phases at the opposite sides of exons. Surprisingly the simulated pattern was quite similar to that observed. In the simulants we typically observe a prevalence of symmetric exons carrying phase 0 at both ends, which is common for eukaryotic genes. However, at least in some species, the extent of the bias in favor of symmetric (0,0) exons is not as great in simulants as in real genes. These results emphasize the need to construct a biologically relevant null model of successful intron insertion.
Collapse
Affiliation(s)
- A Ruvinsky
- Institute for Genetics and Bioinformatics, University of New England, Armidale 2351, NSW, Australia.
| | | | | | | |
Collapse
|
18
|
Simon DM, Hummel CL, Sheeley SL, Bhattacharya D. Heterogeneity of intron presence or absence in rDNA genes of the lichen species Physcia aipolia and P. stellaris. Curr Genet 2005; 47:389-99. [PMID: 15868149 DOI: 10.1007/s00294-005-0581-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2005] [Revised: 03/22/2005] [Accepted: 03/29/2005] [Indexed: 10/25/2022]
Abstract
Intron origin and evolution are of high interest, yet the rates of insertion and loss are unclear. To investigate their spread, we studied ribosomal (r)DNA introns from the closely related lichens Physcia aipolia and P. stellaris. Both taxa are replete with rDNA spliceosomal introns and autocatalytic group I introns, many of which show presence/absence polymorphism when screened with the PCR approach. This initially suggested that Physcia could be a model for studying intron retention and loss. However, during the course of a population-level analysis, we discovered widespread intron presence/absence heterogeneity within lichen thalli. To address this result, we sequenced multiple clones encoding nuclear rDNA and the single-copy elongation factor-1alpha (EF-1alpha) from individual thalli. These data showed extensive rDNA heterogeneity within individuals, rather than the presence of multiple fungi within a thallus. Our results suggest that considerable care must be taken when interpreting intron presence/absence in lichen rDNA, an observation that has general implications for the study of rDNA intron evolution.
Collapse
Affiliation(s)
- Dawn M Simon
- Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa, 312 Biology Building, Iowa City, IA 52242-1324, USA
| | | | | | | |
Collapse
|
19
|
Wang C, Typas MA, Butt TM. Phylogenetic and exon-intron structure analysis of fungal subtilisins: support for a mixed model of intron evolution. J Mol Evol 2005; 60:238-46. [PMID: 15785852 DOI: 10.1007/s00239-004-0147-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2004] [Accepted: 07/18/2004] [Indexed: 10/25/2022]
Abstract
Phylogenetic and exon-intron structure analyses of intra- and interspecific fungal subtilisins in this study provided support for a mixed model of intron evolution: a synthetic theory of introns-early and introns-late speculations. Intraspecifically, there were three phase zero introns in Pr1A and its introns 1 and 2 located at the highly conserved positions were phylogentically congruent with coding region, which is in favor of the view of introns-early speculation, while intron 3 had two different sizes and was evolutionarily incongruent with coding region, the evidence for introns-late speculation. Noticeably, the subtilisin Pr1J gene from different strains of M. ansiopliae contained different number of introns, the strong evidence in support of introns-late theory. Interspecifically, phylogenetic analysis of 60 retrievable fungal subtilisins provided a clear relationship between amino acid sequence and gene exon-intron structure that the homogeneous sequences usually have a similar exon-infron structure. There were 10 intron positions inserted by highly biased phase zero introns across examined fungal subtilisin genes, half of these positions were highly conserved, while the others were species-specific, appearing to be of recent origins due to intron insertion, in favor of the introns-late theory. High conservations of positions 1 and 2 inserted by the high percentage of phase zero introns as well as the evidence of phylogenetic congruence between the evolutionary histories of intron sequences and coding region suggested that the introns at these two positions were primordial.
Collapse
Affiliation(s)
- Chengshu Wang
- School of Biological Sciences, University of Wales Swansea, Swansea SA2 8PP, UK.
| | | | | |
Collapse
|
20
|
Garte S. Fractal properties of the human genome. J Theor Biol 2004; 230:251-60. [PMID: 15302556 DOI: 10.1016/j.jtbi.2004.05.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2003] [Revised: 05/14/2004] [Accepted: 05/18/2004] [Indexed: 11/20/2022]
Abstract
The fractal dimension of the human chromosomes and four other genomes were determined using the box counting method. Human chromosomes exhibited a fractal dimension (D) of about 0.8, while values for a bacteria, yeast, worm and plant were higher. Analysis of three human chromosomes over five orders of magnitude of scale (from 10(8) to 10(4) bp), showed D to be non-constant at the smaller scales, when introns were included as gaps. The relationship between D and gene density fit an empirical equation related to that expected from theory, and allowed for the calculation of the fractal initiator or self-similarity ratio. This value (0.57) was constant at all scales for human chromosomes, and was similar for other species, except for Arabidopsis.
Collapse
Affiliation(s)
- Seymour Garte
- School of Public Health, UMDNJ, New Brunswick, NJ 08903, USA.
| |
Collapse
|
21
|
Cho S, Jin SW, Cohen A, Ellis RE. A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 2004; 14:1207-20. [PMID: 15231741 PMCID: PMC442136 DOI: 10.1101/gr.2639304] [Citation(s) in RCA: 142] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Since introns were discovered 26 years ago, people have wondered how changes in intron/exon structure occur, and what role these changes play in evolution. To answer these questions, we have begun studying gene structure in nematodes related to Caenorhabditis elegans. As a first step, we cloned a set of five genes from six different Caenorhabditis species, and used their amino acid sequences to construct the first detailed phylogeny of this genus. Our data indicate that nematode introns are lost at a very high rate during evolution, almost 400-fold higher than in mammals. These losses do not occur randomly, but instead, favor some introns and do not affect others. In contrast, intron gains are far less common than losses in these genes. On the basis of the sequences at each intron site, we suggest that several distinct mechanisms can cause introns to be lost. The small size of C. elegans introns should increase the rate at which each of these types of loss can occur, and might account for the dramatic difference in loss rate between nematodes and mammals.
Collapse
Affiliation(s)
- Soochin Cho
- Department of Molecular, Cellular and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48864, USA
| | | | | | | |
Collapse
|
22
|
Kanzok SM, Hoa NT, Bonizzoni M, Luna C, Huang Y, Malacrida AR, Zheng L. Origin of Toll-like receptor-mediated innate immunity. J Mol Evol 2004; 58:442-8. [PMID: 15114422 DOI: 10.1007/s00239-003-2565-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2003] [Accepted: 10/29/2003] [Indexed: 10/26/2022]
Abstract
Toll-related receptors (TLR) have been found in four animal phyla: Nematoda, Arthropoda, Echinodermata, and Chordata. No TLR has been identified thus far in acoelomates. TLR genes play a pivotal role in the innate immunity in both fruit fly and mammals. The prevailing view is that TLR-mediated immunity is ancient. The two pseudocoelomate TLRs, one each from Caenorhabditis elegans and Strongyloides stercoralis, were distinct from the coelomate ones. Further, the only TLR gene (Tol-1) in Ca. elegans did not appear to play a role in innate immunity. We argue that TLR-mediated innate immunity developed only in the coelomates, after they split from pseudocoelomates and acoelomates. We hypothesize that the function of TLR-mediated immunity is to prevent microbial infection in the body cavity present only in the coelomates. Phylogenetic analysis showed that almost all arthropod TLRs form a separate cluster from the mammalian counterparts. We further hypothesize that TLR-mediated immunity developed independently in the protostomia and deuterostomia coelomates.
Collapse
Affiliation(s)
- Stefan M Kanzok
- Yale University School of Medicine, Epidemiology and Public Health, 60 College Street, New Haven, CT 06520, USA
| | | | | | | | | | | | | |
Collapse
|
23
|
Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Evidence of splice signal migration from exon to intron during intron evolution. Curr Biol 2004; 13:2170-4. [PMID: 14680632 DOI: 10.1016/j.cub.2003.12.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
A comparison of the nucleotide sequences around the splice junctions that flank old (shared by two or more major lineages of eukaryotes) and new (lineage-specific) introns in eukaryotic genes reveals substantial differences in the distribution of information between introns and exons. Old introns have a lower information content in the exon regions adjacent to the splice sites than new introns but have a corresponding higher information content in the intron itself. This suggests that introns insert into nonrandom (proto-splice) sites but, during the evolution of an intron after insertion, the splice signal shifts from the flanking exon regions to the ends of the intron itself. Accumulation of information inside the intron during evolution suggests that new introns largely emerge de novo rather than through propagation and migration of old introns.
Collapse
Affiliation(s)
- Alexander V Sverdlov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
24
|
Andersen JN, Jansen PG, Echwald SM, Mortensen OH, Fukada T, Del Vecchio R, Tonks NK, Møller NPH. A genomic perspective on protein tyrosine phosphatases: gene structure, pseudogenes, and genetic disease linkage. FASEB J 2004; 18:8-30. [PMID: 14718383 DOI: 10.1096/fj.02-1212rev] [Citation(s) in RCA: 223] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The protein tyrosine phosphatases (PTPs) are now recognized as critical regulators of signal transduction under normal and pathophysiological conditions. In this analysis we have explored the sequence of the human genome to define the composition of the PTP family. Using public and proprietary sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing and provide predicted amino acid sequences for four human PTPs that are currently defined by fragments only. Finally, we correlated each PTP locus with genetic disease markers and identified 4 PTPs that map to known susceptibility loci for type 2 diabetes and 19 PTPs that map to regions frequently deleted in human cancers. We have made our analysis available at http://ptp.cshl.edu or http://science.novonordisk.com/ptp and we hope this resource will facilitate the functional characterization of these key enzymes.
Collapse
Affiliation(s)
- Jannik N Andersen
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724-2208, USA
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Contreras-Moreira B, Jonsson PF, Bates PA. Structural context of exons in protein domains: implications for protein modelling and design. J Mol Biol 2003; 333:1045-59. [PMID: 14583198 DOI: 10.1016/j.jmb.2003.09.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Intron boundaries were extracted from genomic data and mapped onto single-domain human and murine protein structures taken from the Protein Data Bank. A first analysis of this set of proteins shows that intron boundaries prefer to be in non-regular secondary structure elements, while avoiding alpha-helices and beta-strands. This fact alone suggests an evolutionary model in which introns are constrained by protein structure, particularly by tertiary structure contacts. In addition, in silico recombination experiments of a subset of these proteins together with their homologues, including those in different species, show that introns have a tendency to occur away from artificial crossover hot spots. Altogether, these findings support a model in which genes can preferentially harbour introns in less constrained regions of the protein fold they code for. In the light of these findings, we discuss some implications for protein modelling and design.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Biomolecular Modelling Laboratory, Cancer Research UK, London Research Institute, Lincoln's Inn Fields Laboratories, 44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | | | | |
Collapse
|
26
|
Abstract
For nearly 15 years, it has been widely believed that many introns were recently acquired by the genes of multicellular organisms. However, the mechanism of acquisition has yet to be described for a single animal intron. Here, we report a large-scale computational analysis of the human, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana genomes. We divided 147,796 human intron sequences into batches of similar lengths and aligned them with each other. Different types of homologies between introns were found, but none showed evidence of simple intron transposition. Also, 106,902 plant, 39,624 Drosophila, and 6021 C. elegans introns were examined. No single case of homologous introns in nonhomologous genes was detected. Thus, we found no example of transposition of introns in the last 50 million years in humans, in 3 million years in Drosophila and C. elegans, or in 5 million years in Arabidopsis. Either new introns do not arise via transposition of other introns or intron transposition must have occurred so early in evolution that all traces of homology have been lost.
Collapse
Affiliation(s)
- Alexei Fedorov
- Department of Medicine, Medical College of Ohio, Toledo, Ohio 43614, USA.
| | | | | | | |
Collapse
|
27
|
Fedorov A, Roy S, Cao X, Gilbert W. Phylogenetically older introns strongly correlate with module boundaries in ancient proteins. Genome Res 2003; 13:1155-7. [PMID: 12743017 PMCID: PMC403643 DOI: 10.1101/gr.1008203] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The hypothesis that some (but not all) introns were used to construct ancient genes by exon shuffling of modules at the earliest stages of evolution is supported by the finding of an excess of phase-zero intron positions in the boundary regions of such modules in 276 ancient proteins (defined as common to eukaryotes and prokaryotes). Here we show further that as phase-zero intron positions are shared by distant taxa, and thus are truly phylogenetically ancient, their excess in the boundaries becomes greater, rising to an 80% excess if shared by four out of the five taxa: vertebrates, invertebrates, fungi, plants, and protists.
Collapse
Affiliation(s)
- Alexei Fedorov
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|
28
|
Fedorov A, Merican AF, Gilbert W. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci U S A 2002; 99:16128-33. [PMID: 12444254 PMCID: PMC138576 DOI: 10.1073/pnas.242624899] [Citation(s) in RCA: 142] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We purge large databases of animal, plant, and fungal intron-containing genes to a 20% similarity level and then identify the most similar animal-plant, animal-fungal, and plant-fungal protein pairs. We identify the introns in each BLAST 2.0 alignment and score matched intron positions and slid (near-matched, within six nucleotides) intron positions automatically. Overall we find that 10% of the animal introns match plant positions, and a further 7% are "slides." Fifteen percent of fungal introns match animal positions, and 13% match plant positions. Furthermore, the number of alignments with high numbers of matches deviates greatly from the Poisson expectation. The 30 animal-plant alignments with the highest matches (for which 44% of animal introns match plant positions) when aligned with fungal genes are also highly enriched for triple matches: 39% of the fungal introns match both animal and plant positions. This is strong evidence for ancestral introns predating the animal-plant-fungal divergence, and in complete opposition to any expectations based on random insertion. In examining the slid introns, we show that at least half are caused by imperfections in the alignments, and are most likely to be actual matches at common positions. Thus, our final estimates are that approximately equal 14% of animal introns match plant positions, and that approximately equal 17-18% of fungal introns match animal or plant positions, all of these being likely to be ancestral in the eukaryotes.
Collapse
Affiliation(s)
- Alexei Fedorov
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
29
|
Roy SW, Fedorov A, Gilbert W. The signal of ancient introns is obscured by intron density and homolog number. Proc Natl Acad Sci U S A 2002; 99:15513-7. [PMID: 12432089 PMCID: PMC137748 DOI: 10.1073/pnas.242600199] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In ancient genes whose products have known 3-dimensional structures, an excess of phase zero introns (those that lie between the codons) appear in the boundaries of modules, compact regions of the polypeptide chain. These excesses are highly significant and could support the hypothesis that ancient genes were assembled by exon shuffling involving compact modules. (Phase one and two introns, and many phase zero introns, appear to arise later.) However, as more genes, with larger numbers of homologs and intron positions, were examined, the effects became smaller, dropping from a 40% excess to an 8% excess as the number of intron positions increased from 570 to 3,328, even though the statistical significance remained strong. An interpretation of this behavior is that novel inserted positions appearing in homologs washed out the signal from a finite number of ancient positions. Here we show that this is likely to be the case. Analyses of intron positions restricted to those in genes for which relatively few intron positions from homologs are known, or to those in genes with a small number of known homologous gene structures, show a significant correlation of phase zero intron positions with the module structure, which weakens as the density of attributed intron positions or the number of homologs increases. These effects do not appear for phase one and phase two introns. This finding matches the expectation of the mixed model of intron origin, in which a fraction of phase zero introns are left from the assembly of the first genes, while other introns have been added in the course of evolution.
Collapse
Affiliation(s)
- Scott William Roy
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
30
|
Altenhein B, Markl J, Lieb B. Gene structure and hemocyanin isoform HtH2 from the mollusc Haliotis tuberculata indicate early and late intron hot spots. Gene 2002; 301:53-60. [PMID: 12490323 DOI: 10.1016/s0378-1119(02)01081-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We have cloned and sequenced cDNAs coding for the complete primary structure of HtH2, the second hemocyanin isoform of the marine gastropod Haliotis tuberculata. The deduced protein sequence comprises 3399 amino acids, corresponding to a molecular mass of 392 kDa. It shares only 66% of structural identity with the previously analysed first isoform HtH1, and according to a molecular clock, the two isoforms of Haliotis hemocyanin separated ca. 320 million years ago. By genomic polymerase chain reaction and 5' race, we have also sequenced the complete gene of HtH2 (18,598 bp), except of the 5' region in front of the secreted protein. It encompasses 15 exons and 14 introns and shows several microsatellite-rich regions. It mirrors the modular structure of the encoded hemocyanin subunit, with a linear arrangement of eight different functional units separated and bordered by seven phase 1 'linker introns'. In addition, within regions encoding three of the functional units, the HtH2 gene contains six 'internal introns'. Comparison to previously sequenced genes of Octopus dofleini hemocyanin and Haliotis hemocyanin isoform (HtH1) suggests Precambrian and Palaeocoic hot spot of intron gains, followed by 320 million years of absolute stasis.
Collapse
Affiliation(s)
- Benjamin Altenhein
- Institute of Zoology, Johannes Gutenberg University, 55099, Mainz, Germany
| | | | | |
Collapse
|
31
|
Kaessmann H, Zöllner S, Nekrutenko A, Li WH. Signatures of domain shuffling in the human genome. Genome Res 2002; 12:1642-50. [PMID: 12421750 PMCID: PMC187552 DOI: 10.1101/gr.520702] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
To elucidate the role of exon shuffling in shaping the complexity of the human genome/proteome, we have systematically analyzed intron phase distributions in the coding sequence of human protein domains. We found that introns at the boundaries of domains show high excess of symmetrical phase combinations (i.e., 0-0, 1-1, and 2-2), whereas nonboundary introns show no excess symmetry. This suggests that exon shuffling has primarily involved rearrangement of structural and functional domains as a whole. Furthermore, we found that domains flanked by phase 1 introns have dramatically expanded in the human genome due to domain shuffling and that 1-1 symmetrical domains and domain families are nonrandomly distributed with respect to their age. The predominance and extracellular location of 1-1 symmetrical domains among domains specific to metazoans suggests that they are associated with the rise of multicellularity. On the other hand, 0-0 symmetrical domains tend to be over-represented among ancient protein domains that are shared between the eukaryotic and prokaryotic kingdoms, which is compatible with the suggestion of primordial domain shuffling in the progenote. To see whether the human data reflect general genomic patterns of metazoans, similar analyses were done for the nematode Caenorhabditis elegans. Although the C. elegans data generally concur with the human patterns, we identified fewer intron-bounded domains in this organism, consistent with the lower complexity of C. elegans genes. [The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: Z. Gu and R. Stevens.]
Collapse
Affiliation(s)
- Henrik Kaessmann
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | |
Collapse
|
32
|
Endo T, Fedorov A, de Souza SJ, Gilbert W. Do introns favor or avoid regions of amino acid conservation? Mol Biol Evol 2002; 19:521-252. [PMID: 11919293 DOI: 10.1093/oxfordjournals.molbev.a004107] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Are intron positions correlated with regions of high amino acid conservation? For a set of ancient conserved proteins, with intronless prokaryotic but intron-containing eukaryotic homologs, multiple sequence alignments identified residues invariant throughout evolution. Intron positions between codons show no preferences. However, introns lying after the first base of a codon prefer conserved regions, markedly in glycines. Because glycines are in excess in conserved regions, this behavior could reflect phase-one introns entering glycine residues randomly in the ancestral sequences. Examination of intron positions within codons of evolutionarily invariable amino acids showed that roughly 50% of these introns are bordered by guanines at both 5'- and 3'-ends, 25% have a G only before the intron, and 5% have a G only after the intron, whereas about 20% are bordered by nonguanine bases.
Collapse
Affiliation(s)
- Toshinori Endo
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|