1
|
Fukuchi S, Noguchi T, Anbo H, Homma K. Exon Elongation Added Intrinsically Disordered Regions to the Encoded Proteins and Facilitated the Emergence of the Last Eukaryotic Common Ancestor. Mol Biol Evol 2022; 40:6931801. [PMID: 36529689 PMCID: PMC9825244 DOI: 10.1093/molbev/msac272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/06/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Most prokaryotic proteins consist of a single structural domain (SD) with little intrinsically disordered regions (IDRs) that by themselves do not adopt stable structures, whereas the typical eukaryotic protein comprises multiple SDs and IDRs. How eukaryotic proteins evolved to differ from prokaryotic proteins has not been fully elucidated. Here, we found that the longer the internal exons are, the more frequently they encode IDRs in eight eukaryotes including vertebrates, invertebrates, a fungus, and plants. Based on this observation, we propose the "small bang" model from the proteomic viewpoint: the protoeukaryotic genes had no introns and mostly encoded one SD each, but a majority of them were subsequently divided into multiple exons (step 1). Many exons unconstrained by SDs elongated to encode IDRs (step 2). The elongated exons encoding IDRs frequently facilitated the acquisition of multiple SDs to make the last common ancestor of eukaryotes (step 3). One prediction of the model is that long internal exons are mostly unconstrained exons. Analytical results of the eight eukaryotes are consistent with this prediction. In support of the model, we identified cases of internal exons that elongated after the rat-mouse divergence and discovered that the expanded sections are mostly in unconstrained exons and preferentially encode IDRs. The model also predicts that SDs followed by long internal exons tend to have other SDs downstream. This prediction was also verified in all the eukaryotic species analyzed. Our model accounts for the dichotomy between prokaryotic and eukaryotic proteins and proposes a selective advantage conferred by IDRs.
Collapse
Affiliation(s)
- Satoshi Fukuchi
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, Kiyose, Tokyo, Japan
| | - Hiroto Anbo
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | | |
Collapse
|
2
|
Cui X, Stolzer M, Durand D. Evidence for exon shuffling is sensitive to model choice. J Bioinform Comput Biol 2021; 19:2140013. [PMID: 34806953 DOI: 10.1142/s0219720021400138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The exon shuffling theory posits that intronic recombination creates new domain combinations, facilitating the evolution of novel protein function. This theory predicts that introns will be preferentially situated near domain boundaries. Many studies have sought evidence for exon shuffling by testing the correspondence between introns and domain boundaries against chance intron positioning. Here, we present an empirical investigation of how the choice of null model influences significance. Although genome-wide studies have used a uniform null model, exclusively, more realistic null models have been proposed for single gene studies. We extended these models for genome-wide analyses and applied them to 21 metazoan and fungal genomes. Our results show that compared with the other two models, the uniform model does not recapitulate genuine exon lengths, dramatically underestimates the probability of chance agreement, and overestimates the significance of intron-domain correspondence by as much as 100 orders of magnitude. Model choice had much greater impact on the assessment of exon shuffling in fungal genomes than in metazoa, leading to different evolutionary conclusions in seven of the 16 fungal genomes tested. Genome-wide studies that use this overly permissive null model may exaggerate the importance of exon shuffling as a general mechanism of multidomain evolution.
Collapse
Affiliation(s)
- Xiaoyue Cui
- Department of Computational Biology, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Dannie Durand
- Departments of Biological Sciences and Computational Biology, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
3
|
The TIM Barrel Architecture Facilitated the Early Evolution of Protein-Mediated Metabolism. J Mol Evol 2016; 82:17-26. [PMID: 26733481 PMCID: PMC4709378 DOI: 10.1007/s00239-015-9722-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 11/11/2015] [Indexed: 12/30/2022]
Abstract
The triosephosphate isomerase (TIM) barrel protein fold is a structurally repetitive architecture that is present in approximately 10 % of all enzymes. It is generally assumed that this ubiquity in modern proteomes reflects an essential historical role in early protein-mediated metabolism. Here, we provide quantitative and comparative analyses to support several hypotheses about the early importance of the TIM barrel architecture. An information theoretical analysis of protein structures supports the hypothesis that the TIM barrel architecture could arise more easily by duplication and recombination compared to other mixed α/β structures. We show that TIM barrel enzymes corresponding to the most taxonomically broad superfamilies also have the broadest range of functions, often aided by metal and nucleotide-derived cofactors that are thought to reflect an earlier stage of metabolic evolution. By comparison to other putatively ancient protein architectures, we find that the functional diversity of TIM barrel proteins cannot be explained simply by their antiquity. Instead, the breadth of TIM barrel functions can be explained, in part, by the incorporation of a broad range of cofactors, a trend that does not appear to be shared by proteins in general. These results support the hypothesis that the simple and functionally general TIM barrel architecture may have arisen early in the evolution of protein biosynthesis and provided an ideal scaffold to facilitate the metabolic transition from ribozymes, peptides, and geochemical catalysts to modern protein enzymes.
Collapse
|
4
|
Wang L, Stein LD. Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process. BMC Evol Biol 2013; 13:57. [PMID: 23448166 PMCID: PMC3732091 DOI: 10.1186/1471-2148-13-57] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 02/22/2013] [Indexed: 12/02/2022] Open
Abstract
Background Most eukaryotic genes are interrupted by spliceosomal introns. The evolution of exon-intron structure remains mysterious despite rapid advance in genome sequencing technique. In this work, a novel approach is taken based on the assumptions that the evolution of exon-intron structure is a stochastic process, and that the characteristics of this process can be understood by examining its historical outcome, the present-day size distribution of internal translated exons (exon). Through the combination of simulation and modeling the size distribution of exons in different species, we propose a general random fragmentation process (GRFP) to characterize the evolution dynamics of exon-intron structure. This model accurately predicts the probability that an exon will be split by a new intron and the distribution of novel insertions along the length of the exon. Results As the first observation from this model, we show that the chance for an exon to obtain an intron is proportional to its size to the 3rd power. We also show that such size dependence is nearly constant across gene, with the exception of the exons adjacent to the 5′ UTR. As the second conclusion from the model, we show that intron insertion loci follow a normal distribution with a mean of 0.5 (center of the exon) and a standard deviation of 0.11. Finally, we show that intron insertions within a gene are independent of each other for vertebrates, but are more negatively correlated for non-vertebrate. We use simulation to demonstrate that the negative correlation might result from significant intron loss during evolution, which could be explained by selection against multi-intron genes in these organisms. Conclusions The GRFP model suggests that intron gain is dynamic with a higher chance for longer exons; introns are inserted into exons randomly with the highest probability at the center of the exon. GRFP estimates that there are 78 introns in every 10 kb coding sequences for vertebrate genomes, agreeing with empirical observations. GRFP also estimates that there are significant intron losses in the evolution of non-vertebrate genomes, with extreme cases of around 57% intron loss in Drosophila melanogaster, 28% in Caenorhabditis elegans, and 24% in Oryza sativa.
Collapse
Affiliation(s)
- Liya Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | | |
Collapse
|
5
|
Koonin EV, Csuros M, Rogozin IB. Whence genes in pieces: reconstruction of the exon-intron gene structures of the last eukaryotic common ancestor and other ancestral eukaryotes. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 4:93-105. [PMID: 23139082 DOI: 10.1002/wrna.1143] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In eukaryotes, protein-coding sequences are interrupted by non-coding sequences known as introns. During mRNA maturation, introns are excised by the spliceosome and the coding regions, exons, are spliced to form the mature coding region. The intron densities widely differ between eukaryotic lineages, from 6 to 7 introns per kb of coding sequence in vertebrates, some invertebrates and green plants, to only a few introns across the entire genome in many unicellular eukaryotes. Evolutionary reconstructions using maximum likelihood methods suggest intron-rich ancestors for each major group of eukaryotes. For the last common ancestor of animals, the highest intron density of all extant and extinct eukaryotes was inferred, at 120-130% of the human intron density. Furthermore, an intron density within 53-74% of the human values was inferred for the last eukaryotic common ancestor. Accordingly, evolution of eukaryotic genes in all lines of descent involved primarily intron loss, with substantial gain only at the bases of several branches including plants and animals. These conclusions have substantial biological implications indicating that the common ancestor of all modern eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. Alternative splicing most likely initially appeared as an inevitable result of splicing errors and only later was employed to generate structural and functional diversification of proteins.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information NLM/NIH, Bethesda, MD, USA.
| | | | | |
Collapse
|
6
|
DsHsp90 is involved in the early response of Dunaliella salina to environmental stress. Int J Mol Sci 2012; 13:7963-7979. [PMID: 22942684 PMCID: PMC3430215 DOI: 10.3390/ijms13077963] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 06/20/2012] [Accepted: 06/20/2012] [Indexed: 11/17/2022] Open
Abstract
Heat shock protein 90 (Hsp90) is a molecular chaperone highly conserved across the species from prokaryotes to eukaryotes. Hsp90 is essential for cell viability under all growth conditions and is proposed to act as a hub of the signaling network and protein homeostasis of the eukaryotic cells. By interacting with various client proteins, Hsp90 is involved in diverse physiological processes such as signal transduction, cell mobility, heat shock response and osmotic stress response. In this research, we cloned the dshsp90 gene encoding a polypeptide composed of 696 amino acids from the halotolerant unicellular green algae Dunaliella salina. Sequence alignment indicated that DsHsp90 belonged to the cytosolic Hsp90A family. Further biophysical and biochemical studies of the recombinant protein revealed that DsHsp90 possessed ATPase activity and existed as a dimer with similar percentages of secondary structures to those well-studied Hsp90As. Analysis of the nucleotide sequence of the cloned genomic DNA fragment indicated that dshsp90 contained 21 exons interrupted by 20 introns, which is much more complicated than the other plant hsp90 genes. The promoter region of dshsp90 contained putative cis-acting stress responsive elements and binding sites of transcriptional factors that respond to heat shock and salt stress. Further experimental research confirmed that dshsp90 was upregulated quickly by heat and salt shock in the D. salina cells. These findings suggested that dshsp90 might serve as a component of the early response system of the D. salina cells against environmental stresses.
Collapse
|
7
|
Rogozin IB, Carmel L, Csuros M, Koonin EV. Origin and evolution of spliceosomal introns. Biol Direct 2012; 7:11. [PMID: 22507701 PMCID: PMC3488318 DOI: 10.1186/1745-6150-7-11] [Citation(s) in RCA: 224] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/15/2012] [Indexed: 12/31/2022] Open
Abstract
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information NLM/NIH, 8600 Rockville Pike, Bldg, 38A, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
8
|
Abstract
Some genes in the candidate early-branching eukaryote Giardia lamblia occur in separate pieces, transcribed from non-contiguous chromosomal locations. The pre-mRNAs from the separate pieces apparently find each other by regions of complementarity and are subsequently spliced together by the spliceosome. Could genes in pieces, transcribed into separate pre-mRNAs, have been an early feature of spliceosomal evolution?
Collapse
Affiliation(s)
- Thomas Blumenthal
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309, USA.
| |
Collapse
|
9
|
Ahmadinejad N, Dagan T, Gruenheit N, Martin W, Gabaldón T. Evolution of spliceosomal introns following endosymbiotic gene transfer. BMC Evol Biol 2010; 10:57. [PMID: 20178587 PMCID: PMC2834692 DOI: 10.1186/1471-2148-10-57] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 02/23/2010] [Indexed: 12/03/2022] Open
Abstract
Background Spliceosomal introns are an ancient, widespread hallmark of eukaryotic genomes. Despite much research, many questions regarding the origin and evolution of spliceosomal introns remain unsolved, partly due to the difficulty of inferring ancestral gene structures. We circumvent this problem by using genes originated by endosymbiotic gene transfer, in which an intron-less structure at the time of the transfer can be assumed. Results By comparing the exon-intron structures of 64 mitochondrial-derived genes that were transferred to the nucleus at different evolutionary periods, we can trace the history of intron gains in different eukaryotic lineages. Our results show that the intron density of genes transferred relatively recently to the nuclear genome is similar to that of genes originated by more ancient transfers, indicating that gene structure can be rapidly shaped by intron gain after the integration of the gene into the genome and that this process is mainly determined by forces acting specifically on each lineage. We analyze 12 cases of mitochondrial-derived genes that have been transferred to the nucleus independently in more than one lineage. Conclusions Remarkably, the proportion of shared intron positions that were gained independently in homologous genes is similar to that proportion observed in genes that were transferred prior to the speciation event and whose shared intron positions might be due to vertical inheritance. A particular case of parallel intron gain in the nad7 gene is discussed in more detail.
Collapse
Affiliation(s)
- Nahal Ahmadinejad
- Institut für Botanik III, Heinrich-Heine Universität Düsseldorf, Düsseldorf, Germany
| | | | | | | | | |
Collapse
|
10
|
Abstract
Evolutionary reconstructions using maximum likelihood methods point to unexpectedly high densities of introns in protein-coding genes of ancestral eukaryotic forms including the last common ancestor of all extant eukaryotes. Combined with the evidence of the origin of spliceosomal introns from invading Group II self-splicing introns, these results suggest that early ancestral eukaryotic genomes consisted of up to 80% sequences derived from Group II introns, a much greater contribution of introns than that seen in any extant genome. An organism with such an unusual genome architecture could survive only under conditions of a severe population bottleneck.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
11
|
Basu MK, Poliakov E, Rogozin IB. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 2009; 10:205-16. [PMID: 19151098 DOI: 10.1093/bib/bbn057] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or 'promiscuous'). These promiscuous domains are typically involved in protein-protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
Collapse
Affiliation(s)
- Malay Kumar Basu
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
12
|
Rothacker B, Werr M, Ilg T. Molecular cloning, partial genomic structure and functional characterization of succinic semialdehyde dehydrogenase genes from the parasitic insects Lucilia cuprina and Ctenocephalides felis. INSECT MOLECULAR BIOLOGY 2008; 17:279-291. [PMID: 18477242 DOI: 10.1111/j.1365-2583.2008.00800.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The enzyme succinic semialdehyde dehydrogenase (SSADH; EC1.2.1.24) is a component of the gamma-aminobutyric acid degradation pathway in mammals and is essential for development and function of the nervous system. Here we report the identification, cDNA cloning and functional expression of SSADH from the parasitic insects Lucilia cuprina and Ctenocephalides felis. The recombinant proteins possess potent NAD+-dependent SSADH activity, while their catalytic efficiency for other aldehyde substrates is lower. A genomic copy of the L. cuprina SSADH gene contains two introns, while a genomic gene version of C. felis is devoid of introns. In contrast to the single copy SSADH genes in Drosophila melanogaster and mammals, in L. cuprina and C. felis, multiple SSADH gene copies are present in the genome.
Collapse
Affiliation(s)
- B Rothacker
- Intervet Innovation GmbH, Zur Propstei, 55270 Schwabenheim, Germany
| | | | | |
Collapse
|
13
|
Sharpton TJ, Neafsey DE, Galagan JE, Taylor JW. Mechanisms of intron gain and loss in Cryptococcus. Genome Biol 2008; 9:R24. [PMID: 18234113 PMCID: PMC2395259 DOI: 10.1186/gb-2008-9-1-r24] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2007] [Revised: 11/29/2007] [Accepted: 01/30/2008] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Genome comparisons across deep phylogenetic divergences have revealed that spliceosomal intron gain and loss are common evolutionary events. However, because of the deep divergences involved in these comparisons, little is understood about how these changes occur, particularly in the case of intron gain. To ascertain mechanisms of intron gain and loss, we compared five relatively closely related genomes from the yeast Cryptococcus. RESULTS We observe a predominance of intron loss over gain and identify a relatively slow intron loss rate in Cryptococcus. Some genes preferentially lose introns and a large proportion of intron losses occur in the middle of genes (so called internal intron loss). Finally, we identify a gene that displays a differential number of introns in a repetitive DNA region. CONCLUSION Based the observed patterns of intron loss and gain, population resequencing and population genetic analysis, it appears that recombination causes the widely observed but poorly understood phenomenon of internal intron loss and that DNA repeat expansion can create new introns in a population.
Collapse
Affiliation(s)
- Thomas J Sharpton
- Department of Plant and Microbial Biology, University of California at Berkeley, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
14
|
de Roos ADG. Conserved intron positions in ancient protein modules. Biol Direct 2007; 2:7. [PMID: 17288589 PMCID: PMC1800838 DOI: 10.1186/1745-6150-2-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 02/08/2007] [Indexed: 12/31/2022] Open
Abstract
Background The timing of the origin of introns is of crucial importance for an understanding of early genome architecture. The Exon theory of genes proposed a role for introns in the formation of multi-exon proteins by exon shuffling and predicts the presence of conserved splice sites in ancient genes. In this study, large-scale analysis of potential conserved splice sites was performed using an intron-exon database (ExInt) derived from GenBank. Results A set of conserved intron positions was found by matching identical splice sites sequences from distantly-related eukaryotic kingdoms. Most amino acid sequences with conserved introns were homologous to consensus sequences of functional domains from conserved proteins including kinases, phosphatases, small GTPases, transporters and matrix proteins. These included ancient proteins that originated before the eukaryote-prokaryote split, for instance the catalytic domain of protein phosphatase 2A where a total of eleven conserved introns were found. Using an experimental setup in which the relation between a splice site and the ancientness of its surrounding sequence could be studied, it was found that the presence of an intron was positively correlated to the ancientness of its surrounding sequence. Intron phase conservation was linked to the conservation of the gene sequence and not to the splice site sequence itself. However, no apparent differences in phase distribution were found between introns in conserved versus non-conserved sequences. Conclusion The data confirm an origin of introns deep in the eukaryotic branch and is in concordance with the presence of introns in the first functional protein modules in an 'Exon theory of genes' scenario. A model is proposed in which shuffling of primordial short exonic sequences led to the formation of the first functional protein modules, in line with hypotheses that see the formation of introns integral to the origins of genome evolution. Reviewers This article was reviewed by Scott Roy (nominated by Anthony Poole), Sandro de Souza (nominated by Manyuan Long), and Gáspár Jékely.
Collapse
Affiliation(s)
- Albert D G de Roos
- Syncyte BioIntelligence, P.O. Box 600, 1000 AP, Amsterdam, The Netherlands.
| |
Collapse
|
15
|
Gudlaugsdottir S, Boswell DR, Wood GR, Ma J. Exon size distribution and the origin of introns. Genetica 2007; 131:299-306. [PMID: 17279432 DOI: 10.1007/s10709-007-9139-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2005] [Accepted: 01/06/2007] [Indexed: 11/29/2022]
Abstract
Since it was first recognised that eukaryotic genes are fragmented into coding segments (exons) separated by non-coding segments (introns), the reason for this phenomenon has been debated. There are two dominant theories: that the piecewise arrangement of genes allows functional protein domains, represented by exons, to recombine by shuffling to form novel proteins with combinations of functions; or that introns represent parasitic DNA that can infest the eukaryotic genome because it does not interfere grossly with the fitness of its host. Differing distributions of exon lengths are predicted by these two theories. In this paper we examine distributions of exon lengths for six different organisms and find that they offer empirical evidence that both theories may in part be correct.
Collapse
|
16
|
Sverdlov AV, Csuros M, Rogozin IB, Koonin EV. A glimpse of a putative pre-intron phase of eukaryotic evolution. Trends Genet 2007; 23:105-8. [PMID: 17239982 DOI: 10.1016/j.tig.2007.01.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2006] [Revised: 11/06/2006] [Accepted: 01/11/2007] [Indexed: 11/30/2022]
Abstract
Comparison of the exon-intron structures of ancient eukaryotic paralogs reveals the absence of conserved intron positions in these genes. This is in contrast to the conservation of intron positions in orthologous genes from even the most evolutionarily distant eukaryotes and in more recent paralogs. The lack of conserved intron positions in ancient paralogs probably reflects the origination of these genes during the earliest phase of eukaryotic evolution, which was characterized by concomitant invasion of genes by group II self-splicing elements (which were to become introns in the future) and extensive duplication of genes.
Collapse
Affiliation(s)
- Alexander V Sverdlov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
17
|
Evolution of secretin family GPCR members in the metazoa. BMC Evol Biol 2006; 6:108. [PMID: 17166275 PMCID: PMC1764030 DOI: 10.1186/1471-2148-6-108] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2006] [Accepted: 12/13/2006] [Indexed: 11/10/2022] Open
Abstract
Background Comparative approaches using protostome and deuterostome data have greatly contributed to understanding gene function and organismal complexity. The family 2 G-protein coupled receptors (GPCRs) are one of the largest and best studied hormone and neuropeptide receptor families. They are suggested to have arisen from a single ancestral gene via duplication events. Despite the recent identification of receptor members in protostome and early deuterostome genomes, relatively little is known about their function or origin during metazoan divergence. In this study a comprehensive description of family 2 GPCR evolution is given based on in silico and expression analyses of the invertebrate receptor genes. Results Family 2 GPCR members were identified in the invertebrate genomes of the nematodes C. elegans and C. briggsae, the arthropods D. melanogaster and A. gambiae (mosquito) and in the tunicate C. intestinalis. This suggests that they are of ancient origin and have evolved through gene/genome duplication events. Sequence comparisons and phylogenetic analyses have demonstrated that the immediate gene environment, with regard to gene content, is conserved between the protostome and deuterostome receptor genomic regions. Also that the protostome genes are more like the deuterostome Corticotrophin Releasing Factor (CRF) and Calcitonin/Calcitonin Gene-Related Peptide (CAL/CGRP) receptors members than the other family 2 GPCR members. The evolution of family 2 GPCRs in deuterostomes is characterised by acquisition of new family members, with SCT (Secretin) receptors only present in tetrapods. Gene structure is characterised by an increase in intron number with organismal complexity with the exception of the vertebrate CAL/CGRP receptors. Conclusion The family 2 GPCR members provide a good example of gene duplication events occurring in tandem with increasing organismal complexity during metazoan evolution. The putative ancestral receptors are proposed to be more like the deuterostome CAL/CGRP and CRF receptors and this may be associated with their fundamental role in calcium regulation and the stress response, both of which are essential for survival.
Collapse
|
18
|
Abstract
Research into the origins of introns is at a critical juncture in the resolution of theories on the evolution of early life (which came first, RNA or DNA?), the identity of LUCA (the last universal common ancestor, was it prokaryotic- or eukaryotic-like?), and the significance of noncoding nucleotide variation. One early notion was that introns would have evolved as a component of an efficient mechanism for the origin of genes. But alternative theories emerged as well. From the debate between the "introns-early" and "introns-late" theories came the proposal that introns arose before the origin of genetically encoded proteins and DNA, and the more recent "introns-first" theory, which postulates the presence of introns at that early evolutionary stage from a reconstruction of the "RNA world." Here we review seminal and recent ideas about intron origins. Recent discoveries about the patterns and causes of intron evolution make this one of the most hotly debated and exciting topics in molecular evolutionary biology today.
Collapse
Affiliation(s)
- Francisco Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
19
|
Koonin EV. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct 2006; 1:22. [PMID: 16907971 PMCID: PMC1570339 DOI: 10.1186/1745-6150-1-22] [Citation(s) in RCA: 187] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Accepted: 08/14/2006] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Ever since the discovery of 'genes in pieces' and mRNA splicing in eukaryotes, origin and evolution of spliceosomal introns have been considered within the conceptual framework of the 'introns early' versus 'introns late' debate. The 'introns early' hypothesis, which is closely linked to the so-called exon theory of gene evolution, posits that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. Under this scenario, the absence of spliceosomal introns in prokaryotes is considered to be a result of "genome streamlining". The 'introns late' hypothesis counters that spliceosomal introns emerged only in eukaryotes, and moreover, have been inserted into protein-coding genes continuously throughout the evolution of eukaryotes. Beyond the formal dilemma, the more substantial side of this debate has to do with possible roles of introns in the evolution of eukaryotes. RESULTS I argue that several lines of evidence now suggest a coherent solution to the introns-early versus introns-late debate, and the emerging picture of intron evolution integrates aspects of both views although, formally, there seems to be no support for the original version of introns-early. Firstly, there is growing evidence that spliceosomal introns evolved from group II self-splicing introns which are present, usually, in small numbers, in many bacteria, and probably, moved into the evolving eukaryotic genome from the alpha-proteobacterial progenitor of the mitochondria. Secondly, the concept of a primordial pool of 'virus-like' genetic elements implies that self-splicing introns are among the most ancient genetic entities. Thirdly, reconstructions of the ancestral state of eukaryotic genes suggest that the last common ancestor of extant eukaryotes had an intron-rich genome. Thus, it appears that ancestors of spliceosomal introns, indeed, have existed since the earliest stages of life's evolution, in a formal agreement with the introns-early scenario. However, there is no evidence that these ancient introns ever became widespread before the emergence of eukaryotes, hence, the central tenet of introns-early, the role of introns in early evolution of proteins, has no support. However, the demonstration that numerous introns invaded eukaryotic genes at the outset of eukaryotic evolution and that subsequent intron gain has been limited in many eukaryotic lineages implicates introns as an ancestral feature of eukaryotic genomes and refutes radical versions of introns-late. Perhaps, most importantly, I argue that the intron invasion triggered other pivotal events of eukaryogenesis, including the emergence of the spliceosome, the nucleus, the linear chromosomes, the telomerase, and the ubiquitin signaling system. This concept of eukaryogenesis, in a sense, revives some tenets of the exon hypothesis, by assigning to introns crucial roles in eukaryotic evolutionary innovation. CONCLUSION The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics and theoretical considerations goes as follows: self-splicing introns since the earliest stages of life's evolution--numerous spliceosomal introns invading genes of the emerging eukaryote during eukaryogenesis--subsequent lineage-specific loss and gain of introns. The intron invasion, probably, spawned by the mitochondrial endosymbiont, might have critically contributed to the emergence of the principal features of the eukaryotic cell. This scenario combines aspects of the introns-early and introns-late views. REVIEWERS this article was reviewed by W. Ford Doolittle, James Darnell (nominated by W. Ford Doolittle), William Martin, and Anthony Poole.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
20
|
Lin H, Zhu W, Silva JC, Gu X, Buell CR. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol 2006; 7:R41. [PMID: 16719932 PMCID: PMC1779517 DOI: 10.1186/gb-2006-7-5-r41] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2006] [Revised: 03/21/2006] [Accepted: 04/24/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Introns are under less selection pressure than exons, and consequently, intronic sequences have a higher rate of gain and loss than exons. In a number of plant species, a large portion of the genome has been segmentally duplicated, giving rise to a large set of duplicated genes. The recent completion of the rice genome in which segmental duplication has been documented has allowed us to investigate intron evolution within rice, a diploid monocotyledonous species. RESULTS Analysis of segmental duplication in rice revealed that 159 Mb of the 371 Mb genome and 21,570 of the 43,719 non-transposable element-related genes were contained within a duplicated region. In these duplicated regions, 3,101 collinear paired genes were present. Using this set of segmentally duplicated genes, we investigated intron evolution from full-length cDNA-supported non-transposable element-related gene models of rice. Using gene pairs that have an ortholog in the dicotyledonous model species Arabidopsis thaliana, we identified more intron loss (49 introns within 35 gene pairs) than intron gain (5 introns within 5 gene pairs) following segmental duplication. We were unable to demonstrate preferential intron loss at the 3' end of genes as previously reported in mammalian genomes. However, we did find that the four nucleotides of exons that flank lost introns had less frequently used 4-mers. CONCLUSION We observed that intron evolution within rice following segmental duplication is largely dominated by intron loss. In two of the five cases of intron gain within segmentally duplicated genes, the gained sequences were similar to transposable elements.
Collapse
Affiliation(s)
- Haining Lin
- The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA
| | - Wei Zhu
- The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA
| | - Joana C Silva
- The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA
| | - Xun Gu
- Department of Genetics, Development, and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA
| | - C Robin Buell
- The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
21
|
Whamond GS, Thornton JM. An analysis of intron positions in relation to nucleotides, amino acids, and protein secondary structure. J Mol Biol 2006; 359:238-47. [PMID: 16616935 DOI: 10.1016/j.jmb.2006.03.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2005] [Revised: 03/05/2006] [Accepted: 03/15/2006] [Indexed: 10/24/2022]
Abstract
We present an analysis of intron positions in relation to nucleotides, amino acid residues, and protein secondary structure. Previous work has shown that intron sites in proteins are not randomly distributed with respect to secondary structures. Here we show that this preference can be almost totally explained by the nucleotide bias of splice site machinery, and may well not relate to protein stability or conformation at all. Each intron phase is preferentially associated with its own set of residues: phase 0 introns with lysine, glutamine, and glutamic acid before the intron, and valine after; phase 1 introns with glycine, alanine, valine, aspartic acid, and glutamic acid; and phase 2 introns with arginine, serine, lysine, and tryptophan. These preferences can be explained principally on the basis of nucleotide bias at intron locations, which is in accordance with previous literature. Although this work does not prove that introns are inserted into genomes at specific proto-splice sites, it shows that the nucleotide bias surrounding introns, however it originally occurred, explains the observed correlations between introns and protein secondary structure.
Collapse
Affiliation(s)
- Gordon S Whamond
- Department of Biochemistry and Molecular Biology, University College London, UK.
| | | |
Collapse
|
22
|
Kvamme BO, Kongshaug H, Nilsen F. Organisation of trypsin genes in the salmon louse (Lepeophtheirus salmonis, Crustacea, copepoda) genome. Gene 2005; 352:63-74. [PMID: 15878809 DOI: 10.1016/j.gene.2005.03.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2004] [Revised: 02/24/2005] [Accepted: 03/14/2005] [Indexed: 11/26/2022]
Abstract
Trypsins constitute a subclass of the S1A family of serine peptidases found in all groups of animal and some bacteria. At present, no information about the genomic organisation of trypsins is available for copepods. The only data of copepod trypsins indicate several different trypsins in the marine parasitic copepod Lepeophtheirus salmonis. In the present study, 31.7 kbp of genomic DNA surrounding the previously described LsTryp1-5 sequences was sequenced. The sequenced regions contain nine full-length and three partial trypsin genes. A conservative estimate based on PCR analysis and genomic sequence indicated at least 22 different trypsin genes in L. salmonis, of which 18 are most similar to the previously described LsTryp1 and -2 cDNA sequences. Four of these genes are putative pseudogenes. In addition, a putative mariner like transposase gene was identified. The genomic sequences suggest that the L. salmonis trypsin genes reside within one or more gene clusters. Three different LsTryp intron exon structures were identified, and all three are different from the intron exon organisation previously reported for other S1A peptidases. This implies several intron loss and gain events in the evolution of the L. salmonis trypsin genes.
Collapse
Affiliation(s)
- Bjørn Olav Kvamme
- Institute of Marine Research, P.O. Box 1870 Nordnes, 5817 Bergen, Norway.
| | | | | |
Collapse
|
23
|
A survey of intron research in genetics. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/3-540-61723-x_974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
24
|
Simon DM, Hummel CL, Sheeley SL, Bhattacharya D. Heterogeneity of intron presence or absence in rDNA genes of the lichen species Physcia aipolia and P. stellaris. Curr Genet 2005; 47:389-99. [PMID: 15868149 DOI: 10.1007/s00294-005-0581-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2005] [Revised: 03/22/2005] [Accepted: 03/29/2005] [Indexed: 10/25/2022]
Abstract
Intron origin and evolution are of high interest, yet the rates of insertion and loss are unclear. To investigate their spread, we studied ribosomal (r)DNA introns from the closely related lichens Physcia aipolia and P. stellaris. Both taxa are replete with rDNA spliceosomal introns and autocatalytic group I introns, many of which show presence/absence polymorphism when screened with the PCR approach. This initially suggested that Physcia could be a model for studying intron retention and loss. However, during the course of a population-level analysis, we discovered widespread intron presence/absence heterogeneity within lichen thalli. To address this result, we sequenced multiple clones encoding nuclear rDNA and the single-copy elongation factor-1alpha (EF-1alpha) from individual thalli. These data showed extensive rDNA heterogeneity within individuals, rather than the presence of multiple fungi within a thallus. Our results suggest that considerable care must be taken when interpreting intron presence/absence in lichen rDNA, an observation that has general implications for the study of rDNA intron evolution.
Collapse
Affiliation(s)
- Dawn M Simon
- Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa, 312 Biology Building, Iowa City, IA 52242-1324, USA
| | | | | | | |
Collapse
|
25
|
Sverdlov AV, Babenko VN, Rogozin IB, Koonin EV. Preferential loss and gain of introns in 3' portions of genes suggests a reverse-transcription mechanism of intron insertion. Gene 2004; 338:85-91. [PMID: 15302409 DOI: 10.1016/j.gene.2004.05.027] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2004] [Revised: 04/19/2004] [Accepted: 05/17/2004] [Indexed: 11/25/2022]
Abstract
In an attempt to gain insight into the dynamics of intron evolution in eukaryotic protein-coding genes, the distributions of old introns, that are conserved between distant phylogenetic lineages, and new, lineage-specific introns along the gene length, were examined. A significant excess of old introns in 5'-regions of genes was detected. New introns, when analyzed in bulk, showed a nearly flat distribution from the 5'- to the 3'-end. However, analysis of new intron distributions in individual genomes revealed notable lineage-specific features. While in intron-poor genomes, particularly yeast Schizosaccharomyces pombe (Sp), the 5'-portions of genes contain a significantly greater number of new introns than the 3'-portions, the intron-rich genomes of humans and Arabidopsis show the opposite trend. These observations seem to be compatible with the view that introns are both lost and inserted in 3'-terminal portions of genes more often than in 5'-portions. Overrepresentation of 3'-terminal sequences among cDNAs that mediate intron loss appears to be the most likely explanation for the apparent preferential loss of introns in the distal parts of genes. Preferential insertion of introns in the 3'-portions suggests that introns might be inserted via a reverse-transcription-mediated pathway similar to that implicated in intron loss. This mechanism could involve duplication of a portion of the coding region during reverse transcription followed by homologous recombination and subsequent rapid sequence divergence in the copy that becomes a new intron.
Collapse
Affiliation(s)
- Alexander V Sverdlov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
26
|
|
27
|
Gopalan V, Tan TW, Lee BTK, Ranganathan S. Xpro: database of eukaryotic protein-encoding genes. Nucleic Acids Res 2004; 32:D59-63. [PMID: 14681359 PMCID: PMC308785 DOI: 10.1093/nar/gkh051] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record's sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493,983 genes--351,918 intron- containing genes and 142,065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro.
Collapse
Affiliation(s)
- Vivek Gopalan
- Department of Biochemistry, National University of Singapore, Singapore 119260
| | | | | | | |
Collapse
|
28
|
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 2003; 13:1512-7. [PMID: 12956953 DOI: 10.1016/s0960-9822(03)00558-x] [Citation(s) in RCA: 301] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Sequencing of eukaryotic genomes allows one to address major evolutionary problems, such as the evolution of gene structure. We compared the intron positions in 684 orthologous gene sets from 8 complete genomes of animals, plants, fungi, and protists and constructed parsimonious scenarios of evolution of the exon-intron structure for the respective genes. Approximately one-third of the introns in the malaria parasite Plasmodium falciparum are shared with at least one crown group eukaryote; this number indicates that these introns have been conserved through >1.5 billion years of evolution that separate Plasmodium from the crown group. Paradoxically, humans share many more introns with the plant Arabidopsis thaliana than with the fly or nematode. The inferred evolutionary scenario holds that the common ancestor of Plasmodium and the crown group and, especially, the common ancestor of animals, plants, and fungi had numerous introns. Most of these ancestral introns, which are retained in the genomes of vertebrates and plants, have been lost in fungi, nematodes, arthropods, and probably Plasmodium. In addition, numerous introns have been inserted into vertebrate and plant genes, whereas, in other lineages, intron gain was much less prominent.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
29
|
Abstract
Retrotransposons have proliferated extensively in eukaryotic lineages; the genomes of many animals and plants comprise 50% or more retrotransposon sequences by weight. There are several persuasive arguments that the enzymatic lynchpin of retrotransposon replication, reverse transcriptase (RT), is an ancient enzyme. Moreover, the direct progenitors of retrotransposons are thought to be mobile self-splicing introns that actively propagate themselves via reverse transcription, the group II introns, also known as retrointrons. Retrointrons are represented in modern genomes in very modest numbers, and thus far, only in certain eubacterial and organellar genomes. Archaeal genomes are nearly devoid of RT in any form. In this study, I propose a model to explain this unusual distribution, and rationalize it with the proposed ancient origin of the RT gene. A cap and tail hypothesis is proposed. By this hypothesis, the specialized terminal structures of eukaryotic mRNA provide the ideal molecular environment for the lengthening, evolution, and subsequent massive expansion of highly mobile retrotransposons, leading directly to the retrotransposon-cluttered structure that typifies modern metazoan genomes and the eventual emergence of retroviruses.
Collapse
Affiliation(s)
- Jef D Boeke
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.
| |
Collapse
|
30
|
Rogozin IB, Babenko VN, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Mirkin BG, Nikolskaya AN, Rao BS, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA, Koonin EV. Evolution of eukaryotic gene repertoire and gene structure: discovering the unexpected dynamics of genome evolution. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2003; 68:293-301. [PMID: 15338629 DOI: 10.1101/sqb.2003.68.293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Affiliation(s)
- I B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Sakurai A, Fujimori S, Kochiwa H, Kitamura-Abe S, Washio T, Saito R, Carninci P, Hayashizaki Y, Tomita M. On biased distribution of introns in various eukaryotes. Gene 2002; 300:89-95. [PMID: 12468090 DOI: 10.1016/s0378-1119(02)01035-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We conducted comprehensive analyses on intron positions in the Mus musculus genome by comparing genomic sequences in the GenBank database and cDNA sequences in the mouse cDNA library recently developed by Riken Genomic Sciences Center. Our results confirm that introns have a tendency to be located toward the 5' end of the gene. The same type of analysis was conducted in the coding region of seven eukaryotes (Saccharomyces cerevisiae, Plasmodium falciparum, Caenorhabditis elegans, Drosophila melanogaster, M. musculus, Homo sapiens, Arabidopsis thaliana). Introns in genes with a single intron have a locational bias toward the 5' end in all species except A. thaliana. We also measured the distance from the start codon to the position of the intron, and found that single introns prefer the location immediately after the start codon in S. cerevisiae and P. falciparum. We discuss three possible explanations for these findings: (1) they are the consequence of intron loss by reverse-transcriptase; (2) they are necessary to accommodate the function; and (3) they are concerned with the mechanism of pre-mRNA splicing.
Collapse
Affiliation(s)
- A Sakurai
- Institute for Advanced Biosciences, Keio University, 5322 Endo, Fujisawa-city, Kanagawa, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Di Maro A, Pizzo E, Cubellis MV, D'Alessio G. An intron-less betagamma-crystallin-type gene from the sponge Geodia cydonium. Gene 2002; 299:79-82. [PMID: 12459254 DOI: 10.1016/s0378-1119(02)01014-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We report the cloning of a gene encoding a betagamma-crystallin-type protein from a porifera, the Geodia cydonium sponge. The data provide direct, conclusive evidence of the existence of such a gene in the genome of an early diverged metazoan. The cloned gene is found to contain no introns, while proto-splice sites are identified in the nucleotide sequence at positions where introns are located in homologous, very recently diverged vertebrate genes. These findings are discussed in the light of the debate between the introns-late and introns-early theories.
Collapse
Affiliation(s)
- Antimo Di Maro
- Dipartimento di Scienze della Vita, Seconda Università di Napoli, Via Vivaldi 43, 81100 Caserta, Italy
| | | | | | | |
Collapse
|
33
|
Müller WEG, Böhm M, Grebenjuk VA, Skorokhod A, Müller IM, Gamulin V. Conservation of the positions of metazoan introns from sponges to humans. Gene 2002; 295:299-309. [PMID: 12354665 DOI: 10.1016/s0378-1119(02)00690-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Sponges (phylum Porifera) are the phylogenetic oldest Metazoa still extant. They can be considered as reference animals (Urmetazoa) for the understanding of the evolutionary processes resulting in the creation of Metazoa in general and also for the metazoan gene organization in particular. In the marine sponge Suberites domuncula, genes encoding p38 and JNK kinases contain nine and twelve introns, respectively. Eight introns in both genes share the same positions and the identical phases. One p38 intron slipped for six bases and the JNK gene has three more introns. However, the sequences of the introns are not conserved and the introns in JNK gene are generally much longer. Introns interrupt most of the conserved kinase subdomains I-XI and are found in all three phases (0, 1 and 2). We analyzed in details p38 and JNK genes from human, Caenorhabditis elegans and Drosophila melanogaster and found in most genes introns at the positions identical to those in sponge genes. The exceptions are two p38 genes from D. melanogaster that have lost all introns in the coding sequence. The positions of 11 introns in each of four human p38 genes are fully conserved and ten introns occupy identical positions as the introns in sponge p38 or JNK genes. The same is true for nine, out of ten introns in the human JNK-1 gene. The introns in human p38 and JNK genes are on average more than ten times longer than corresponding introns in sponges. It was proposed that yeast HOG1-like kinases (from i.e. Saccharomyces cerevisiae and Emericella nidulans) and metazoan p38 and JNK kinases are orthologues. p38 and JNK genes were created after the split from fungi by the duplication and diversification of the HOG1-like progenitor gene. Our results further support the common origin of p38 and JNK genes and speak in favor of a very early time of duplication. The ancestral gene contained at least ten introns, which are still present at the very conserved positions in p38 and JNK genes of extant animals. Four of these introns are present at the same positions in the HOG-like gene in the fungus E. nidulans. The others probably entered the ancestral gene after the split of fungi, but before the duplication of the gene and before the creation of the common, urmetazoan progenitor of all multicellular animals. A second gene coding for an immune molecule is described, the allograft inflammatory factor, which likewise showed a highly conserved exon/intron structure in S. domuncula and in human. These data show that the intron/exon borders are highly conserved in genes from sponges to human.
Collapse
Affiliation(s)
- Werner E G Müller
- Institut für Physiologische Chemie, Abteilung Angewandte Molekularbiologie, Universität Mainz, Duesbergweg 6, 55099, Mainz, Germany.
| | | | | | | | | | | |
Collapse
|
34
|
Abstract
DNA shuffling has proven to be a powerful technique for the directed evolution of proteins. A mix of theoretical and applied research has now provided insights into how recombination can be guided to more efficiently generate proteins and even organisms with altered functions.
Collapse
Affiliation(s)
- Jamie M Bacher
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
| | - Brian D Reiss
- Center for Nano- and Molecular Science and Technology, University of Texas at Austin, Austin, TX 78712, USA
| | - Andrew D Ellington
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA
- Center for Nano- and Molecular Science and Technology, University of Texas at Austin, Austin, TX 78712, USA
- Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
35
|
Anantharaman V, Koonin EV, Aravind L. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res 2002; 30:1427-64. [PMID: 11917006 PMCID: PMC101826 DOI: 10.1093/nar/30.7.1427] [Citation(s) in RCA: 381] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
RNA metabolism, broadly defined as the compendium of all processes that involve RNA, including transcription, processing and modification of transcripts, translation, RNA degradation and its regulation, is the central and most evolutionarily conserved part of cell physiology. A comprehensive, genome-wide census of all enzymatic and non-enzymatic protein domains involved in RNA metabolism was conducted by using sequence profile analysis and structural comparisons. Proteins related to RNA metabolism comprise from 3 to 11% of the complete protein repertoire in bacteria, archaea and eukaryotes, with the greatest fraction seen in parasitic bacteria with small genomes. Approximately one-half of protein domains involved in RNA metabolism are present in most, if not all, species from all three primary kingdoms and are traceable to the last universal common ancestor (LUCA). The principal features of LUCA's RNA metabolism system were reconstructed by parsimony-based evolutionary analysis of all relevant groups of orthologous proteins. This reconstruction shows that LUCA possessed not only the basal translation system, but also the principal forms of RNA modification, such as methylation, pseudouridylation and thiouridylation, as well as simple mechanisms for polyadenylation and RNA degradation. Some of these ancient domains form paralogous groups whose evolution can be traced back in time beyond LUCA, towards low-specificity proteins, which probably functioned as cofactors for ribozymes within the RNA world framework. The main lineage-specific innovations of RNA metabolism systems were identified. The most notable phase of innovation in RNA metabolism coincides with the advent of eukaryotes and was brought about by the merge of the archaeal and bacterial systems via mitochondrial endosymbiosis, but also involved emergence of several new, eukaryote-specific RNA-binding domains. Subsequent, vast expansions of these domains mark the origin of alternative splicing in animals and probably in plants. In addition to the reconstruction of the evolutionary history of RNA metabolism, this analysis produced numerous functional predictions, e.g. of previously undetected enzymes of RNA modification.
Collapse
Affiliation(s)
- Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Building 389, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
36
|
Fedorov A, Cao X, Saxonov S, de Souza SJ, Roy SW, Gilbert W. Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns. Proc Natl Acad Sci U S A 2001; 98:13177-82. [PMID: 11687643 PMCID: PMC60844 DOI: 10.1073/pnas.231491498] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
o introns delineate elements of protein tertiary structure? This issue is crucial to the debate about the role and origin of introns. We present an analysis of the full set of proteins with known three-dimensional structures that have homologs with intron positions recorded in GenBank. A computer program was generated that maps on a reference sequence the positions of all introns in homologous genes. We have applied this program to a set of 665 nonredundant protein sequences with defined three-dimensional structures in the Protein Data Bank (PDB), which yielded 8,217 introns in 407 proteins. For the subset of proteins corresponding to ancient conserved regions (ACR), we find that there is a correlation of phase-zero introns with the boundary regions of modules and no correlation for the phase-one and phase-two positions. However, for a subset of proteins without prokaryotic counterparts (131 non-ACR proteins), a set of presumably modern proteins (or proteins that have diverged extremely far from any ancestral form), we do not find any correlation of phase-zero intron positions with three-dimensional structure. Furthermore, we find an anticorrelation of phase-one intron positions with module boundaries: they actually have a preference for the interior of modules. This finding is explicable as a preference for phase-one introns to lie in glycines, between G/G sequences, the preference for glycines being anticorrelated with the three-dimensional modules. We interpret this anticorrelation as a sign that a number of phase-one introns, and hence many modern introns, have been inserted into G/G "protosplice" sequences.
Collapse
Affiliation(s)
- A Fedorov
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | | | | | | | | | |
Collapse
|
37
|
Abstract
Several technical, social, and biological networks were recently found to demonstrate scale-free and small-world behavior instead of random graph characteristics. In this work, the topology of protein domain networks generated with data from the ProDom, Pfam, and Prosite domain databases was studied. It was found that these networks exhibited small-world and scale-free topologies with a high degree of local clustering accompanied by a few long-distance connections. Moreover, these observations apply not only to the complete databases, but also to the domain distributions in proteomes of different organisms. The extent of connectivity among domains reflects the evolutionary complexity of the organisms considered.
Collapse
Affiliation(s)
- S Wuchty
- European Media Laboratory, Heidelberg, Germany.
| |
Collapse
|
38
|
Inagaki Y, Doolittle WF. Class I release factors in ciliates with variant genetic codes. Nucleic Acids Res 2001; 29:921-7. [PMID: 11160924 PMCID: PMC29606 DOI: 10.1093/nar/29.4.921] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In eukaryotes with the universal genetic code a single class I release factor (eRF1) most probably recognizes all stop codons (UAA, UAG and UGA) and is essential for termination of nascent peptide synthesis. It is well established that stop codons have been reassigned to amino acid codons at least three times among ciliates. The codon specificities of ciliate eRF1s must have been modified to accommodate the variant codes. In this study we have amplified, cloned and sequenced eRF1 genes of two hypotrichous ciliates, Oxytricha trifallax (UAA and UAG for Gln) and Euplotes aediculatus (UGA for Cys). We also sequenced/identified three protist and two archaeal class I RF genes to enlarge the database of eRF1/aRF1s with the universal code. Extensive comparisons between universal code eRF1s and those of Oxytricha, Euplotes, and Tetrahymena which represent three lineages that acquired variant codes independently, provide important clues to identify stop codon-binding regions in eRF1. Domain 1 in the five ciliate eRF1s, particularly the TASNIKS heptapeptide and its adjacent region, differs significantly from domain 1 in universal code eRF1s. This observation suggests that domain 1 contains the codon recognition site, but that the mechanism of eRF1 codon recognition may be more complex than proposed by Nakamura et al. or Knight and Landweber.
Collapse
Affiliation(s)
- Y Inagaki
- Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.
| | | |
Collapse
|
39
|
Glansdorff N. About the last common ancestor, the universal life-tree and lateral gene transfer: a reappraisal. Mol Microbiol 2000; 38:177-85. [PMID: 11069646 DOI: 10.1046/j.1365-2958.2000.02126.x] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
An organismal tree rooted in the bacterial branch and derived from a hyperthermophilic last common ancestor (LCA) is still widely assumed to represent the path followed by evolution from the most primeval cells to the three domains recognized among contemporary organisms: Bacteria, Archaea and Eucarya. In the past few years, however, more and more discrepancies between this pattern and individual protein trees have been brought to light. There has been an overall tendency to attribute these incongruities to widespread lateral gene transfer. However, recent developments, a reappraisal of earlier evidence and considerations of our own lead us to a quite different view. It would appear (i) that the role of lateral gene transfer was overemphasized in recent discussions of molecular phylogenies; (ii) that the LCA was probably a non-thermophilic protoeukaryote from which both Archaea and Bacteria emerged by reductive evolution but not as sister groups, in keeping with a current evolutionary scheme for the biosynthesis of membrane lipids; and (iii) that thermophilic Archaea may have been the first branch to diverge from the ancestral line.
Collapse
Affiliation(s)
- N Glansdorff
- Microbiology, Free University of Brussels (VUB), Flanders Interuniversity Institute and J.-M. Wiame Microbiological Research Institute, Brussels B-1070, Belgium.
| |
Collapse
|
40
|
Affiliation(s)
- I B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
41
|
Marquardt J, Wans S, Rhiel E, Randolf A, Krumbein WE. Intron-exon structure and gene copy number of a gene encoding for a membrane-intrinsic light-harvesting polypeptide of the red alga Galdieria sulphuraria. Gene 2000; 255:257-65. [PMID: 11024285 DOI: 10.1016/s0378-1119(00)00332-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Genes for light-harvesting proteins (lhc genes) of higher plants are well examined. However, little is known about the corresponding genes of algae, although this knowledge might give valuable information about the evolution of photosynthetic antennae. In the case of rhodophytes only two cDNA sequences from a single organism, Porphyridium cruentum, have been published. Here we describe an additional sequence from another species, the thermo-acidophilic red alga Galdieria sulphuraria. For the first time also a genomic sequence for a red algal lhc gene is presented. From a cDNA library of G. sulphuraria we isolated a clone containing an open reading frame for a protein of 302 amino acids with a deduced molecular mass of 33.86kDa. It shares major structural features with eukaryotic light-harvesting polypeptides. A proposed cleavage site between transit peptide and mature protein gives rise to a transit peptide of 119 amino acids and a mature protein of 183 residues. Hydropathy analysis suggests that the mature protein consists of three transmembrane helices. Several amino acid residues supposed to bind chlorophyll a and chlorophyll b in higher plants are conserved. The protein shows up to 69% identity and 81% similarity to the Porphyridium polypeptides in the transmembrane helices 1 and 3. Using oligonucleotides annealing in the regions of the start and stop codons of the gene as primers, a DNA sequence was amplified from nuclear G. sulphuraria DNA by PCR. Compared with the cDNA clone, this sequence contains five additional intervening DNA strings of 50-74bp length. Four of them show typical features of spliceosomal introns with GT-AG borders, and the fifth differs by starting with GC. Three of the supposed introns are located in similar positions as introns of higher plant light-harvesting proteins. Southern blotting and hybridization experiments indicate that G. sulphuraria contains at least three copies of this gene.
Collapse
Affiliation(s)
- J Marquardt
- ICBM/Geomikrobiologie, Carl von Ossietzky Universität Oldenburg, Carl-von-Ossietzky-Str. 9-11, 26111, Oldenburg, Germany.
| | | | | | | | | |
Collapse
|
42
|
Affiliation(s)
- G Heinrich
- VA Northern California Health Care System and EBIRE, 150 Muir Road, Martinez, CA 94553, USA.
| | | |
Collapse
|
43
|
Abstract
Prokaryotes are generally assumed to be the oldest existing form of life on earth. This assumption, however, makes it difficult to understand certain aspects of the transition from earlier stages in the origin of life to more complex ones, and it does not account for many apparently ancient features in the eukaryotes. From a model of the RNA world, based on relic RNA species in modern organisms, one can infer that there was an absolute requirement for a high-accuracy RNA replicase even before proteins evolved. In addition, we argue here that the ribosome (together with the RNAs involved in its assembly) is so large that it must have had a prior function before protein synthesis. A model that connects and equates these two requirements (high-accuracy RNA replicase and prior function of the ribosome) can explain many steps in the origin of life while accounting for the observation that eukaryotes have retained more vestiges of the RNA world. The later derivation of prokaryote RNA metabolism and genome structure can be accounted for by the two complementary mechanisms of r-selection and thermoreduction.
Collapse
Affiliation(s)
- A Poole
- Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand.
| | | | | |
Collapse
|
44
|
Abstract
Recent studies on the genomes of protists, plants, fungi and animals confirm that the increase in genome size and gene number in different eukaryotic lineages is paralleled by a general decrease in genome compactness and an increase in the number and size of introns. It may thus be predicted that exon-shuffling has become increasingly significant with the evolution of larger, less compact genomes. To test the validity of this prediction, we have analyzed the evolutionary distribution of modular proteins that have clearly evolved by intronic recombination. The results of this analysis indicate that modular multidomain proteins produced by exon-shuffling are restricted in their evolutionary distribution. Although such proteins are present in all major groups of metazoa from sponges to chordates, there is practically no evidence for the presence of related modular proteins in other groups of eukaryotes. The biological significance of this difference in the composition of the proteomes of animals, fungi, plants and protists is best appreciated when these modular proteins are classified with respect to their biological function. The majority of these proteins can be assigned to functional categories that are inextricably linked to multicellularity of animals, and are of absolute importance in permitting animals to function in an integrated fashion: constituents of the extracellular matrix, proteases involved in tissue remodelling processes, various proteins of body fluids, membrane-associated proteins mediating cell-cell and cell-matrix interactions, membrane associated receptor proteins regulating cell cell communications, etc. Although some basic types of modular proteins seem to be shared by all major groups of metazoa, there are also groups of modular proteins that appear to be restricted to certain evolutionary lineages. In summary, the results suggest that exon-shuffling acquired major significance at the time of metazoan radiation. It is interesting to note that the rise of exon-shuffling coincides with a spectacular burst of evolutionary creativity: the Big Bang of metazoan radiation. It seems probable that modular protein evolution by exon-shuffling has contributed significantly to this accelerated evolution of metazoa, since it facilitated the rapid construction of multidomain extracellular and cell surface proteins that are indispensable for multicellularity.
Collapse
Affiliation(s)
- L Patthy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest.
| |
Collapse
|
45
|
Takahashi KI, Noguti T, Hojo H, Yamauchi K, Kinoshita M, Aimoto S, Ohkubo T, Gō M. A mini-protein designed by removing a module from barnase: molecular modeling and NMR measurements of the conformation. PROTEIN ENGINEERING 1999; 12:673-80. [PMID: 10469828 DOI: 10.1093/protein/12.8.673] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
A globular domain can be decomposed into compact modules consisting of contiguous 10-30 amino acid residues. The correlation between modules and exons observed in different proteins suggests that each module was encoded by an ancestral exon and that modules were combined into globular domains by exon fusion. Barnase is a single domain RNase consisting of 110 amino acid residues and was decomposed into six modules. We designed a mini-protein by removing the second module, M2, from barnase in order to gain an insight into the structural and functional roles of the module. In the molecular modeling of the mini-protein, we evaluated thermodynamic stability and aqueous solubility together with mechanical stability of the model. We chemically synthesized a mini-barnase with (15)N-labeling at 10 residues, whose corresponding residues in barnase are all found in the region around the hydrophobic core. Circular dichroism and NMR measurements revealed that mini-barnase takes a non-random specific conformation that has a similar hydrophobic core structure to that of barnase. This result, that a module could be deleted without altering the structure of core region of barnase, supports the view that modules act as the building blocks of protein design.
Collapse
Affiliation(s)
- K i Takahashi
- Division of Biological Science, Graduate School of Science, Nagoya University, Furo-cho, Chikusa, Nagoya 464-8602, Japan
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Dewilde S, Blaxter M, Van Hauwaert ML, Van Houte K, Pesce A, Griffon N, Kiger L, Marden MC, Vermeire S, Vanfleteren J, Esmans E, Moens L. Structural, functional, and genetic characterization of Gastrophilus hemoglobin. J Biol Chem 1998; 273:32467-74. [PMID: 9829978 DOI: 10.1074/jbc.273.49.32467] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Hemoglobin of Gastrophilus intestinalis (Insecta, Diptera), was purified and characterized. At least two isoforms have been identified by isoelectrofocusing, mass spectrometry, and genomic Southern blotting. Functional studies show a high oxygen affinity due to a low ligand dissociation rate (koff = 2.4 s-1) and a relatively high autoxidation rate (t1/2 = 1.6/h). The globins were separated under denaturing conditions, and the sequence of Hb1 (Mr = 17,965 +/- 2) was determined at the protein and DNA level. The open reading frame codes for a polypeptide of 150 amino acids. Although the globin is distantly related to globins from other species, it has a low penalty score against globin templates. Freshly isolated hemoglobin was crystallized from polyethylene glycol. Crystals contain two hemoglobin molecules per asymmetric unit. Solution of the three-dimensional structure by molecular replacement could not be achieved, possibly due to the presence of three protein isoforms in the crystals. In order to determine its three-dimensional structure, G. intestinalis Hb1 was overexpressed in Escherichia coli, resulting in a fully functional molecule as confirmed by ligand binding affinity. The globin gene contains two introns at positions D7.0 and G7.0. The D7.0 intron is unprecedented, suggesting that globin gene evolution is much more complex than originally thought.
Collapse
Affiliation(s)
- S Dewilde
- Department of Biochemistry, University of Antwerp, B-2610 Antwerp, Belgium
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
Does the intron/exon structure of eukaryotic genes belie their ancient assembly by exon-shuffling or have introns been inserted into preformed genes during eukaryotic evolution? These are the central questions in the ongoing 'introns-early' versus 'introns-late' controversy. The phylogenetic distribution of spliceosomal introns continues to strongly favor the intronslate theory. The introns-early theory, however, has claimed support from intron phase and protein structure correlations.
Collapse
Affiliation(s)
- J M Logsdon
- Department of Biochemistry, Dalhousie University, Halifax, Nova Scotia,B3H 4H7, Canada.
| |
Collapse
|
48
|
Dewilde S, Winnepenninckx B, Arndt MH, Nascimento DG, Santoro MM, Knight M, Miller AN, Kerlavage AR, Geoghagen N, Van Marck E, Liu LX, Weber RE, Moens L. Characterization of the myoglobin and its coding gene of the mollusc Biomphalaria glabrata. J Biol Chem 1998; 273:13583-92. [PMID: 9593695 DOI: 10.1074/jbc.273.22.13583] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A cDNA clone isolated from a Biomphalaria glabrata (Mollusca, Gastropoda) neural cDNA library was identified as encoding a myoglobin-like protein of 148 amino acids with a single domain and a calculated mass of 16,049.29. Alignment with globin sequences with known tertiary structure confirms its overall globin nature. The expressed myoglobin was identified in the radular muscle and isolated. Oxygen equilibrium measurements on the protein reveal a high oxygen affinity. Val-B10 and Gln-E7, important residues for the determination of the oxygen affinity, are strikingly different from the standard molluscan pattern (Conti, E., Moser, C., Rizzi, M., Mattevi, A., Lionetti, C., Coda, A., Ascenzi, P., Brunori, M., Bolognesi, M. (1993) J. Mol. Biol. 233, 498-508). The single gene encoding the globin chain is interrupted by three introns at positions A3.2, B12.2, and G7.0. Comparison with other nonvertebrate globin genes reveals on the one hand conservation (B12.2 and G7.0) and on the other hand variability of the insertion positions (A3.2). The Biomphalaria myoglobin sequence was used together with all other molluscan globin sequences available to assess the origin and phylogeny of the phylum. Our results confirm the doubts raised about monophyletic origin of the Mollusca, which was first observed using SSU rRNA as a molecular marker.
Collapse
Affiliation(s)
- S Dewilde
- Department of Biochemistry, University of Antwerp, B-2610 Antwerp, Belgium
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Kalyanaraman S, Copeland NG, Gilbert DG, Jenkins NA, Gautam N. Structure and chromosomal localization of mouse G protein subunit gamma 4 gene. Genomics 1998; 49:147-51. [PMID: 9570961 DOI: 10.1006/geno.1998.5223] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The G protein gamma subunits are members of a multigene family and are implicated in determining the specificity of receptor-G protein interaction. The gene structures for many of the gamma subunits remain to be determined. Here, we report the gene structure for the brain-specific gamma 4 subunit and its map position on a mouse chromosome. The gene (Gng4) comprises at least three exons spanning over 20 kb. The 225-bp coding region, which spans two exons, is interrupted by a large 18.2-kb intron whose position is conserved in other gamma subunit genes. There is a putative additional intron in the 5' untranslated region just upstream of the translation initiation codon. Introns are present in most of the other gamma subunits at this position. The mouse Gng4 gene is mapped to chromosome 13.
Collapse
Affiliation(s)
- S Kalyanaraman
- Department of Anesthesiology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | | | |
Collapse
|
50
|
O'Neill RJ, Brennan FE, Delbridge ML, Crozier RH, Graves JA. De novo insertion of an intron into the mammalian sex determining gene, SRY. Proc Natl Acad Sci U S A 1998; 95:1653-7. [PMID: 9465071 PMCID: PMC19134 DOI: 10.1073/pnas.95.4.1653] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/1997] [Indexed: 02/06/2023] Open
Abstract
Two theories have been proposed to explain the evolution of introns within eukaryotic genes. The introns early theory, or "exon theory of genes," proposes that introns are ancient and that recombination within introns provided new exon structure, and thus new genes. The introns late theory, or "insertional theory of introns," proposes that ancient genes existed as uninterrupted exons and that introns have been introduced during the course of evolution. There is still controversy as to how intron-exon structure evolved and whether the majority of introns are ancient or novel. Although there is extensive evidence in support of the introns early theory, phylogenetic comparisons of several genes indicate recent gain and loss of introns within these genes. However, no example has been shown of a protein coding gene, intronless in its ancestral form, which has acquired an intron in a derived form. The mammalian sex determining gene, SRY, is intronless in all mammals studied to date, as is the gene from which it recently evolved. However, we report here comparisons of genomic and cDNA sequences that now provide evidence of a de novo insertion of an intron into the SRY gene of dasyurid marsupials. This recently (approximately 45 million years ago) inserted sequence is not homologous with known transposable elements. Our data demonstrate that introns may be inserted as spliced units within a developmentally crucial gene without disrupting its function.
Collapse
Affiliation(s)
- R J O'Neill
- School of Genetics and Human Variation, La Trobe University, Bundoora, VIC 3083, Australia.
| | | | | | | | | |
Collapse
|