1
|
McCoy MJ, Fire AZ. Parallel gene size and isoform expansion of ancient neuronal genes. Curr Biol 2024; 34:1635-1645.e3. [PMID: 38460513 PMCID: PMC11043017 DOI: 10.1016/j.cub.2024.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 12/16/2023] [Accepted: 02/11/2024] [Indexed: 03/11/2024]
Abstract
How nervous systems evolved is a central question in biology. A diversity of synaptic proteins is thought to play a central role in the formation of specific synapses leading to nervous system complexity. The largest animal genes, often spanning hundreds of thousands of base pairs, are known to be enriched for expression in neurons at synapses and are frequently mutated or misregulated in neurological disorders and diseases. Although many of these genes have been studied independently in the context of nervous system evolution and disease, general principles underlying their parallel evolution remain unknown. To investigate this, we directly compared orthologous gene sizes across eukaryotes. By comparing relative gene sizes within organisms, we identified a distinct class of large genes with origins predating the diversification of animals and, in many cases, the emergence of neurons as dedicated cell types. We traced this class of ancient large genes through evolution and found orthologs of the large synaptic genes potentially driving the immense complexity of metazoan nervous systems, including in humans and cephalopods. Moreover, we found that while these genes are evolving under strong purifying selection, as demonstrated by low dN/dS ratios, they have simultaneously grown larger and gained the most isoforms in animals. This work provides a new lens through which to view this distinctive class of large and multi-isoform genes and demonstrates how intrinsic genomic properties, such as gene length, can provide flexibility in molecular evolution and allow groups of genes and their host organisms to evolve toward complexity.
Collapse
Affiliation(s)
- Matthew J McCoy
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.
| | - Andrew Z Fire
- Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA.
| |
Collapse
|
2
|
McCoy MJ, Fire AZ. Ancient origins of complex neuronal genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.28.534655. [PMID: 37034725 PMCID: PMC10081198 DOI: 10.1101/2023.03.28.534655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
How nervous systems evolved is a central question in biology. An increasing diversity of synaptic proteins is thought to play a central role in the formation of specific synapses leading to nervous system complexity. The largest animal genes, often spanning millions of base pairs, are known to be enriched for expression in neurons at synapses and are frequently mutated or misregulated in neurological disorders and diseases. While many of these genes have been studied independently in the context of nervous system evolution and disease, general principles underlying their parallel evolution remain unknown. To investigate this, we directly compared orthologous gene sizes across eukaryotes. By comparing relative gene sizes within organisms, we identified a distinct class of large genes with origins predating the diversification of animals and in many cases the emergence of dedicated neuronal cell types. We traced this class of ancient large genes through evolution and found orthologs of the large synaptic genes driving the immense complexity of metazoan nervous systems, including in humans and cephalopods. Moreover, we found that while these genes are evolving under strong purifying selection as demonstrated by low dN/dS scores, they have simultaneously grown larger and gained the most isoforms in animals. This work provides a new lens through which to view this distinctive class of large and multi-isoform genes and demonstrates how intrinsic genomic properties, such as gene length, can provide flexibility in molecular evolution and allow groups of genes and their host organisms to evolve toward complexity.
Collapse
Affiliation(s)
- Matthew J. McCoy
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Whitman Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA
| | - Andrew Z. Fire
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Tine M, Kuhl H, Teske PR, Reinhardt R. Genome-wide analysis of European sea bass provides insights into the evolution and functions of single-exon genes. Ecol Evol 2021; 11:6546-6557. [PMID: 34141239 PMCID: PMC8207432 DOI: 10.1002/ece3.7507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 01/24/2021] [Accepted: 03/12/2021] [Indexed: 11/17/2022] Open
Abstract
Several studies have attempted to understand the origin and evolution of single-exon genes (SEGs) in eukaryotic organisms, including fishes, but few have examined the functional and evolutionary relationships between SEGs and multiple-exon gene (MEG) paralogs, in particular the conservation of promoter regions. Given that SEGs originate via the reverse transcription of mRNA from a "parental" MEGs, such comparisons may enable identifying evolutionarily-related SEG/MEG paralogs, which might fulfill equivalent physiological functions. Here, the relationship of SEG proportion with MEG count, gene density, intron count, and chromosome size was assessed for the genome of the European sea bass, Dicentrarchus labrax. Then, SEGs with an MEG parent were identified, and promoter sequences of SEG/MEG paralogs were compared, to identify highly conserved functional motifs. The results revealed a total count of 1,585 (8.3% of total genes) SEGs in the European sea bass genome, which was correlated with MEG count but not with gene density. The significant correlation of SEG content with the number of MEGs suggests that SEGs were continuously and independently generated over evolutionary time following species divergence through retrotranscription events, followed by tandem duplications. Functional annotation showed that the majority of SEGs are functional, as is evident from their expression in RNA-seq data used to support homology-based genome annotation. Differences in 5'UTR and 3'UTR lengths between SEG/MEG paralogs observed in this study may contribute to gene expression divergence between them and therefore lead to the emergence of new SEG functions. The comparison of nonsynonymous to synonymous changes (Ka/Ks) between SEG/MEG parents showed that 74 of them are under positive selection (Ka/Ks > 1; p = .0447). An additional fifteen SEGs with an MEG parent have a common promoter, which implies that they are under the influence of common regulatory networks.
Collapse
Affiliation(s)
- Mbaye Tine
- UFR des Sciences Agronomiques, de l'Aquaculture et des Technologies Alimentaires (S2ATA)Université Gaston Berger (UGB)Saint‐LouisSenegal
- Genome Centre at the Max‐Planck Institute for Plant Breeding ResearchKölnGermany
| | - Heiner Kuhl
- Department of Ecophysiology and AquacultureLeibniz‐Institute of Freshwater Ecology and Inland Fisheries (IGB)BerlinGermany
| | - Peter R. Teske
- Department of ZoologyCentre for Ecological Genomics and Wildlife ConservationUniversity of JohannesburgJohannesburgSouth Africa
| | - Richard Reinhardt
- Genome Centre at the Max‐Planck Institute for Plant Breeding ResearchKölnGermany
| |
Collapse
|
4
|
Robinson-Thiewes S, Kimble J. C. elegans mpk-1b long first intron enhances MPK-1B protein expression. MICROPUBLICATION BIOLOGY 2021; 2021:10.17912/micropub.biology.000350. [PMID: 33474533 PMCID: PMC7812387 DOI: 10.17912/micropub.biology.000350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 12/29/2020] [Accepted: 01/05/2021] [Indexed: 11/21/2022]
Affiliation(s)
| | - Judith Kimble
- University of Wisconsin-Madison: Department of Biochemistry, Madison, WI USA
- University of Wisconsin-Madison: Department of Medical Genetics, Madison, WI USA
| |
Collapse
|
5
|
McCoy MJ, Fire AZ. Intron and gene size expansion during nervous system evolution. BMC Genomics 2020; 21:360. [PMID: 32410625 PMCID: PMC7222433 DOI: 10.1186/s12864-020-6760-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 04/28/2020] [Indexed: 01/07/2023] Open
Abstract
Background The evolutionary radiation of animals was accompanied by extensive expansion of gene and genome sizes, increased isoform diversity, and complexity of regulation. Results Here we show that the longest genes are enriched for expression in neuronal tissues of diverse vertebrates and of invertebrates. Additionally, we show that neuronal gene size expansion occurred predominantly through net gains in intron size, with a positional bias toward the 5′ end of each gene. Conclusions We find that intron and gene size expansion is a feature of many genes whose expression is enriched in nervous systems. We speculate that unique attributes of neurons may subject neuronal genes to evolutionary forces favoring net size expansion. This process could be associated with tissue-specific constraints on gene function and/or the evolution of increasingly complex gene regulation in nervous systems.
Collapse
Affiliation(s)
- Matthew J McCoy
- Grass Fellowship Program, Marine Biological Laboratory, Woods Hole, MA, 02543, USA. .,Departments of Pathology and Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA.
| | - Andrew Z Fire
- Departments of Pathology and Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA.
| |
Collapse
|
6
|
Hamid MH, Rozano L, Yeong WC, Abdullah JO, Saidi NB. Analysis of MAP kinase MPK4/MEKK1/MKK genes of Carica papaya L. comparative to other plant homologues. Bioinformation 2017; 13:31-41. [PMID: 28642634 PMCID: PMC5463617 DOI: 10.6026/97320630013031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/17/2017] [Accepted: 02/17/2017] [Indexed: 12/25/2022] Open
Abstract
Mitogen-activated protein kinase 4 (MPK4) interacts with the (Mitogen-activated protein kinase kinase kinase 1) MEKK1/ Mitogenactivated protein kinase kinase 1 (MKK1)/ Mitogen-activated protein kinase kinase 2 (MKK2) complex to affect its function in plant development or against pathogen attacks. The KEGG (Kyoto Encyclopedia of Genes and Genomes) network analysis of Arabidopsis thaliana revealed close interactions between those four genes in the same plant-pathogen interaction pathway, which warrants further study of these genes due to their evolutionary conservation in different plant species. Through targeting the signature sequence in MPK4 of papaya using orthologs from Arabidopsis, the predicted sequence of MPK4 was studied using a comparative in silico approach between different plant species and the MAP cascade complex of MEKK1/MKK1/MKK2. This paper reported that MPK4 was highly conserved in papaya with 93% identical across more than 500 bases compared in each species predicted. Slight variations found in the MEKK1/MKK1/MKK2 complex nevertheless still illustrated sequence similarities between most of the species. Localization of each gene in the cascade network was also predicted, potentiating future functional verification of these genes interactions using knock out or/and gene silencing tactics.
Collapse
Affiliation(s)
- Muhammad Hanam Hamid
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Lina Rozano
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
| | - Wee Chien Yeong
- Biotechnology and Nanotechnology Research Centre, Malaysian Agricultural Research and Development Institute, 43400 Serdang, Selangor, Malaysia
| | - Janna Ong Abdullah
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Noor Baity Saidi
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| |
Collapse
|
7
|
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. Alternative splicing acting as a bridge in evolution. Stem Cell Investig 2015; 2:19. [PMID: 27358887 DOI: 10.3978/j.issn.2306-9759.2015.10.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 10/15/2015] [Indexed: 12/15/2022]
Abstract
BACKGROUND Alternative splicing (AS) regulates diverse cellular and developmental functions through alternative protein structures of different isoforms. Alternative exons dominate AS in vertebrates; however, very little is known about the extent and function of AS in lower eukaryotes. To understand the role of introns in gene evolution, we examined AS from a green algal and five fungal genomes using a novel EST-based gene-modeling algorithm (COMBEST). METHODS AS from each genome was classified with COMBEST that maps EST sequences to genomes to build gene models. Various aspects of AS were analyzed through statistical methods. The interplay of intron 3n length, phase, coding property, and intron retention (RI) were examined with Chi-square testing. RESULTS With 3 to 834 times EST coverage, we identified up to 73% of AS in intron-containing genes and found preponderance of RI among 11 types of AS. The number of exons, expression level, and maximum intron length correlated with number of AS per gene (NAG), and intron-rich genes suppressed AS. Genes with AS were more ancient, and AS was conserved among fungal genomes. Among stopless introns, non-retained introns (NRI) avoided, but major RI preferred 3n length. In contrast, stop-containing introns showed uniform distribution among 3n, 3n+1, and 3n+2 lengths. We found a clue to the intron phase enigma: it was the coding function of introns involved in AS that dictates the intron phase bias. CONCLUSIONS Majority of AS is non-functional, and the extent of AS is suppressed for intron-rich genes. RI through 3n length, stop codon, and phase bias bridges the transition from functionless to functional alternative isoforms.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Asaf Salamov
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Alan Kuo
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Andrea L Aerts
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Xiangyang Kong
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Igor V Grigoriev
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| |
Collapse
|
8
|
Gipson TA, Neueder A, Wexler NS, Bates GP, Housman D. Aberrantly spliced HTT, a new player in Huntington's disease pathogenesis. RNA Biol 2013; 10:1647-52. [PMID: 24256709 DOI: 10.4161/rna.26706] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Huntington's disease (HD) is an adult-onset neurodegenerative disorder caused by a mutated CAG repeat in the huntingtin gene that is translated into an expanded polyglutamine tract. The clinical manifestation of HD is a progressive physical, cognitive, and psychiatric deterioration that is eventually fatal. The mutant huntingtin protein is processed into several smaller fragments, which have been implicated as critical factors in HD pathogenesis. The search for proteases responsible for their production has led to the identification of several cleavage sites on the huntingtin protein. However, the origin of the small N-terminal fragments that are found in HD postmortem brains has remained elusive. Recent mapping of huntingtin fragments in a mouse model demonstrated that the smallest N-terminal fragment is an exon 1 protein. This discovery spurred our hypothesis that mis-splicing as opposed to proteolysis could be generating the smallest huntingtin fragment. We demonstrated that mis-splicing of mutant huntingtin intron 1 does indeed occur and results in a short polyadenylated mRNA, which is translated into an exon 1 protein. The exon 1 protein fragment is highly pathogenic. Transgenic mouse models containing just human huntingtin exon 1 develop a rapid onset of HD-like symptoms. Our finding that a small, mis-spliced HTT transcript and corresponding exon 1 protein are produced in the context of an expanded CAG repeat has unraveled a new molecular mechanism in HD pathogenesis. Here we present detailed models of how mis-splicing could be facilitated, what challenges remain in this model, and implications for therapeutic studies.
Collapse
Affiliation(s)
- Theresa A Gipson
- Koch Institute for Integrative Cancer Research; Massachusetts Institute of Technology; Cambridge, MA USA
| | - Andreas Neueder
- Department of Medical and Molecular Genetics; King's College London; London, UK
| | - Nancy S Wexler
- Hereditary Disease Foundation; New York, NY USA; Department of Neurology and Psychiatry; Columbia University; New York, NY USA
| | - Gillian P Bates
- Department of Medical and Molecular Genetics; King's College London; London, UK
| | - David Housman
- Koch Institute for Integrative Cancer Research; Massachusetts Institute of Technology; Cambridge, MA USA
| |
Collapse
|
9
|
Bradnam KR, Korf I. Longer first introns are a general property of eukaryotic gene structure. PLoS One 2008; 3:e3093. [PMID: 18769727 PMCID: PMC2518113 DOI: 10.1371/journal.pone.0003093] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Accepted: 08/11/2008] [Indexed: 11/19/2022] Open
Abstract
While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 5' UTR is--on average--significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations.
Collapse
Affiliation(s)
- Keith R Bradnam
- Genome Center, University of California Davis, Davis, California, USA.
| | | |
Collapse
|
10
|
Atambayeva SA, Khailenko VA, Ivashchenko AT. Intron and exon length variation in Arabidopsis, rice, nematode, and human. Mol Biol 2008. [DOI: 10.1134/s0026893308020180] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Abstract
The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes. Thus it is not surprising that the number of algorithm and software developers working in the area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.
Collapse
Affiliation(s)
- J W Fickett
- Theoretical Biology and Biophysics Group, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
12
|
Bissonnette N, Gilbert I, Levesque-Sergerie JP, Lacasse P, Petitclerc D. In vivo expression of the antimicrobial defensin and lactoferrin proteins allowed by the strategic insertion of introns adequately spliced. Gene 2006; 372:142-52. [PMID: 16516411 DOI: 10.1016/j.gene.2005.12.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Revised: 12/14/2005] [Accepted: 12/21/2005] [Indexed: 11/22/2022]
Abstract
A major limitation of conventional shuttle expression system, when cloning a bactericidal gene, is the basal expression level in bacteria, which is lethal. Although the expression level is low, the bactericidal feature inherent to the molecule leads to subsequent failure to recover intact transformants when the related gene is cloned into a conventional expression vector. Contrary to popular belief, the human cytomegalovirus immediate-early region 1 promoter (CMV), which is to date one of the most powerful promoters for eukaryotic expression, is active in bacteria. In this study, bactericidal genes were cloned into a conventional shuttle eukaryote expression vector harbouring the CMV promoter, but were interrupted with a sequence independent splicing element (SISE), thus inhibiting lethal gene expression in bacteria. The insertion strategy of the intron uses a universal restriction site-free cloning approach, which has been developed to insert a DNA fragment into a specific location of a gene, through a PCR-based cloning technique. We have found that one intervening sequence, which derives from an adenovirus, can be spliced in a mammalian system without respect to its location, thus the bactericidal protein is synthesized only when transfected into mammalian cells. Therein, lactoferrin and defensin proteins were produced in vivo without the necessity of complex expression systems. By introducing the adeno SISE within the coding sequence of the bactericidal genes, such genes can be easily synthesized in vitro through cloning into bacteria and still are able to express biologically active proteins when introduced into mammalian cells.
Collapse
Affiliation(s)
- Nathalie Bissonnette
- Dairy and Swine Research and Development Centre, Agriculture and Agri-Food Canada, P.O. Box 90, Lennoxville, Quebec Canada J1M 1Z3.
| | | | | | | | | |
Collapse
|
13
|
Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Conservation versus parallel gains in intron evolution. Nucleic Acids Res 2005; 33:1741-8. [PMID: 15788746 PMCID: PMC1069513 DOI: 10.1093/nar/gki316] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Orthologous genes from distant eukaryotic species, e.g. animals and plants, share up to 25-30% intron positions. However, the relative contributions of evolutionary conservation and parallel gain of new introns into this pattern remain unknown. Here, the extent of independent insertion of introns in the same sites (parallel gain) in orthologous genes from phylogenetically distant eukaryotes is assessed within the framework of the protosplice site model. It is shown that protosplice sites are no more conserved during evolution of eukaryotic gene sequences than random sites. Simulation of intron insertion into protosplice sites with the observed protosplice site frequencies and intron densities shows that parallel gain can account but for a small fraction (5-10%) of shared intron positions in distantly related species. Thus, the presence of numerous introns in the same positions in orthologous genes from distant eukaryotes, such as animals, fungi and plants, appears to reflect mostly bona fide evolutionary conservation.
Collapse
Affiliation(s)
| | | | | | - Eugene V. Koonin
- To whom correspondence should be addressed. Tel: +1 301 435 5913; Fax: +1 301 435 7794;
| |
Collapse
|
14
|
Stoltzfus A, Ford Doolittle W. Molecular evolution: slippery introns and globin gene evolution. Curr Biol 2005; 3:215-7. [PMID: 15335770 DOI: 10.1016/0960-9822(93)90336-m] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- A Stoltzfus
- Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistry, Dalhousie University, Halifax, Nova Scotia, B3H 4117, Canada
| | | |
Collapse
|
15
|
Chamary JV, Hurst LD. Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage. Mol Biol Evol 2004; 21:1014-23. [PMID: 15014158 DOI: 10.1093/molbev/msh087] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In mammals divergence at fourfold degenerate sites in codons (K(4)) and intronic sequence (K(i)) are both used to estimate the mutation rate, under the supposition that both evolve neutrally. Does it matter which of these we use? Using either class of sequence can be defended because (1) K(4) is the same as K(i) (at least in rodents) and (2) there is no selectively driven codon usage (hence no systematic selection on third sites). Here we re-examine these findings using 560 introns (for 136 genes) in the mouse-rat comparison, aligned by eye and using a new maximum likelihood protocol. We find that the rate of evolution at fourfold sites and at intronic sites is similar in magnitude, but only after eliminating putatively constrained sites from introns (first introns and sites flanking intron-exon junctions). Any approximate congruence between the two rates is not, however, owing to an underlying similarity in the mode of sequence evolution. Some dinucleotides are hypermutable and differently abundant in exons and introns (e.g., CpGs). More importantly, after controlling for relative abundance, all dinucleotides starting with A or T are more prevalent in mismatches in exons than in introns, whereas C-starting dinucleotides (except CG) are more common in introns. Although C content at intronic sites is lower than at flanking fourfold sites, G content is similar, demonstrating that there exists a strong strand-specific preference for C nucleotides that is unique to exons. Transcription-coupled mutational processes and biased gene conversion cannot explain this, as they should affect introns and flanking exons equally. Therefore, by elimination, we propose this to be strong evidence for selectively driven codon usage in mammals.
Collapse
Affiliation(s)
- Jean-Vincent Chamary
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | |
Collapse
|
16
|
Matsushita S, Matsushita M, Itoh H, Hagiwara K, Takahashi R, Ozawa T, Kuramoto K. Multiple pathology and tails of disability: Space-time structure of disability in longevity. Geriatr Gerontol Int 2003. [DOI: 10.1111/j.1444-1586.2003.00085.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Xue HY, Forsdyke DR. Low-complexity segments in Plasmodium falciparum proteins are primarily nucleic acid level adaptations. Mol Biochem Parasitol 2003; 128:21-32. [PMID: 12706793 DOI: 10.1016/s0166-6851(03)00039-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Protein segments that contain few of the possible 20 amino acids, sometimes in tandem repeat arrays, are referred to as containing "simple" or "low-complexity" sequence. Many Plasmodium falciparum proteins are longer than their homologs in other species by virtue of their content of such low-complexity segments that have no known function; these are interspersed among segments of higher complexity to which function can often be ascribed. If there is low complexity at the protein level, there is likely to be low complexity at the corresponding nucleic acid level (departure from equifrequency of the four bases). Thus, low complexity may have been selected primarily at the nucleic acid level and low complexity at the protein level may be secondary. In this case, the amino acid composition of low-complexity segments should be more reflective than that of high complexity segments on forces operating at the nucleic acid level, which include GC-pressure and AG-pressure. Consistent with this, for amino acid determining first and second codon positions, open reading frames containing low-complexity segments show increased contributions to downward GC-pressure (revealed as decreased percentage of G+C) and to upward AG-pressure (revealed as increased percentage A+G). When not countermanded by high contributions to AG-pressure, low-complexity segments can contribute to base order-dependent fold potential; in this respect, they resemble introns. Thus, in P. falciparum, low-complexity segments appear as adaptations primarily serving nucleic acid level functions.
Collapse
Affiliation(s)
- H Y Xue
- Department of Biochemistry, Queen's University, Kingston, Ont, K7L3N6, Canada
| | | |
Collapse
|
18
|
Jacobs K, Mattheeuws M, Van Poucke M, Van Zeveren A, Peelman LJ. Characterization of the porcine FABGL gene. Anim Genet 2002; 33:220-3. [PMID: 12030927 DOI: 10.1046/j.1365-2052.2002.00849.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The porcine major histocompatibility complex, also called swine lymphocyte antigen (SLA) complex, is of particular interest not only because of its central role in the immune response, but also because of its influence on many traits such as reproduction, fatness and meat quality. The porcine FABGL (FabG (beta-ketoacyl-[acyl-carrierprotein] reductase, Escherichia coli) like) gene, coding for a 17beta-hydroxysteroid dehydrogenase (17beta-HSD), is a candidate gene for these traits. The complete gene was sequenced and compared with human and mouse FABGL sequences. The deduced amino acid sequence showed 85 and 83% sequence identity to human and mouse sequences, respectively. Polymorphicic BbvI and DdeI restriction sites were found in the porcine FABGL gene. The promoter was compared with the promoter regions of human and mouse FABGL sequence in order to identify putative regulatory elements. The transcription profile of the porcine gene was determined and showed a widespread tissue distribution.
Collapse
Affiliation(s)
- K Jacobs
- Department of Animal Nutrition, Genetics, Breeding and Ethology, Faculty of Veterinary Medicine, Ghent University, Merelbeke, Belgium
| | | | | | | | | |
Collapse
|
19
|
Affiliation(s)
- I B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
20
|
Green KJ, Guy SG, Cserhalmi-Friedman PB, McLean WH, Christiano AM, Wagner RM. Analysis of the desmoplakin gene reveals striking conservation with other members of the plakin family of cytolinkers. Exp Dermatol 1999; 8:462-70. [PMID: 10597135 DOI: 10.1111/j.1600-0625.1999.tb00304.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Members of the plakin family of cytolinker proteins integrate filaments into cellular networks and anchor these networks to the plasma membrane. Their importance is supported by the existence of cell and tissue fragility disorders caused by mutations in certain family members. In this study, the human gene encoding desmoplakin (DSP) was characterized and its structure compared with the related family members: plectin, bullous pemphigoid antigen 1 (BPAG1), envoplakin (EVPL) and periplakin (PPL). Sequence analysis of genomic clones was carried out in combination with a PCR-based strategy to define intron-exon borders. DSP was mapped using the GB4 radiation hybrid mapping panel to the interval between markers D6S296 and AFM043 x f2, corresponding to cytogenetic band 6p24. In addition, the murine gene (Dsp) was mapped to mouse chromosome 13 by interspecific backcross mapping. DSP encompasses approximately 45 kb organized into 24 exons and 23 introns, and the pattern of intron-exon borders bears a striking resemblance to other members of the plakin family. Notable features include the fact that a single large exon encodes the entire C-terminus of each gene. In contrast, the N-termini comprise numerous smaller exons with conservation of many intron-exon borders. Detailed characterization and mapping of these genes will facilitate their further evaluation as targets of genetic disorders and provide insights into the evolutionary relationships among molecules in this emerging gene family.
Collapse
Affiliation(s)
- K J Green
- Department of Pathology, R. H. Lurie Cancer Center, Northwestern University Medical School, Chicago, IL 60611, USA.
| | | | | | | | | | | |
Collapse
|
21
|
Kriventseva EV, Gelfand MS. Statistical analysis of the exon-intron structure of higher and lower eukaryote genes. J Biomol Struct Dyn 1999; 17:281-8. [PMID: 10563578 DOI: 10.1080/07391102.1999.10508361] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Statistics of the exon-intron structure and splicing sites of several diverse eukaryotes was studied. The yeast exon-intron structures have a number of unique features. A yeast gene usually have at most one intron. The branch site is strongly conserved, whereas the polypirimidine tract is short. Long yeast introns tend to have stronger acceptor sites. In other species the branch site is less conserved and often cannot be determined. In non-yeast samples there is an almost universal correlation between lengths of neighboring exons (all samples excluding protists) and correlation between lengths of neighboring introns (human, drosophila, protists). On the average first introns are longer, and anomalously long introns are usually first introns in a gene. There is a universal preference for exons and exon pairs with the (total) length divisible by 3. Introns positioned between codons are preferred, whereas those positioned between the first and second positions in codon are avoided. The choice of A or G at the third position of intron (the donor splice sites generally prefer purines at this position) is correlated with the overall GC-composition of the gene. In all samples dinucleotide AG is avoided in the region preceding the acceptor site.
Collapse
Affiliation(s)
- E V Kriventseva
- VA Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow
| | | |
Collapse
|
22
|
Keinänen R, Vartiainen N, Koistinaho J. Molecular cloning and characterization of the rat inducible nitric oxide synthase (iNOS) gene. Gene X 1999; 234:297-305. [PMID: 10395902 DOI: 10.1016/s0378-1119(99)00196-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We have cloned and characterized the rat inducible nitric oxide synthase (iNOS) gene. It spans approx. 36kb and is divided into 27 exons and 26 introns. The distribution and length of exons are similar to those in the human iNOS gene. In the 5' flanking regulatory region of the rat iNOS gene, there are a number of putative transcription factor binding sites (>20), many of them probably indispensable for the gene's nuclear factor kappaB (NFkappaB)-dependent induction, but also many which may have a role in its NFkappaB-independent induction pathway. These include cyclic adenosine 3', 5'-monophosphate (cAMP) response elements (CRE), hypoxia responsive element (HRE) and GATA-core elements. Rat models are powerful tools in studies of neurological diseases. Because iNOS is most likely responsible for the harmful consequences of nitric oxide (NO) in general, the cloned rat iNOS gene will further reveal the mechanisms of iNOS inducibility in different cell types during development and disease, including brain diseases, and to promote studies of pharmacological intervention in cases where extensive NO production plays a critical role.
Collapse
Affiliation(s)
- R Keinänen
- A.I. Virtanen Institute for Molecular Sciences, University of Kuopio, P.O. Box 1627, FIN-70211, Kuopio, Finland.
| | | | | |
Collapse
|
23
|
Li W. Statistical properties of open reading frames in complete genome sequences. COMPUTERS & CHEMISTRY 1999; 23:283-301. [PMID: 10404621 DOI: 10.1016/s0097-8485(99)00014-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.
Collapse
Affiliation(s)
- W Li
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA.
| |
Collapse
|
24
|
Laub MT, Smith DW. Finding intron/exon splice junctions using INFO, INterruption Finder and Organizer. J Comput Biol 1998; 5:307-21. [PMID: 9672834 DOI: 10.1089/cmb.1998.5.307] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
INFO, INterruption Finder and Organizer, has been used to find coding sequence intron-exon splice junctions in human and other DNA by comparing the six conceptual translations of the input DNA sequence with sequences in protein databanks using a similarity matrix and windowing algorithm. Similarities detected both delineate position of the gene and provide clues as to the function of the gene product. In addition to use of a standard similarity matrix and windowing algorithm, INFO uses two novel steps, the MiniLibrary and Reverse Sequence steps, to enhance identification of small exons and to improve precision of junction nucleotide delineation. Exons as small as about 30 bases can be reliably found, and > 90% of junctions are precisely identified when canonical splice junction information is used. With the MiniLibrary and Reverse Sequence steps, INFO parameters need not be optimized by the user. In comparative test runs using 19 human DNA sequences, INFO found 108 of 111 exons, with 0 reported false positives, compared with 111 exons and 51 false positives for BLASTX, 99 exons and 6 false positives for GRAIL II, 77 exons and 24 false positives for GeneMark, 61 exons and 9 false positives for GeneID, and 105 exons and 6 false positives for PROCRUSTES. The correlation coefficient for finding and positioning these 111 exons was greater than 98% for INFO. Comparable results were obtained in test runs of 13 nonhuman DNA sequences. INFO is applicable to DNA from any species, will become more robust as sequence databanks expand, and complements other heuristic approaches.
Collapse
Affiliation(s)
- M T Laub
- Department of Biology, University of California, San Diego, La Jolla 92093, USA
| | | |
Collapse
|
25
|
Abstract
Transcriptional repression in eukaryotes often involves tens or hundreds of kilobase pairs, two to three orders of magnitude more than the bacterial operator/repressor model does. Classical repression, represented by this model, was maintained over the whole span of evolution under different guises, and consists of repressor factors interacting primarily with promoters and, in later evolution, also with enhancers. The use of much larger amounts of DNA in the other mode of repression, here called the sectorial mode ('superrepression'), results in the conceptual transfer of so-called junk DNA to the domain of functional DNA. This contribution to the solution of the c-value paradox involves perhaps 15% of genomic 'junk,' and encompasses the bulk of the introns, thought to fill a stabilizing role in sectorially repressed chromatin structures. In the case of developmental genes, such structures appear to be heterochromatoid in character. However, solid clues regarding general structural features of superrepressed terminal differentiation genes remain elusive. The competition among superrepressible DNA sectors for sectorially binding factors offers, in principle, a molecular mechanism for developmental switches. Position effect variegation may be considered an abnormal manifestation of normal processes that underly development and involve heterochromatoid sectorial repression, which is apparently required for local elimination or modulation of morphological features (morpholysis). Sectorial repression of genes participating either in development or in terminal differentiation is considered instrumental in establishing stable cell types, and provides a basis for the distinction between determination and cell type specification. The gamut of possible stable cell types may have been broadened by the appearance in evolution of heavy isochores. Additional types of relatively frequent GC-rich cis-acting DNA motifs may offer reiterated binding sites to factors endowed with a selective (though not individually strong) affinity for these motifs. The majority of sequence motifs thought to be used in superrepression need not be individually maintained by natural selection. It is re-emphasized that the dispensability of sequences is not an indicator of their nonfunctionality and that in many cases, along noncoding sequences, nucleotides tend to fill functions collectively, rather than individually.
Collapse
Affiliation(s)
- E Zuckerkandl
- Institute of Molecular Medical Sciences, Palo Alto, CA 94306, USA
| |
Collapse
|
26
|
Stoltzfus A, Logsdon JM, Palmer JD, Doolittle WF. Intron "sliding" and the diversity of intron positions. Proc Natl Acad Sci U S A 1997; 94:10739-44. [PMID: 9380704 PMCID: PMC23469 DOI: 10.1073/pnas.94.20.10739] [Citation(s) in RCA: 119] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Alignments of homologous genes typically reveal a great diversity of intron locations, far more than could fit comfortably in a single gene. Thus, a minority of these intron positions could be inherited from a single ancestral gene, but the larger share must be attributed to subsequent events of intron gain or intron "sliding" (movement from one position to another within a gene). Intron sliding has been argued from cases of discordant introns and from putative spatial clustering of intron positions. A list of 32 cases of discordant introns is presented here. Most of these cases are found to be artefactual. The spatial and phylogenetic distributions of intron positions from five published compilations of gene data, comprising 205 intron positions, have been examined systematically for evidence of intron sliding. The results suggest that sliding, if it occurs at all, has contributed little to the diversity of intron positions.
Collapse
Affiliation(s)
- A Stoltzfus
- Canadian Institute for Advanced Research Program in Evolutionary Biology, and Department of Biochemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7.
| | | | | | | |
Collapse
|
27
|
Xu W, Liu L, Mooslehner K, Emson PC. Structural organization of the human vesicular monoamine transporter type-2 gene and promoter analysis using the jelly fish green fluorescent protein as a reporter. BRAIN RESEARCH. MOLECULAR BRAIN RESEARCH 1997; 45:41-9. [PMID: 9105669 DOI: 10.1016/s0169-328x(96)00218-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The genomic structure of a human vesicle monoamine transporter, type-2 (hVMAT2) was determined from two overlapping cosmids, phVMAT2-cos1 and phVMT2-cos2, spanning more than 35 kb. The hVMAT2 open reading frame is encoded by 16 exons, with translation initiation and termination in exon 2 and exon 16, respectively. Several potential binding sites for transcriptional regulatory factors, including a cAMP response element (CRE) were identified in the 5'-upstream region of the gene. A promoter construct using the jellyfish green fluorescent protein (GFP) as reporter has been made and transfected into the human neuroblastoma cell line, SHSY-5Y. The cellular expression of the GFP was readily detected by fluorescence microscopy and cells expressing GFP could be sorted using a fluorescence-activated cell sorter (FACS), allowing the level of GFP expression in transfected SHSY-5Y cells to be quickly and reliably determined.
Collapse
Affiliation(s)
- W Xu
- Department of Neurobiology, Babraham Institute, Cambridge, UK.
| | | | | | | |
Collapse
|
28
|
Abstract
Since base composition of translational stop codons (TAG, TAA, and TGA) is biased toward a low G+C content, a differential density for these termination signals is expected in random DNA sequences of different base compositions. The expected length of reading frames (DNA segments of sense codons flanked by in-phase stop codons) in random sequences is thus a function of GC content. The analysis of DNA sequences from several genome databases stratified according to GC content reveals that the longest coding sequences-exons in vertebrates and genes in prokaryotes-are GC-rich, while the shortest ones are GC-poor. Exon lengthening in GC-rich vertebrate regions does not result, however, in longer vertebrate proteins, perhaps because of the lower number of exons in the genes located in these regions. The effects on coding-sequence lengths constitute a new evolutionary meaning for compositional variations in DNA GC content.
Collapse
Affiliation(s)
- J L Oliver
- Departamento de Genética, Instituto de Biotecnología, Facultad de Ciencias, Universidad de Granada, E-18071-Granada, Spain
| | | |
Collapse
|
29
|
Ogata H, Fujibuchi W, Kanehisa M. The size differences among mammalian introns are due to the accumulation of small deletions. FEBS Lett 1996; 390:99-103. [PMID: 8706839 DOI: 10.1016/0014-5793(96)00636-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In order to investigate the molecular mechanisms that alter intron size, we conducted an extensive interspecies comparison of homologous introns among three mammalian groups: human, artiodactyls, and rodents. The size differences of introns were statistically significant among all three groups (longest intron was for human and shortest for rodents), and appear to be due to the accumulation of small deletions, according to the separate count of insertion and deletion frequencies. The distribution of intron size differences also has a shape similar to that for the distribution of insertion/deletion sizes found in pseudogenes. It is suggested that introns are selectively neutral to small-scale changes of the genome size, which inherently contain the bias of favoring short deletions against short insertions.
Collapse
Affiliation(s)
- H Ogata
- Institute for Chemical Research, Kyoto University, Japan
| | | | | |
Collapse
|
30
|
Abstract
Close analysis of intron phase - the position of introns within codons - is claimed to provide novel evidence supporting the view that introns predate the divergence of bacteria and eukaryotes and, via 'exon shuffling', played a crucial role in protein evolution. But just how compelling is this evidence?
Collapse
Affiliation(s)
- L D Hurst
- Department of Genetics, Downing Street, Cambridge, CB2 3EH, UK
| | | |
Collapse
|
31
|
Liu CG, Maercker C, Castañon MJ, Hauptmann R, Wiche G. Human plectin: organization of the gene, sequence analysis, and chromosome localization (8q24). Proc Natl Acad Sci U S A 1996; 93:4278-83. [PMID: 8633055 PMCID: PMC39526 DOI: 10.1073/pnas.93.9.4278] [Citation(s) in RCA: 104] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Plectin, a 500-kDa intermediate filament binding protein, has been proposed to provide mechanical strength to cells and tissues by acting as a cross-linking element of the cytoskeleton. To set the basis for future studies on gene regulation, tissue-specific expression, and pathological conditions involving this protein, we have cloned the human plectin gene, determined its coding sequence, and established its genomic organization. The coding sequence contains 32 exons that extend over 32 kb of the human genome. Most of the introns reside within a region encoding the globular N-terminal domain of the molecule, whereas the entire central rod domain and the entire C-terminal globular domain were found to be encoded by single exons of remarkable length, >3 kb and >6 kb, respectively. Overall, the organization of the human plectin gene was strikingly similar to that of human bullous pemphigoid antigen 1 (BPAG1), confirming that both proteins belong to the same gene family. Comparison of the deduced protein sequences for human and rat plectin revealed that they were 93% identical. By using fluorescence in situ hybridization, we have mapped the plectin gene to the long arm of chromosome 8 within the telomeric region. This gene locus (8q24) has previously been implicated in the human blistering skin disease epidermolysis bullosa simplex Ogna. Detailed knowledge of the structure of the plectin gene and its chromosome localization will aid in the elucidation of whether this or any other pathological conditions are linked to alterations in the plectin gene.
Collapse
Affiliation(s)
- C G Liu
- Institute of Biochemistry and Molecular Cell Biology, University of Vienna-Biocenter, Austria
| | | | | | | | | |
Collapse
|
32
|
Bailleul B. During in vivo maturation of eukaryotic nuclear mRNA, splicing yields excised exon circles. Nucleic Acids Res 1996; 24:1015-9. [PMID: 8604331 PMCID: PMC145744 DOI: 10.1093/nar/24.6.1015] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Circular splicing has already been described on nuclear pre-mRNA for certain splice sites far apart in the multi exonic ETS-1 gene and in the single 1.2 kb exon of the Sry locus. To date, it is unclear how splice site juxtaposition occurs in normal and circular splicing. The splice site selection of an internal exon is likely to involve pairing between splice sites across that exon. Based on this, we predict that, albeit at low frequency, internal exons yield circular RNA by splicing as an error-prone mechanism of exon juxtaposition or, perhaps more interestingly, as a regulated mechanism on alternative exons. To address this question, the circular exon formation was analyzed at three ETS-1 internal exons (one alternative spliced exon and two constitutive), in human cell line and blood cell samples. Here, we show by RT-PCR and sequencing that exon circular splicing occurs at the three individual exons that we examined. RNase protection experiments suggest that there is no correlation between exon circle expression and exon skipping.
Collapse
Affiliation(s)
- B Bailleul
- Unite 124 INSERM, Institut de Recherches sur le Cancer de Lille, France
| |
Collapse
|
33
|
Brocchieri L, Karlin S. How are close residues of protein structures distributed in primary sequence? Proc Natl Acad Sci U S A 1995; 92:12136-40. [PMID: 8618859 PMCID: PMC40311 DOI: 10.1073/pnas.92.26.12136] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Structurally neighboring residues are categorized according to their separation in the primary sequence as proximal (1-4 positions apart) and otherwise distal, which in turn is divided into near (5-20 positions), far (21-50 positions), very far ( > 50 positions), and interchain (from different chains of the same structure). These categories describe the linear distance histogram (LDH) for three-dimensional neighboring residue types. Among the main results are the following: (i) nearest-neighbor hydrophobic residues tend to be increasingly distally separated in the linear sequence, thus most often connecting distinct secondary structure units. (ii) The LDHs of oppositely charged nearest-neighbors emphasize proximal positions with a subsidiary maximum for very far positions. (iii) Cysteine-cysteine structural interactions rarely involve proximal positions. (iv) The greatest numbers of interchain specific nearest-neighbors in protein structures are composed of oppositely charged residues. (v) The largest fraction of side-chain neighboring residues from beta-strands involves near positions, emphasizing associations between consecutive strands. (vi) Exposed residue pairs are predominantly located in proximal linear positions, while buried residue pairs principally correspond to far or very far distal positions. The results are principally invariant to protein sizes, amino acid usages, linear distance normalizations, and over- and underrepresentations among nearest-neighbor types. Interpretations and hypotheses concerning the LDHs, particularly those of hydrophobic and charged pairings, are discussed with respect to protein stability and functionality. The pronounced occurrence of oppositely charged interchain contacts is consistent with many observations on protein complexes where multichain stabilization is facilitated by electrostatic interactions.
Collapse
Affiliation(s)
- L Brocchieri
- Department of Mathematics, Stanford University, CA 94305-2125, USA
| | | |
Collapse
|
34
|
Maxwell MM, Nearing J, Aziz N. Ke 6 gene. Sequence and organization and aberrant regulation in murine polycystic kidney disease. J Biol Chem 1995; 270:25213-9. [PMID: 7559658 DOI: 10.1074/jbc.270.42.25213] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Ke 6 gene is a newly identified gene located in the major histocompatibility complex and is a candidate steroid dehydrogenase gene because of structural homology and regulatory similarities with mammalian steroid dehydrogenases. We report here the complete nucleotide sequence and intron-exon organization of the Ke 6 gene and cloning of the alternatively spliced Ke 6b transcript. We find that Ke 6 gene expression is down-regulated in pcy mice which is a murine model of polycystic kidney disease (PKD). Thus far, Ke 6 gene expression is down-regulated in all murine models of PKD we have examined. Abnormal steroid metabolism as a possible cause of PKD is discussed.
Collapse
Affiliation(s)
- M M Maxwell
- Department of Medicine, Children's Hospital, Boston, Massachusetts 02115, USA
| | | | | |
Collapse
|
35
|
Stoltzfus A, Spencer DF, Zuker M, Logsdon JM, Doolittle WF. Response
: Introns and the Origin of Protein-Coding Genes. Science 1995. [DOI: 10.1126/science.268.5215.1367.b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
36
|
Stoltzfus A, Spencer DF, Zuker M, Logsdon JM, Doolittle WF. Response
: Introns and the Origin of Protein-Coding Genes. Science 1995. [DOI: 10.1126/science.268.5215.1367-b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
37
|
Durkin ME, Wewer UM, Chung AE. Exon organization of the mouse entactin gene corresponds to the structural domains of the polypeptide and has regional homology to the low-density lipoprotein receptor gene. Genomics 1995; 26:219-28. [PMID: 7601446 DOI: 10.1016/0888-7543(95)80204-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Entactin is a widespread basement membrane protein of 150 kDa that binds to type IV collagen and laminin. The complete exon-intron structure of the mouse entactin gene has been determined from lambda genomic DNA clones. The gene spans at least 65 kb and contains 20 exons. The exon organization of the mouse entactin gene closely corresponds to the organization of the polypeptide into distinct structural and functional domains. The two amino-terminal globular domains are encoded by three exons each. Single exons encode the two protease-sensitive, O-glycosylated linking regions. The six EGF-like repeats and the single thyroglobulin-type repeat are each encoded by separate exons. The carboxyl-terminal half of entactin displays sequence homology to the growth factor-like region of the low-density lipoprotein receptor, and in both genes this region is encoded by eight exons. The positions of four introns are also conserved in the homologous region of the two genes. These observations suggest that the entactin gene has evolved via exon shuffling. Finally, several sequence polymorphisms useful for gene linkage analysis were found in the 3' noncoding region of the last exon.
Collapse
Affiliation(s)
- M E Durkin
- Department of Biological Sciences, University of Pittsburgh, Pennsylvania 15260, USA
| | | | | |
Collapse
|
38
|
Duret L, Mouchiroud D, Gautier C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 1995; 40:308-17. [PMID: 7723057 DOI: 10.1007/bf00163235] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We compared the exon/intron organization of vertebrate genes belonging to different isochore classes, as predicted by their GC content at third codon position. Two main features have emerged from the analysis of sequences published in GenBank: (1) genes coding for long proteins (i.e., > or = 500 aa) are almost two times more frequent in GC-poor than in GC-rich isochores; (2) intervening sequences (= sum of introns) are on average three times longer in GC-poor than in GC-rich isochores. These patterns are observed among human, mouse, rat, cow, and even chicken genes and are therefore likely to be common to all warm-blooded vertebrates. Analysis of Xenopus sequences suggests that the same patterns exist in cold-blooded vertebrates. It could be argued that such results do not reflect the reality because sequence databases are not representative of entire genomes. However, analysis of biases in GenBank revealed that the observed discrepancies between GC-rich and GC-poor isochores are not artifactual, and are probably largely underestimated. We investigated the distribution of microsatellites and interspersed repeats in introns of human and mouse genes from different isochores. This analysis confirmed previous studies showing that L1 repeats are almost absent from GC-rich isochores. Microsatellites and SINES (Alu, B1, B2) are found at roughly equal frequencies in introns from all isochore classes. Globally, the presence of repeated sequences does not account for the increased intron length in GC-poor isochores. The relationships between gene structure and global genome organization and evolution are discussed.
Collapse
Affiliation(s)
- L Duret
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Lyon I, URA-CNRS 243, Villeurbanne, France
| | | | | |
Collapse
|
39
|
Abstract
Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology. Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow region, Russia
| |
Collapse
|
40
|
Altenberg L. Genome growth and the evolution of the genotype-phenotype map. LECTURE NOTES IN COMPUTER SCIENCE 1995. [DOI: 10.1007/3-540-59046-3_11] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
41
|
|
42
|
|
43
|
Abstract
A database of 210 Schizosaccharomyces pombe DNA sequences (524,794 bp) was extracted from GenBank (release number 81.0) and examined by a number of methods in order to characterize statistical features of these sequences that might serve as signals or constraints for messenger RNA splicing. The statistical information compiled includes splicing signal (donor, acceptor and branch site) profiles, translational initiation start profile, exon/intron length distributions, ORF distribution, CDS size distribution, codon usage table, and 6-tuple distribution. The information content of the various signals are also presented. A rule-based interactive computer program for finding introns called INTRON.PLOT has been developed and was used to successfully analyze 7 newly sequenced genes.
Collapse
Affiliation(s)
- M Q Zhang
- Cold Spring Harbor Laboratory, NY 11724
| | | |
Collapse
|
44
|
Cloning and characterization of HTK, a novel transmembrane tyrosine kinase of the EPH subfamily. J Biol Chem 1994. [DOI: 10.1016/s0021-9258(17)36776-5] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
45
|
Abstract
Intron recognition in Angiosperms is hypothesized to require AU-rich motifs within introns. In this report we examined the role of AU-rich motifs in pre-mRNA processing. AU-rich segments of maize introns inserted near the single intron of the maize Bronze-2(Bz2) gene result in alternative splicing. Other insertions of AU-rich sequence in the Bz2 cDNA resulted in de novo intron creation using splice junctions at the edges of the AU-rich region. Surprisingly, the five AU-rich inserts that we tested also caused polyadenylation, even though none had been selected for that function in plants. Insertions of GC-rich sequence into Bz2 did not cause either splicing or polyadenylation. We propose that AU-rich motifs are a general signal for RNA processing in maize and that in the absence of a 5' splice site, polyadenylation is the default pathway.
Collapse
Affiliation(s)
- K R Luehrsen
- Department of Biological Sciences, Stanford University, California 94305-5020
| | | |
Collapse
|
46
|
White SH. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol 1994; 38:383-94. [PMID: 8007006 DOI: 10.1007/bf00163155] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79-95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 "super-family" proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length. The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079-2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377-1382).
Collapse
Affiliation(s)
- S H White
- Department of Physiology and Biophysics, University of California, Irvine 92717
| |
Collapse
|
47
|
Abstract
Exon sizes in vertebrate genes are, with a few exceptions, limited to less than 300 bases. It has been proposed that this limitation may derive from the exon definition model of splice site recognition. In this model, a downstream donor site enhances splicing at the upstream acceptor site of the same exon. This enhancement may require contact between factors bound to each end of the exon; an exon size limitation would promote such contact. To test the idea that proximity was required for exon definition, we inserted random DNA fragments from Escherichia coli into a central exon in a three-exon dihydrofolate reductase minigene and tested whether the expanded exons were efficiently spliced. DNA from a plasmid library of expanded minigenes was used to transfect a CHO cell deletion mutant lacking the dhfr locus. PCR analysis of DNA isolated from the pooled stable cotransfectant populations displayed a range of DNA insert sizes from 50 to 1,500 nucleotides. A parallel analysis of the RNA from this population by reverse transcription followed by PCR showed a similar size distribution. Central exons as large as 1,400 bases could be spliced into mRNA. We also tested individual plasmid clones containing exon inserts of defined sizes. The largest exon included in mRNA was 1,200 bases in length, well above the 300-base limit implied by the survey of naturally occurring exons. We conclude that a limitation in exon size is not part of the exon definition mechanism.
Collapse
|
48
|
Genomic structure and subcellular localization of MAL, a human T-cell-specific proteolipid protein. J Biol Chem 1994. [DOI: 10.1016/s0021-9258(17)37174-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
49
|
Abstract
Exon sizes in vertebrate genes are, with a few exceptions, limited to less than 300 bases. It has been proposed that this limitation may derive from the exon definition model of splice site recognition. In this model, a downstream donor site enhances splicing at the upstream acceptor site of the same exon. This enhancement may require contact between factors bound to each end of the exon; an exon size limitation would promote such contact. To test the idea that proximity was required for exon definition, we inserted random DNA fragments from Escherichia coli into a central exon in a three-exon dihydrofolate reductase minigene and tested whether the expanded exons were efficiently spliced. DNA from a plasmid library of expanded minigenes was used to transfect a CHO cell deletion mutant lacking the dhfr locus. PCR analysis of DNA isolated from the pooled stable cotransfectant populations displayed a range of DNA insert sizes from 50 to 1,500 nucleotides. A parallel analysis of the RNA from this population by reverse transcription followed by PCR showed a similar size distribution. Central exons as large as 1,400 bases could be spliced into mRNA. We also tested individual plasmid clones containing exon inserts of defined sizes. The largest exon included in mRNA was 1,200 bases in length, well above the 300-base limit implied by the survey of naturally occurring exons. We conclude that a limitation in exon size is not part of the exon definition mechanism.
Collapse
Affiliation(s)
- I T Chen
- Department of Biological Sciences, Columbia University, New York, New York 10027
| | | |
Collapse
|
50
|
Luehrsen KR, Taha S, Walbot V. Nuclear pre-mRNA processing in higher plants. PROGRESS IN NUCLEIC ACID RESEARCH AND MOLECULAR BIOLOGY 1994; 47:149-93. [PMID: 8016320 DOI: 10.1016/s0079-6603(08)60252-4] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- K R Luehrsen
- Department of Biological Sciences, Stanford University, California 94305
| | | | | |
Collapse
|