151
|
Knapp K, Chonka A, Chen YPP. POEM, A 3-dimensional exon taxonomy and patterns in untranslated exons. BMC Genomics 2008; 9:428. [PMID: 18803852 PMCID: PMC2561055 DOI: 10.1186/1471-2164-9-428] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 09/20/2008] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question. RESULTS POEM (Protein Oriented Exon Monikers) is a new taxonomy focused on protein proximal exons. It integrates three dimensions of information (Global Position, Regional Position and Region), thus its exon categories are based on known statistical exon features. POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties. Using the POEM taxonomy previous wide ranging estimates of initial 5' untranslated region exons are resolved. According to our datasets, 29-36% of genes have wholly untranslated first exons. Untranslated exon containing sequences are shown to have consistently up to 6 times more 5' untranslated exons than 3' untranslated exons. Finally, three exon patterns are determined which account for 70% of untranslated exon genes. CONCLUSION We describe a thorough three-dimensional exon taxonomy called POEM, which is biologically and statistically relevant. No previous taxonomy provides such fine grained information and yet still includes all valid information dimensions. The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy. It will also facilitate unambiguous communication due to its fine granularity.
Collapse
Affiliation(s)
- Keith Knapp
- Faculty of Science and Technology, Deakin University, Victoria, Australia.
| | | | | |
Collapse
|
152
|
Bradnam KR, Korf I. Longer first introns are a general property of eukaryotic gene structure. PLoS One 2008; 3:e3093. [PMID: 18769727 PMCID: PMC2518113 DOI: 10.1371/journal.pone.0003093] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Accepted: 08/11/2008] [Indexed: 11/19/2022] Open
Abstract
While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 5' UTR is--on average--significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations.
Collapse
Affiliation(s)
- Keith R Bradnam
- Genome Center, University of California Davis, Davis, California, USA.
| | | |
Collapse
|
153
|
Barbazuk WB, Fu Y, McGinnis KM. Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res 2008; 18:1381-92. [PMID: 18669480 DOI: 10.1101/gr.053678.106] [Citation(s) in RCA: 255] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Alternative splicing (AS) creates multiple mRNA transcripts from a single gene. While AS is known to contribute to gene regulation and proteome diversity in animals, the study of its importance in plants is in its early stages. However, recently available plant genome and transcript sequence data sets are enabling a global analysis of AS in many plant species. Results of genome analysis have revealed differences between animals and plants in the frequency of alternative splicing. The proportion of plant genes that have one or more alternative transcript isoforms is approximately 20%, indicating that AS in plants is not rare, although this rate is approximately one-third of that observed in human. The majority of plant AS events have not been functionally characterized, but evidence suggests that AS participates in important plant functions, including stress response, and may impact domestication and trait selection. The increasing availability of plant genome sequence data will enable larger comparative analyses that will identify functionally important plant AS events based on their evolutionary conservation, determine the influence of genome duplication on the evolution of AS, and discover plant-specific cis-elements that regulate AS. This review summarizes recent analyses of AS in plants, discusses the importance of further analysis, and suggests directions for future efforts.
Collapse
Affiliation(s)
- W Brad Barbazuk
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA.
| | | | | |
Collapse
|
154
|
Guo W, Cai C, Wang C, Zhao L, Wang L, Zhang T. A preliminary analysis of genome structure and composition in Gossypium hirsutum. BMC Genomics 2008; 9:314. [PMID: 18590573 PMCID: PMC2481271 DOI: 10.1186/1471-2164-9-314] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2008] [Accepted: 07/01/2008] [Indexed: 11/23/2022] Open
Abstract
Background Upland cotton has the highest yield, and accounts for > 95% of world cotton production. Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species. Here, we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome. Results 142 BAC sequences from Gossypium hirsutum cv. Maxxa were downloaded and confirmed. These BAC sequence analysis revealed that the tetraploid cotton genome contains over 70,000 candidate genes with duplicated gene copies in homoeologous A- and D-subgenome regions. Gene distribution is uneven, with gene-rich and gene-free regions of the genome. Twenty-one percent of the 142 BACs lacked genes. BAC gene density ranged from 0 to 33.2 per 100 kb, whereas most gene islands contained only one gene with an average of 1.5 genes per island. Retro-elements were found to be a major component, first an enriched LTR/gypsy and second LTR/copia. Most LTR retrotransposons were truncated and in nested structures. In addition, 166 polymorphic loci amplified with SSRs developed from 70 BAC clones were tagged on our backbone genetic map. Seventy-five percent (125/166) of the polymorphic loci were tagged on the D-subgenome. By comprehensively analyzing the molecular size of amplified products among tetraploid G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124, and diploid G. herbaceum var. africanum and G. raimondii, 37 BACs, 12 from the A- and 25 from the D-subgenome, were further anchored to their corresponding subgenome chromosomes. After a large amount of genes sequence comparison from different subgenome BACs, the result showed that introns might have no contribution to different subgenome size in Gossypium. Conclusion This study provides us with the first glimpse of cotton genome complexity and serves as a foundation for tetraploid cotton whole genomesequencing in the future.
Collapse
Affiliation(s)
- Wangzhen Guo
- National Key Laboratory of Crop Genetics & Germplasm Enhancement, Cotton Research Institute, Nanjing Agricultural University, Nanjing 210095, PR China.
| | | | | | | | | | | |
Collapse
|
155
|
Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science 2008; 320:1629-31. [PMID: 18535209 DOI: 10.1126/science.1158078] [Citation(s) in RCA: 205] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
The role that natural selection plays in governing the locations and early evolution of copy-number mutations remains largely unexplored. We used high-density full-genome tiling arrays to create a fine-scale genomic map of copy-number polymorphisms (CNPs) in Drosophila melanogaster. We inferred a total of 2658 independent CNPs, 56% of which overlap genes. These include CNPs that are likely to be under positive selection, most notably high-frequency duplications encompassing toxin-response genes. The locations and frequencies of CNPs are strongly shaped by purifying selection, with deletions under stronger purifying selection than duplications. Among duplications, those overlapping exons or introns, as well as those falling on the X chromosome, seem to be subject to stronger purifying selection.
Collapse
Affiliation(s)
- J J Emerson
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA.
| | | | | | | |
Collapse
|
156
|
Atambayeva SA, Khailenko VA, Ivashchenko AT. Intron and exon length variation in Arabidopsis, rice, nematode, and human. Mol Biol 2008. [DOI: 10.1134/s0026893308020180] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
157
|
Mackert A, do Nascimento AM, Bitondi MMG, Hartfelder K, Simões ZLP. Identification of a juvenile hormone esterase-like gene in the honey bee, Apis mellifera L. — Expression analysis and functional assays. Comp Biochem Physiol B Biochem Mol Biol 2008; 150:33-44. [DOI: 10.1016/j.cbpb.2008.01.004] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Revised: 01/17/2008] [Accepted: 01/21/2008] [Indexed: 01/25/2023]
|
158
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
159
|
|
160
|
Ogino K, Tsuneki K, Furuya H. CLONING OF CHITINASE-LIKE PROTEIN1 CDNA FROM DICYEMID MESOZOANS (PHYLUM: DICYEMIDA). J Parasitol 2007; 93:1403-15. [DOI: 10.1645/ge-1290.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
161
|
Havlioglu N, Wang J, Fushimi K, Vibranovski MD, Kan Z, Gish W, Fedorov A, Long M, Wu JY. An intronic signal for alternative splicing in the human genome. PLoS One 2007; 2:e1246. [PMID: 18043753 PMCID: PMC2082412 DOI: 10.1371/journal.pone.0001246] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 10/23/2007] [Indexed: 11/23/2022] Open
Abstract
An important level at which the expression of programmed cell death (PCD) genes is regulated is alternative splicing. Our previous work identified an intronic splicing regulatory element in caspase-2 (casp-2) gene. This 100-nucleotide intronic element, In100, consists of an upstream region containing a decoy 3' splice site and a downstream region containing binding sites for splicing repressor PTB. Based on the signal of In100 element in casp-2, we have detected the In100-like sequences as a family of sequence elements associated with alternative splicing in the human genome by using computational and experimental approaches. A survey of human genome reveals the presence of more than four thousand In100-like elements in 2757 genes. These In100-like elements tend to locate more frequent in intronic regions than exonic regions. EST analyses indicate that the presence of In100-like elements correlates with the skipping of their immediate upstream exons, with 526 genes showing exon skipping in such a manner. In addition, In100-like elements are found in several human caspase genes near exons encoding the caspase active domain. RT-PCR experiments show that these caspase genes indeed undergo alternative splicing in a pattern predicted to affect their functional activity. Together, these results suggest that the In100-like elements represent a family of intronic signals for alternative splicing in the human genome.
Collapse
Affiliation(s)
- Necat Havlioglu
- Department of Pathology, Saint Louis University, St. Louis, Missouri, United States of America
| | - Jun Wang
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Kazuo Fushimi
- Department of Neurology, Lurie Comprehensive Cancer Center, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Maria D. Vibranovski
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Zhengyan Kan
- Department of Genetics, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Warren Gish
- Department of Genetics, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Alexei Fedorov
- Department of Medicine and Program in Bioinformatics and Proteomics/Genomics, Medical University of Ohio, Toledo, Ohio, United States of America
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, United States of America
| | - Jane Y. Wu
- Department of Neurology, Lurie Comprehensive Cancer Center, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| |
Collapse
|
162
|
Schwartz SH, Silva J, Burstein D, Pupko T, Eyras E, Ast G. Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res 2007; 18:88-103. [PMID: 18032728 DOI: 10.1101/gr.6818908] [Citation(s) in RCA: 136] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Introns are among the hallmarks of eukaryotic genes. Splicing of introns is directed by three main splicing signals: the 5' splice site (5'ss), the branch site (BS), and the polypyrimdine tract/3'splice site (PPT-3'ss). To study the evolution of these splicing signals, we have conducted a systematic comparative analysis of these signals in over 1.2 million introns from 22 eukaryotes. Our analyses suggest that all these signals have dramatically evolved: The PPT is weak among most fungi, intermediate in plants and protozoans, and strongest in metazoans. Within metazoans it shows a gradual strengthening from Caenorhabditis elegans to human. The 5'ss and the BS were found to be degenerate among most organisms, but highly conserved among some fungi. A maximum parsimony-based algorithm for reconstructing ancestral position-specific scoring matrices suggested that the ancestral 5'ss and BS were degenerate, as in metazoans. To shed light on the evolutionary variation in splicing signals, we have analyzed the evolutionary changes in the factors that bind these signals. Our analysis reveals coevolution of splicing signals and their corresponding splicing factors: The strength of the PPT is correlated to changes in key residues in its corresponding splicing factor U2AF2; limited correlation was found between changes in the 5'ss and U1 snRNA that binds it; but not between the BS and U2 snRNA. Thus, although the basic ability of eukaryotes to splice introns has remained conserved throughout evolution, the splicing signals and their corresponding splicing factors have considerably evolved, uniquely shaping the splicing mechanisms of different organisms.
Collapse
Affiliation(s)
- Schraga H Schwartz
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv 69978, Israel
| | | | | | | | | | | |
Collapse
|
163
|
Aruga J, Odaka YS, Kamiya A, Furuya H. Dicyema Pax6 and Zic: tool-kit genes in a highly simplified bilaterian. BMC Evol Biol 2007; 7:201. [PMID: 17961212 PMCID: PMC2222250 DOI: 10.1186/1471-2148-7-201] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 10/25/2007] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Dicyemid mesozoans (Phylum Dicyemida) are simple (8-40-cell) cephalopod endoparasites. They have neither body cavities nor differentiated organs, such as nervous and gastrointestinal systems. Whether dicyemids are intermediate between Protozoa and Metazoa (as represented by their "Mesozoa" classification) or degenerate species of more complex metazoans is controversial. Recent molecular phylogenetic studies suggested that they are simplified bilaterians belonging to the Lophotrochozoa. We cloned two genes developmentally critical in bilaterian animals (Pax6 and Zic), together with housekeeping genes (actin, fructose-bisphosphate aldolase, and ATP synthase beta subunit) from a dicyemid to reveal whether their molecular phylogeny supported the "simplification" hypothesis, and to clarify evolutionary changes in dicyemid gene structure and expression profiles. RESULTS Genomic/cDNA sequence analysis showed that 1) the Pax6 molecular phylogeny and Zic intron positions supported the idea of dicyemids as reduced bilaterians; 2) the aa sequences deduced from the five genes were highly divergent; and 3) Dicyema genes contained very short introns of uniform length. In situ hybridization analyses revealed that Zic genes were expressed in hermaphroditic gonads, and Pax6 was expressed weakly throughout the developmental stages of the 2 types of embryo and in the hermaphroditic gonads. CONCLUSION The accelerated evolutionary rates and very short and uniform intron may represent a part of Dicyema genomic features. The presence and expression of the two tool-kit genes (Pax6 and Zic) in Dicyema suggests that they can be very versatile genes even required for the highly reduced bilaterian like Dicyema. Dicyemids may be useful models of evolutionary body plan simplification.
Collapse
Affiliation(s)
- Jun Aruga
- Laboratory for Comparative Neurogenesis, RIKEN Brain Science Institute, Wako 351-0198, Japan
| | - Yuri S Odaka
- Laboratory for Comparative Neurogenesis, RIKEN Brain Science Institute, Wako 351-0198, Japan
| | - Akiko Kamiya
- Laboratory for Comparative Neurogenesis, RIKEN Brain Science Institute, Wako 351-0198, Japan
| | - Hidetaka Furuya
- Department of Biology, Graduate School of Science, Osaka University, Toyonaka, Osaka 560-0043, Japan
| |
Collapse
|
164
|
Drieschner N, Kerschling S, Soller JT, Rippe V, Belge G, Bullerdiek J, Nimzyk R. A domain of the thyroid adenoma associated gene (THADA) conserved in vertebrates becomes destroyed by chromosomal rearrangements observed in thyroid adenomas. Gene 2007; 403:110-7. [PMID: 17889454 DOI: 10.1016/j.gene.2007.06.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2007] [Revised: 06/25/2007] [Accepted: 06/25/2007] [Indexed: 11/27/2022]
Abstract
THADA, mapping to chromosomal band 2p21 is target gene of specific chromosomal rearrangements observed in thyroid benign tumors. Thus, it is one of the most common gene targets in chromosomal rearrangements in benign epithelial tumors. Nevertheless, nothing is known about the function of its protein. Therefore, we have analyzed the genetic structure of THADA homologous genes in selected vertebrates (Canis familiaris, Chlorocebus aethiops, Gallus gallus, and Mus musculus), which are not characterized up to now. The coding sequences of the mRNA of these species have been sequenced and analyzed revealing similarities to ARM repeat structures which indicates an involvement in protein-protein interactions. Using multiple alignments we identified the most conserved part of the protein (aa 1033-1415 Homo sapiens) with an identity of 70.5% between the most different organisms implying a putative important functional domain. The truncations observed in human thyroid adenomas disrupt this conserved domain of the protein indicating a loss of function of THADA contributing to the development of the follicular neoplasias of the thyroid.
Collapse
Affiliation(s)
- Norbert Drieschner
- Center for Human Genetics, University of Bremen, Leobenerstr./ZHG, D-28359 Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
165
|
Ogino K, Tsuneki K, Furuya H. The expression of tubulin and tektin genes in dicyemid mesozoans (Phylum: Dicyemida). J Parasitol 2007; 93:608-18. [PMID: 17626353 DOI: 10.1645/ge-1037r.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Dicyemid mesozoans (Phylum Dicyemida) are endoparasites (or endosymbionts) that typically are found in the renal sac of benthic cephalopod mollusks such as octopuses and cuttlefishes. Adult dicyemids likely adhere to the renal appendage of hosts via cilia of calotte peripheral cells. These cilia seem to be continuously worn away in the interaction between the dicyemids and the epidermal cells of host renal appendages. We cloned 4 cDNAs and genes, alpha-tubulin, beta-tubulin, tektin B, and tektin C, which are thought to play a key role in ciliogenesis, from Dicyema japonicum, and studied expression patterns of these genes by whole-mount in situ hybridization. We detected coexpression of these genes in the calotte peripheral cells, but not in the trunk peripheral cells. This suggests that regeneration and turnover of cilia continuously occur in the calotte. In vermiform and infusoriform embryos, we also detected coexpression patterns of these genes, which might correlate with ciliogenesis during the embryogenesis. We also predicted the secondary structure and the coiled-coil regions of dicyemid tektins.
Collapse
Affiliation(s)
- Kazutoyo Ogino
- Department of Biology, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka, Osaka 560-0043, Japan.
| | | | | |
Collapse
|
166
|
Coghlan A, Durbin R. Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure. Bioinformatics 2007; 23:1468-75. [PMID: 17483502 PMCID: PMC2880447 DOI: 10.1093/bioinformatics/btm133] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other. RESULTS We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder. AVAILABILITY Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix
Collapse
Affiliation(s)
- Avril Coghlan
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | |
Collapse
|
167
|
Grover CE, Kim H, Wing RA, Paterson AH, Wendel JF. Microcolinearity and genome evolution in the AdhA region of diploid and polyploid cotton (Gossypium). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2007; 50:995-1006. [PMID: 17461788 DOI: 10.1111/j.1365-313x.2007.03102.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Genome sizes vary by several orders of magnitude, driven by mechanisms such as illegitimate recombination and transposable element proliferation. Prior analysis of the CesA region in two cotton genomes that diverged 5-10 million years ago (Ma), and acquired a twofold difference in genome size, revealed extensive local conservation of genic and intergenic regions, with no evidence of the global genome size difference. The present study extends the comparison to include BAC sequences surrounding the gene encoding alcohol dehydrogenase A (AdhA) from four cotton genomes: the two co-resident genomes (A(T) and D(T)) of the allotetraploid, Gossypium hirsutum, as well as the model diploid progenitors, Gossypium arboreum (A) and Gossypium raimondii (D). In contrast to earlier work, evolution in the AdhA region reflects, in a microcosm, the overall difference in genome size, with a nearly twofold difference in aligned sequence length. Most size differences may be attributed to differential accumulation of retroelements during divergence of the genome diploids from their common ancestor, but in addition there has been a biased accumulation of small deletions, such that those in the smaller D genome are on average twice as large as those in the larger A genome. The data also provide evidence for the global phenomenon of 'genomic downsizing' in polyploids shortly after formation. This in part reflects a higher frequency of small deletions post-polyploidization, and increased illegitimate recombination. In conjunction with previous work, the data here confirm the conclusion that genome size evolution reflects many forces that collectively operate heterogeneously among genomic regions.
Collapse
Affiliation(s)
- Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | |
Collapse
|
168
|
Carmel L, Wolf YI, Rogozin IB, Koonin EV. Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Res 2007; 17:1034-44. [PMID: 17495008 PMCID: PMC1899114 DOI: 10.1101/gr.6438607] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Several contrasting scenarios have been proposed for the origin and evolution of spliceosomal introns, a hallmark of eukaryotic genes. A comprehensive probabilistic model to obtain a definitive reconstruction of intron evolution was developed and applied to 391 sets of conserved genes from 19 eukaryotic species. It is inferred that a relatively high intron density was reached early, i.e., the last common ancestor of eukaryotes contained >2.15 introns/kilobase, and the last common ancestor of multicellular life forms harbored approximately 3.4 introns/kilobase, a greater intron density than in most of the extant fungi and in some animals. The rates of intron gain and intron loss appear to have been dropping during the last approximately 1.3 billion years, with the decline in the gain rate being much steeper. Eukaryotic lineages exhibit three distinct modes of evolution of the intron-exon structure. The primary, balanced mode, apparently, operates in all lineages. In this mode, intron gain and loss are strongly and positively correlated, in contrast to previous reports on inverse correlation between these processes. The second mode involves an elevated rate of intron loss and is prevalent in several lineages, such as fungi and insects. The third mode, characterized by elevated rate of intron gain, is seen only in deep branches of the tree, indicating that bursts of intron invasion occurred at key points in eukaryotic evolution, such as the origin of animals. Intron dynamics could depend on multiple mechanisms, and in the balanced mode, gain and loss of introns might share common mechanistic features.
Collapse
Affiliation(s)
- Liran Carmel
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Igor B. Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
- Corresponding author.E-mail ; fax (301) 480-9241
| |
Collapse
|
169
|
Provata A, Oikonomou T. Power law exponents characterizing human DNA. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:056102. [PMID: 17677128 DOI: 10.1103/physreve.75.056102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 02/09/2007] [Indexed: 05/16/2023]
Abstract
The size distributions of all known coding and noncoding DNA sequences are studied in all human chromosomes. In a unified approach, both introns and intergenic regions are treated as noncoding regions. The distributions of noncoding segments Pnc(S) of size S present long tails Pnc(S) approximately S(-1-mu nc) , with exponents mu nc ranging between 0.71 (for chromosome 13) and 1.2 (for chromosome 19). On the contrary, the exponential, short-range decay terms dominate in the distributions of coding (exon) segments Pc(S) in all chromosomes. Aiming to address the emergence of these statistical features, minimal, stochastic, mean-field models are proposed, based on randomly aggregating DNA strings with duplication, influx and outflux of genomic segments. These minimal models produce both the short-range statistics in the coding and the observed power law and fractal statistics in the noncoding DNA. The minimal models also demonstrate that although the two systems (coding and noncoding) coexist, alternating on the same linear chain, they act independently: the coding as a closed, equilibrium system and the noncoding as an open, out-of-equilibrium one.
Collapse
Affiliation(s)
- A Provata
- Institute of Physical Chemistry, National Center for Scientific Research Demokritos, 15310 Athens, Greece.
| | | |
Collapse
|
170
|
Kodama Y, Nagaya S, Shinmyo A, Kato K. Mapping and characterization of DNase I hypersensitive sites in Arabidopsis chromatin. PLANT & CELL PHYSIOLOGY 2007; 48:459-70. [PMID: 17283013 DOI: 10.1093/pcp/pcm017] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Recent genome-wide analyses of yeast and human chromatin revealed the widespread prevalence of DNase I hypersensitive sites (DNase I HSs) at gene regulatory regions with possible roles in eukaryotic gene regulation. The presence of DNase I HSs in plants has been described for only a few genes, and we analyzed the chromatin structure of an 80 kb genomic region containing 30 variably expressed genes by DNase I sensitivity assay at 500 bp resolution in Arabidopsis. Distinct DNase I HSs were found at the 5' and/or 3' ends of most genes irrespective of their expression levels. Further analysis of well-characterized genes showed that the DNase I HSs occurred near cis-regulatory elements in the promoters of these genes. Upon transcriptional activation of a heat-inducible gene, the DNase I HS was extended into the vicinity of a cis-element and adjacent TATA element in the promoter. Concomitant with this change in DNase I HS, histones were acetylated, removed from the promoter, and a transcription activator bound to this cis-element. These results suggest that the DNase I HSs participate in the transcriptional regulation of Arabidopsis genes by enhancing the access of chromatin remodeling factors and/or transcription factors to their target sites as seen in yeast and human chromatin.
Collapse
Affiliation(s)
- Yuichi Kodama
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara, 630-0192, Japan
| | | | | | | |
Collapse
|
171
|
Aluri S, Büttner M. Identification and functional expression of the Arabidopsis thaliana vacuolar glucose transporter 1 and its role in seed germination and flowering. Proc Natl Acad Sci U S A 2007; 104:2537-42. [PMID: 17284600 PMCID: PMC1892959 DOI: 10.1073/pnas.0610278104] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Indexed: 11/18/2022] Open
Abstract
Sugar compartmentation into vacuoles of higher plants is a very important physiological process, providing extra space for transient and long-term sugar storage and contributing to the osmoregulation of cell turgor and shape. Despite the long-standing knowledge of this subcellular sugar partitioning, the proteins responsible for these transport steps have remained unknown. We have identified a gene family in Arabidopsis consisting of three members homologous to known sugar transporters. One member of this family, Arabidopsis thaliana vacuolar glucose transporter 1 (AtVGT1), was localized to the vacuolar membrane. Moreover, we provide evidence for transport activity of a tonoplast sugar transporter based on its functional expression in bakers' yeast and uptake studies in isolated yeast vacuoles. Analyses of Atvgt1 mutant lines indicate an important function of this vacuolar glucose transporter during developmental processes like seed germination and flowering.
Collapse
Affiliation(s)
- Sirisha Aluri
- Molekulare Pflanzenphysiologie, Universität Erlangen–Nürnberg, Staudtstrasse 5, D-91058 Erlangen, Germany
| | - Michael Büttner
- Molekulare Pflanzenphysiologie, Universität Erlangen–Nürnberg, Staudtstrasse 5, D-91058 Erlangen, Germany
| |
Collapse
|
172
|
Gudlaugsdottir S, Boswell DR, Wood GR, Ma J. Exon size distribution and the origin of introns. Genetica 2007; 131:299-306. [PMID: 17279432 DOI: 10.1007/s10709-007-9139-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2005] [Accepted: 01/06/2007] [Indexed: 11/29/2022]
Abstract
Since it was first recognised that eukaryotic genes are fragmented into coding segments (exons) separated by non-coding segments (introns), the reason for this phenomenon has been debated. There are two dominant theories: that the piecewise arrangement of genes allows functional protein domains, represented by exons, to recombine by shuffling to form novel proteins with combinations of functions; or that introns represent parasitic DNA that can infest the eukaryotic genome because it does not interfere grossly with the fitness of its host. Differing distributions of exon lengths are predicted by these two theories. In this paper we examine distributions of exon lengths for six different organisms and find that they offer empirical evidence that both theories may in part be correct.
Collapse
|
173
|
|
174
|
HybGFS: a hybrid method for genome-fingerprint scanning. BMC Bioinformatics 2006; 7:479. [PMID: 17069662 PMCID: PMC1643838 DOI: 10.1186/1471-2105-7-479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2006] [Accepted: 10/29/2006] [Indexed: 11/16/2022] Open
Abstract
Background Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases. Results Here, we present a hybrid method for genome-fingerprint scanning, known as HybGFS. This technique combines genome sequence-based peptide MS/MS ion searching with liquid-chromatography elution-time (LC-ET) prediction, to improve the reliability of identification. The hybrid method allows the simultaneous identification and mapping of proteins without a priori information about their coding sequences. The current study used standard LC-MS/MS data to query an in silico-generated six-reading-frame translation and the enzymatic digest of an entire genome. Used in conjunction with precursor/product ion-mass searching, the LC-ETs increased confidence in the peptide-identification process and reduced the number of false-positive matches. The power of this method was demonstrated using recombinant proteins from the Escherichia coli K12 strain. Conclusion The novel hybrid method described in this study will be useful for the large-scale experimental confirmation of genome coding sequences, without the need for transcriptome-level expression analysis or costly MS database searching.
Collapse
|
175
|
Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genes Dev 2006; 16:1252-61. [PMID: 16954538 PMCID: PMC1581434 DOI: 10.1101/gr.5282906] [Citation(s) in RCA: 279] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Accepted: 05/22/2006] [Indexed: 11/25/2022]
Abstract
The DNA content of eukaryotic nuclei (C-value) varies approximately 200,000-fold, but there is only a approximately 20-fold variation in the number of protein-coding genes. Hence, most C-value variation is ascribed to the repetitive fraction, although little is known about the evolutionary dynamics of the specific components that lead to genome size variation. To understand the modes and mechanisms that underlie variation in genome composition, we generated sequence data from whole genome shotgun (WGS) libraries for three representative diploid (n = 13) members of Gossypium that vary in genome size from 880 to 2460 Mb (1C) and from a phylogenetic outgroup, Gossypioides kirkii, with an estimated genome size of 588 Mb. Copy number estimates including all dispersed repetitive sequences indicate that 40%-65% of each genome is composed of transposable elements. Inspection of individual sequence types revealed differential, lineage-specific expansion of various families of transposable elements among the different plant lineages. Copia-like retrotransposable element sequences have differentially accumulated in the Gossypium species with the smallest genome, G. raimondii, while gypsy-like sequences have proliferated in the lineages with larger genomes. Phylogenetic analyses demonstrated a pattern of lineage-specific amplification of particular subfamilies of retrotransposons within each species studied. One particular group of gypsy-like retrotransposon sequences, Gorge3 (Gossypium retrotransposable gypsy-like element), appears to have undergone a massive proliferation in two plant lineages, accounting for a major fraction of genome-size change. Like maize, Gossypium has undergone a threefold increase in genome size due to the accumulation of LTR retrotransposons over the 5-10 Myr since its origin.
Collapse
Affiliation(s)
- Jennifer S. Hawkins
- Iowa State University, Department of Ecology, Evolution and Organismal Biology, Ames, Iowa 50011, USA
| | - HyeRan Kim
- University of Arizona, Department of Plant Sciences, Arizona Genomics Institute, Tucson, Arizona 85721, USA
| | - John D. Nason
- Iowa State University, Department of Ecology, Evolution and Organismal Biology, Ames, Iowa 50011, USA
| | - Rod A. Wing
- University of Arizona, Department of Plant Sciences, Arizona Genomics Institute, Tucson, Arizona 85721, USA
| | - Jonathan F. Wendel
- Iowa State University, Department of Ecology, Evolution and Organismal Biology, Ames, Iowa 50011, USA
| |
Collapse
|
176
|
Laurell C, Wirta V, Nilsson P, Lundeberg J. Comparative analysis of a 3' end tag PCR and a linear RNA amplification approach for microarray analysis. J Biotechnol 2006; 127:638-46. [PMID: 16997411 DOI: 10.1016/j.jbiotec.2006.08.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Revised: 07/13/2006] [Accepted: 08/17/2006] [Indexed: 11/20/2022]
Abstract
BACKGROUND Various types of amplification techniques have been developed in order to enable microarray gene expression analysis when the amount of starting material is limited. The two main strategies are linear amplification, using in vitro transcription, and exponential amplification, based on PCR. We have evaluated the performance of a linear and an in-house developed exponential amplification protocol that relies on 3' end tag sequences. We used 100 ng total RNA as starting material for amplification and compared the results with data from hybridizations with unamplified mRNA and total RNA. RESULTS Preservation of expression ratios after amplification was examined comparing log(2) ratios obtained with amplification protocols to those obtained with standard labelling of mRNA. The Pearson correlations were 0.61 and 0.84, respectively, for the two linear amplification replicates and 0.76 and 0.80 for the two exponential amplification replicates. The correlations between repeated amplifications was 0.82 with the exponential method and 0.63 with the linear, indicating a better reproducibility with the PCR-based approach. CONCLUSION Both amplification methods generated results in agreement with unamplified material. In this study, the PCR-based method was more reproducible than in vitro transcription amplification. Advantages with the in-house developed method are the lower cost since it is non-commercial and that the PCR generated product offers compatibility with both sense and antisense arrays.
Collapse
Affiliation(s)
- Cecilia Laurell
- School of Biotechnology, Department of Gene Technology, KTH, Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden.
| | | | | | | |
Collapse
|
177
|
Hui J, Bindereif A. Alternative pre-mRNA splicing in the human system: unexpected role of repetitive sequences as regulatory elements. Biol Chem 2006; 386:1265-71. [PMID: 16336120 DOI: 10.1515/bc.2005.143] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Alternative splicing is a process by which multiple messenger RNAs (mRNAs) are generated from a single pre-mRNA, resulting in functionally distinct protein products. This is accomplished by the differential recognition of splice sites in the pre-mRNA, often regulated in a tissue- or development-specific manner. Alternative splicing constitutes not only an important mechanism in controlling gene expression in humans, but also an essential source for increasing proteome diversity. In this review we summarize the underlying mechanistic principles, focussing on the cis-acting regulatory elements. In particular, the role of short sequence repeats, which are often polymorphic, in splicing regulation is discussed.
Collapse
Affiliation(s)
- Jingyi Hui
- Institut für Biochemie, Justus-Liebig-Universität Giessen, Heinrich-Buff-Ring 58, D-35392 Giessen, Germany
| | | |
Collapse
|
178
|
Gilson PR, Su V, Slamovits CH, Reith ME, Keeling PJ, McFadden GI. Complete nucleotide sequence of the chlorarachniophyte nucleomorph: nature's smallest nucleus. Proc Natl Acad Sci U S A 2006; 103:9566-71. [PMID: 16760254 PMCID: PMC1480447 DOI: 10.1073/pnas.0600707103] [Citation(s) in RCA: 161] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The introduction of plastids into different heterotrophic protists created lineages of algae that diversified explosively, proliferated in marine and freshwater environments, and radically altered the biosphere. The origins of these secondary plastids are usually inferred from the presence of additional plastid membranes. However, two examples provide unique snapshots of secondary-endosymbiosis-in-action, because they retain a vestige of the endosymbiont nucleus known as the nucleomorph. These are chlorarachniophytes and cryptomonads, which acquired their plastids from a green and red alga respectively. To allow comparisons between them, we have sequenced the nucleomorph genome from the chlorarachniophyte Bigelowiella natans: at a mere 373,000 bp and with only 331 genes, the smallest nuclear genome known and a model for extreme reduction. The genome is eukaryotic in nature, with three linear chromosomes containing densely packed genes with numerous overlaps. The genome is replete with 852 introns, but these are the smallest introns known, being only 18, 19, 20, or 21 nt in length. These pygmy introns are shown to be miniaturized versions of normal-sized introns present in the endosymbiont at the time of capture. Seventeen nucleomorph genes encode proteins that function in the plastid. The other nucleomorph genes are housekeeping entities, presumably underpinning maintenance and expression of these plastid proteins. Chlorarachniophyte plastids are thus serviced by three different genomes (plastid, nucleomorph, and host nucleus) requiring remarkable coordination and targeting. Although originating by two independent endosymbioses, chlorarachniophyte and cryptomonad nucleomorph genomes have converged upon remarkably similar architectures but differ in many molecular details that reflect two distinct trajectories to hypercompaction and reduction.
Collapse
Affiliation(s)
- Paul R. Gilson
- *Infection and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville 3050, Australia
| | - Vanessa Su
- School of Botany, University of Melbourne, Victoria 3010, Australia
| | - Claudio H. Slamovits
- Department of Botany, University of British Columbia, Vancouver, BC, Canada V6T 1Z4
| | - Michael E. Reith
- Institute for Marine Biosciences, National Research Council, Halifax, NS, Canada B3H 3Z1; and
| | - Patrick J. Keeling
- Department of Botany, University of British Columbia, Vancouver, BC, Canada V6T 1Z4
| | - Geoffrey I. McFadden
- School of Botany, University of Melbourne, Victoria 3010, Australia
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
179
|
Chung BYW, Simons C, Firth AE, Brown CM, Hellens RP. Effect of 5'UTR introns on gene expression in Arabidopsis thaliana. BMC Genomics 2006; 7:120. [PMID: 16712733 PMCID: PMC1482700 DOI: 10.1186/1471-2164-7-120] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 05/19/2006] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND The majority of introns in gene transcripts are found within the coding sequences (CDSs). A small but significant fraction of introns are also found to reside within the untranslated regions (5'UTRs and 3'UTRs) of expressed sequences. Alignment of the whole genome and expressed sequence tags (ESTs) of the model plant Arabidopsis thaliana has identified introns residing in both coding and non-coding regions of the genome. RESULTS A bioinformatic analysis revealed some interesting observations: (1) the density of introns in 5'UTRs is similar to that in CDSs but much higher than that in 3'UTRs; (2) the 5'UTR introns are preferentially located close to the initiating ATG codon; (3) introns in the 5'UTRs are, on average, longer than introns in the CDSs and 3'UTRs; and (4) 5'UTR introns have a different nucleotide composition to that of CDS and 3'UTR introns. Furthermore, we show that the 5'UTR intron of the A. thaliana EF1alpha-A3 gene affects the gene expression and the size of the 5'UTR intron influences the level of gene expression. CONCLUSION Introns within the 5'UTR show specific features that distinguish them from introns that reside within the coding sequence and the 3'UTR. In the EF1alpha-A3 gene, the presence of a long intron in the 5'UTR is sufficient to enhance gene expression in plants in a size dependent manner.
Collapse
Affiliation(s)
- Betty YW Chung
- Biochemistry Department, University of Otago, Dunedin, New Zealand
- Bioscience Institute, University College Cork, Cork, Ireland
| | - Cas Simons
- HortResearch, Auckland, New Zealand
- Institute of Molecular Biosciences, Brisbane, Australia
| | - Andrew E Firth
- Biochemistry Department, University of Otago, Dunedin, New Zealand
| | - Chris M Brown
- Biochemistry Department, University of Otago, Dunedin, New Zealand
| | | |
Collapse
|
180
|
Wang BB, Brendel V. Genomewide comparative analysis of alternative splicing in plants. Proc Natl Acad Sci U S A 2006; 103:7175-80. [PMID: 16632598 PMCID: PMC1459036 DOI: 10.1073/pnas.0602039103] [Citation(s) in RCA: 402] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2005] [Indexed: 11/18/2022] Open
Abstract
Alternative splicing (AS) has been extensively studied in mammalian systems but much less in plants. Here we report AS events deduced from EST/cDNA analysis in two model plants: Arabidopsis and rice. In Arabidopsis, 4,707 (21.8%) of the genes with EST/cDNA evidence show 8,264 AS events. Approximately 56% of these events are intron retention (IntronR), and only 8% are exon skipping. In rice, 6,568 (21.2%) of the expressed genes display 14,542 AS events, of which 53.5% are IntronR and 13.8% are exon skipping. The consistent high frequency of IntronR suggests prevalence of splice site recognition by intron definition in plants. Different AS events within a given gene occur, for the most part, independently. In total, 36-43% of the AS events produce transcripts that would be targets of the non-sense-mediated decay pathway, if that pathway were to operate in plants as in humans. Forty percent of Arabidopsis AS genes are alternatively spliced also in rice, with some examples strongly suggesting a role of the AS event as an evolutionary conserved mechanism of posttranscriptional regulation. We created a comprehensive web-interfaced database to compile and visualize the evidence for alternative splicing in plants (Alternative Splicing in Plants, available at www.plantgdb.org/ASIP).
Collapse
Affiliation(s)
- Bing-Bing Wang
- Departments of *Genetics, Development, and Cell Biology and
| | - Volker Brendel
- Departments of *Genetics, Development, and Cell Biology and
- Statistics, Iowa State University, Ames, IA 50011-3260
| |
Collapse
|
181
|
Sarmiento C, Nigul L, Kazantseva J, Buschmann M, Truve E. AtRLI2 is an endogenous suppressor of RNA silencing. PLANT MOLECULAR BIOLOGY 2006; 61:153-63. [PMID: 16786298 DOI: 10.1007/s11103-005-0001-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2005] [Accepted: 12/28/2005] [Indexed: 05/10/2023]
Abstract
RNA silencing is a mechanism involved in gene regulation during development and anti-viral defense in plants and animals. Although many viral suppressors of this mechanism have been described up to now, this is not the case for endogenous suppressors. We have identified a novel endogenous suppressor in plants: RNase L inhibitor (RLI) of Arabidopsis thaliana. RLI is a very conserved protein among eukaryotes and archaea. It was first known as component of the interferon-induced mammalian 2'-5' oligoadenylate (2-5A) anti-viral pathway. This protein is in several organisms responsible for essential functions, which are not related to the 2-5A pathway, like ribosome biogenesis and translation initiation. Arabidopsis has two RLI paralogs. We have described in detail the expression pattern of one of these paralogs (AtRLI2), which is ubiquitously expressed in all plant organs during different developmental stages. Infiltrating Nicotiana benthamiana green fluorescent protein (GFP)-transgenic line with Agrobacterium strains harboring GFP and AtRLI2, we proved that AtRLI2 suppresses silencing at the local and at the systemic level, reducing drastically the amount of GFP small interfering RNAs.
Collapse
Affiliation(s)
- Cecilia Sarmiento
- Department of Gene Technology, Tallinn University of Technology, Estonia.
| | | | | | | | | |
Collapse
|
182
|
Kuo BYL, Chen Y, Bohacec S, Johansson Ö, Wasserman WW, Simpson EM. SAGE2Splice: unmapped SAGE tags reveal novel splice junctions. PLoS Comput Biol 2006; 2:e34. [PMID: 16683015 PMCID: PMC1447652 DOI: 10.1371/journal.pcbi.0020034] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Accepted: 03/08/2006] [Indexed: 11/18/2022] Open
Abstract
Serial analysis of gene expression (SAGE) not only is a method for profiling the global expression of genes, but also offers the opportunity for the discovery of novel transcripts. SAGE tags are mapped to known transcripts to determine the gene of origin. Tags that map neither to a known transcript nor to the genome were hypothesized to span a splice junction, for which the exon combination or exon(s) are unknown. To test this hypothesis, we have developed an algorithm, SAGE2Splice, to efficiently map SAGE tags to potential splice junctions in a genome. The algorithm consists of three search levels. A scoring scheme was designed based on position weight matrices to assess the quality of candidates. Using optimized parameters for SAGE2Splice analysis and two sets of SAGE data, candidate junctions were discovered for 5%-6% of unmapped tags. Candidates were classified into three categories, reflecting the previous annotations of the putative splice junctions. Analysis of predicted tags extracted from EST sequences demonstrated that candidate junctions having the splice junction located closer to the center of the tags are more reliable. Nine of these 12 candidates were validated by RT-PCR and sequencing, and among these, four revealed previously uncharacterized exons. Thus, SAGE2Splice provides a new functionality for the identification of novel transcripts and exons. SAGE2Splice is available online at http://www.cisreg.ca.
Collapse
Affiliation(s)
- Byron Yu-Lin Kuo
- Genetics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ying Chen
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Slavita Bohacec
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Öjvind Johansson
- Stockholm Bioinformatics Center, Kunliga Tekniska Högskolan, Albanova, Stockholm, Sweden
| | - Wyeth W Wasserman
- Genetics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Elizabeth M Simpson
- Genetics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
183
|
Abstract
There has been a lively debate over the evolution of eukaryote introns: at what point in the tree of life did they appear and from where, and what has been their subsequent pattern of loss and gain? A diverse range of recent research papers is relevant to this debate, and it is timely to bring them together. The absence of introns that are not self-splicing in prokaryotes and several other lines of evidence suggest an ancient eukaryotic origin for these introns, and the subsequent gain and loss of introns appears to be an ongoing process in many organisms. Some introns are now functionally important and there have been suggestions that invoke natural selection for the ancient and recent gain of introns, but it is also possible that fixation and loss of introns can occur in the absence of positive selection.
Collapse
Affiliation(s)
- R Belshaw
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK.
| | | |
Collapse
|
184
|
Agrawal R, Stormo GD. Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans. Bioinformatics 2006; 22:1239-44. [PMID: 16595562 DOI: 10.1093/bioinformatics/btl076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computational gene prediction methods are an important component of whole genome analyses. While ab initio gene finders have demonstrated major improvements in accuracy, the most reliable methods are evidence-based gene predictors. These algorithms can rely on several different sources of evidence including predictions from multiple ab initio gene finders, matches to known proteins, sequence conservation and partial cDNAs to predict the final product. Despite the success of these algorithms, prediction of complete gene structures, especially for alternatively spliced products, remains a difficult task. RESULTS LOCUS (Length Optimized Characterization of Unknown Spliceforms) is a new evidence-based gene finding algorithm which integrates a length-constraint into a dynamic programming-based framework for prediction of gene products. On a Caenorhabditis elegans test set of alternatively spliced internal exons, its performance exceeds that of current ab initio gene finders and in most cases can accurately predict the correct form of all the alternative products. As the length information used by the algorithm can be obtained in a high-throughput fashion, we propose that integration of such information into a gene-prediction pipeline is feasible and doing so may improve our ability to fully characterize the complete set of mRNAs for a genome. AVAILABILITY LOCUS is available from http://ural.wustl.edu/software.html
Collapse
Affiliation(s)
- Ritesh Agrawal
- Department of Genetics, Washington University School of Medicine 660 S. Euclid, Campus Box 8232, St. Louis, MO 63110, USA
| | | |
Collapse
|
185
|
Titus TA, Selvig DR, Qin B, Wilson C, Starks AM, Roe BA, Postlethwait JH. The Fanconi anemia gene network is conserved from zebrafish to human. Gene 2006; 371:211-23. [PMID: 16515849 DOI: 10.1016/j.gene.2005.11.038] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2005] [Revised: 10/24/2005] [Accepted: 11/30/2005] [Indexed: 11/28/2022]
Abstract
Fanconi anemia (FA) is a complex disease involving nine identified and two unidentified loci that define a network essential for maintaining genomic stability. To test the hypothesis that the FA network is conserved in vertebrate genomes, we cloned and sequenced zebrafish (Danio rerio) cDNAs and/or genomic BAC clones orthologous to all nine cloned FA genes (FANCA, FANCB, FANCC, FANCD1, FANCD2, FANCE, FANCF, FANCG, and FANCL), and identified orthologs in the genome database for the pufferfish Tetraodon nigroviridis. Genomic organization of exons and introns was nearly identical between zebrafish and human for all genes examined. Hydrophobicity plots revealed conservation of FA protein structure. Evolutionarily conserved regions identified functionally important domains, since many amino acid residues mutated in human disease alleles or shown to be critical in targeted mutagenesis studies are identical in zebrafish and human. Comparative genomic analysis demonstrated conserved syntenies for all FA genes. We conclude that the FA gene network has remained intact since the last common ancestor of zebrafish and human lineages. The application of powerful genetic, cellular, and embryological methodologies make zebrafish a useful model for discovering FA gene functions, identifying new genes in the network, and identifying therapeutic compounds.
Collapse
Affiliation(s)
- Tom A Titus
- Institute of Neuroscience, 1254 University of Oregon, Eugene, OR 97403, USA
| | | | | | | | | | | | | |
Collapse
|
186
|
Ko P, Narayanan M, Kalyanaraman A, Aluru S. Space-conserving optimal DNA-protein alignment. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:80-8. [PMID: 16448002 DOI: 10.1109/csb.2004.1332420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
DNA-protein alignment algorithms can be used to discover coding sequences in a genomic sequence, if the corresponding protein derivatives are known. They can also be used to identify potential coding sequences of a newly sequenced genome, by using proteins from related species. Previously known algorithms either solve a simplified formulation, or sacrifice optimality to achieve practical implementation. In this paper, we present a comprehensive formulation of the DNA-protein alignment problem, and an algorithm to compute the optimal alignment in O(mn) time using only four tables of size (m + 1) x (n + 1), where m and n are the lengths of the DNA and protein sequences, respectively. We also developed a Protein and DNA Alignment program PanDA that implements the proposed solution. Experimental results indicate that our algorithm produces high quality alignments.
Collapse
Affiliation(s)
- Pang Ko
- Department of Electrical and Computer Engineering, Iowa State University, USA.
| | | | | | | |
Collapse
|
187
|
Wei H, Fu Y, Arora R. Intron-flanking EST-PCR markers: from genetic marker development to gene structure analysis in Rhododendron. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2005; 111:1347-56. [PMID: 16167139 DOI: 10.1007/s00122-005-0064-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2005] [Accepted: 07/26/2005] [Indexed: 05/04/2023]
Abstract
With a long-term goal of constructing a linkage map of Rhododendron enriched with gene-specific markers, we utilized Rhododendron catawbiense ESTs for the development of high-efficiency (in terms of generating polymorphism frequency) PCR-based markers. Using the gene-sequence alignment between Rhododendron ESTs and the genomic sequences of Arabidopsis homologs, we developed 'intron-flanking' EST-PCR-based primers that would anneal in conserved exon regions and amplify across the more highly diverged introns. These primers resulted in increased efficiency (61% vs. 13%; 4.7-fold) of polymorphism-detection compared with conventional EST-PCR methods, supporting the assumption that intron regions are more diverged than exons. Significantly, this study demonstrates that Arabidopsis genome database can be useful in developing gene-specific PCR-based markers for other non-model plant species for which the EST data are available but genomic sequences are not. The comparative analysis of intron sizes between Rhododendron and Arabidopsis (made possible in this study by aligning of Rhododendron ESTs with Arabidopsis genomic sequences and the sequencing of Rhododendron genomic PCR products) provides the first insight into the gene structure of Rhododendron.
Collapse
Affiliation(s)
- Hui Wei
- Department of Horticulture, Iowa State University, Ames, IA 50011, USA
| | | | | |
Collapse
|
188
|
Fox-Walsh KL, Dou Y, Lam BJ, Hung SP, Baldi PF, Hertel KJ. The architecture of pre-mRNAs affects mechanisms of splice-site pairing. Proc Natl Acad Sci U S A 2005; 102:16176-81. [PMID: 16260721 PMCID: PMC1283478 DOI: 10.1073/pnas.0508489102] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The exon/intron architecture of genes determines whether components of the spliceosome recognize splice sites across the intron or across the exon. Using in vitro splicing assays, we demonstrate that splice-site recognition across introns ceases when intron size is between 200 and 250 nucleotides. Beyond this threshold, splice sites are recognized across the exon. Splice-site recognition across the intron is significantly more efficient than splice-site recognition across the exon, resulting in enhanced inclusion of exons with weak splice sites. Thus, intron size can profoundly influence the likelihood that an exon is constitutively or alternatively spliced. An EST-based alternative-splicing database was used to determine whether the exon/intron architecture influences the probability of alternative splicing in the Drosophila and human genomes. Drosophila exons flanked by long introns display an up to 90-fold-higher probability of being alternatively spliced compared with exons flanked by two short introns, demonstrating that the exon/intron architecture in Drosophila is a major determinant in governing the frequency of alternative splicing. Exon skipping is also more likely to occur when exons are flanked by long introns in the human genome. Interestingly, experimental and computational analyses show that the length of the upstream intron is more influential in inducing alternative splicing than is the length of the downstream intron. We conclude that the size and location of the flanking introns control the mechanism of splice-site recognition and influence the frequency and the type of alternative splicing that a pre-mRNA transcript undergoes.
Collapse
Affiliation(s)
- Kristi L Fox-Walsh
- Department of Microbiology and Molecular Genetics, University of California, Irvine, CA 92697-4025, USA
| | | | | | | | | | | |
Collapse
|
189
|
Neuman S, Kovalio M, Yaffe D, Nudel U. The Drosophila homologue of the dystrophin gene - introns containing promoters are the major contributors to the large size of the gene. FEBS Lett 2005; 579:5365-71. [PMID: 16198353 DOI: 10.1016/j.febslet.2005.08.073] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2005] [Revised: 08/25/2005] [Accepted: 08/29/2005] [Indexed: 11/29/2022]
Abstract
We show that the drosophila gene encoding the dystrophin-like protein (DLP) is as complex as the mammalian dystrophin gene. Three 5' promoters and three internal promoters regulate the expression of three full-length and three truncated products, respectively. The existence of this complex gene structure in such evolutionary remote organisms suggests that both types of products have diverse important functions. The promoters of both the DLP gene and the mammalian dystrophin gene are located in very large introns. These introns contribute significantly to the large size of the genes. The possible relevance of the conservation of the large size of introns containing promoters to the regulation of promoter activity is discussed.
Collapse
Affiliation(s)
- Sara Neuman
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
190
|
Freitas TC, Arasu P. Cloning and characterisation of genes encoding two transforming growth factor-beta-like ligands from the hookworm, Ancylostoma caninum. Int J Parasitol 2005; 35:1477-87. [PMID: 16140304 DOI: 10.1016/j.ijpara.2005.07.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Revised: 06/24/2005] [Accepted: 07/25/2005] [Indexed: 10/25/2022]
Abstract
To elucidate the role of transforming growth factor beta (TGF-beta) signalling in the arrest/reactivation pathway of the Ancylostoma caninum hookworm, two parasite-encoded TGF-beta-like ligands were cloned and characterised. Ac-dbl-1 showed 60% amino acid identity to the Caenorhabditis elegansdbl-1 gene, which regulates growth while Ac-daf-7 showed 46% amino acid identity to Ce-daf-7 which regulates arrested development. Exon/intron organisation of the genes for Ac-dbl-1 and Ac-daf-7 were different from that of the corresponding C. elegans genes with nine and 10 exons, respectively, and introns ranging in size from 56 to 2,556 bp. Based on real-time reverse transcriptase (RT)-PCR, Ac-dbl-1 and Ac-daf-7 were expressed in all stages tested, i.e. egg, first/second stage larvae (L1/L2), infective third stage larvae (iL3), serum-stimulated third stage larvae (ssL3), and male and female adult worms. Expression of Ac-dbl-1 peaked in the adult male stage suggesting a similar role to Ce-dbl-1 in regulating male tail patterning. Ac-daf-7 expression was at a maximum in the arrested iL3 and reactivated ssL3 stages, which differs from that of Ce-daf-7 expression and may be unique to parasitic nematodes that have an obligate requirement to undergo developmental arrest. In support of the PCR results, antibodies to the A. caninum TGF-beta-like ligands detected proteins in iL3, ssL3, and adult worm extracts. Immunofluorescent studies showed that Ac-daf-7 is expressed in the anterior region of the iL3 similar to Ce-daf-7, which is localised to the ASI chemosensory neurons.
Collapse
Affiliation(s)
- Tori C Freitas
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, 27606, USA
| | | |
Collapse
|
191
|
Boeckmann B, Blatter MC, Famiglietti L, Hinz U, Lane L, Roechert B, Bairoch A. Protein variety and functional diversity: Swiss-Prot annotation in its biological context. C R Biol 2005; 328:882-99. [PMID: 16286078 DOI: 10.1016/j.crvi.2005.06.001] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Revised: 06/01/2005] [Accepted: 06/05/2005] [Indexed: 11/25/2022]
Abstract
We all know that the dogma 'one gene, one protein' is obsolete. A functional protein and, likewise, a protein's ultimate function depend not only on the underlying genetic information but also on the ongoing conditions of the cellular system. Frequently the transcript, like the polypeptide, is processed in multiple ways, but only one or a few out of a multitude of possible variants are produced at a time. An overview on processes that can lead to sequence variety and structural diversity in eukaryotes is given. The UniProtKB/Swiss-Prot protein knowledgebase provides a wealth of information regarding protein variety, function and associated disorders. Examples for such annotation are shown and further ones are available at http://www.expasy.org/sprot/tutorial/examples_CRB.
Collapse
Affiliation(s)
- Brigitte Boeckmann
- Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1, rue Michel-Servet, 1211 Genève 4, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
192
|
Rajic ZA, Jankovic GM, Vidovic A, Milic NM, Skoric D, Pavlovic M, Lazarevic V. Size of the protein-coding genome and rate of molecular evolution. J Hum Genet 2005; 50:217-229. [PMID: 15883855 DOI: 10.1007/s10038-005-0242-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2004] [Accepted: 02/17/2005] [Indexed: 11/25/2022]
Abstract
In diploid populations of size N, there will be 2 Nmu mutations per nucleotide (nt) site (or per locus) per generation (mu stands for mutation rate). If either the population or the coding genome double in size, one expects 4 Nmu mutations. What is important is not the population size per se but the number of genes (coding sites), the two being often interconverted. Here we compared the total physical length of protein-coding genomes (n) with the corresponding absolute rates of synonymous substitution (K(S)), an empirical neutral reference. In the classical occupancy problem and in the coupons collector (CC) problem, n was expressed as the mean rate of change (K(CC)). Despite inherently very low power of the approaches involving averaging of rates, the mode of molecular evolution of the total size phenotype of the coding genome could be evidenced through differences between the genomic estimates of K(CC) [K(CC)=1/(ln n + 0.57721) n] and rate of molecular evolution, K(S). We found that (1) the estimates of n and K(S) are reciprocally correlated across taxa (r=0.812; p<< 0.001); (2) the gamete-cell division hypothesis (Chang et al. Proc Natl Acad Sci USA 91:827-831, 1994) can be confirmed independently in terms of K(CC)/K(S) ratios; (3) the time scale of molecular evolution changes with change in mutation rate, as previously shown by Takahata (Proc Natl Acad Sci USA 87:2419-2423, 1990), Takahata et al. (Genetics 130:925-938, 1992), and Vekemans and Slatkin (Genetics 137:1157-1165, 1994); (4) the generation time and population size (Lynch and Conery, Science 302:1401-1404, 2003) effects left their "signatures" at the level of the size phenotype of the protein-coding genome.
Collapse
Affiliation(s)
- Zoran A Rajic
- Institute of Hematology, University Clinical Center, University of Belgrade, ul. Dr. Koste Todorovica br. 2, 11000, Belgrade, Serbia
| | - Gradimir M Jankovic
- Institute of Hematology, University Clinical Center, University of Belgrade, ul. Dr. Koste Todorovica br. 2, 11000, Belgrade, Serbia.
| | - Ana Vidovic
- Institute of Hematology, University Clinical Center, University of Belgrade, ul. Dr. Koste Todorovica br. 2, 11000, Belgrade, Serbia
| | - Natasa M Milic
- Faculty of Medicine, Institute for Medical Statistics and Informatics, Belgrade, Serbia
| | - Dejan Skoric
- University Children's Hospital, University of Belgrade, Belgrade, Serbia
| | - Milorad Pavlovic
- Institute of Hematology, University Clinical Center, University of Belgrade, ul. Dr. Koste Todorovica br. 2, 11000, Belgrade, Serbia
| | | |
Collapse
|
193
|
Szafranski K, Lehmann R, Parra G, Guigo R, Glöckner G. Gene organization features in A/T-rich organisms. J Mol Evol 2005; 60:90-8. [PMID: 15696371 DOI: 10.1007/s00239-004-0201-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2003] [Accepted: 08/18/2004] [Indexed: 10/25/2022]
Abstract
Several species have genomes in which the four nucleotides are not equally represented (Glöckner 2000). Interestingly, shifts to very high A/T or G/C levels can occur in several distinct branches of the tree of life. The underlying reasons for these shifts therefore may be of different origin. Now entire chromosome sequences from two different A/T-rich genomes, Dictyostelium discoideum and Plasmodium falciparum, are available (Bowman et al. 1999; Gardner et al. 2002; Glöckner et al. 2002). This gives us the opportunity to investigate how a high A/T content may influence the signals that are the landmarks for gene specification. We found that, in contrast with most known metazoan and plant genomes, splice signals contain, little information other than the canonical GT-AG dinucleotides. Intron lengths in A/T rich organisms, on the other hand, are comparable to those of other lower eukaryotes. Intergenic regions show, dependent on the orientation of adjacent genes, a size pattern with a ratio of 1 (3'-3') to 2 (3'-5') to 3 (5'-5'). Overall, gene organization patterns seem not to be influenced by the A/T bias. Surprisingly, the slightly higher A/T content of the P. falciparum genome compared to that of D. discoideum (80.1 versus 77.4%) is not achieved by increased A/T richness in intergenic regions. Instead both the shift of the nucleotide usage in coding regions to A/T-rich codons and the longer intergenic regions make an equal contribution to the higher A/T content in this organism.
Collapse
Affiliation(s)
- Karol Szafranski
- Department of Genome Analysis, Institute for Molecular Biotechnology Jena, Beutenbergstr. 11, D-07745 Jena, Germany
| | | | | | | | | |
Collapse
|
194
|
Nakayashiki H, Hanada S, Nguyen BQ, Kadotani N, Tosa Y, Mayama S. RNA silencing as a tool for exploring gene function in ascomycete fungi. Fungal Genet Biol 2005; 42:275-83. [PMID: 15749047 DOI: 10.1016/j.fgb.2005.01.002] [Citation(s) in RCA: 205] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2004] [Revised: 12/08/2004] [Accepted: 01/01/2005] [Indexed: 12/01/2022]
Abstract
We have developed a pHANNIBAL-like silencing vector, pSilent-1, for ascomycete fungi, which carries a hygromycin resistance cassette and a transcriptional unit for hairpin RNA expression with a spacer of a cutinase gene intron from the rice blast fungus Magnaporthe oryzae. In M. oryzae, a silencing vector with the cutinase intron spacer (147 bp) showed a higher efficiency in silencing of the eGFP gene than did those with a spacer of a GUS gene fragment or a longer intron (850 bp) of a chitin binding protein gene. Application of pSilent-1 to two M. oryzae endogenous genes, MPG1 and polyketide synthase-like gene, resulted in various degrees of silencing of the genes in 70-90% of the resulting transformants. RNA silencing was also induced by a pSilent-1-based vector in Colletotrichum lagenarium at a slightly lower efficiency than in M. oryzae, indicating that this silencing vector should provide a useful reverse genetic tool in ascomycete fungi.
Collapse
Affiliation(s)
- Hitoshi Nakayashiki
- Laboratory of Plant Pathology, Kobe University, 1-1 Rokkodaicho, Nada, 657-8501 Kobe, Japan.
| | | | | | | | | | | |
Collapse
|
195
|
Burnette JM, Miyamoto-Sato E, Schaub MA, Conklin J, Lopez AJ. Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics 2005; 170:661-74. [PMID: 15802507 PMCID: PMC1450422 DOI: 10.1534/genetics.104.039701] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many genes with important roles in development and disease contain exceptionally long introns, but special mechanisms for their expression have not been investigated. We present bioinformatic, phylogenetic, and experimental evidence in Drosophila for a mechanism that subdivides many large introns by recursive splicing at nonexonic elements and alternative exons. Recursive splice sites predicted with highly stringent criteria are found at much higher frequency than expected in the sense strands of introns >20 kb, but they are found only at the expected frequency on the antisense strands, and they are underrepresented within introns <10 kb. The predicted sites in long introns are highly conserved between Drosophila melanogaster and Drosophila pseudoobscura, despite extensive divergence of other sequences within the same introns. These patterns of enrichment and conservation indicate that recursive splice sites are advantageous in the context of long introns. Experimental analyses of in vivo processing intermediates and lariat products from four large introns in the unrelated genes kuzbanian, outspread, and Ultrabithorax confirmed that these introns are removed by a series of recursive splicing steps using the predicted nonexonic sites. Mutation of nonexonic site RP3 within Ultrabithorax also confirmed that recursive splicing is the predominant processing pathway even with a shortened version of the intron. We discuss currently known and potential roles for recursive splicing.
Collapse
Affiliation(s)
- James M Burnette
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | | | | | |
Collapse
|
196
|
Nicholson MJ, Theodorou MK, Brookman JL. Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome. MICROBIOLOGY-SGM 2005; 151:121-133. [PMID: 15632432 DOI: 10.1099/mic.0.27353-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens and lampirin. This gene was present as a single copy in Orpinomyces, was expressed during vegetative growth and was also detected in genomes from another gut fungal genus, Neocallimastix.
Collapse
Affiliation(s)
- Matthew J Nicholson
- School of Biological Sciences, University of Manchester, 1.800 Stopford Building, Oxford Road, Manchester M13 9PT, UK
- Institute of Grassland and Environmental Research, Plas Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK
| | - Michael K Theodorou
- Institute of Grassland and Environmental Research, Plas Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK
| | - Jayne L Brookman
- Institute of Grassland and Environmental Research, Plas Gogerddan, Aberystwyth, Ceredigion SY23 3EB, UK
| |
Collapse
|
197
|
Rensing SA, Fritzowsky D, Lang D, Reski R. Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens. BMC Genomics 2005; 6:43. [PMID: 15784153 PMCID: PMC1079823 DOI: 10.1186/1471-2164-6-43] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2004] [Accepted: 03/22/2005] [Indexed: 11/18/2022] Open
Abstract
Background The moss Physcomitrella patens is an emerging plant model system due to its high rate of homologous recombination, haploidy, simple body plan, physiological properties as well as phylogenetic position. Available EST data was clustered and assembled, and provided the basis for a genome-wide analysis of protein encoding genes. Results We have clustered and assembled Physcomitrella patens EST and CDS data in order to represent the transcriptome of this non-seed plant. Clustering of the publicly available data and subsequent prediction resulted in a total of 19,081 non-redundant ORF. Of these putative transcripts, approximately 30% have a homolog in both rice and Arabidopsis transcriptome. More than 130 transcripts are not present in seed plants but can be found in other kingdoms. These potential "retained genes" might have been lost during seed plant evolution. Functional annotation of these genes reveals unequal distribution among taxonomic groups and intriguing putative functions such as cytotoxicity and nucleic acid repair. Whereas introns in the moss are larger on average than in the seed plant Arabidopsis thaliana, position and amount of introns are approximately the same. Contrary to Arabidopsis, where CDS contain on average 44% G/C, in Physcomitrella the average G/C content is 50%. Interestingly, moss orthologs of Arabidopsis genes show a significant drift of codon fraction usage, towards the seed plant. While averaged codon bias is the same in Physcomitrella and Arabidopsis, the distribution pattern is different, with 15% of moss genes being unbiased. Species-specific, sensitive and selective splice site prediction for Physcomitrella has been developed using a dataset of 368 donor and acceptor sites, utilizing a support vector machine. The prediction accuracy is better than those achieved with tools trained on Arabidopsis data. Conclusion Analysis of the moss transcriptome displays differences in gene structure, codon and splice site usage in comparison with the seed plant Arabidopsis. Putative retained genes exhibit possible functions that might explain the peculiar physiological properties of mosses. Both the transcriptome representation (including a BLAST and retrieval service) and splice site prediction have been made available on , setting the basis for assembly and annotation of the Physcomitrella genome, of which draft shotgun sequences will become available in 2005.
Collapse
Affiliation(s)
- Stefan A Rensing
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany
| | - Dana Fritzowsky
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany
| | - Daniel Lang
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany
| |
Collapse
|
198
|
Kupfer DM, Drabenstot SD, Buchanan KL, Lai H, Zhu H, Dyer DW, Roe BA, Murphy JW. Introns and splicing elements of five diverse fungi. EUKARYOTIC CELL 2005; 3:1088-100. [PMID: 15470237 PMCID: PMC522613 DOI: 10.1128/ec.3.5.1088-1100.2004] [Citation(s) in RCA: 202] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Genomic sequences and expressed sequence tag data for a diverse group of fungi (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Aspergillus nidulans, Neurospora crassa, and Cryptococcus neoformans) provided the opportunity to accurately characterize conserved intronic elements. An examination of large intron data sets revealed that fungal introns in general are short, that 98% or more of them belong to the canonical splice site (ss) class (5'GU...AG3'), and that they have polypyrimidine tracts predominantly in the region between the 5' ss and the branch point. Information content is high in the 5' ss, branch site, and 3' ss regions of the introns but low in the exon regions adjacent to the introns in the fungi examined. The two yeasts have broader intron length ranges and correspondingly higher intron information content than the other fungi. Generally, as intron length increases in the fungi, so does intron information content. Homologs of U2AF spliceosomal proteins were found in all species except for S. cerevisiae, suggesting a nonconventional role for U2AF in the absence of canonical polypyrimidine tracts in the majority of introns. Our observations imply that splicing in fungi may be different from that in vertebrates and may require additional proteins that interact with polypyrimidine tracts upstream of the branch point. Theoretical protein homologs for Nam8p and TIA-1, two proteins that require U-rich regions upstream of the branch point to function, were found. There appear to be sufficient differences between S. cerevisiae and S. pombe introns and the introns of two filamentous members of the Ascomycota and one member of the Basidiomycota to warrant the development of new model organisms for studying the splicing mechanisms of fungi.
Collapse
Affiliation(s)
- Doris M Kupfer
- Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, P.O. Box 26901, BMSB 1053, Oklahoma City, OK 73190, USA
| | | | | | | | | | | | | | | |
Collapse
|
199
|
Vanácová S, Yan W, Carlton JM, Johnson PJ. Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis. Proc Natl Acad Sci U S A 2005; 102:4430-5. [PMID: 15764705 PMCID: PMC554003 DOI: 10.1073/pnas.0407500102] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Eukaryotes have evolved elaborate splicing mechanisms to remove introns that would otherwise destroy the protein-coding capacity of genes. Nuclear premRNA splicing requires sequence motifs in the intron and is mediated by a ribonucleoprotein complex, the spliceosome. Here we demonstrate the presence of a splicing apparatus in the protist Trichomonas vaginalis and show that RNA motifs found in yeast and metazoan introns are required for splicing. We also describe the first introns in this deep-branching lineage. The positions of these introns are often conserved in orthologous genes, indicating they were present in a common ancestor of trichomonads, yeast, and metazoa. All examined T. vaginalis introns have a highly conserved 12-nt 3' splice-site motif that encompasses the branch point and is necessary for splicing. This motif is also found in the only described intron in a gene from another deep-branching eukaryote, Giardia intestinalis. These studies demonstrate the conservation of intron splicing signals across large evolutionary distances, reveal unexpected motif conservation in deep-branching lineages that suggest a simplified mechanism of splicing in primitive unicellular eukaryotes, and support the presence of introns in the earliest eukaryote.
Collapse
Affiliation(s)
- Stepánka Vanácová
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA
| | | | | | | |
Collapse
|
200
|
Wang SZ, Roberts RM. The evolution of the Sin1 gene product, a little known protein implicated in stress responses and type I interferon signaling in vertebrates. BMC Evol Biol 2005; 5:13. [PMID: 15698473 PMCID: PMC549548 DOI: 10.1186/1471-2148-5-13] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2004] [Accepted: 02/07/2005] [Indexed: 11/10/2022] Open
Abstract
Background In yeast, birds and mammals, the SAPK-interacting protein 1 (Sin1) gene product has been implicated as a component of the stress-activated protein kinase (SAPK) signal transduction pathway. Recently, Sin1 has also been shown to interact with the carboxyl terminal end of the cytoplasmic domain of the ovine type I interferon receptor subunit 2 (IFNAR2). However, the function of Sin1 remains unknown. Since SAPK pathways are ancient and the IFN system is confined to vertebrates, the organization of the Sin1 gene and the sequences of the Sin1 protein have been compared across a wide taxonomic range of species. Results Sin1 is represented, apparently as a single gene, in all metazoan species and fungi but is not detectable in protozoa, prokaryotes, or plants. Sin1 is highly conserved in vertebrates (79–99% identity at amino acid level), which possess an interferon system, suggesting that it has been subjected to powerful evolutionary constraint that has limited its diversification. Sin1 possesses at least two unique sequences in its IFNAR2-interacting region that are not represented in insects and other invertebrates. Sequence alignment between vertebrates and insects revealed five Sin1 strongly conserved domains (SCDs I-V), but an analysis of any of these domains failed to identify known functional protein motifs. SCD III, which is approximately 129 amino acids in length, is particularly highly conserved and is present in all the species examined, suggesting a conserved function from fungi to mammals. The coding region of the vertebrate Sin1 gene encompasses 11 exon and 10 introns, while in C. elegans the gene consists of 10 exons and 9 introns organized distinctly from those of vertebrates. In yeast and insects, Sin1 is intronless. Conclusions The study reveals the phylogeny of a little studied gene which has recently been implicated in two important signal transduction pathways, one ancient (stress response), one relatively new (interferon signaling).
Collapse
Affiliation(s)
- Shu-Zong Wang
- Veterinary Pathobiology, University of Missouri, Columbia, USA
- Center for Developmental Biology, University of Texas Southwestern, Medical Center, Dallas USA
| | - R Michael Roberts
- Veterinary Pathobiology, University of Missouri, Columbia, USA
- Biochemistry, University of Missouri, Columbia, USA
- Animal Sciences, University of Missouri, Columbia, USA
| |
Collapse
|