101
|
Simpson CG, Jennings SN, Clark GP, Thow G, Brown JWS. Dual functionality of a plant U-rich intronic sequence element. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2004; 37:82-91. [PMID: 14675434 DOI: 10.1046/j.1365-313x.2003.01941.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
In potato invertase genes, the constitutively included, 9-nucleotide (nt)-long mini-exon requires a strong branchpoint and U-rich polypyrimidine tract for inclusion. The strength of these splicing signals was demonstrated by greatly enhanced splicing of a poorly spliced intron and by their ability to support splicing of an artificial mini-exon, following their introduction. Plant introns also require a second splicing signal, UA-rich intronic elements, for efficient intron splicing. Mutation of the branchpoint caused loss of mini-exon inclusion without loss of splicing enhancement, showing that the same U-rich sequence can function as either a polypyrimidine tract or a UA-rich intronic element. The distinction between the splicing signals depended on intron context (the presence or absence of an upstream, adjacent and functional branchpoint), and on the sequence context of the U-rich elements. Polypyrimidine tracts tolerated C residues while UA-rich intronic elements tolerated As. Thus, in plant introns, U-rich splicing elements can have dual roles as either a general plant U-rich splicing signal or a polypyrimidine tract. Finally, overexpression of two different U-rich binding proteins enhanced intron recognition significantly. These results highlight the importance of co-operation between splicing signals, the importance of other nucleotides within U-rich elements for optimal binding of competing splicing factors and effects on splicing efficiency of U-rich binding proteins.
Collapse
Affiliation(s)
- Craig G Simpson
- Gene Expression, Scottish Crop Research Institute, Invergowrie, Dundee, DD2 5DA Scotland, UK
| | | | | | | | | |
Collapse
|
102
|
Fecht-Christoffers MM, Braun HP, Lemaitre-Guillier C, VanDorsselaer A, Horst WJ. Effect of manganese toxicity on the proteome of the leaf apoplast in cowpea. PLANT PHYSIOLOGY 2003; 133:1935-46. [PMID: 14605229 PMCID: PMC300745 DOI: 10.1104/pp.103.029215] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2003] [Revised: 07/21/2003] [Accepted: 08/27/2003] [Indexed: 05/19/2023]
Abstract
Excess manganese (Mn) supply causes formation of visible brown depositions in the cell walls of leaves of cowpea (Vigna unguiculata), which consist of oxidized Mn and oxidized phenols. Because oxidation of Mn and phenolic compounds in the leaf apoplast was proposed to be catalyzed by apoplastic peroxidases (PODs), induction of these enzymes by Mn excess was investigated. POD activity increased upon prolonged Mn treatment in the leaf tissue. Simultaneously, a significant increase in the concentration of soluble apoplastic proteins in "apoplastic washing fluid" was observed. The identity of the released proteins was systematically characterized by analysis of the apoplast proteome using two-dimensional gel electrophoresis and liquid chromatography-tandem mass spectrometry. Some of the identified proteins exhibit sequence identity to acidic PODs from other plants. Several other proteins show homologies to pathogenesis-related proteins, e.g. glucanase, chitinase, and thaumatin-like proteins. Because pathogenesis-related-like proteins are known to be induced by various other abiotic and biotic stresses, a specific physiological role of these proteins in response to excess Mn supply remains to be established. The specific role of apoplastic PODs in the response of plants to Mn stress is discussed.
Collapse
|
103
|
Tang W, Luo X, Nelson A, Collver H, Kinken K. Functional genomics of wood quality and properties. GENOMICS, PROTEOMICS & BIOINFORMATICS 2003; 1:263-78. [PMID: 15629055 PMCID: PMC5172417 DOI: 10.1016/s1672-0229(03)01032-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Genomics promises to enrich the investigations of biology and biochemistry. Current advancements in genomics have major implications for genetic improvement in animals, plants, and microorganisms, and for our understanding of cell growth, development, differentiation, and communication. Significant progress has been made in the understanding of plant genomics in recent years, and the area continues to progress rapidly. Functional genomics offers enormous potential to tree improvement and the understanding of gene expression in this area of science worldwide. In this review we focus on functional genomics of wood quality and properties in trees, mainly based on progresses made in genomics study of Pinus and Populus. The aims of this review are to summarize the current status of functional genomics including: (1) Gene discovery; (2) EST and genomic sequencing; (3) From EST to functional genomics; (4) Approaches to functional analysis; (5) Engineering lignin biosynthesis; (6) Modification of cell wall biogenesis; and (7) Molecular modelling. Functional genomics has been greatly invested worldwide and will be important in identifying candidate genes whose function is critical to all aspects of plant growth, development, differentiation, and defense. Forest biotechnology industry will significantly benefit from the advent of functional genomics of wood quality and properties.
Collapse
Affiliation(s)
- Wei Tang
- Department of Biology, Howell Science Complex, East Carolina University, Greenville, NC 27858, USA.
| | | | | | | | | |
Collapse
|
104
|
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003; 31:5654-66. [PMID: 14500829 PMCID: PMC206470 DOI: 10.1093/nar/gkg770] [Citation(s) in RCA: 1244] [Impact Index Per Article: 59.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the approximately 27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.
Collapse
Affiliation(s)
- Brian J Haas
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
105
|
Xu R, Li QQ. A RING-H2 zinc-finger protein gene RIE1 is essential for seed development in Arabidopsis. PLANT MOLECULAR BIOLOGY 2003; 53:37-50. [PMID: 14756305 DOI: 10.1023/b:plan.0000009256.01620.a6] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
RING zinc-finger proteins play important roles in the regulation of development in a variety of organisms. In the plant kingdom, few genes encoding RING zinc-finger proteins have been documented with visible effects on plant growth and development. A novel gene, RIE1, encoding a RING-H2 zinc-finger protein was identified in Arabidopsis thaliana and is characterized in this paper. RIE1 encodes a predicted protein product of 359 amino acids residues with a molecular mass of 40 kDa, with a RING-H2 zinc-finger motif located at the extreme end of the C-terminus. Characterization of a Dissociation (Ds) insertion line (SGT4559) and a T-DNA insertion line (SRIE1) demonstrated that disruption of RIE1 is embryo-lethal. SGT4559 heterozygous plants produced seeds with embryo development arrested from globular to torpedo stages. Some mutant seeds were rescued by embryo culture, and the mutant (rie1) plants seemed to grow normally compared to wild-type plants, except that the mutants produced only abnormal seeds. However, RIE1 was expressed in different tissues throughout the whole plant as revealed by northern blot analysis and gene fusion assay of RIE1 promoter with the beta-glucuronidase (GUS) gene. Our results indicated that RIE1 plays an essential role in seed development.
Collapse
MESH Headings
- Amino Acid Sequence
- Arabidopsis/embryology
- Arabidopsis/genetics
- Arabidopsis Proteins/genetics
- Base Sequence
- Blotting, Northern
- Carrier Proteins/genetics
- Cloning, Molecular
- Culture Techniques
- DNA, Complementary/chemistry
- DNA, Complementary/genetics
- Gene Expression Regulation, Plant
- Genetic Complementation Test
- Glucuronidase/genetics
- Glucuronidase/metabolism
- Molecular Sequence Data
- Mutagenesis, Insertional
- Mutation
- Plants, Genetically Modified
- Promoter Regions, Genetic/genetics
- Recombinant Fusion Proteins/genetics
- Recombinant Fusion Proteins/metabolism
- Seeds/genetics
- Seeds/growth & development
- Sequence Alignment
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
- Ubiquitin-Protein Ligases
Collapse
Affiliation(s)
- Ruqiang Xu
- Department of Botany, Miami University, Oxford, OH 45056, USA
| | | |
Collapse
|
106
|
Majoros WH, Pertea M, Antonescu C, Salzberg SL. GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Res 2003; 31:3601-4. [PMID: 12824375 PMCID: PMC168934 DOI: 10.1093/nar/gkg527] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present three programs for ab initio gene prediction in eukaryotes: Exonomy, Unveil and GlimmerM. Exonomy is a 23-state Generalized Hidden Markov Model (GHMM), Unveil is a 283-state standard Hidden Markov Model (HMM) and GlimmerM is a previously-described genefinder which utilizes decision trees and Interpolated Markov Models (IMMs). All three are readily re-trainable for new organisms and have been found to perform well compared to other genefinders. Results are presented for Arabidopsis thaliana. Cases have been found where each of the genefinders outperforms each of the others, demonstrating the collective value of this ensemble of genefinders. These programs are all accessible through webservers at http://www.tigr.org/software.
Collapse
Affiliation(s)
- William H Majoros
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | |
Collapse
|
107
|
Schlueter SD, Dong Q, Brendel V. GeneSeqer@PlantGDB: Gene structure prediction in plant genomes. Nucleic Acids Res 2003; 31:3597-600. [PMID: 12824374 PMCID: PMC168940 DOI: 10.1093/nar/gkg533] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The GeneSeqer@PlantGDB Web server (http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi) provides a gene structure prediction tool tailored for applications to plant genomic sequences. Predictions are based on spliced alignment with source-native ESTs and full-length cDNAs or non-native probes derived from putative homologous genes. The tool is illustrated with applications to refinement of current gene structure annotation and de novo annotation of draft genomic sequences. The service should facilitate expert annotation as a community effort by providing convenient access to all public plant sequences via the PlantGDB database, a simple four-step protocol for spliced alignment and visually appealing displays of the predicted gene structures in addition to detailed sequence alignments.
Collapse
Affiliation(s)
- Shannon D Schlueter
- Department of Zoology and Genetics, Iowa State University, Ames, IA 50011-3260, USA
| | | | | |
Collapse
|
108
|
Abstract
Very short exons, also known as micro-exons, occur in large numbers in some eukaryotic genomes. Existing annotation tools have a limited ability to recognize these short sequences, which range in length up to 25 bp. Here, we describe a computational method for the identification of micro-exons using near-perfect alignments between cDNA and genomic DNA sequences. Using this method, we detected 319 micro-exons in 4 complete genomes, of which 224 were previously unknown, human (170), the nematode Caenorhabditis elegans (4), the fruit fly Drosophila melanogaster (14), and the mustard plant Arabidopsis thaliana (36). Comparison of our computational method with popular cDNA alignment programs shows that the new algorithm is both efficient and accurate. The algorithm also aids in the discovery of micro-exon-skipping events and cross-species micro-exon conservation.
Collapse
|
109
|
Wortman JR, Haas BJ, Hannick LI, Smith RK, Maiti R, Ronning CM, Chan AP, Yu C, Ayele M, Whitelaw CA, White OR, Town CD. Annotation of the Arabidopsis genome. PLANT PHYSIOLOGY 2003; 132:461-8. [PMID: 12805579 PMCID: PMC166989 DOI: 10.1104/pp.103.022251] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2003] [Revised: 03/07/2003] [Accepted: 03/18/2003] [Indexed: 05/18/2023]
Affiliation(s)
- Jennifer R Wortman
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
110
|
Zhu W, Schlueter SD, Brendel V. Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping. PLANT PHYSIOLOGY 2003; 132:469-84. [PMID: 12805580 PMCID: PMC166990 DOI: 10.1104/pp.102.018101] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2002] [Revised: 01/06/2003] [Accepted: 02/20/2003] [Indexed: 05/18/2023]
Abstract
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.
Collapse
Affiliation(s)
- Wei Zhu
- Department of Zoology and Genetics, Iowa State University, Ames 50011-3260, USA
| | | | | |
Collapse
|
111
|
Schoof H, Karlowski WM. Comparison of rice and Arabidopsis annotation. CURRENT OPINION IN PLANT BIOLOGY 2003; 6:106-112. [PMID: 12667865 DOI: 10.1016/s1369-5266(03)00003-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Several versions of the rice genome were published in 2002, providing a first overview of the genome content of this model monocot. At the same time, the genome of the model dicot, Arabidopsis thaliana, reached a new level of annotation as thousands of full-length cDNA sequences were integrated with the genome sequence.
Collapse
Affiliation(s)
- Heiko Schoof
- Technical University of Munich, Genome Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany.
| | | |
Collapse
|
112
|
Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW. Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. THE PLANT CELL 2003; 15:809-34. [PMID: 12671079 PMCID: PMC152331 DOI: 10.1105/tpc.009308] [Citation(s) in RCA: 1024] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2002] [Accepted: 02/13/2003] [Indexed: 05/18/2023]
Abstract
The Arabidopsis genome contains approximately 200 genes that encode proteins with similarity to the nucleotide binding site and other domains characteristic of plant resistance proteins. Through a reiterative process of sequence analysis and reannotation, we identified 149 NBS-LRR-encoding genes in the Arabidopsis (ecotype Columbia) genomic sequence. Fifty-six of these genes were corrected from earlier annotations. At least 12 are predicted to be pseudogenes. As described previously, two distinct groups of sequences were identified: those that encoded an N-terminal domain with Toll/Interleukin-1 Receptor homology (TIR-NBS-LRR, or TNL), and those that encoded an N-terminal coiled-coil motif (CC-NBS-LRR, or CNL). The encoded proteins are distinct from the 58 predicted adapter proteins in the previously described TIR-X, TIR-NBS, and CC-NBS groups. Classification based on protein domains, intron positions, sequence conservation, and genome distribution defined four subgroups of CNL proteins, eight subgroups of TNL proteins, and a pair of divergent NL proteins that lack a defined N-terminal motif. CNL proteins generally were encoded in single exons, although two subclasses were identified that contained introns in unique positions. TNL proteins were encoded in modular exons, with conserved intron positions separating distinct protein domains. Conserved motifs were identified in the LRRs of both CNL and TNL proteins. In contrast to CNL proteins, TNL proteins contained large and variable C-terminal domains. The extant distribution and diversity of the NBS-LRR sequences has been generated by extensive duplication and ectopic rearrangements that involved segmental duplications as well as microscale events. The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.
Collapse
Affiliation(s)
- Blake C Meyers
- Department of Vegetable Crops, University of California, Davis, California 95616, USA
| | | | | | | | | |
Collapse
|
113
|
Mattsson J, Ckurshumova W, Berleth T. Auxin signaling in Arabidopsis leaf vascular development. PLANT PHYSIOLOGY 2003; 131:1327-39. [PMID: 12644682 PMCID: PMC166892 DOI: 10.1104/pp.013623] [Citation(s) in RCA: 209] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2002] [Revised: 09/09/2002] [Accepted: 11/18/2002] [Indexed: 05/18/2023]
Abstract
A number of observations have implicated auxin in the formation of vascular tissues in plant organs. These include vascular strand formation in response to local auxin application, the effects of impaired auxin transport on vascular patterns and suggestive phenotypes of Arabidopsis auxin response mutants. In this study, we have used molecular markers to visualize auxin response patterns in developing Arabidopsis leaves as well as Arabidopsis mutants and transgenic plants to trace pathways of auxin signal transduction controlling the expression of early procambial genes. We show that in young Arabidopsis leaf primordia, molecular auxin response patterns presage sites of procambial differentiation. This is the case not only in normal development but also upon experimental manipulation of auxin transport suggesting that local auxin signals are instrumental in patterning Arabidopsis leaf vasculature. We further found that the activity of the Arabidopsis gene MONOPTEROS, which is required for proper vascular differentiation, is also essential in a spectrum of auxin responses, which include the regulation of rapidly auxin-inducible AUX/IAA genes, and discovered the tissue-specific vascular expression profile of the class I homeodomain-leucine zipper gene, AtHB20. Interestingly, MONOPTEROS activity is a limiting factor in the expression of AtHB8 and AtHB20, two genes encoding transcriptional regulators expressed early in procambial development. Our observations connect general auxin signaling with early controls of vascular differentiation and suggest molecular mechanisms for auxin signaling in patterned cell differentiation.
Collapse
Affiliation(s)
- Jim Mattsson
- Department of Botany, University of Toronto, 25 Willcocks Street, Toronto, Canada M5S 3B2
| | | | | |
Collapse
|
114
|
Halterman DA, Wei F, Wise RP. Powdery mildew-induced Mla mRNAs are alternatively spliced and contain multiple upstream open reading frames. PLANT PHYSIOLOGY 2003; 131:558-67. [PMID: 12586880 PMCID: PMC166832 DOI: 10.1104/pp.014407] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2002] [Revised: 11/08/2002] [Accepted: 11/21/2002] [Indexed: 05/20/2023]
Abstract
In barley (Hordeum vulgare), the Mla13 powdery mildew resistance gene confers Rar1-dependent, AvrMla13-specific resistance to Blumeria graminis f. sp. hordei (Bgh). We have identified cDNA and genomic copies of Mla13 and used this coiled-coil nucleotide-binding site leucine-rich repeat protein-encoding gene as a model for the regulation of host resistance to obligate biotrophic fungi in cereals. We demonstrate quantitatively that a rapid increase in the accumulation of Mla transcripts and transcripts of the Mla-signaling genes, Rar1 and Sgt1, is triggered between 16 and 20 h post inoculation, the same time frame that haustoria of avirulent Bgh make contact with the host cell plasma membrane. An abundance of Mla13 cDNAs revealed five classes of transcript leader regions containing two alternatively spliced introns and up to three upstream open reading frames (uORFs). Alternative splicing of introns in the transcript leader region results in a different number of uORFs and variability in the size of uORF2. These results indicate that regulation of Mla transcript accumulation is not constitutive and that induction is coordinately controlled by recognition-specific factors. The sudden increase in specific transcript levels could account for the rapid defense response phenotype conferred by Mla6 and Mla13.
Collapse
Affiliation(s)
- Dennis A Halterman
- Corn Insects and Crop Genetics Research, United States Department of Agriculture-Agricultural Research Service, Iowa State University, Ames, Iowa 50011-1020, USA
| | | | | |
Collapse
|
115
|
Abstract
Currently, relatively few proteomics studies of chloroplast have been published, but the field has just started emerging and is likely to develop more rapidly in the future. While the complex membrane structure of the chloroplast makes it difficult to study its entire proteome by global approaches, proteomics has considerably increased our knowledge of the proteins of single compartments such as, for instance, the envelope and the thylakoid lumen. Proteomics has also succeeded in the subunit characterisation of select protein complexes such as the ribosomes and the cytochrome b (6)f complex. In addition, proteomics was successfully applied to find new potential target pathways for thioredoxin-mediated signal transduction. In this review, we present an overview of the latest developments in the field of chloroplast proteomics and discuss their impact on photosynthesis research. In addition, we summarise the current state of research in proteomics of the photosynthetic cyanobactrium Synechocystis sp. PCC 6803.
Collapse
Affiliation(s)
- Wolfgang P Schröder
- Departments of Chemistry and Biochemistry, Umeå University, 901 87, Umeå, Sweden
| | | |
Collapse
|
116
|
Crowe ML, Serizet C, Thareau V, Aubourg S, Rouzé P, Hilson P, Beynon J, Weisbeek P, van Hummelen P, Reymond P, Paz-Ares J, Nietfeld W, Trick M. CATMA: a complete Arabidopsis GST database. Nucleic Acids Res 2003; 31:156-8. [PMID: 12519971 PMCID: PMC165518 DOI: 10.1093/nar/gkg071] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Complete Arabidopsis Transcriptome Micro Array (CATMA) database contains gene sequence tag (GST) and gene model sequences for over 70% of the predicted genes in the Arabidopsis thaliana genome as well as primer sequences for GST amplification and a wide range of supplementary information. All CATMA GST sequences are specific to the gene for which they were designed, and all gene models were predicted from a complete reannotation of the genome using uniform parameters. The database is searchable by sequence name, sequence homology or direct SQL query, and is available through the CATMA website at http://www.catma.org/.
Collapse
Affiliation(s)
- Mark L Crowe
- The John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
117
|
Xiao YL, Malik M, Whitelaw CA, Town CD. Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis. PLANT PHYSIOLOGY 2002; 130:2118-28. [PMID: 12481096 PMCID: PMC166724 DOI: 10.1104/pp.010207] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2002] [Revised: 08/25/2002] [Accepted: 09/09/2002] [Indexed: 05/19/2023]
Abstract
About 25% of the genes in the fully sequenced and annotated Arabidopsis genome have structures that are predicted solely by computer algorithms with no support from either nucleic acid or protein homologs from other species or expressed sequence matches from Arabidopsis. These are referred to as "hypothetical genes." On chromosome 2, sequenced by The Institute for Genomic Research, there are approximately 800 hypothetical genes among a total of approximately 4,100 genes. To test their expression under various growth conditions and in specific tissues, we used six cDNA populations prepared from cold-treated, heat-treated, and pathogen (Xanthomonas campestris pv campestris)-infected plants, callus, roots, and young seedlings. To date, 169 hypothetical genes were tested, and 138 of them are found to be expressed in one or more of the six cDNA populations. By sequencing multiple clones from each 5'- and 3'-rapid amplification of cDNA ends (RACE) product and assembling the sequences, we generated full-length sequences for 16 of these genes. For 14 genes, there was one full-length assembly that precisely supported the intron-exon boundaries of their gene predictions, adding only 5'- and 3'-untranslated region sequences. However, for three of these genes, the other assemblies represent additional exons and alternatively spliced or unspliced introns. For the remaining two genes, the cDNA sequences reveal major differences with predicted gene structures. In addition, a total of six genes displayed more than one polyadenylation site. These data will be used to update gene models in The Institute for Genomic Research annotation database ATH1.
Collapse
Affiliation(s)
- Yong-Li Xiao
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.
| | | | | | | |
Collapse
|
118
|
Bergman CM, Pfeiffer BD, Rincón-Limas DE, Hoskins RA, Gnirke A, Mungall CJ, Wang AM, Kronmiller B, Pacleb J, Park S, Stapleton M, Wan K, George RA, de Jong PJ, Botas J, Rubin GM, Celniker SE. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol 2002; 3:RESEARCH0086. [PMID: 12537575 PMCID: PMC151188 DOI: 10.1186/gb-2002-3-12-research0086] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2002] [Revised: 11/25/2002] [Accepted: 12/05/2002] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. RESULTS We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. CONCLUSIONS Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.
Collapse
Affiliation(s)
- Casey M Bergman
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
- These authors contributed equally to this work
| | - Barret D Pfeiffer
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
- These authors contributed equally to this work
| | - Diego E Rincón-Limas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Current address: Departamento de Biologia Molecular, Universidad Autonoma de Tamaulipas-UAMRA, Reynosa, CP 88740, Mexico
| | - Roger A Hoskins
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | | | - Chris J Mungall
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, University of California, Berkeley, CA 94720, USA
| | - Adrienne M Wang
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
- Current address: Department of Physiology, University of California, San Francisco, CA 94143, USA
| | - Brent Kronmiller
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
- Current address: Department of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Joanne Pacleb
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Soo Park
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Mark Stapleton
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Kenneth Wan
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Reed A George
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Pieter J de Jong
- Children's Hospital and Research Center at Oakland, Oakland, CA 94609, USA
| | - Juan Botas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gerald M Rubin
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, University of California, Berkeley, CA 94720, USA
| | - Susan E Celniker
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| |
Collapse
|
119
|
Mungall CJ, Misra S, Berman BP, Carlson J, Frise E, Harris N, Marshall B, Shu S, Kaminker JS, Prochnik SE, Smith CD, Smith E, Tupy JL, Wiel C, Rubin GM, Lewis SE. An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol 2002; 3:RESEARCH0081. [PMID: 12537570 PMCID: PMC151183 DOI: 10.1186/gb-2002-3-12-research0081] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2002] [Accepted: 11/28/2002] [Indexed: 01/02/2023] Open
Abstract
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.
Collapse
Affiliation(s)
- C J Mungall
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
120
|
Lewis SE, Searle SMJ, Harris N, Gibson M, Lyer V, Richter J, Wiel C, Bayraktaroglu L, Birney E, Crosby MA, Kaminker JS, Matthews BB, Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME. Apollo: a sequence annotation editor. Genome Biol 2002; 3:RESEARCH0082. [PMID: 12537571 PMCID: PMC151184 DOI: 10.1186/gb-2002-3-12-research0082] [Citation(s) in RCA: 311] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2002] [Revised: 11/13/2002] [Accepted: 11/23/2002] [Indexed: 11/10/2022] Open
Abstract
The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.
Collapse
Affiliation(s)
- S E Lewis
- Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
121
|
Ohler U, Liao GC, Niemann H, Rubin GM. Computational analysis of core promoters in the Drosophila genome. Genome Biol 2002; 3:RESEARCH0087. [PMID: 12537576 PMCID: PMC151189 DOI: 10.1186/gb-2002-3-12-research0087] [Citation(s) in RCA: 299] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2002] [Revised: 11/19/2002] [Accepted: 11/27/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The core promoter, a region of about 100 base-pairs flanking the transcription start site (TSS), serves as the recognition site for the basal transcription apparatus. Drosophila TSSs have generally been mapped by individual experiments; the low number of accurately mapped TSSs has limited analysis of promoter sequence motifs and the training of computational prediction tools. RESULTS We identified TSS candidates for about 2,000 Drosophila genes by aligning 5' expressed sequence tags (ESTs) from cap-trapped cDNA libraries to the genome, while applying stringent criteria concerning coverage and 5'-end distribution. Examination of the sequences flanking these TSSs revealed the presence of well-known core promoter motifs such as the TATA box, the initiator and the downstream promoter element (DPE). We also define, and assess the distribution of, several new motifs prevalent in core promoters, including what appears to be a variant DPE motif. Among the prevalent motifs is the DNA-replication-related element DRE, recently shown to be part of the recognition site for the TBP-related factor TRF2. Our TSS set was then used to retrain the computational promoter predictor McPromoter, allowing us to improve the recognition performance to over 50% sensitivity and 40% specificity. We compare these computational results to promoter prediction in vertebrates. CONCLUSIONS There are relatively few recognizable binding sites for previously known general transcription factors in Drosophila core promoters. However, we identified several new motifs enriched in promoter regions. We were also able to significantly improve the performance of computational TSS prediction in Drosophila.
Collapse
Affiliation(s)
- Uwe Ohler
- Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, CA 94720-3200, USA.
| | | | | | | |
Collapse
|
122
|
Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, de Grey ADNJ, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM, Lewis SE. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 2002; 3:RESEARCH0083. [PMID: 12537572 PMCID: PMC151185 DOI: 10.1186/gb-2002-3-12-research0083] [Citation(s) in RCA: 268] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2002] [Revised: 11/28/2002] [Accepted: 11/28/2002] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
Collapse
Affiliation(s)
- Sima Misra
- Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|