201
|
Moser C, Segala C, Fontana P, Salakhudtinov I, Gatto P, Pindo M, Zyprian E, Toepfer R, Grando MS, Velasco R. Comparative analysis of expressed sequence tags from different organs of Vitis vinifera L. Funct Integr Genomics 2005; 5:208-17. [PMID: 15856347 DOI: 10.1007/s10142-005-0143-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Revised: 03/30/2005] [Accepted: 03/31/2005] [Indexed: 10/25/2022]
Abstract
Expressed sequence tags (ESTs) are providing a valuable approach to sampling organism-expressed genomes, especially when studying large genomes such as those of many plants. We report on the comparison of 8,647 ESTs generated from six different grape (Vitis vinifera L.) organs: berry, root, leaf, bud, shoot and inflorescence. Clustering and assembly of these ESTs resulted in 4,203 unique sequences and revealed that at this level of EST sampling, each organ shares a low percentage of transcripts with the others. To define organ relationships based on EST counts, we calculated a distance matrix of pairwise correlation coefficients between the libraries which indicated bud, inflorescence and shoot as a group distinct from the other organs considered in this study. A putative function was identified for about 85% of the unique sequences. By assigning them to specific functional classes, we were able to highlight strong differences between organs in the metabolism, protein biosynthesis and photosynthesis categories. This grape EST collection has also proven to be a valuable source for the development of 'functional' simple sequence repeats (SSRs) markers: a total of 405 SSRs have been identified. EST sequences and annotation results have been organised in the IASMA-grape database, freely available at the address http://genomics.iasma.it.
Collapse
Affiliation(s)
- C Moser
- Istituto Agrario San Michele all'Adige, S. Michele a/Adige, 38010 Trento, Italy.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
202
|
Rensink WA, Iobst S, Hart A, Stegalkina S, Liu J, Buell CR. Gene expression profiling of potato responses to cold, heat, and salt stress. Funct Integr Genomics 2005; 5:201-7. [PMID: 15856349 DOI: 10.1007/s10142-005-0141-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2005] [Revised: 03/25/2005] [Accepted: 03/26/2005] [Indexed: 10/25/2022]
Abstract
In order to identify genes involved in abiotic stress responses in potato, seedlings were grown under controlled conditions and subjected to cold (4 degrees C), heat (35 degrees C), or salt (100 mM NaCl) stress for up to 27 h. Using an approximately 12,000 clone potato cDNA microarray, expression profiles were captured at three time points following initiation of the stress (3, 9, and 27 h) from two different tissues, roots and leaves. A total of 3,314 clones could be identified as significantly up- or down-regulated in response to at least one stress condition. The genes represented by these clones encode transcription factors, signal transduction factors, and heat-shock proteins which have been associated with abiotic stress responses in Arabidopsis and rice, suggesting similar response pathways function in potato. These stress-regulated clones could be separated into either stress-specific or shared-response clones, suggesting the existence of general response pathways as well as more stress-specific pathways. In addition, we identified expression profiles which are indicative for the type of stress applied to the plants.
Collapse
|
203
|
Valenzuela JG. Exploring tick saliva: from biochemistry to ‘sialomes’ and functional genomics. Parasitology 2005; 129 Suppl:S83-94. [PMID: 15938506 DOI: 10.1017/s0031182004005189] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Tick saliva, a fluid once believed to be only relevant for lubrication of mouthparts and water balance, is now well known to be a cocktail of potent anti-haemostatic, anti-inflammatory and immunomodulatory molecules that helps these arthropods obtain a blood meal from their vertebrate hosts. The repertoire of pharmacologically active components in this cocktail is impressive as well as the number of targets they specifically affect. These salivary components change the physiology of the host at the bite site and, consequently, some pathogens transmitted by ticks take advantage of this change and become more infective. Tick salivary proteins have therefore become an attractive target to control tick-borne diseases. Recent advances in molecular biology, protein chemistry and computational biology are accelerating the isolation, sequencing and analysis of a large number of transcripts and proteins from the saliva of different ticks. Many of these newly isolated genes code for proteins with homologies to known proteins allowing identification or prediction of their function. However, most of these genes code for proteins with unknown functions therefore opening the road to functional genomic approaches to identify their biological activities and roles in blood feeding and hence, vaccine development to control tick-borne diseases.
Collapse
Affiliation(s)
- J G Valenzuela
- Vector Molecular Biology Unit, Laboratory of Malaria and Vector Research, NIAID, National Institutes of Health, 4 Center Drive, 4/B2-35, Bethesda, MD 20892, USA.
| |
Collapse
|
204
|
de la Cruz N, Bromberg S, Pasko D, Shimoyama M, Twigger S, Chen J, Chen CF, Fan C, Foote C, Gopinath GR, Harris G, Hughes A, Ji Y, Jin W, Li D, Mathis J, Nenasheva N, Nie J, Nigam R, Petri V, Reilly D, Wang W, Wu W, Zuniga-Meyer A, Zhao L, Kwitek A, Tonellato P, Jacob H. The Rat Genome Database (RGD): developments towards a phenome database. Nucleic Acids Res 2005; 33:D485-91. [PMID: 15608243 PMCID: PMC540004 DOI: 10.1093/nar/gki050] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Rat Genome Database (RGD) (http://rgd.mcw.edu) aims to meet the needs of its community by providing genetic and genomic infrastructure while also annotating the strengths of rat research: biochemistry, nutrition, pharmacology and physiology. Here, we report on RGD's development towards creating a phenome database. Recent developments can be categorized into three groups. (i) Improved data collection and integration to match increased volume and biological scope of research. (ii) Knowledge representation augmented by the implementation of a new ontology and annotation system. (iii) The addition of quantitative trait loci data, from rat, mouse and human to our advanced comparative genomics tools, as well as the creation of new, and enhancement of existing, tools to enable users to efficiently browse and survey research data. The emphasis is on helping researchers find genes responsible for disease through the use of rat models. These improvements, combined with the genomic sequence of the rat, have led to a successful year at RGD with over two million page accesses that represent an over 4-fold increase in a year. Future plans call for increased annotation of biological information on the rat elucidated through its use as a model for human pathobiology. The continued development of toolsets will facilitate integration of these data into the context of rat genomic sequence, as well as allow comparisons of biological and genomic data with the human genomic sequence and of an increasing number of organisms.
Collapse
Affiliation(s)
- Norberto de la Cruz
- Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53213, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
205
|
Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res 2005; 33:D71-4. [PMID: 15608288 PMCID: PMC540018 DOI: 10.1093/nar/gki064] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.
Collapse
Affiliation(s)
- Y Lee
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
206
|
Rudd S. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections. Nucleic Acids Res 2005; 33:D622-7. [PMID: 15608275 PMCID: PMC539994 DOI: 10.1093/nar/gki040] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.
Collapse
Affiliation(s)
- Stephen Rudd
- Centre for Biotechnology, Tykistökatu 6, FIN-20521 Turku, Finland.
| |
Collapse
|
207
|
Ramírez M, Graham MA, Blanco-López L, Silvente S, Medrano-Soto A, Blair MW, Hernández G, Vance CP, Lara M. Sequencing and analysis of common bean ESTs. Building a foundation for functional genomics. PLANT PHYSIOLOGY 2005. [PMID: 15824284 DOI: 10.1104/pp.104.054999.gumes] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for human consumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequence tag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenced a total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNA library derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified as singletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts. Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons to other legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules. Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection. Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the most redundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determine whether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodule gene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integrated for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to bean improvement.
Collapse
Affiliation(s)
- Mario Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Apartado 66210 Cuernavaca, Morelos, Mexico
| | | | | | | | | | | | | | | | | |
Collapse
|
208
|
Ramírez M, Graham MA, Blanco-López L, Silvente S, Medrano-Soto A, Blair MW, Hernández G, Vance CP, Lara M. Sequencing and analysis of common bean ESTs. Building a foundation for functional genomics. PLANT PHYSIOLOGY 2005; 137:1211-27. [PMID: 15824284 PMCID: PMC1088315 DOI: 10.1104/pp.104.054999] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2004] [Revised: 01/21/2005] [Accepted: 01/30/2005] [Indexed: 05/18/2023]
Abstract
Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for human consumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequence tag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenced a total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNA library derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified as singletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts. Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons to other legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules. Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection. Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the most redundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determine whether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodule gene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integrated for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to bean improvement.
Collapse
Affiliation(s)
- Mario Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Apartado 66210 Cuernavaca, Morelos, Mexico
| | | | | | | | | | | | | | | | | |
Collapse
|
209
|
Karaca M, Bilgen M, Onus AN, Ince AG, Elmasulu SY. Exact tandem repeats analyzer (E-TRA): a new program for DNA sequence mining. J Genet 2005; 84:49-54. [PMID: 15876583 DOI: 10.1007/bf02715889] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as 'organs', 'tissues', 'cell lines' and 'development stages' for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.
Collapse
Affiliation(s)
- Mehmet Karaca
- Faculty of Agriculture, Akdeniz University, 07059 Antalya, Turkey.
| | | | | | | | | |
Collapse
|
210
|
Baldessari D, Shin Y, Krebs O, König R, Koide T, Vinayagam A, Fenger U, Mochii M, Terasaka C, Kitayama A, Peiffer D, Ueno N, Eils R, Cho KW, Niehrs C. Global gene expression profiling and cluster analysis in Xenopus laevis. Mech Dev 2005; 122:441-75. [PMID: 15763214 DOI: 10.1016/j.mod.2004.11.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2004] [Revised: 10/04/2004] [Accepted: 11/07/2004] [Indexed: 01/12/2023]
Abstract
We have undertaken a large-scale microarray gene expression analysis using cDNAs corresponding to 21,000 Xenopus laevis ESTs. mRNAs from 37 samples, including embryos and adult organs, were profiled. Cluster analysis of embryos of different stages was carried out and revealed expected affinities between gastrulae and neurulae, as well as between advanced neurulae and tadpoles, while egg and feeding larvae were clearly separated. Cluster analysis of adult organs showed some unexpected tissue-relatedness, e.g. kidney is more related to endodermal than to mesodermal tissues and the brain is separated from other neuroectodermal derivatives. Cluster analysis of genes revealed major phases of co-ordinate gene expression between egg and adult stages. During the maternal-early embryonic phase, genes maintaining a rapidly dividing cell state are predominantly expressed (cell cycle regulators, chromatin proteins). Genes involved in protein biosynthesis are progressively induced from mid-embryogenesis onwards. The larval-adult phase is characterised by expression of genes involved in metabolism and terminal differentiation. Thirteen potential synexpression groups were identified, which encompass components of diverse molecular processes or supra-molecular structures, including chromatin, RNA processing and nucleolar function, cell cycle, respiratory chain/Krebs cycle, protein biosynthesis, endoplasmic reticulum, vesicle transport, synaptic vesicle, microtubule, intermediate filament, epithelial proteins and collagen. Data filtering identified genes with potential stage-, region- and organ-specific expression. The dataset was assembled in the iChip microarray database, , which allows user-defined queries. The study provides insights into the higher order of vertebrate gene expression, identifies synexpression groups and marker genes, and makes predictions for the biological role of numerous uncharacterized genes.
Collapse
Affiliation(s)
- Danila Baldessari
- Division of Molecular Embryology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
211
|
Sprunck S, Baumann U, Edwards K, Langridge P, Dresselhaus T. The transcript composition of egg cells changes significantly following fertilization in wheat (Triticum aestivum L.). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2005; 41:660-72. [PMID: 15703054 DOI: 10.1111/j.1365-313x.2005.02332.x] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Here, we report the transcript profile of wheat egg cells and proembryos, just after the first cell division. Microdissected female gametophytes of wheat were used to isolate eggs and two-celled proembryos to construct cell type-specific cDNA libraries. In total, 1197 expressed sequence tags (ESTs) were generated. Analysis of these ESTs revealed numerous novel transcripts. In egg cells, 17.6% of the clustered ESTs represented novel transcripts, while 11.4% novel clusters were identified in the two-celled proembryo. Functional classification of sequences with similarity to previously characterized proteins indicates that the unfertilized egg cell has a higher metabolic activity and protein turnover than previously thought. Transcript composition of two-celled proembryos was significantly distinct from egg cells, reflecting DNA replication as well as high transcriptional and translational activity. Several novel transcripts of the egg cell are specific for this cell. In contrast, some fertilization induced novel mRNAs are abundant also in sporophytic tissues indicating a more general role in plant growth and development. The potential functions of genes based on similarity to known genes involved in developmental processes are discussed. Our analysis has identified numerous genes with potential roles in embryo sac function such as signaling, fertilization or induction of embryogenesis.
Collapse
Affiliation(s)
- Stefanie Sprunck
- Developmental Biology and Biotechnology, Biocenter Klein Flottbek, University of Hamburg, Ohnhorststrasse 18, D-22609 Hamburg, Germany
| | | | | | | | | |
Collapse
|
212
|
Fujimori S, Washio T, Tomita M. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics 2005; 6:26. [PMID: 15733327 PMCID: PMC555766 DOI: 10.1186/1471-2164-6-26] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2004] [Accepted: 02/28/2005] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND A GC-compositional strand bias or GC-skew (=(C-G)/(C+G)), where C and G denote the numbers of cytosine and guanine residues, was recently reported near the transcription start sites (TSS) of Arabidopsis genes. However, it is unclear whether other eukaryotic species have equally prominent GC-skews, and the biological meaning of this trait remains unknown. RESULTS Our study confirmed a significant GC-skew (C > G) in the TSS of Oryza sativa (rice) genes. The full-length cDNAs and genomic sequences from Arabidopsis and rice were compared using statistical analyses. Despite marked differences in the G+C content around the TSS in the two plants, the degrees of bias were almost identical. Although slight GC-skew peaks, including opposite skews (C < G), were detected around the TSS of genes in human and Drosophila, they were qualitatively and quantitatively different from those identified in plants. However, plant-like GC-skew in regions upstream of the translation initiation sites (TIS) in some fungi was identified following analyses of the expressed sequence tags and/or genomic sequences from other species. On the basis of our dataset, we estimated that > 70 and 68% of Arabidopsis and rice genes, respectively, had a strong GC-skew (> 0.33) in a 100-bp window (that is, the number of C residues was more than double the number of G residues in a +/-100-bp window around the TSS). The mean GC-skew value in the TSS of highly-expressed genes in Arabidopsis was significantly greater than that of genes with low expression levels. Many of the GC-skew peaks were preferentially located near the TSS, so we examined the potential value of GC-skew as an index for TSS identification. Our results confirm that the GC-skew can be used to assist the TSS prediction in plant genomes. CONCLUSION The GC-skew (C > G) around the TSS is strictly conserved between monocot and eudicot plants (ie. angiosperms in general), and a similar skew has been observed in some fungi. Highly-expressed Arabidopsis genes had overall a more marked GC-skew in the TSS compared to genes with low expression levels. We therefore propose that the GC-skew around the TSS in some plants and fungi is related to transcription. It might be caused by mutations during transcription initiation or the frequent use of transcription factor-biding sites having a strand preference. In addition, GC-skew is a good candidate index for TSS prediction in plant genomes, where there is a lack of correlation among CpG islands and genes.
Collapse
Affiliation(s)
- Shigeo Fujimori
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | - Takanori Washio
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara 630-0192, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Department of Environmental Information, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| |
Collapse
|
213
|
La Rota M, Kantety RV, Yu JK, Sorrells ME. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics 2005; 6:23. [PMID: 15720707 PMCID: PMC550658 DOI: 10.1186/1471-2164-6-23] [Citation(s) in RCA: 159] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2004] [Accepted: 02/18/2005] [Indexed: 11/23/2022] Open
Abstract
Background Earlier comparative maps between the genomes of rice (Oryza sativa L.), barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) were linkage maps based on cDNA-RFLP markers. The low number of polymorphic RFLP markers has limited the development of dense genetic maps in wheat and the number of available anchor points in comparative maps. Higher density comparative maps using PCR-based anchor markers are necessary to better estimate the conservation of colinearity among cereal genomes. The purposes of this study were to characterize the proportion of transcribed DNA sequences containing simple sequence repeats (SSR or microsatellites) by length and motif for wheat, barley and rice and to determine in-silico rice genome locations for primer sets developed for wheat and barley Expressed Sequence Tags. Results The proportions of SSR types (di-, tri-, tetra-, and penta-nucleotide repeats) and motifs varied with the length of the SSRs within and among the three species, with trinucleotide SSRs being the most frequent. Distributions of genomic microsatellites (gSSRs), EST-derived microsatellites (EST-SSRs), and transcribed regions in the contiguous sequence of rice chromosome 1 were highly correlated. More than 13,000 primer pairs were developed for use by the cereal research community as potential markers in wheat, barley and rice. Conclusion Trinucleotide SSRs were the most common type in each of the species; however, the relative proportions of SSR types and motifs differed among rice, wheat, and barley. Genomic microsatellites were found to be primarily located in gene-rich regions of the rice genome. Microsatellite markers derived from the use of non-redundant EST-SSRs are an economic and efficient alternative to RFLP for comparative mapping in cereals.
Collapse
Affiliation(s)
- Mauricio La Rota
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Ramesh V Kantety
- Department of Plant & Soil Science, 138 ARC Building, Alabama A&M University, Normal, AL, 35762, USA
| | - Ju-Kyung Yu
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Mark E Sorrells
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| |
Collapse
|
214
|
Blomberg LA, Long EL, Sonstegard TS, Van Tassell CP, Dobrinsky JR, Zuelke KA. Serial analysis of gene expression during elongation of the peri-implantation porcine trophectoderm (conceptus). Physiol Genomics 2005; 20:188-94. [PMID: 15536174 DOI: 10.1152/physiolgenomics.00157.2004] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Conceptus loss during the preimplantation and early postimplantation period hinders the efficiency of swine reproduction. Significant conceptus loss occurs during trophectoderm elongation between gestational day 11 ( D11) and day 12 ( D12). Elongation of the porcine conceptus is a key stage of development during which maternal recognition of pregnancy, initial placental development, and preparation for implantation occurs. The objective of this study was to establish comparative transcriptome profiles of D11 ovoid and D12 filamentous conceptuses and thereby identify temporally regulated genes essential for developmental progression during conceptus elongation. Serial analysis of gene expression (SAGE) libraries were constructed from in vivo derived ovoid and filamentous swine conceptuses to yield a total of 42,389 tags (ovoid) and 42,391 tags (filamentous) representing 14,464 and 13,098 putative unique transcripts, respectively. Statistical analysis of tag frequencies revealed the differential expression of 431 tags between libraries ( P < 0.05). Nucleotide sequence alignment searches on public databases provided SAGE tag annotation and gene ontology assignments. Comparisons between the SAGE profiles of ovoid and filamentous conceptuses revealed increased expression of key genes in the steroidogenesis [cytochrome P-450scc ( CYP11A1), aromatase ( CYP19A), and steroidogenic acute regulatory protein ( STAR)] and oxidative stress response pathways [microsomal glutathione S-transferase 1 ( MGST1) and copper-zinc superoxide dismutase ( SOD1)]. Differential expression of these genes in the steroidogenic and oxidative stress response pathways was confirmed by real-time PCR. These results validate the utility of SAGE in the pig and establish an initial model linking gene expression profiles at the pathway level with phenotypic progression from ovoid to filamentous stages of conceptus development.
Collapse
Affiliation(s)
- Le Ann Blomberg
- Biotechnology and Germplasm Laboratory, United States Department of Agriculture Agricultural Research Service, Beltsville, Maryland 20705, USA.
| | | | | | | | | | | |
Collapse
|
215
|
Vandepoele K, Van de Peer Y. Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY 2005; 137:31-42. [PMID: 15644465 PMCID: PMC548836 DOI: 10.1104/pp.104.054700] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2004] [Revised: 11/10/2004] [Accepted: 11/10/2004] [Indexed: 05/18/2023]
Abstract
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology , Ghent University, B-9052 Ghent, Belgium
| | | |
Collapse
|
216
|
Vassilev D, Leunissen J, Atanassov A, Nenov A, Dimov G. Application of Bioinformatics in Plant Breeding. BIOTECHNOL BIOTEC EQ 2005. [DOI: 10.1080/13102818.2005.10817293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
217
|
Hubbard SJ, Grafham DV, Beattie KJ, Overton IM, McLaren SR, Croning MDR, Boardman PE, Bonfield JK, Burnside J, Davies RM, Farrell ER, Francis MD, Griffiths-Jones S, Humphray SJ, Hyland C, Scott CE, Tang H, Taylor RG, Tickle C, Brown WRA, Birney E, Rogers J, Wilson SA. Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags. Genome Res 2005; 15:174-83. [PMID: 15590942 PMCID: PMC540287 DOI: 10.1101/gr.3011405] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2004] [Accepted: 10/04/2004] [Indexed: 12/22/2022]
Abstract
We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally annotated and describes a minimum of 11,929 chicken coding genes, including the sequence for 2260 full-length cDNAs together with a collection of noncoding (nc) cDNAs that have been stringently filtered to remove untranslated regions of coding mRNAs. The combined collection of cDNAs and ESTs describe 62,546 clustered transcripts and provide transcriptional evidence for a total of 18,989 chicken genes, including 88% of the annotated Ensembl gene set. Analysis of the ncRNAs reveals a set that is highly conserved in chickens and mammals, including sequences for 14 pri-miRNAs encoding 23 different miRNAs. The data sets described here provide a transcriptome toolkit linked to physical clones for bioinformaticians and experimental biologists who wish to use chicken systems as a low-cost, accessible alternative to mammals for the analysis of vertebrate development, immunology, and cell biology.
Collapse
Affiliation(s)
- Simon J Hubbard
- Faculty of Life Sciences, The University of Manchester, Manchester, M60 1QD, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
218
|
Stamm S, Ben-Ari S, Rafalska I, Tang Y, Zhang Z, Toiber D, Thanaraj TA, Soreq H. Function of alternative splicing. Gene 2004; 344:1-20. [PMID: 15656968 DOI: 10.1016/j.gene.2004.10.022] [Citation(s) in RCA: 671] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2004] [Revised: 09/10/2004] [Accepted: 10/21/2004] [Indexed: 02/06/2023]
Abstract
Alternative splicing is one of the most important mechanisms to generate a large number of mRNA and protein isoforms from the surprisingly low number of human genes. Unlike promoter activity, which primarily regulates the amount of transcripts, alternative splicing changes the structure of transcripts and their encoded proteins. Together with nonsense-mediated decay (NMD), at least 25% of all alternative exons are predicted to regulate transcript abundance. Molecular analyses during the last decade demonstrate that alternative splicing determines the binding properties, intracellular localization, enzymatic activity, protein stability and posttranslational modifications of a large number of proteins. The magnitude of the effects range from a complete loss of function or acquisition of a new function to very subtle modulations, which are observed in the majority of cases reported. Alternative splicing factors regulate multiple pre-mRNAs and recent identification of physiological targets shows that a specific splicing factor regulates pre-mRNAs with coherent biological functions. Therefore, evidence is now accumulating that alternative splicing coordinates physiologically meaningful changes in protein isoform expression and is a key mechanism to generate the complex proteome of multicellular organisms.
Collapse
Affiliation(s)
- Stefan Stamm
- Institute for Biochemistry, University of Erlangen, Fahrstrasse 17, 91054 Erlangen, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
219
|
Chen YA, Mckillen DJ, Wu S, Jenny MJ, Chapman R, Gross PS, Warr GW, Almeida JS. Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information. BMC Bioinformatics 2004; 5:191. [PMID: 15585062 PMCID: PMC539232 DOI: 10.1186/1471-2105-5-191] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2004] [Accepted: 12/07/2004] [Indexed: 12/04/2022] Open
Abstract
Background Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. Results Sequence similarity between ESTs, quantified by the E-value of pair-wise alignment, was used as a surrogate for expected hybridization between corresponding sequences. Using this value as a measure of dissimilarity, sequence redundancy reduction was performed by hierarchical cluster analyses. The choice of how many microarray probes to retain was made based on an index developed for this research: a sequence diversity index (SDI) within a sequence diversity plot (SDP). This index tracked the decreasing within-cluster sequence diversity as the number of clusters increased. For a given stage in the agglomeration procedure, the EST having the highest similarity to all the other sequences within each cluster, the centroid EST, was selected as a microarray probe. A small dataset of ESTs from Atlantic white shrimp (Litopenaeus setiferus) was used to test this algorithm so that the detailed results could be examined. The functional representative level of the selected probes was quantified using Gene Ontology (GO) annotations. Conclusions For organisms with limited genomic information, combining hierarchical clustering methods to analyze ESTs can yield an optimal cDNA microarray design. If biomarker discovery is the goal of the microarray experiments, the average linkage method is more effective, while single linkage is more suitable if identification of physiological mechanisms is more of interest. This general design procedure is not limited to designing single-species cDNA microarrays for marine organisms, and it can equally be applied to multiple-species microarrays of any organisms with limited genomic information.
Collapse
Affiliation(s)
- Yian A Chen
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - David J Mckillen
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA
| | - Shuyuan Wu
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - Matthew J Jenny
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA
- Marine Biomedicine and Environmental Science Center, Medical University of South Carolina, Charleston, SC, USA
| | - Robert Chapman
- Marine Biomedicine and Environmental Science Center, Medical University of South Carolina, Charleston, SC, USA
- South Carolina Department of Natural Resources, Marine Resources Research Institute, Charleston, SC, USA
| | - Paul S Gross
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA
- Marine Biomedicine and Environmental Science Center, Medical University of South Carolina, Charleston, SC, USA
| | - Gregory W Warr
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA
- Marine Biomedicine and Environmental Science Center, Medical University of South Carolina, Charleston, SC, USA
| | - Jonas S Almeida
- Department of Biostatistics, Bioinformatics, and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
220
|
Kasukawa T, Katayama S, Kawaji H, Suzuki H, Hume DA, Hayashizaki Y. Construction of representative transcript and protein sets of human, mouse, and rat as a platform for their transcriptome and proteome analysis. Genomics 2004; 84:913-21. [PMID: 15533708 DOI: 10.1016/j.ygeno.2004.08.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2004] [Accepted: 08/16/2004] [Indexed: 10/26/2022]
Abstract
The number of mammalian transcripts identified by full-length cDNA projects and genome sequencing projects is increasing remarkably. Clustering them into a strictly nonredundant and comprehensive set provides a platform for functional analysis of the transcriptome and proteome, but the quality of the clustering and predictive usefulness have previously required manual curation to identify truncated transcripts and inappropriate clustering of closely related sequences. A Representative Transcript and Protein Sets (RTPS) pipeline was previously designed to identify the nonredundant and comprehensive set of mouse transcripts based on clustering of a large mouse full-length cDNA set (FANTOM2). Here we propose an alternative method that is more robust, requires less manual curation, and is applicable to other organisms in addition to mouse. RTPSs of human, mouse, and rat have been produced by this method and used for validation. Their comprehensiveness and quality are discussed by comparison with other clustering approaches. The RTPSs are available at .
Collapse
Affiliation(s)
- Takeya Kasukawa
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan.
| | | | | | | | | | | |
Collapse
|
221
|
ESTIMA, a tool for EST management in a multi-project environment. BMC Bioinformatics 2004; 5:176. [PMID: 15527510 PMCID: PMC533868 DOI: 10.1186/1471-2105-5-176] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2004] [Accepted: 11/04/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users. RESULTS A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects. CONCLUSIONS The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http://titan.biotec.uiuc.edu/ESTIMA/. The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.
Collapse
|
222
|
Anderson JV, Delseny M, Fregene MA, Jorge V, Mba C, Lopez C, Restrepo S, Soto M, Piegu B, Verdier V, Cooke R, Tohme J, Horvath DP. An EST resource for cassava and other species of Euphorbiaceae. PLANT MOLECULAR BIOLOGY 2004; 56:527-39. [PMID: 15630617 DOI: 10.1007/s11103-004-5046-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2003] [Accepted: 04/02/2004] [Indexed: 05/18/2023]
Abstract
Cassava (Manihot esculenta) is a major food staple for nearly 600 million people in Africa, Asia, and Latin America. Major losses in yield result from biotic and abiotic stresses that include diseases such as Cassava Mosaic Disease (CMD) and Cassava Bacterial Blight (CBB), drought, and acid soils. Additional losses also occur from deterioration during the post-harvest storage of roots. To help cassava breeders overcome these obstacles, the scientific community has turned to modern genomics approaches to identify key genetic characteristics associated with resistance to these yield-limiting factors. One approach for developing a genomics program requires the development of ESTs (expressed sequence tags). To date, nearly 23,000 ESTs have been developed from various cassava tissues, and genotypes. Preliminary analysis indicates existing EST resources contain at least 6000-7000 unigenes. Data presented in this report indicate that the cassava ESTs will be a valuable resource for the study of genetic diversity, stress resistance, and growth and development, not only in cassava, but also other members of the Euphorbiaceae family.
Collapse
Affiliation(s)
- James V Anderson
- USDA/ARS, Biosciences Research Laboratory, 1605 Albrecht Blvd., P.O. Box 5674, State University Station, Fargo, ND, 58105, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
223
|
Fei Z, Tang X, Alba RM, White JA, Ronning CM, Martin GB, Tanksley SD, Giovannoni JJ. Comprehensive EST analysis of tomato and comparative genomics of fruit ripening. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2004; 40:47-59. [PMID: 15361140 DOI: 10.1111/j.1365-313x.2004.02188.x] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A large tomato expressed sequence tag (EST) dataset (152 635 total) was analyzed to gain insights into differential gene expression among diverse plant tissues representing a range of developmental programs and biological responses. These ESTs were clustered and assembled to a total of 31 012 unique gene sequences. To better understand tomato gene expression at a plant system level and to identify differentially expressed and tissue-specific genes, we developed and implemented a digital expression analysis protocol. By clustering genes according to their relative abundance in the various EST libraries, expression patterns of genes across various tissues were generated and genes with similar patterns were grouped. In addition, tissues themselves were clustered for relatedness based on relative gene expression as a means of validating the integrity of the EST data as representative of relative gene expression. Arabidopsis and grape EST collections were also characterized to facilitate cross-species comparisons where possible. Tomato fruit digital expression data was specifically compared with publicly available grape EST data to gain insight into molecular manifestation of ripening processes across diverse taxa and resulted in identification of common transcription factors not previously associated with ripening.
Collapse
Affiliation(s)
- Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | |
Collapse
|
224
|
Manthey K, Krajinski F, Hohnjec N, Firnhaber C, Pühler A, Perlick AM, Küster H. Transcriptome profiling in root nodules and arbuscular mycorrhiza identifies a collection of novel genes induced during Medicago truncatula root endosymbioses. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2004; 17:1063-77. [PMID: 15497399 DOI: 10.1094/mpmi.2004.17.10.1063] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Transcriptome profiling based on cDNA array hybridizations and in silico screening was used to identify Medicago truncatula genes induced in both root nodules and arbuscular mycorrhiza (AM). By array hybridizations, we detected several hundred genes that were upregulated in the root nodule and the AM symbiosis, respectively, with a total of 75 genes being induced during both interactions. The second approach based on in silico data mining yielded several hundred additional candidate genes with a predicted symbiosis-enhanced expression. A subset of the genes identified by either expression profiling tool was subjected to quantitative real-time reverse-transcription polymerase chain reaction for a verification of their symbiosis-induced expression. That way, induction in root nodules and AM was confirmed for 26 genes, most of them being reported as symbiosis-induced for the first time. In addition to delivering a number of novel symbiosis-induced genes, our approach identified several genes that were induced in only one of the two root endosymbioses. The spatial expression patterns of two symbiosis-induced genes encoding an annexin and a beta-tubulin were characterized in transgenic roots using promoter-reporter gene fusions.
Collapse
Affiliation(s)
- Katja Manthey
- Lehrstuhl für Genetik, Fakultät für Biologie, Universität Bielefeld, Postfach 100131, D-33501 Bielefeld, Germany
| | | | | | | | | | | | | |
Collapse
|
225
|
Nene V, Lee D, Kang'a S, Skilton R, Shah T, de Villiers E, Mwaura S, Taylor D, Quackenbush J, Bishop R. Genes transcribed in the salivary glands of female Rhipicephalus appendiculatus ticks infected with Theileria parva. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2004; 34:1117-1128. [PMID: 15475305 DOI: 10.1016/j.ibmb.2004.07.002] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2004] [Revised: 06/30/2004] [Accepted: 07/01/2004] [Indexed: 05/24/2023]
Abstract
We describe the generation of an auto-annotated index of genes that are expressed in the salivary glands of four-day fed female adult Rhipicephalus appendiculatus ticks. A total of 9162 EST sequences were derived from an uninfected tick cDNA library and 9844 ESTs were from a cDNA library from ticks infected with Theileria parva, which develop in type III salivary gland acini. There were no major differences between abundantly expressed ESTs from the two cDNA libraries, although there was evidence for an up-regulation in the expression of some glycine-rich proteins in infected salivary glands. Gene ontology terms were also assigned to sequences in the index and those with potential enzyme function were linked to the Kyoto encyclopedia of genes and genomes database, allowing reconstruction of metabolic pathways. Several genes code for previously characterized tick proteins such as receptors for myokinin or ecdysteroid and an immunosuppressive protein. cDNAs coding for homologs of heme-lipoproteins which are major components of tick hemolymph were identified by searching the database with published N-terminal peptide sequence data derived from biochemically purified Boophilus microplus proteins. The EST data will be a useful resource for construction of microarrays to probe vector biology, vector-host and vector-pathogen interactions and to underpin gene identification via proteomics approaches.
Collapse
Affiliation(s)
- Vishvanath Nene
- Parasite Genomics Department, The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
226
|
Huang Y, Pumphrey J, Gingle AR. ESTminer: a Web interface for mining EST contig and cluster databases. Bioinformatics 2004; 21:669-70. [PMID: 15374864 DOI: 10.1093/bioinformatics/bti030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. AVAILABILITY ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp CONTACT agingle@uga.edu.
Collapse
Affiliation(s)
- Yecheng Huang
- Center for Applied Genetic Technologies, University of Georgia 111 Riverbend Road, Athens, GA 30602, USA
| | | | | |
Collapse
|
227
|
Harrington ED, Boue S, Valcarcel J, Reich JG, Bork P. Estimating rates of alternative splicing in mammals and invertebrates. Nat Genet 2004. [DOI: 10.1038/ng0904-916] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
228
|
Simko I. One potato, two potato: haplotype association mapping in autotetraploids. TRENDS IN PLANT SCIENCE 2004; 9:441-8. [PMID: 15337494 DOI: 10.1016/j.tplants.2004.07.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Affiliation(s)
- Ivan Simko
- USDA-ARS, Vegetable Laboratory, Bldg 010A, 10300 Baltimore Avenue, Beltsville, MD 20705, USA.
| |
Collapse
|
229
|
Zhang X, Fowler SG, Cheng H, Lou Y, Rhee SY, Stockinger EJ, Thomashow MF. Freezing-sensitive tomato has a functional CBF cold response pathway, but a CBF regulon that differs from that of freezing-tolerant Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2004; 39:905-19. [PMID: 15341633 DOI: 10.1111/j.1365-313x.2004.02176.x] [Citation(s) in RCA: 255] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Many plants increase in freezing tolerance in response to low temperature, a process known as cold acclimation. In Arabidopsis, cold acclimation involves action of the CBF cold response pathway. Key components of the pathway include rapid cold-induced expression of three homologous genes encoding transcriptional activators, CBF1, 2 and 3 (also known as DREB1b, c and a, respectively), followed by expression of CBF-targeted genes, the CBF regulon, that increase freezing tolerance. Unlike Arabidopsis, tomato cannot cold acclimate raising the question of whether it has a functional CBF cold response pathway. Here we show that tomato, like Arabidopsis, encodes three CBF homologs, LeCBF1-3 (Lycopersicon esculentum CBF1-3), that are present in tandem array in the genome. Only the tomato LeCBF1 gene, however, was found to be cold-inducible. As is the case for Arabidopsis CBF1-3, transcripts for LeCBF1-3 did accumulate in response to mechanical agitation, but not in response to drought, ABA or high salinity. Constitutive overexpression of LeCBF1 in transgenic Arabidopsis plants induced expression of CBF-targeted genes and increased freezing tolerance indicating that LeCBF1 encodes a functional homolog of the Arabidopsis CBF1-3 proteins. However, constitutive overexpression of either LeCBF1 or AtCBF3 in transgenic tomato plants did not increase freezing tolerance. Gene expression studies, including the use of a cDNA microarray representing approximately 8000 tomato genes, identified only four genes that were induced 2.5-fold or more in the LeCBF1 or AtCBF3 overexpressing plants, three of which were putative members of the tomato CBF regulon as they were also upregulated in response to low temperature. Additional experiments indicated that of eight tomato genes that were likely orthologs of Arabidopsis CBF regulon genes, none were responsive to CBF overexpression in tomato. From these results, we conclude that tomato has a complete CBF cold response pathway, but that the tomato CBF regulon differs from that of Arabidopsis and appears to be considerably smaller and less diverse in function.
Collapse
Affiliation(s)
- Xin Zhang
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA
| | | | | | | | | | | | | |
Collapse
|
230
|
Usadel B, Schlüter U, Mølhøj M, Gipmans M, Verma R, Kossmann J, Reiter WD, Pauly M. Identification and characterization of a UDP-D-glucuronate 4-epimerase in Arabidopsis. FEBS Lett 2004; 569:327-31. [PMID: 15225656 DOI: 10.1016/j.febslet.2004.06.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 05/28/2004] [Accepted: 06/01/2004] [Indexed: 11/19/2022]
Abstract
One of the major sugars present in the plant cell wall is d-galacturonate, the dominant monosaccharide in pectic polysaccharides. Previous work indicated that one of the activated precursors necessary for the synthesis of pectins is UDP-d-galacturonate, which is synthesized from UDP-d-glucuronate by a UDP-d-glucuronate 4-epimerase (GAE). Here, we report the identification, cloning and characterization of a GAE6 from Arabidopsis thaliana. Functional analysis revealed that this enzyme converts UDP-d-glucuronate to UDP-d-galacturonate in vitro. An expression analysis of this epimerase and its five homologs in the Arabidopsis genome by quantitative RT-PCR and promoter::GUS fusions indicated differential expression of the family members in plant tissues and expression of all isoforms in the developing pollen of A. thaliana.
Collapse
Affiliation(s)
- Björn Usadel
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Golm, Germany
| | | | | | | | | | | | | | | |
Collapse
|
231
|
Roche FM, Hokamp K, Acab M, Babiuk LA, Hancock REW, Brinkman FSL. ProbeLynx: a tool for updating the association of microarray probes to genes. Nucleic Acids Res 2004; 32:W471-4. [PMID: 15215432 PMCID: PMC441590 DOI: 10.1093/nar/gkh452] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
As genome sequence data and gene prediction improve, probes developed for a given microarray experiment should be continuously re-evaluated for their specificity for given genes. ProbeLynx(www.pathogenomics.ca/probelynx) is a new web service which uses current genomic sequence information to re-examine microarray probe specificity and provide annotation updates relevant to determining which gene(s) and transcript(s) are associated with a given probe. Probe sequences (either oligonucleotide- or cDNA-based) are uploaded in FASTA format and the results returned as a tab-delimited flat file for insertion into a spreadsheet application or database management system for further analysis. ProbeLynx has been initially developed to focus on arrays derived from human, mouse, chicken and bovine genomes, but may be expanded to handle other genomic datasets. ProbeLynx offers microarray users the important ability to continuously assess the potential of a probe to cross-hybridize to paralogous genes and the suitability of a given probe to investigate a transcript of interest. By also including the latest gene function annotation information in the output, ProbeLynx provides the critical first step in updating microarray data annotation.
Collapse
Affiliation(s)
- Fiona M Roche
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | | | | | | | | | | |
Collapse
|
232
|
Leipzig J, Pevzner P, Heber S. The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res 2004; 32:3977-83. [PMID: 15292448 PMCID: PMC506815 DOI: 10.1093/nar/gkh731] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes show a huge abundance of different splice variants. We use a different approach and integrate all transcripts derived from a gene into a single splicing graph. Each transcript corresponds to a path in the graph, and alternative splicing is displayed by bifurcations. This representation preserves the relationships between different splicing variants and allows us to investigate systematically all possible putative transcripts. We built a database of splicing graphs for human genes, using transcript information from various major sources (Ensembl, RefSeq, STACK, TIGR and UniGene). A Web interface allows users to display the splicing graphs, to interactively assemble transcripts and to access their sequences as well as neighboring genomic regions. We also provide for each gene an exhaustive pre-computed catalog of putative transcripts--in total more than 1.2 million sequences. We found that approximately 65% of the investigated genes show evidence for alternative splicing, and in 5% of the cases, a single gene might produce over 100 transcripts.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Department of Computer Science, College of Engineering, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | | | |
Collapse
|
233
|
Clegg N, Abbott D, Ferguson C, Coleman R, Nelson PS. Characterization and comparative analyses of transcriptomes from the normal and neoplastic human prostate. Prostate 2004; 60:227-39. [PMID: 15176052 DOI: 10.1002/pros.20055] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
BACKGROUND The prostate gland is a highly specialized organ with functional attributes that serve to enhance the fertility of mammalian species. Pathological processes affecting the prostate include benign prostate hypertrophy and prostate carcinoma; diseases that account for major morbidity and mortality in middle-aged and elderly men. To facilitate studies of biological processes uniquely represented in the prostate and assess molecular alterations associated with prostate carcinoma, we sought to establish the diversity of gene expression in the normal and neoplastic prostate through the compilation and analysis of a prostate transcriptome. METHODS We assembled and annotated ESTs derived from prostate cDNA libraries that were either produced in our laboratory or available from public sequence repositories such as CGAP, dbEST, and Unigene. Determinations of differential gene expression between the normal prostate, other normal tissues, and neoplastic prostate tissues was performed using statistical algorithms. Confirmation of differential expression was performed by quantitative PCR and Northern analysis. RESULTS A total of 99,448 high-quality ESTs were assembled and annotated to produce a prostate transcriptome comprised of 24,580 distinct TUs. Comparative analyses of gene expression levels identified 61 TUs with exclusive expression in the prostate and 45 TUs with high levels of expression in the prostate relative to at least 25 other normal tissues (P > 0.99). Comparative analyses of ESTs derived from neoplastic prostate tissues identified 75 genes with dysregulated expression in cancer (P > 0.99). CONCLUSIONS The human prostate expresses a diverse repertoire of genes that reflect a functionally complex organ. The identification of genes with prostate-restricted or enhanced expression may provide additional insights into the biochemical processes that interact to form the developmental, signaling, and metabolic pathways of the normal and neoplastic gland.
Collapse
Affiliation(s)
- Nigel Clegg
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | | | | | | | | |
Collapse
|
234
|
Lee YH, Kim YH, Park JG. Identification of genes involved in liver cancer cell growth using an antisense library of phage genomic DNA. Cancer Res Treat 2004; 36:246-54. [PMID: 20368842 DOI: 10.4143/crt.2004.36.4.246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2004] [Accepted: 07/22/2004] [Indexed: 11/21/2022] Open
Abstract
PURPOSE Genes involved in liver cancer cell growth have been identified using an antisense library of large circular (LC-) genomic DNA of a recombinant M13 phage. MATERIALS AND METHODS A subtracted cDNA library was constructed by combining procedures of suppression subtractive hybridization (SSH) and unidirectional cloning of the subtracted cDNA into an M13 phagemid vector. Utilizing the life cycle of M13 bacteriophages, LC-antisense molecules derived from 1,200 random cDNA clones selected by size were prepared from the culture supernatant of bacterial transformants. The antisense molecules were arrayed for transfection on 96-well plates preseeded with HepG2. RESULTS When examined for growth inhibition after antisense transfection, 153 out of 1,200 LC-antisense molecules showed varying degrees of growth inhibitory effect to HepG2 cells. Sequence comparison of the 153 clones identified 58 unique genes. The observations were further extended by other cell-based assays. CONCLUSION These results suggest that the LC-antisense library offers potential for unique high-throughput screening to find genes involved in a specific biological function, and may prove to be an effective target validation system for gene-based drug discovery.
Collapse
|
235
|
Affiliation(s)
- Elodie Ghedin
- Parasite Genomics, The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | |
Collapse
|
236
|
Mitra RM, Shaw SL, Long SR. Six nonnodulating plant mutants defective for Nod factor-induced transcriptional changes associated with the legume-rhizobia symbiosis. Proc Natl Acad Sci U S A 2004; 101:10217-22. [PMID: 15220482 PMCID: PMC454190 DOI: 10.1073/pnas.0402186101] [Citation(s) in RCA: 128] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
As the legume-rhizobia symbiosis is established, the plant recognizes bacterial-signaling molecules, Nod factors (NFs), and initiates transcriptional and developmental changes within the root to allow bacterial invasion and the construction of a novel organ, the nodule. Plant mutants defective in nodule initiation (Nod(-)) are thought to have defects in NF-signal transduction. However, it is unknown whether WT plants respond to NF-independent bacterial-derived signals or whether Nod(-) plant mutants show defects in global symbiosis-associated gene expression. To characterize plant gene expression in the establishment of the symbiosis, we used an Affymetrix oligonucleotide microarray representing 9,935 Medicago truncatula expressed sequences. We identified 46 sequences that are differentially expressed in plants exposed for 24 h to WT Sinorhizobium meliloti or to the invasion defective S. meliloti mutant, exoA. Eight of these genes encode nucleolar proteins, which are implicated in ribosome biogenesis. We also identified differentially expressed transcription factors, signaling components, defense response proteins, stress response proteins, and several previously uncharacterized genes. NF appears both necessary and sufficient to induce most changes. Six of seven Nod(-) M. truncatula mutants (nfp, dmi1, dmi2, dmi3, nsp1, and nsp2) showed no transcriptional response to S. meliloti, suggesting that the encoded proteins are required for initiating new transcription. The Nod(-) mutant hcl, however, exhibits a reduced transcriptional response to S. meliloti, indicating that the machinery responsible for initiating new transcription is at least partially functional in this mutant.
Collapse
Affiliation(s)
- Raka M Mitra
- Department of Biological Sciences, 371 Serra Mall, Stanford University, CA 94305-5020, USA
| | | | | |
Collapse
|
237
|
Graham MA, Silverstein KAT, Cannon SB, VandenBosch KA. Computational identification and characterization of novel genes from legumes. PLANT PHYSIOLOGY 2004; 135:1179-97. [PMID: 15266052 PMCID: PMC519039 DOI: 10.1104/pp.104.037531] [Citation(s) in RCA: 128] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2003] [Revised: 04/01/2004] [Accepted: 04/03/2004] [Indexed: 05/18/2023]
Abstract
The Fabaceae, the third largest family of plants and the source of many crops, has been the target of many genomic studies. Currently, only the grasses surpass the legumes for the number of publicly available expressed sequence tags (ESTs). The quantity of sequences from diverse plants enables the use of computational approaches to identify novel genes in specific taxa. We used BLAST algorithms to compare unigene sets from Medicago truncatula, Lotus japonicus, and soybean (Glycine max and Glycine soja) to nonlegume unigene sets, to GenBank's nonredundant and EST databases, and to the genomic sequences of rice (Oryza sativa) and Arabidopsis. As a working definition, putatively legume-specific genes had no sequence homology, below a specified threshold, to publicly available sequences of nonlegumes. Using this approach, 2,525 legume-specific EST contigs were identified, of which less than three percent had clear homology to previously characterized legume genes. As a first step toward predicting function, related sequences were clustered to build motifs that could be searched against protein databases. Three families of interest were more deeply characterized: F-box related proteins, Pro-rich proteins, and Cys cluster proteins (CCPs). Of particular interest were the >300 CCPs, primarily from nodules or seeds, with predicted similarity to defensins. Motif searching also identified several previously unknown CCP-like open reading frames in Arabidopsis. Evolutionary analyses of the genomic sequences of several CCPs in M. truncatula suggest that this family has evolved by local duplications and divergent selection.
Collapse
Affiliation(s)
- Michelle A Graham
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | | | | | | |
Collapse
|
238
|
Hudson TC, Stapleton AE, Brown JL. Codifying bioinformatics processes without programming. ACTA ACUST UNITED AC 2004. [DOI: 10.1016/s1741-8364(04)02410-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
239
|
Küster H, Hohnjec N, Krajinski F, El YF, Manthey K, Gouzy J, Dondrup M, Meyer F, Kalinowski J, Brechenmacher L, van Tuinen D, Gianinazzi-Pearson V, Pühler A, Gamas P, Becker A. Construction and validation of cDNA-based Mt6k-RIT macro- and microarrays to explore root endosymbioses in the model legume Medicago truncatula. J Biotechnol 2004; 108:95-113. [PMID: 15129719 DOI: 10.1016/j.jbiotec.2003.11.011] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
To construct macro- and microarray tools suitable for expression profiling in root endosymbioses of the model legume Medicago truncatula, we PCR-amplified a total of 6048 cDNA probes representing genes expressed in uninfected roots, mycorrhizal roots and young root nodules [Nucleic Acids Res. 30 (2002) 5579]. Including additional probes for either tissue-specific or constitutively expressed control genes, 5651 successfully amplified gene-specific probes were used to grid macro- and to spot microarrays designated Mt6k-RIT (M. truncatula 6k root interaction transcriptome). Subsequent to a technical validation of microarray printing, we performed two pilot expression profiling experiments using Cy-labeled targets from Sinorhizobium meliloti-induced root nodules and Glomus intraradices-colonized arbuscular mycorrhizal roots. These targets detected marker genes for nodule and arbuscular mycorrhiza development, amongst them different nodule-specific leghemoglobin and nodulin genes as well as a mycorrhiza-specific phosphate transporter gene. In addition, we identified several dozens of genes that have so far not been reported to be differentially expressed in nodules or arbuscular mycorrhiza thus demonstrating that Mt6k-RIT arrays serve as useful tools for an identification of genes relevant for legume root endosymbioses. A comprehensive profiling of such candidate genes will be very helpful to the development of breeding strategies and for the improvement of cultivation management targeted at increasing legume use in sustainable agricultural systems.
Collapse
Affiliation(s)
- Helge Küster
- Lehrstuhl für Genetik, Fakultät für Biologie, Universität Bielefeld, Postfach 100131, Bielefeld D-33501, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
240
|
Whitworth K, Springer GK, Forrester LJ, Spollen WG, Ries J, Lamberson WR, Bivens N, Murphy CN, Mathialagan N, Mathialigan N, Green JA, Prather RS. Developmental expression of 2489 gene clusters during pig embryogenesis: an expressed sequence tag project. Biol Reprod 2004; 71:1230-43. [PMID: 15175238 DOI: 10.1095/biolreprod.104.030239] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
Abstract
Identification of mRNAs that are present at early stages of embryogenesis is critical for a better understanding of development. To this end, cDNA libraries were constructed from germinal vesicle-stage oocytes, in vivo-produced four-cell- and blastocyst-stage embryos, and from in vitro-produced four-cell- and blastocyst-stage embryos. Randomly picked clones (10 848) were sequenced from the 3' end and those of sufficient quality (8066, 74%) were clustered into groups of sequence similarity (>95% identity), resulting in 2489 clusters. The sequence of the longest representative expressed sequence tag (EST) of each cluster was compared with GenBank and TIGR. Scores below 200 were considered unique, and 1114 (44.8%) did not have a match in either database. Sequencing from the 5' end yielded 12 of 37 useful annotations, suggesting that one third of the 1114 might be identifiable, still leaving over 700 unique ESTs. Virtual Northerns compared between the stages identified numerous genes where expression appears to change from the germinal vesicle oocyte to the four-cell stage, from the four-cell to blastocyst stage, and between in vitro- and in vivo-derived four-cell- and blastocyst-stage embryos. This is the first large-scale sequencing project on early pig embryogenesis and has resulted in the discovery of a large number of genes as well as possible stage-specific expression. Because many of these ESTs appear to not be in the public databases, their addition will be useful for transcriptional profiling experiments conducted on early pig embryos.
Collapse
Affiliation(s)
- Kristin Whitworth
- Department of Animal Science, University of Missouri-Columbia, Columbia, MO 65211, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
241
|
LeClere S, Rampey RA, Bartel B. IAR4, a gene required for auxin conjugate sensitivity in Arabidopsis, encodes a pyruvate dehydrogenase E1alpha homolog. PLANT PHYSIOLOGY 2004; 135:989-99. [PMID: 15173569 PMCID: PMC514133 DOI: 10.1104/pp.104.040519] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2004] [Revised: 03/20/2004] [Accepted: 03/21/2004] [Indexed: 05/17/2023]
Abstract
The formation and hydrolysis of indole-3-acetic acid (IAA) conjugates represent a potentially important means for plants to regulate IAA levels and thereby auxin responses. The identification and characterization of mutants defective in these processes is advancing the understanding of auxin regulation and response. Here we report the isolation and characterization of the Arabidopsis iar4 mutant, which has reduced sensitivity to several IAA-amino acid conjugates. iar4 is less sensitive to a synthetic auxin and low concentrations of an ethylene precursor but responds to free IAA and other hormones tested similarly to wild type. The gene defective in iar4 encodes a homolog of the E1alpha-subunit of mitochondrial pyruvate dehydrogenase, which converts pyruvate to acetyl-coenzyme A. We did not detect glycolysis or Krebs-cycle-related defects in the iar4 mutant, and a T-DNA insertion in the IAR4 coding sequence conferred similar phenotypes as the originally identified missense allele. In contrast, we found that disruption of the previously described mitochondrial pyruvate dehydrogenase E1alpha-subunit does not alter IAA-Ala responsiveness or confer any obvious phenotypes. It is possible that IAR4 acts in the conversion of indole-3-pyruvate to indole-3-acetyl-coenzyme A, which is a potential precursor of IAA and IAA conjugates.
Collapse
Affiliation(s)
- Sherry LeClere
- Department of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, USA
| | | | | |
Collapse
|
242
|
Fox SA, Loh S, Thean AL, Garlepp MJ. Identification of differentially expressed genes in murine mesothelioma cell lines of differing tumorigenicity using suppression subtractive hybridization. Biochim Biophys Acta Mol Basis Dis 2004; 1688:237-44. [PMID: 15062874 DOI: 10.1016/j.bbadis.2003.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2003] [Revised: 12/12/2003] [Accepted: 12/16/2003] [Indexed: 02/06/2023]
Abstract
We have previously prepared two B7-1 transfectant clones (AC29 B7-6 and AC29 B7-7) from the AC29 murine mesothelioma (MM) cell line which displayed markedly different in vivo growth rates and susceptibility to cytotoxic T cell killing. Using suppression subtractive hybridisation (SSH), we searched for factors which may determine the biological distinction seen in these clones. We isolated 19 cDNA clones from two SSH generated libraries by screening using subtracted cDNA probes and characterised them using Northern hybridisation, sequencing, RT-PCR and real-time RT-PCR. The 19 cDNA clones comprised 16 different transcripts of which 15 were identified by homology to known genes and one was novel. Expression of a murine endogenous retroviral (mERV) transcript mERV-AC29 was found in the immunogenic AC29 B7-6 clone and parental AC29 but absent in AC29 B7-7. Real-time RT-PCR was used to confirm that galectin-1, the disintegrin/metalloproteinase MDC9 and ribonucleotide reductase M1 were overexpressed in AC29 B7-7. Our results show that SSH is a powerful method for the identification of genes expressed differentially between phenotypically different tumour cell lines or clones. Characterisation of the role of those identified here will provide useful information in understanding genes responsible for differential tumorigenicity.
Collapse
Affiliation(s)
- Simon A Fox
- Pharmacogenetics Laboratory, School of Pharmacy, Curtin University of Technology, P.O. Box U1987, Perth, WA 6001, Australia
| | | | | | | |
Collapse
|
243
|
Reinhardt A, Eisenberg D. DPANN: Improved sequence to structure alignments following fold recognition. Proteins 2004; 56:528-38. [PMID: 15229885 DOI: 10.1002/prot.20144] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.
Collapse
|
244
|
Morris JK, Willard BB, Yin X, Jeserich G, Kinter M, Trapp BD. The 36K protein of zebrafish CNS myelin is a short-chain dehydrogenase. Glia 2004; 45:378-91. [PMID: 14966869 DOI: 10.1002/glia.10338] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Previous studies identified homologues to mammalian myelin genes expressed in the teleost central nervous system (CNS), including myelin basic protein (MBP), protein zero (P0), and a member of the proteolipid protein family, DM20. In addition, an uncharacterized 36-kDa (36K) protein is a major component of teleost myelin, but is not a major component of myelin in other species. In the present study, we sought to better understand myelin proteins and myelination in one teleost, zebrafish, by molecular characterization of the zebrafish 36K protein. Purified zebrafish CNS myelin was isolated and the amino acid sequences of peptides present in the 36-kDa band were determined by mass spectrometry. These sequences matched a previously uncharacterized EST in The Institute for Genome Research (TIGR) zebrafish database that is related to the short-chain dehydrogenase/reductase (SDR) protein family. In vitro expression of the zebrafish 36K cDNA in Neuro 2a cells resulted in a protein product that was recognized by a 36K polyclonal antibody. The zebrafish 36K mRNA and protein expression patterns were determined and correlated to other known myelin gene expression profiles. In addition, we determined by in situ hybridization that a human 36K homologue (FLJ13639) is expressed in oligodendrocytes and neurons in the adult human cortex. This study identified a major myelin protein in zebrafish, 36K, as a member of the SDR superfamily; an expression pattern similar to other myelin genes was demonstrated.
Collapse
Affiliation(s)
- Jacqueline K Morris
- Department of Neurosciences, Cleveland Clinic Foundation, Lerner Research Institute, Cleveland, Ohio 44195, USA.
| | | | | | | | | | | |
Collapse
|
245
|
Abstract
It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | | | |
Collapse
|
246
|
Mitreva M, McCarter JP, Martin J, Dante M, Wylie T, Chiapelli B, Pape D, Clifton SW, Nutman TB, Waterston RH. Comparative genomics of gene expression in the parasitic and free-living nematodes Strongyloides stercoralis and Caenorhabditis elegans. Genome Res 2004; 14:209-20. [PMID: 14762059 PMCID: PMC327096 DOI: 10.1101/gr.1524804] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Although developmental timing of gene expression is used to infer potential gene function, studies have yet to correlate this information between species. We analyzed 10,921 ESTs in 3311 clusters from first- and infective third-stage larva (L1, L3i) of the parasitic nematode Strongyloides stercoralis and compared the results to Caenorhabditis elegans, a species that has an L3i-like dauer stage. In the comparison of S. stercoralis clusters with stage-specific expression to C. elegans homologs expressed in either dauer or nondauer stages, matches between S. stercoralis L1 and C. elegans nondauer-expressed genes dominated, suggesting conservation in the repertoire of genes expressed during growth in nutrient-rich conditions. For example, S. stercoralis collagen transcripts were abundant in L1 but not L3i, a pattern consistent with C. elegans collagens. Although a greater proportion of S. stercoralis L3i than L1 genes have homologs among the C. elegans dauer-specific transcripts, we did not uncover evidence of a robust conserved L3i/dauer 'expression signature.' Strikingly, in comparisons of S. stercoralis clusters to C. elegans homologs with RNAi knockouts, those with significant L1-specific expression were more than twice as likely as L3i-specific clusters to match genes with phenotypes. We also provide functional classifications of S. stercoralis clusters.
Collapse
Affiliation(s)
- Makedonka Mitreva
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
247
|
Xing Y, Resch A, Lee C. The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res 2004; 14:426-41. [PMID: 14962984 PMCID: PMC353230 DOI: 10.1101/gr.1304504] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2003] [Accepted: 12/01/2003] [Indexed: 12/28/2022]
Abstract
Recent evidence of abundant transcript variation (e.g., alternative splicing, alternative initiation, alternative polyadenylation) in complex genomes indicates that cataloging the complete set of transcripts from an organism is an important project. One challenge is the fact that most high-throughput experimental methods for characterizing transcripts (such as EST sequencing) give highly detailed information about short fragments of transcripts or protein products, instead of a complete characterization of a full-length form. We analyze this "multiassembly problem"-reconstructing the most likely set of full-length isoform sequences from a mixture of EST fragment data-and present a graph-based algorithm for solving it. In a variety of tests, we demonstrate that this algorithm deals appropriately with coupling of distinct alternative splicing events, increasing fragmentation of the input data and different types of transcript variation (such as alternative splicing, initiation, polyadenylation, and intron retention). To test the method's performance on pure fragment (EST) data, we removed all mRNA sequences, and found it produced no errors in 40 cases tested. Using this algorithm, we have constructed an Alternatively Spliced Proteins database (ASP) from analysis of human expressed and genomic sequences, consisting of 13,384 protein isoforms of 4422 genes, yielding an average of 3.0 protein isoforms per gene.
Collapse
Affiliation(s)
- Yi Xing
- UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute and Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California 90095-1570, USA
| | | | | |
Collapse
|
248
|
Vincentz M, Cara FAA, Okura VK, da Silva FR, Pedrosa GL, Hemerly AS, Capella AN, Marins M, Ferreira PC, França SC, Grivet L, Vettore AL, Kemper EL, Burnquist WL, Targon MLP, Siqueira WJ, Kuramae EE, Marino CL, Camargo LEA, Carrer H, Coutinho LL, Furlan LR, Lemos MVF, Nunes LR, Gomes SL, Santelli RV, Goldman MH, Bacci M, Giglioti EA, Thiemann OH, Silva FH, Van Sluys MA, Nobrega FG, Arruda P, Menck CFM. Evaluation of monocot and eudicot divergence using the sugarcane transcriptome. PLANT PHYSIOLOGY 2004; 134:951-9. [PMID: 15020759 PMCID: PMC389918 DOI: 10.1104/pp.103.033878] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Over 40,000 sugarcane (Saccharum officinarum) consensus sequences assembled from 237,954 expressed sequence tags were compared with the protein and DNA sequences from other angiosperms, including the genomes of Arabidopsis and rice (Oryza sativa). Approximately two-thirds of the sugarcane transcriptome have similar sequences in Arabidopsis. These sequences may represent a core set of proteins or protein domains that are conserved among monocots and eudicots and probably encode for essential angiosperm functions. The remaining sequences represent putative monocot-specific genetic material, one-half of which were found only in sugarcane. These monocot-specific cDNAs represent either novelties or, in many cases, fast-evolving sequences that diverged substantially from their eudicot homologs. The wide comparative genome analysis presented here provides information on the evolutionary changes that underlie the divergence of monocots and eudicots. Our comparative analysis also led to the identification of several not yet annotated putative genes and possible gene loss events in Arabidopsis.
Collapse
Affiliation(s)
- Michel Vincentz
- Centro de Biologia Molecular e Engenharia Genética, Universidade de Campinas, Caixa Postal 6010, 13083-970, Campinas SP, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
249
|
Klee EW, Carlson DF, Fahrenkrug SC, Ekker SC, Ellis LBM. Identifying secretomes in people, pufferfish and pigs. Nucleic Acids Res 2004; 32:1414-21. [PMID: 14990746 PMCID: PMC390277 DOI: 10.1093/nar/gkh286] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The proteins processed by the secretory pathway (secretome) are critical players in the development of multi-cellular eukaryotic organisms but have yet to be comprehensively studied at the genomic level. In this study, we use the Target P algorithm to predict human (13-20% of proteins found in individual datasets) and Fugu (14%) secretomes based on analysis of their nearly complete proteomes. We combine internal processing with prediction software to automate secreted protein identification and overcome one of the major challenges associated with EST data: identification of the minority of clones that encode N-terminally-complete proteins. We discuss the use of these methods to predict secreted proteins in EST-based consensus sequence sets, and we validate these predictions using an assay for cell-free cotranslational translocation. Analysis of TIGR Porcine Gene Index 4.0 as a test dataset resulted in the identification of 352 N-terminally-complete, putative secreted proteins. In functional agreement with our predictions, 34 of 40 (85%) of these cDNAs were verified to be cotranslationally translocated in an in vitro translation system. The methods developed here are specifically designed to accept partial open reading frames and improve secreted protein predictions in eukaryotic transcriptomes, and are valuable for the analysis and annotation of eukaryotic EST databases.
Collapse
Affiliation(s)
- Eric W Klee
- Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN 55455, USA
| | | | | | | | | |
Collapse
|
250
|
Allen JE, Pertea M, Salzberg SL. Computational gene prediction using multiple sources of evidence. Genome Res 2004; 14:142-8. [PMID: 14707176 PMCID: PMC314291 DOI: 10.1101/gr.1562804] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.
Collapse
Affiliation(s)
- Jonathan E Allen
- The Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | |
Collapse
|