101
|
Abstract
Mitochondrial transcripts of two ascidian species were reconstructed through sequence assembly of publicly available ESTs resembling mitochondrial DNA sequences (mt-ESTs). This strategy allowed us to analyze processing and mapping of the mitochondrial transcripts and to investigate the gene organization of a previously uncharacterized mitochondrial genome (mtDNA). This new strategy would greatly facilitate the sequencing and annotation of mtDNAs. In Ciona intestinalis, the assembled mt-ESTs covered 22 mitochondrial genes ( approximately 12,000 bp) and provided the partial sequence of the mtDNA and the prediction of its gene organization. Such sequences were confirmed by amplification and sequencing of the entire Ciona mtDNA. For Halocynthia roretzi, for which the mtDNA sequence was already available, the inferred mt transcripts allowed better definition of gene boundaries (16S rRNA, ND1, ATP6, and tRNA-Ser genes) and the identification of a new gene (an additional Phe-tRNA). In both species, polycistronic and immature transcripts, creation of stop codons by polyadenylation, tRNA signal processing, and rRNA transcript termination signals were identified, thus suggesting that the main features of mitochondrial transcripts are conserved in Chordata.
Collapse
Affiliation(s)
- Carmela Gissi
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università di Milano, Milano, Italy
| | | |
Collapse
|
102
|
Sato T, Mishina M. Representational difference analysis, high-resolution physical mapping, and transcript identification of the zebrafish genomic region for a motor behavior. Genomics 2003; 82:218-29. [PMID: 12837271 DOI: 10.1016/s0888-7543(03)00071-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Zebrafish is one of the best model organisms for investigating gene functions in vertebrates. By 4,5',8-trimethylpsoralen mutagenesis, we isolated a zebrafish mutant, vibrato, with defects in the spontaneous contraction and touch response. Whole genome subtraction between the wild-type and the mutant genomes by representational difference analysis yielded polymorphic markers tightly linked to the vibrato locus. Using these markers, we constructed a high-resolution physical map and localized the vibrato locus within a genomic region of 720 kb. Direct cDNA selection with the contig led to the identification of a novel gene, solo, encoding a protein with SEC14 and spectrin repeat domains. These domains of Solo shared significant amino acid sequence identities with those of mammalian Trio and Karilin. In addition, we found the zebrafish orthologs for mammalian TTN, COL5A2, and CED-6 in the vibrato region. Mapping of these genes localized human chromosomal regions possibly involved in motor disorders. Our results suggest that representational difference analysis provides an efficient way to isolate mutated genomic regions in zebrafish.
Collapse
Affiliation(s)
- Tomomi Sato
- Department of Molecular Neurobiology and Pharmacology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | | |
Collapse
|
103
|
Pedra JHF, Brandt A, Westerman R, Lobo N, Li HM, Romero-Severson J, Murdock LL, Pittendrigh BR. Transcriptome analysis of the cowpea weevil bruchid: identification of putative proteinases and alpha-amylases associated with food breakdown. INSECT MOLECULAR BIOLOGY 2003; 12:405-12. [PMID: 12864920 DOI: 10.1046/j.1365-2583.2003.00425.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
We describe here the first systematic work to discover insect genes involved in food breakdown using a cDNA library enriched for gut-expressed transcripts from Callosobruchus maculatus. A total of 1056 clones were screened for cDNA insert-containing plasmids, and 503 nonredundant open reading frames were discovered. Twenty-three inferred genes potentially involved in digestive processes in cowpea weevil were identified, including proteinases and amylases. The predicted catalytic sites were identified in the inferred cysteine and aspartic acid proteinases, and in alpha-amylases. Transcriptome analysis of the cowpea bruchid will potentially permit gene discovery in other beetles, an insect order of major economic and ecological importance that is poorly represented in genomic databases.
Collapse
Affiliation(s)
- J H F Pedra
- Indiana Center for Insect Genomics (ICIG), University of Notre Dame, Notre Dame, IN, USA
| | | | | | | | | | | | | | | |
Collapse
|
104
|
Rudd S. Expressed sequence tags: alternative or complement to whole genome sequences? TRENDS IN PLANT SCIENCE 2003; 8:321-9. [PMID: 12878016 DOI: 10.1016/s1360-1385(03)00131-6] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Over three million sequences from approximately 200 plant species have been deposited in the publicly available plant expressed sequence tag (EST) sequence databases. Many of the ESTs have been sequenced as an alternative to complete genome sequencing or as a substrate for cDNA array-based expression analyses. This creates a formidable resource from both biodiversity and gene-discovery standpoints. Bioinformatics-based sequence analysis tools have extended the scope of EST analysis into the fields of proteomics, marker development and genome annotation. Although EST collections are certainly no substitute for a whole genome scaffold, this "poor man's genome" resource forms the core foundations for various genome-scale experiments within the as yet unsequenceable plant genomes.
Collapse
Affiliation(s)
- Stephen Rudd
- Institut für Bioinformatik, GSF Forschungszentrum für Umwelt und Gesundheit, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany.
| |
Collapse
|
105
|
Abstract
gp38k (CHI3L1) is a secreted heparin-binding glycoprotein whose expression, in vitro, is associated with vascular smooth muscle cell (VSMC) migration and invasion into the underlying gelatinous matrix. gp38k is expressed at high levels in postconfluent "nodular" VSMC cultures and at low levels in subconfluent proliferating cultures. In vivo, expression of gp38k homologs is high in regions of tissue remodeling and now has been detected in atherosclerotic plaques and in the developing heart. We tested the hypothesis that gp38k functions to modulate VSMC adhesion and migration. By use of modified Boyden chambers, gp38k at a concentration as low as 1 ng/ml has profound effects on VSMC migration but little or no effect on fibroblast migration. In addition, gp38k adsorbed to polystyrene surfaces directly promotes VSMC attachment and spreading. Attachment is inhibited in the presence of affinity-purified anti-gp38k or 10 mM EDTA. These results establish that gp38k is a new vascular cell adhesion and migration factor that may have a role in processes leading to vascular occlusion and heart development. gp38k may interact with VSMC via an EDTA-sensitive mechanism consistent with integrin mediated cell-matrix interaction.
Collapse
Affiliation(s)
- Kimi C Nishikawa
- Department of Biological Sciences, 1400 Washington Avenue, University at Albany-SUNY, 1400 Washington Avenue, Albany, NY 12222, USA
| | | |
Collapse
|
106
|
Dunn JR, Risk JM, Langan JE, Marlee D, Ellis A, Campbell F, Watson AJM, Field JK. Physical and transcript map of the minimally deleted region III on 17p implicated in the early development of Barrett's oesophageal adenocarcinoma. Oncogene 2003; 22:4134-42. [PMID: 12821948 DOI: 10.1038/sj.onc.1206466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Allelic imbalance (AI) studies on chromosome 17 (C17) in Barrett's oesophageal adenocarcinoma (BOA) tumours strongly suggest that a minimally deleted region on C17p harbours a BOA-associated gene with tumour suppressor function. This deleted region, designated minimal region III (MRIII), lies between the two microsatellite markers D17S1852 and D17S954. Computational sequence analysis techniques, BLAST and NIX, were used to assemble a physical map of MRIII, consisting of three overlapping bacterial artificial chromosome (BAC) clones, 297N7, 963H4 and 795F17, from the RPCI-11 library. The 270 kb genomic sequence of MRIII was analysed using the computational gene prediction methods NIX and TAP to identify putative BOA genes. A transcript map of MRIII has been generated and contains 25 candidate BOA genes, four of which are the named genes MYH3, SCO1, x006 and MAGOH-LIKE. The other candidates consist of seven genes predicted by TAP with associated ESTs identified by NIX, two genes predicted by TAP alone and 12 genes/ESTs (or pairs of ESTs) identified by NIX alone. No disease-specific mutations were identified in x006 or MAGOH-LIKE, although expression analysis of these genes suggests that they may show alternative splicing or be altered epigenetically or in regulatory regions in oesophageal cancer.
Collapse
Affiliation(s)
- Julie R Dunn
- Molecular Genetics and Oncology Group, Clinical Dental Sciences, The University of Liverpool, Liverpool L69 3BX, UK
| | | | | | | | | | | | | | | |
Collapse
|
107
|
Balasenthil S, Vadlamudi RK. Functional interactions between the estrogen receptor coactivator PELP1/MNAR and retinoblastoma protein. J Biol Chem 2003; 278:22119-27. [PMID: 12682072 PMCID: PMC1262660 DOI: 10.1074/jbc.m212822200] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
PELP1 (proline-, glutamic acid-, and leucine-rich protein-1 (also referred to as MNAR, or modulator of nongenomic activity of estrogen receptor)), a recently identified novel coactivator of estrogen receptors, is widely expressed in a variety of 17 beta-estradiol (E2)-responsive reproductive tissues and is developmentally regulated in mammary glands. pRb (retinoblastoma protein), a cell cycle switch protein, plays a fundamental role in the proliferation, development, and differentiation of eukaryotic cells. To study the putative function of PELP1, we established stable MCF-7 breast cancer cell lines overexpressing PELP1. PELP1 overexpression hypersensitized breast cancer cells to E2 signaling, enhanced progression of breast cancer cells to S phase, and led to persistent hyperphosphorylation of pRb in an E2-dependent manner. Using phosphorylation site-specific pRb antibodies, we identified Ser-807/Ser-811 of pRb as a potential target site of PELP1. Interestingly, PELP1 was discovered to be physiologically associated with pRb and interacted via its C-terminal pocket domain, and PELP1/pRb interaction could be modulated by antiestrogen agents. Using mutant pRb cells, we demonstrated an essential role for PELP1/pRb interactions in the maximal coactivation functions of PELP1 using cyclin D1 as one of the targets. Taken together, these findings suggest that PELP1, a steroid coactivator, plays a permissive role in E2-mediated cell cycle progression, presumably via its regulatory interaction with the pRb pathway.
Collapse
Affiliation(s)
| | - Ratna K. Vadlamudi
- ‡ To whom correspondence should be addressed: Dept. of Molecular and Cellular Oncology, Unit 108, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030. Tel.: 713-745-5239; Fax: 713-745-2050; E-mail:
| |
Collapse
|
108
|
Carninci P, Waki K, Shiraki T, Konno H, Shibata K, Itoh M, Aizawa K, Arakawa T, Ishii Y, Sasaki D, Bono H, Kondo S, Sugahara Y, Saito R, Osato N, Fukuda S, Sato K, Watahiki A, Hirozane-Kishikawa T, Nakamura M, Shibata Y, Yasunishi A, Kikuchi N, Yoshiki A, Kusakabe M, Gustincich S, Beisel K, Pavan W, Aidinis V, Nakagawara A, Held WA, Iwata H, Kono T, Nakauchi H, Lyons P, Wells C, Hume DA, Fagiolini M, Hensch TK, Brinkmeier M, Camper S, Hirota J, Mombaerts P, Muramatsu M, Okazaki Y, Kawai J, Hayashizaki Y. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res 2003; 13:1273-89. [PMID: 12819125 PMCID: PMC403712 DOI: 10.1101/gr.1119703] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
Collapse
Affiliation(s)
- Piero Carninci
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
109
|
Ye Z, Parry JM. Identification of polymorphisms in the human Reprimo gene using public EST data. TERATOGENESIS, CARCINOGENESIS, AND MUTAGENESIS 2003; 22:485-93. [PMID: 12395409 DOI: 10.1002/tcm.10044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The human Reprimo gene is a recently identified cytoplasmic protein, which plays an important role in the regulation of p53-dependent G2 arrest of the cell cycle. Genetic variations in the Reprimo gene that may influence enzyme activity can be of both biological and epidemiological significance. The human expressed sequence tag (EST) database is a wealth of resources, which can be used to rapidly screen for potential polymorphisms in proteins of physiological interest. On the basis of the alignment of human EST sequences, we identified two candidate polymorphisms at nucleotides 824 and 839 in the 3'-untranslated region of the Reprimo gene. The presence of these polymorphisms was confirmed in a Caucasian population (n=82) by the use of the allele specific polymerase chain reaction (PCR). The rare allele frequency at position 824 (38.4%) is much higher than rare allele frequency at position 839 (3.7%). Our results suggest that the human EST data may serve as a valuable source for the rapid identification of genetic variation.
Collapse
Affiliation(s)
- Zheng Ye
- Center for Molecular Genetics and Toxicology, School of Biological Sciences, University of Wales Swansea, Singleton Park, Swansea, United Kingdom.
| | | |
Collapse
|
110
|
Zhang H, Marshall KW, Tang H, Hwang DM, Lee M, Liew CC. Profiling genes expressed in human fetal cartilage using 13,155 expressed sequence tags. Osteoarthritis Cartilage 2003; 11:309-19. [PMID: 12744936 DOI: 10.1016/s1063-4584(03)00032-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To analyze the gene expression profile of human fetal cartilage by expressed sequence tags (ESTs). METHODS A human fetal cartilage (8-12 weeks) cDNA library was constructed using the lambda ZAP Express vector. ESTs were obtained by partial sequencing of cDNA clones. The basic local alignment search tool algorithm was used to compare all generated ESTs to known sequences. RESULTS A total of 13,155 ESTs were analyzed, of which 8696 ESTs (66.1%) matched known genes, 53 ESTs (0.4%) were putatively novel (with no match) and the rest matched other ESTs, genomic DNA and repetitive sequences. Importantly, we identified 2448 unique known genes through non-redundancy analysis of the known gene matches, which were then functionally categorized. The tissue specificity of this library was reflected by its EST profile of the extracellular matrix (ECM) proteins. Collagens were the major transcripts, representing 68.5% of the ECM proteins. Proteoglycans were the second most abundant, constituting 9.5%. Collagen type II was the most abundant gene of all. Glypican 3, decorin and aggrecan were the major transcripts of proteoglycans. Many genes involved in cartilage development were identified, such as insulin-like growth factor-II, its receptor and binding proteins, connective tissue growth factor and fibroblast growth factors. Proteases and their regulatory factors were also identified, including matrix metalloprotease 2 and tissue inhibitor of metalloproteinase 1. CONCLUSIONS The EST approach is an effective way of characterizing the genes expressed in cartilage. These data represent the most extensive molecular information on human fetal cartilage to date. The availability of this information will serve as a basis for further research to identify genes that are essential in cartilage development.
Collapse
Affiliation(s)
- H Zhang
- ChondroGene Inc., 800 Petrolia Road, Unit 15, Toronto, Ontario, Canada M3J 3K4
| | | | | | | | | | | |
Collapse
|
111
|
Li L, Brunk BP, Kissinger JC, Pape D, Tang K, Cole RH, Martin J, Wylie T, Dante M, Fogarty SJ, Howe DK, Liberator P, Diaz C, Anderson J, White M, Jerome ME, Johnson EA, Radke JA, Stoeckert CJ, Waterston RH, Clifton SW, Roos DS, Sibley LD. Gene discovery in the apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res 2003; 13:443-54. [PMID: 12618375 PMCID: PMC430278 DOI: 10.1101/gr.693203] [Citation(s) in RCA: 121] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, approximately 15%-20% represent putative homologs with a conservative cutoff of p < 10(-9), thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets.
Collapse
Affiliation(s)
- Li Li
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
112
|
Venter JC, Levy S, Stockwell T, Remington K, Halpern A. Massive parallelism, randomness and genomic advances. Nat Genet 2003; 33 Suppl:219-27. [PMID: 12610531 DOI: 10.1038/ng1114] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In reviewing the past decade, it is clear that genomics was, and still is, driven by innovative technologies, perhaps more so than any other scientific area in recent memory. From the outset, computing, mathematics and new automated laboratory techniques have been key components in allowing the field to move forward rapidly. We highlight some key innovations that have come together to nurture the explosive growth that makes a new era of genomics a reality. We also document how these new approaches have fueled further innovations and discoveries.
Collapse
Affiliation(s)
- J Craig Venter
- The Center for the Advancement of Genomics, 1901 Research Blvd., Rockville, Maryland 20850, USA.
| | | | | | | | | |
Collapse
|
113
|
Sorek R, Safer HM. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 2003; 31:1067-74. [PMID: 12560505 PMCID: PMC149192 DOI: 10.1093/nar/gkg170] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A key goal of the Human Genome Project was to understand the complete set of human proteins, the proteome. Since the genome sequence by itself is not sufficient for predicting new genes and alternative splicing events that lead to new proteins, expressed sequence tags (ESTs) are used as the primary tool for these purposes. The high prevalence of artifacts in dbEST, however, often leads to invalid predictions. Here we describe a novel method for recognizing genomic DNA contamination and other artifacts that cannot be identified using current EST cleaning techniques. Our method uses the alignment of the entire set of ESTs to the human genome to identify highly contaminated EST libraries. We discovered 53 highly contaminated libraries and a subset of 24 766 ESTs from these libraries that probably represent contamination with genomic DNA, pre-mRNA, and ESTs that span non-canonical introns. Although this is only a small fraction of the entire EST dataset, each contaminating sequence could create a spurious transcript prediction. Indeed, in the clustering and assembly tool that we used, these sequences would have caused incorrect inference of 9575 new splice variants and 6370 new genes. Conclusions based on EST analysis, including prediction of alternative splicing, should be re-evaluated in light of these results. Our method, along with the identified set of contaminated sequences, will be essential for applications that depend on large EST datasets.
Collapse
Affiliation(s)
- Rotem Sorek
- Compugen Ltd, 72 Pinchas Rosen Street, Tel Aviv 69512, Israel.
| | | |
Collapse
|
114
|
Clark T, Lee S, Ridgway Scott L, Wang SM. Computational Analysis of Gene Identification with SAGE. J Comput Biol 2003; 9:513-26. [PMID: 12162890 DOI: 10.1089/106652702760138600] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SAGE is one of the few techniques capable of uniformly probing gene expression at a genome level irrespective of mRNA abundance and without a priori knowledge of the transcripts present. However, individual SAGE tags can match many sequences in the reference database, complicating gene identification. We perform a baseline evaluation of gene identification with SAGE using UniGene Human as the reference database by analyzing 1) the distributions of tags for various length tag sets formed for UniGene Human and 2) the tag-to-sequence mapping using a SAGE tag set consisting of 37,522 tags derived from human myeloid cells. The extensive multiplicity of the dbEST component of UniGene significantly detracts from gains that might be expected by extending tags within the scope of the SAGE protocol. In order to achieve reasonable sequence specificity for gene identification with the content of the commonly used UniGene sequence collection, tags on the order of hundreds of bases in length are required. One way to produce tags of such lengths is with GLGI, which extends SAGE tags to the 3' end of cDNA. We show that the longer sequences produced by GLGI relieve significantly the multiple match condition. In the myeloid sample, we also found a correlation between multiple match severity and high copy number. We extrapolate these findings, providing insights into the use of UniGene Human as a reference for gene identification.
Collapse
Affiliation(s)
- Terry Clark
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA.
| | | | | | | |
Collapse
|
115
|
|
116
|
McCarter JP, Mitreva MD, Martin J, Dante M, Wylie T, Rao U, Pape D, Bowers Y, Theising B, Murphy CV, Kloek AP, Chiapelli BJ, Clifton SW, Bird DM, Waterston RH. Analysis and functional classification of transcripts from the nematode Meloidogyne incognita. Genome Biol 2003; 4:R26. [PMID: 12702207 PMCID: PMC154577 DOI: 10.1186/gb-2003-4-4-r26] [Citation(s) in RCA: 106] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2002] [Revised: 02/17/2003] [Accepted: 02/28/2003] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Plant parasitic nematodes are major pathogens of most crops. Molecular characterization of these species as well as the development of new techniques for control can benefit from genomic approaches. As an entrée to characterizing plant parasitic nematode genomes, we analyzed 5,700 expressed sequence tags (ESTs) from second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita. RESULTS From these, 1,625 EST clusters were formed and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. L2 larvae, which represent the infective stage of the life cycle before plant invasion, express a diverse array of ligand-binding proteins and abundant cytoskeletal proteins. L2 are structurally similar to Caenorhabditis elegans dauer larva and the presence of transcripts encoding glyoxylate pathway enzymes in the M. incognita clusters suggests that root-knot nematode larvae metabolize lipid stores while in search of a host. Homology to other species was observed in 79% of translated cluster sequences, with the C. elegans genome providing more information than any other source. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes including homologs of nodL acetyltransferase and novel cellulases. CONCLUSIONS With sequencing from plant parasitic nematodes accelerating, the approaches to transcript characterization described here can be applied to more extensive datasets and also provide a foundation for more complex genome analyses.
Collapse
Affiliation(s)
- James P McCarter
- Genome Sequencing Center, Department of Genetics, Box 8501, Washington University School of Medicine, St, Louis, MO 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
117
|
Osada N, Hida M, Kusuda J, Tanuma R, Hirata M, Suto Y, Hirai M, Terao K, Sugano S, Hashimoto K. Cynomolgus monkey testicular cDNAs for discovery of novel human genes in the human genome sequence. BMC Genomics 2002; 3:36. [PMID: 12498619 PMCID: PMC140308 DOI: 10.1186/1471-2164-3-36] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2002] [Accepted: 12/23/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In order to contribute to the establishment of a complete map of transcribed regions of the human genome, we constructed a testicular cDNA library for the cynomolgus monkey, and attempted to find novel transcripts for identification of their human homologues. RESULT The full-insert sequences of 512 cDNA clones were determined. Ultimately we found 302 non-redundant cDNAs carrying open reading frames of 300 bp-length or longer. Among them, 89 cDNAs were found not to be annotated previously in the Ensembl human database. After searching against the Ensembl mouse database, we also found 69 putative coding sequences have no homologous cDNAs in the annotated human and mouse genome sequences in Ensembl. We subsequently designed a DNA microarray including 396 non-redundant cDNAs (with and without open reading frames) to examine the expression of the full-sequenced genes. With the testicular probe and a mixture of probes of 10 other tissues, 316 of 332 effective spots showed intense hybridized signals and 75 cDNAs were shown to be expressed very highly in the cynomolgus monkey testis, but not ubiquitously. CONCLUSIONS In this report, we determined 302 full-insert sequences of cynomolgus monkey cDNAs with enough length of open reading frames to discover novel transcripts as human homologues. Among 302 cDNA sequences, human homologues of 89 cDNAs have not been predicted in the annotated human genome sequence in the Ensembl. Additionally, we identified 75 dominantly expressed genes in testis among the full-sequenced clones by using a DNA microarray. Our cDNA clones and analytical results will be valuable resources for future functional genomic studies.
Collapse
Affiliation(s)
- Naoki Osada
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan
- Laboratory of human evolution, Depertment of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8562, Japan
| | - Munetomo Hida
- Department of Genome Structure Analysis, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Jun Kusuda
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan
| | - Reiko Tanuma
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan
| | - Makoto Hirata
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan
| | - Yumiko Suto
- Laboratory of human evolution, Depertment of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8562, Japan
| | - Momoki Hirai
- Laboratory of human evolution, Depertment of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8562, Japan
| | - Keiji Terao
- Tsukuba Primate Center For Medical Science, National Institute of Infectious Diseases, Hachimandai-1, Tsukuba-shi, Ibaraki 305-0843, Japan
| | - Sumio Sugano
- Department of Genome Structure Analysis, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Katsuyuki Hashimoto
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan
| |
Collapse
|
118
|
Herwig R, Schulz B, Weisshaar B, Hennig S, Steinfath M, Drungowski M, Stahl D, Wruck W, Menze A, O'Brien J, Lehrach H, Radelof U. Construction of a 'unigene' cDNA clone set by oligonucleotide fingerprinting allows access to 25 000 potential sugar beet genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2002; 32:845-57. [PMID: 12472698 DOI: 10.1046/j.1365-313x.2002.01457.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Access to the complete gene inventory of an organism is crucial to understanding physiological processes like development, differentiation, pathogenesis, or adaptation to the environment. Transcripts from many active genes are present at low copy numbers. Therefore, procedures that rely on random EST sequencing or on normalisation and subtraction methods have to produce massively redundant data to get access to low-abundance genes. Here, we present an improved oligonucleotide fingerprinting (ofp) approach to the genome of sugar beet (Beta vulgaris), a plant for which practically no molecular information has been available. To identify distinct genes and to provide a representative 'unigene' cDNA set for sugar beet, 159 936 cDNA clones were processed utilizing large-scale, high-throughput data generation and analysis methods. Data analysis yielded 30 444 ofp clusters reflecting the number of different genes in the original cDNA sample. A sample of 10 961 cDNA clones, each representing a different cluster, were selected for sequencing. Standard sequence analysis confirmed that 89% of these EST sequences did represent different genes. These results indicate that the full set of 30 444 ofp clusters represent up to 25 000 genes. We conclude that the ofp analysis pipeline is an accurate and effective way to construct large representative 'unigene' sets for any plant of interest with no requirement for prior molecular sequence data.
Collapse
Affiliation(s)
- Ralf Herwig
- Max-Planck Institute for Molecular Genetics, Ihnestr. 73, D-14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
119
|
Karsten SL, Geshwind DH. Gene Expression Analysis Using
c
DNA
Microarrays. ACTA ACUST UNITED AC 2002; Chapter 4:Unit 4.28. [DOI: 10.1002/0471142301.ns0428s20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
120
|
Yao J, Coussens PM, Saama P, Suchyta S, Ernst CW. Generation of expressed sequence tags from a normalized porcine skeletal muscle cDNA library. Anim Biotechnol 2002; 13:211-22. [PMID: 12517075 DOI: 10.1081/abio-120016190] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Recent developments in microarray technologies permit scientists to analyze expression of thousands of genes simultaneously in diverse biological systems. In an effort to provide integrated resources for application of microarray technologies to studies of skeletal muscle growth and development in swine, we have constructed a normalized cDNA library from porcine skeletal muscle. The effectiveness of normalization was evaluated by DNA sequencing of clones randomly picked from the library before and after normalization, and also by Southern blot hybridization using probes representing abundant transcripts. Our data suggests that the normalization procedure successfully reduced the highly abundant cDNA species in the normalized library. To date, a total of 782 EST (expressed sequence tag) sequences have been generated from this normalized library (687 ESTs) and the original library (95 ESTs). The sequence information of these ESTs plus their BLAST results has been made available through a web accessible database (http://nbfgc.msu.edu). Cluster analysis of the data indicates that a total of 742 unique sequences are present in this collection. BLASTN search of the 742 EST sequences against the public database (dbEST) revealed that 139 had no significant matches (E-value > 10(-15)) to porcine ESTs already entered in the database, suggesting the possibility of their specific expression in porcine skeletal muscle. Generation of non-redundant ESTs from this library will allow us to construct cDNA microarrays for identification of gene expression changes that regulate muscle growth and affect meat quality in swine.
Collapse
Affiliation(s)
- Jianbo Yao
- Department of Animal Science and Center for Animal Functional Genomics, Michigan State University, East Lansing, MI 48824, USA.
| | | | | | | | | |
Collapse
|
121
|
Møller SG, Chua NH. Chemical regulated production of cDNAs from genomic DNA fragments in plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2002; 32:615-22. [PMID: 12445131 DOI: 10.1046/j.1365-313x.2002.01436.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
We have developed a new chemical inducible genetic system that allows for the isolation of any cDNA molecule from in vitro generated genomic transgenes in transgenic plants. This system, termed regulated in vivo cDNA generation (RIDE), permits both targeted isolation of individual full-length cDNA molecules and random isolation of any partial or full-length cDNA from in planta genomic libraries. The RIDE system makes use of the 17-beta estradiol-inducible promoter system linked to intron donor and acceptor sites in a new binary vector configuration. In transgenic Arabidopsis and tobacco plants, we show that the RIDE system can isolate low-abundance full-length cDNAs previously unattainable by conventional means at high efficiencies (75-85%). The ability to randomly isolate individual exons and exons spliced together from genomic libraries in planta suggest that this system can be used for the isolation of any cDNA molecules. The RIDE system thus appears to be an efficient and versatile system for the generation of potentially any cDNA molecule. Moreover, the ORF structural data generated will be of value in both verifying and correcting computational ORF predications in the databases available to the scientific community.
Collapse
MESH Headings
- Arabidopsis/genetics
- Base Sequence
- Chromosomes, Artificial, Bacterial/genetics
- Cloning, Molecular/methods
- DNA, Complementary/biosynthesis
- DNA, Complementary/genetics
- Estradiol/pharmacology
- Exons/genetics
- Genes, Plant/genetics
- Genome, Plant
- Genomic Library
- Molecular Sequence Data
- Open Reading Frames/genetics
- Plants, Genetically Modified
- Promoter Regions, Genetic/genetics
- RNA, Plant/genetics
- Nicotiana/genetics
Collapse
Affiliation(s)
- Simon Geir Møller
- Laboratory of Plant Molecular Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021-3699, USA
| | | |
Collapse
|
122
|
Thompson HGR, Harris JW, Wold BJ, Quake SR, Brody JP. Identification and confirmation of a module of coexpressed genes. Genome Res 2002; 12:1517-22. [PMID: 12368243 PMCID: PMC187523 DOI: 10.1101/gr.418402] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2002] [Accepted: 07/31/2002] [Indexed: 11/25/2022]
Abstract
We synthesize a large gene expression data set using dbEST and UniGene. We use guilt-by-association (GBA) to analyze this data set and identify coexpressed genes. One module, or group of genes, was found to be coexpressed mainly in tissue extracted from breast and ovarian cancers, but also found in tissue from lung cancers, brain cancers, and bone marrow. This module contains at least six members that are believed to be involved in either transcritional regulation (PDEF, H2AFO, NUCKS) or the ubiquitin proteasome pathway (PSMD7, SQSTM1, FLJ10111). We confirm these observations of coexpression by real-time RT-PCR analysis of mRNA extracted from four model breast epithelial cell lines.
Collapse
Affiliation(s)
- H Garrett R Thompson
- Department of Biomedical Engineering, University of California Irvine, Irvine, California 92697, USA
| | | | | | | | | |
Collapse
|
123
|
Chen J, Sun M, Lee S, Zhou G, Rowley JD, Wang SM. Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc Natl Acad Sci U S A 2002; 99:12257-62. [PMID: 12213963 PMCID: PMC129432 DOI: 10.1073/pnas.192436499] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2002] [Indexed: 11/18/2022] Open
Abstract
The number of genes in the human genome is still a controversial issue. Whereas most of the genes in the human genome are said to have been physically or computationally identified, many short cDNA sequences identified as tags by use of serial analysis of gene expression (SAGE) do not match these genes. By performing experimental verification of more than 1,000 SAGE tags and analyzing 4,285,923 SAGE tags of human origin in the current SAGE database, we examined the nature of the unmatched SAGE tags. Our study shows that most of the unmatched SAGE tags are truly novel SAGE tags that originated from novel transcripts not yet identified in the human genome, including alternatively spliced transcripts from known genes and potential novel genes. Our study indicates that by using novel SAGE tags as probes, we should be able to identify efficiently many novel transcripts/novel genes in the human genome that are difficult to identify by conventional methods.
Collapse
Affiliation(s)
- Jianjun Chen
- Department of Medicine, University of Chicago, 5841 South Maryland, MC2115, Chicago, IL 60637, USA
| | | | | | | | | | | |
Collapse
|
124
|
Diehl F, Beckmann B, Kellner N, Hauser NC, Diehl S, Hoheisel JD. Manufacturing DNA microarrays from unpurified PCR products. Nucleic Acids Res 2002; 30:e79. [PMID: 12177307 PMCID: PMC134252 DOI: 10.1093/nar/gnf078] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For the production of DNA microarrays from PCR products, purification of the the DNA fragments prior to spotting is a major expense in cost and time. Also, a considerable amount of material is lost during this process and contamination might occur. Here, a protocol is presented that permits the manufacture of microarrays from unpurified PCR products on aminated surfaces such as glass slides coated with the widely used poly(L-lysine) or aminosilane. The presence of primer molecules in the PCR sample does not increase the non-specific signal upon hybridisation. Overall, signal intensity on arrays made of unpurified PCR products is 94% of the intensity obtained with the respective purified molecules. This slight loss in signal, however, is offset by a reduced variation in the amount of DNA present at the individual spot positions across an array, apart from the considerable savings in time and cost. In addition, a larger number of arrays can be made from one batch of amplification products.
Collapse
Affiliation(s)
- Frank Diehl
- Functional Genome Analysis, Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 506, 69120 Heidelberg, Germany
| | | | | | | | | | | |
Collapse
|
125
|
Gaines PJ, Brandt KS, Eisele AM, Wagner WP, Bozic CM, Wisnewski N. Analysis of expressed sequence tags from subtracted and unsubtracted Ctenocephalides felis hindgut and Malpighian tubule cDNA libraries. INSECT MOLECULAR BIOLOGY 2002; 11:299-306. [PMID: 12144694 DOI: 10.1046/j.1365-2583.2002.00337.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Insect hindgut and Malpighian tubule (HMT) tissues regulate the contents of the haemolymph through the excretion of waste products and the specific reabsorption of nutrients. As such, they perform a role that is essential for survival and may contain molecular targets for insect control strategies. In order to discover genes expressed in the HMT tissues of the cat flea, Ctenocephalides felis, expressed sequence tags (ESTs) were generated from an unsubtracted HMT cDNA library and from a subtracted HMT cDNA library that had been enriched for HMT-specific cDNAs. A total of 4844 ESTs were analysed from both libraries: 3657 from the subtracted library and 1187 from the unsubtracted library. Of the 1418 distinct ESTs identified from both libraries, 953 had significant similarity to other sequences reported in the GenBank database. A comparison of the results from the two libraries confirmed that the percentages of genes likely to be involved with metabolism, cell structure, and digestion were reduced by the subtraction procedure, whereas genes likely to be involved with ion transport were enriched. Analysis of the prevalence of three individual cDNAs in each library revealed that the actin cDNA was reduced in the subtracted library whereas the cDNAs encoding allantoinase and a peritrophin-like protein were greatly enriched in the subtracted library. Northern blot analysis demonstrated that the actin cDNA was expressed in both the HMT and carcass tissues, whereas the allantoinase and peritrophin-like cDNAs were detected exclusively in the HMT tissues. In total, 97 distinct ESTs that appear to encode proteins involved with ion transport were analysed. Some of these proteins may be directly involved with diuresis or the specific reabsorption of salts and nutrients, and thus may be potential molecular targets for flea control strategies.
Collapse
Affiliation(s)
- P J Gaines
- Heska Corporation, 1613 Prospect Parkway, Fort Collins, CO 80525, USA.
| | | | | | | | | | | |
Collapse
|
126
|
Stapleton M, Liao G, Brokstein P, Hong L, Carninci P, Shiraki T, Hayashizaki Y, Champe M, Pacleb J, Wan K, Yu C, Carlson J, George R, Celniker S, Rubin GM. The Drosophila gene collection: identification of putative full-length cDNAs for 70% of D. melanogaster genes. Genome Res 2002; 12:1294-300. [PMID: 12176937 PMCID: PMC186637 DOI: 10.1101/gr.269102] [Citation(s) in RCA: 170] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5' expressed sequence tags (5' expressed sequence tags [ESTs]from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to ~40% of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remaining genes, we have generated an additional 157,835 5' ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22-h embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70% of the predicted genes in Drosophila.
Collapse
Affiliation(s)
- Mark Stapleton
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
127
|
Stanley JS, Mock DM, Griffin JB, Zempleni J. Biotin uptake into human peripheral blood mononuclear cells increases early in the cell cycle, increasing carboxylase activities. J Nutr 2002; 132:1854-9. [PMID: 12097659 PMCID: PMC1435359 DOI: 10.1093/jn/132.7.1854] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Cells respond to proliferation with increased accumulation of biotin, suggesting that proliferation enhances biotin demand. Here we determined whether peripheral blood mononuclear cells (PBMC) increase biotin uptake at specific phases of the cell cycle, and whether biotin is utilized to increase biotinylation of carboxylases. Biotin uptake was quantified in human PBMC that were arrested chemically at specific phases of the cell cycle, i.e., biotin uptake increased in the G1 phase of the cycle [658 +/- 574 amol biotin/(10(6) cells x 30 min)] and remained increased during phases S, G2, and M compared with quiescent controls [200 +/- 62 amol biotin/(10(6) cells x 30 min)]. The abundance of the sodium-dependent multivitamin transporter (SMVT, which transports biotin) was similar at all phases of the cell cycle, suggesting that transporters other than SMVT or splicing variants of SMVT may account for the increased biotin uptake observed in proliferating cells. Activities of biotin-dependent 3-methylcrotonyl-CoA carboxylase and propionyl-CoA carboxylase were up to two times greater in proliferating PBMC compared with controls. The abundance of mRNA encoding 3-methylcrotonyl-CoA carboxylase and propionyl-CoA carboxylase paralleled carboxylase activities, suggesting that PBMC respond to proliferation with increased expression of genes encoding carboxylases. Similarly, expression of the gene encoding holocarboxylase synthetase (which catalyzes binding of biotin to carboxylases) increased in response to proliferation, suggesting that cellular capacity to biotinylate carboxylases was increased. In summary, these findings suggest that PBMC respond to proliferation with increased biotin uptake early in the cell cycle, and that biotin is utilized to increase activities of two of the four biotin-requiring carboxylases.
Collapse
Affiliation(s)
| | - Donald M. Mock
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR
| | | | - Janos Zempleni
- Departments of Nutritional Science and Dietetics, and
- Biochemistry, University of Nebraska at Lincoln, Lincoln, NE
- To whom correspondence and reprint requests should be addressed. E-mail:
| |
Collapse
|
128
|
|
129
|
Kessler MM, Willins DA, Zeng Q, Del Mastro RG, Cook R, Doucette-Stamm L, Lee H, Caron A, McClanahan TK, Wang L, Greene J, Hare RS, Cottarel G, Shimer GH. The use of direct cDNA selection to rapidly and effectively identify genes in the fungus Aspergillus fumigatus. Fungal Genet Biol 2002; 36:59-70. [PMID: 12051895 DOI: 10.1016/s1087-1845(02)00002-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Aspergillus fumigatus is one of the causes of invasive lung disease in immunocompromised individuals. To rapidly identify genes in this fungus, including potential targets for chemotherapy, diagnostics, and vaccine development, we constructed cDNA libraries. We began with non-normalized libraries, then to improve this approach we constructed a normalized cDNA library using direct cDNA selection. Normalization resulted in a reduction of the frequency of clones with highly expressed genes and an enrichment of underrepresented cDNAs. Expressed sequence tags generated from both the original and the normalized libraries were compared with the genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Candida albicans, indicating that a large proportion of A. fumigatus genes do not have orthologs in these fungal species. This method allowed the expeditious identification of genes in a fungal pathogen. The same approach can be applied to other human or plant pathogens to rapidly identify genes without the need for genomic sequence information.
Collapse
|
130
|
Shevchenko Y, Bouffard GG, Butterfield YSN, Blakesley RW, Hartley JL, Young AC, Marra MA, Jones SJM, Touchman JW, Green ED. Systematic sequencing of cDNA clones using the transposon Tn5. Nucleic Acids Res 2002; 30:2469-77. [PMID: 12034835 PMCID: PMC117195 DOI: 10.1093/nar/30.11.2469] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In parallel with the production of genomic sequence data, attention is being focused on the generation of comprehensive cDNA-sequence resources. Such efforts are increasingly emphasizing the production of high-accuracy sequence corresponding to the entire insert of cDNA clones, especially those presumed to reflect the full-length mRNA. The complete sequencing of cDNA clones on a large scale presents unique challenges because of the generally small, yet heterogeneous, sizes of the cloned inserts. We have developed a strategy for high-throughput sequencing of cDNA clones using the transposon Tn5. This approach has been tailored for implementation within an existing large-scale 'shotgun-style' sequencing program, although it could be readily adapted for use in virtually any sequencing environment. In addition, we have developed a modified version of our strategy that can be applied to cDNA clones with large cloning vectors, thereby overcoming a potential limitation of transposon-based approaches. Here we describe the details of our cDNA-sequencing pipeline, including a summary of the experience in sequencing more than 4200 cDNA clones to produce more than 8 million base pairs of high-accuracy cDNA sequence. These data provide both convincing evidence that the insertion of Tn5 into cDNA clones is sufficiently random for its effective use in large-scale cDNA sequencing as well as interesting insight about the sequence context preferred for insertion by Tn5.
Collapse
Affiliation(s)
- Yuriy Shevchenko
- NIH Intramural Sequencing Center, National Institutes of Health, Gaithersburg, MD 20877, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
131
|
Nigumann P, Redik K, Mätlik K, Speek M. Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics 2002; 79:628-34. [PMID: 11991712 DOI: 10.1006/geno.2002.6758] [Citation(s) in RCA: 174] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Human L1 retrotransposon has two transcription-regulatory regions: an internal or sense promoter driving transcription of the full-length L1, and an antisense promoter (ASP) driving transcription in the opposite direction into adjacent cellular sequences yielding chimeric transcripts. Both promoters are located in the 5'-untranslated region (5'-UTR) of L1. Chimeric transcripts derived from the L1 ASP are highly represented in expressed-sequence tag (EST) databases. Using a bioinformatics approach, we have characterized 10 chimeric ESTs (cESTs) derived from the EST division of GenBank. These cESTs contained 3' regions similar or identical to known cellular mRNA sequences. They were accurately spliced and preferentially expressed in tumor cell lines. Analysis of the hundreds of cESTs suggests that the L1 ASP-driven transcription is a common phenomenon not only for tumor cells but also for normal ones and may involve transcriptional interference or epigenetic control of different cellular genes.
Collapse
Affiliation(s)
- Pilvi Nigumann
- Center for Gene Technology, Tallinn Technical University and National Institute of Chemical Physics and Biophysics, Tallinn EE12618, Estonia
| | | | | | | |
Collapse
|
132
|
Nam DK, Lee S, Zhou G, Cao X, Wang C, Clark T, Chen J, Rowley JD, Wang SM. Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc Natl Acad Sci U S A 2002; 99:6152-6. [PMID: 11972056 PMCID: PMC122918 DOI: 10.1073/pnas.092140899] [Citation(s) in RCA: 130] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2002] [Indexed: 11/18/2022] Open
Abstract
We have analyzed a systematic flaw in the current system of gene identification: the oligo(dT) primer widely used for cDNA synthesis generates a high frequency of truncated cDNAs through internal poly(A) priming. Such truncated cDNAs may contribute to 12% of the expressed sequence tags in the current dbEST database. By using a synthetic transcript and real mRNA templates as models, we characterized the patterns of internal poly(A) priming by oligo(dT) primer. We further demonstrated that the internal poly(A) priming can be effectively diminished by replacing the oligo(dT) primer with a set of anchored oligo(dT) primers for reverse transcription. Our study indicates that cDNAs designed for genomewide gene identification should be synthesized by use of the anchored oligo(dT) primers, rather than the oligo(dT) primers, to diminish the generation of truncated cDNAs caused by internal poly(A) priming.
Collapse
Affiliation(s)
- Douglas Kyung Nam
- Department of Medicine, Center for Functional Genomics, University of Chicago, 5841 South Maryland Avenue, MC2115, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
133
|
Shoemaker R, Keim P, Vodkin L, Retzel E, Clifton SW, Waterston R, Smoller D, Coryell V, Khanna A, Erpelding J, Gai X, Brendel V, Raph-Schmidt C, Shoop EG, Vielweber CJ, Schmatz M, Pape D, Bowers Y, Theising B, Martin J, Dante M, Wylie T, Granger C. A compilation of soybean ESTs: generation and analysis. Genome 2002; 45:329-38. [PMID: 11962630 DOI: 10.1139/g01-150] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16928 contigs and 17336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.
Collapse
Affiliation(s)
- Randy Shoemaker
- USDA-ARS, Corn Insect and Crop Genetics Research Unit, and Department of Agronomy, Iowa State University, Ames 50011, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
134
|
Eley GD, Reiter JL, Pandita A, Park S, Jenkins RB, Maihle NJ, James CD. A chromosomal region 7p11.2 transcript map: its development and application to the study of EGFR amplicons in glioblastoma. Neuro Oncol 2002; 4:86-94. [PMID: 11916499 PMCID: PMC1920657 DOI: 10.1093/neuonc/4.2.86] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2001] [Accepted: 01/02/2002] [Indexed: 11/12/2022] Open
Abstract
Cumulative information available about the organization of amplified chromosomal regions in human tumors suggests that the amplification repeat units, or amplicons, can be of a simple or complex nature. For the former, amplified regions generally retain their native chromosomal configuration and involve a single amplification target sequence. For complex amplicons, amplified DNAs usually undergo substantial reorganization relative to the normal chromosomal regions from which they evolve, and the regions subject to amplification may contain multiple target sequences. Previous efforts to characterize the 7p11.2 epidermal growth factor receptor ) amplicon in glioblastoma have relied primarily on the use of markers positioned by linkage analysis and/or radiation hybrid mapping, both of which are known to have the potential for being inaccurate when attempting to order loci over relatively short (<1 Mb) chromosomal regions. Due to the limited resolution of genetic maps that have been established through the use of these approaches, we have constructed a 2-Mb bacterial and P1-derived artificial chromosome (BAC-PAC) contig for the EGFR region and have applied markers positioned on its associated physical map to the analysis of 7p11.2 amplifications in a series of glioblastomas. Our data indicate that EGFR is the sole amplification target within the mapped region, although there are several additional 7p11.2 genes that can be coamplified and overexpressed with EGFR. Furthermore, these results are consistent with EGFR amplicons retaining the same organization as the native chromosome 7p11.2 region from which they are derived.
Collapse
Affiliation(s)
- Greg D Eley
- Department of Laboratory Medicine and Pathology and Tumor Biology Program, Mayo Clinic, Rochester, MN 55905, USA
| | | | | | | | | | | | | |
Collapse
|
135
|
Muggleton SH, Bryant CH, Srinivasan A, Whittaker A, Topp S, Rawlings C. Are grammatical representations useful for learning from biological sequence data?--a case study. J Comput Biol 2002; 8:493-521. [PMID: 11694180 DOI: 10.1089/106652701753216512] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.
Collapse
Affiliation(s)
- S H Muggleton
- Department of Computer Science, University of York, York YO10 5DD, United Kingdom
| | | | | | | | | | | |
Collapse
|
136
|
Warner EE, Dieckgraefe BK. Application of genome-wide gene expression profiling by high-density DNA arrays to the treatment and study of inflammatory bowel disease. Inflamm Bowel Dis 2002; 8:140-57. [PMID: 11854614 DOI: 10.1097/00054725-200203000-00012] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Identification of factors involved in the initiation, amplification, and perpetuation of the chronic immune response and the identification of markers for the characterization of patient subgroups remain critical objectives for ongoing research in inflammatory bowel disease (IBD). The Human Genome Project and the development of the expressed sequence tag (EST) clone collection and database have made possible a new revolution in gene expression analysis. Instead of measuring one or a few genes, parallel DNA microarrays are capable of simultaneously measuring expression of thousands of genes, providing a glimpse into the logic and functional grouping of gene programs encoded by our genome. Applied to clinical specimens from affected and normal individuals, this methodology has the potential to provide a new level of information about disease pathogenesis not previously possible. Two dominant platforms for the construction of high-density microarrays have emerged: cDNA arrays and GeneChips. The first involves robotic spotting of DNA molecules, often derived from EST clone collections, onto a suitable solid phase matrix such as a glass slide. The second involves direct in situ synthesis of sets of gene-specific oligonucleotides on a silicon wafer by an eloquent derivative of the photolithography process. Both cDNA and oligonucleotide arrays are interrogated by hybridization with a fluorescent-labeled cDNA or cRNA representation of the original tissue mRNA. This enables measurement of the expression levels for thousands of mucosal genes in a single experiment. These technologies have recently become less expensive and more widely accessible to all researchers. This review details the principles and methods behind DNA array technology, data analysis and mining, and potential application to research and treatment of IBD.
Collapse
Affiliation(s)
- Elaine E Warner
- Division of Gastroenterology, Washington University School of Medicine, 660 S. Euclid Ave., St. Louis, MO 63110, U.S.A
| | | |
Collapse
|
137
|
Scott HS, Chrast R. Global transcript expression profiling by Serial Analysis of Gene Expression (SAGE). GENETIC ENGINEERING 2002; 23:201-19. [PMID: 11570104 DOI: 10.1007/0-306-47572-3_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Affiliation(s)
- H S Scott
- Genetics and Bioinformatics Division, Walter and Eliza Hall Institute, Royal Parade, Parkville, P.O. Royal Melbourne Hospital, Victoria 3050, Australia.
| | | |
Collapse
|
138
|
Iribar MP, Cruz AK. Base compositional bias in trans-spliced sequences of unknown function in Leishmania major. Exp Parasitol 2002; 100:1-5. [PMID: 11971647 DOI: 10.1006/expr.2001.4671] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- M Pilar Iribar
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes, 3900, 14040-904 Ribeirão Preto, SP, Brazil
| | | |
Collapse
|
139
|
Osada N, Hida M, Kusuda J, Tanuma R, Hirata M, Hirai M, Terao K, Suzuki Y, Sugano S, Hashimoto K. Prediction of unidentified human genes on the basis of sequence similarity to novel cDNAs from cynomolgus monkey brain. Genome Biol 2002; 3:RESEARCH0006. [PMID: 11806829 PMCID: PMC150453 DOI: 10.1186/gb-2001-3-1-research0006] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2001] [Revised: 10/22/2001] [Accepted: 11/07/2001] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The complete assignment of the protein-coding regions of the human genome is a major challenge for genome biology today. We have already isolated many hitherto unknown full-length cDNAs as orthologs of unidentified human genes from cDNA libraries of the cynomolgus monkey (Macaca fascicularis) brain (parietal lobe and cerebellum). In this study, we used cDNA libraries of three other parts of the brain (frontal lobe, temporal lobe and medulla oblongata) to isolate novel full-length cDNAs. RESULTS The entire sequences of novel cDNAs of the cynomolgus monkey were determined, and the orthologous human cDNA sequences were predicted from the human genome sequence. We predicted 29 novel human genes with putative coding regions sharing an open reading frame with the cynomolgus monkey, and we confirmed the expression of 21 pairs of genes by the reverse transcription-coupled polymerase chain reaction method. The hypothetical proteins were also functionally annotated by computer analysis. CONCLUSIONS The 29 new genes had not been discovered in recent explorations for novel genes in humans, and the ab initio method failed to predict all exons. Thus, monkey cDNA is a valuable resource for the preparation of a complete human gene catalog, which will facilitate post-genomic studies.
Collapse
Affiliation(s)
- Naoki Osada
- Division of Genetic Resources, National Institute of Infectious Diseases, 1-23-1 Toyama-cho, Shinjuku-ku, 162-8640, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
140
|
Jia L, Young MF, Powell J, Yang L, Ho NC, Hotchkiss R, Robey PG, Francomano CA. Gene expression profile of human bone marrow stromal cells: high-throughput expressed sequence tag sequencing analysis. Genomics 2002; 79:7-17. [PMID: 11827452 DOI: 10.1006/geno.2001.6683] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.
Collapse
Affiliation(s)
- Libin Jia
- Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | |
Collapse
|
141
|
Skrabanek L, Campagne F. TissueInfo: high-throughput identification of tissue expression profiles and specificity. Nucleic Acids Res 2001; 29:E102-2. [PMID: 11691939 PMCID: PMC60201 DOI: 10.1093/nar/29.21.e102] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2001] [Revised: 08/15/2001] [Accepted: 09/03/2001] [Indexed: 11/14/2022] Open
Abstract
We describe TissueInfo, a knowledge-based method for the high-throughput identification of tissue expression profiles and tissue specificity. TissueInfo defines a set of tissue information calculations that can be computed for large numbers of genes, expressed sequence tags (ESTs) or proteins. Tissue information records that result from the TissueInfo calculations are used to generate tables suitable for data mining and for the selection of genes according to a given expression profile or specificity. When benchmarked against a test set of 116 proteins and literature information, TissueInfo was found to be accurate for 69% of identified tissue specificities and for 80% of expression profiles. The accuracy of the identifications can be increased if query sequences for which little information is available from dbEST are ignored. Thus, with 80% coverage, TissueInfo achieves an accuracy of 76% for specificity and 89% for expression. For the same set of proteins, the curated tissue specificity offered in SWISS-PROT was accurate in 78% of cases. TissueInfo can be useful for the selection of clones for custom microarrays, selection of training sets for ab initio identification of tissue information, gene discovery and genome-wide predictions. Further information about the program can be found at http://icb.mssm.edu/tissueinfo.
Collapse
Affiliation(s)
- L Skrabanek
- Institute for Computational Biomedicine and Department of Physiology and Biophysics, Mount Sinai School of Medicine, Box 1218, 1 Gustave L. Levy Place, New York, NY 10029, USA
| | | |
Collapse
|
142
|
Ponger L, Duret L, Mouchiroud D. Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res 2001; 11:1854-60. [PMID: 11691850 PMCID: PMC311164 DOI: 10.1101/gr.174501] [Citation(s) in RCA: 87] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
In an attempt to understand the origin of CpG islands (CGIs) in mammalian genomes, we have studied their location and structure according to the expression pattern of genes and to the G + C content of isochores in which they are embedded. We show that CGIs located over the transcription start site (named start CGIs) are very different structurally from the others (named no-start CGIs): (1) 61.6% of the no-start CGIs are due to repeated sequences (79 % are due to Alus), whereas only 5.6% of the start CGIs are due to such repeats; (2) start CGIs are longer and display a higher CpGo/e ratio and G + C level than no-start CGIs. The frequency of tissue-specific genes associated to a start CGI varies according to the genomic G + C content, from 25% in G + C-poor isochores to 64% in G + C-rich isochores. Conversely, the frequency of housekeeping genes associated to a start CGI (90%) is independent of the isochore context. Interestingly, the structure of start CGIs is very similar for tissue-specific and housekeeping genes. Moreover, 93% of genes expressed in early embryo are found to exhibit a CpG island over their transcription start point. These observations are consistent with the hypothesis that the occurrence of these CGIs is the consequence of gene expression at this stage, when the methylation pattern is installed.
Collapse
Affiliation(s)
- L Ponger
- Laboratoire de Biométrie et Biologie Evolutive, Unité Nixte de Recherche Centre National de la Recherche Scientifique 5558-Université Claude Bernard, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
143
|
Rauyaree P, Choi W, Fang E, Blackmon B, Dean RA. Genes expressed during early stages of rice infection with the rice blast fungus Magnaporthe grisea. MOLECULAR PLANT PATHOLOGY 2001; 2:347-54. [PMID: 20573024 DOI: 10.1046/j.1464-6722.2001.00085.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
summary A system-wide approach was adopted to further elucidate mechanisms regulating disease outcome between rice and the fungal pathogen Magnaporthe grisea. First, a cDNA library was constructed from M. grisea infected rice at 48 h post-inoculation. The 5' end-sequencing of 619 randomly selected clones revealed 359 expressed sequence tags (ESTs) that had not previously been described. A total of 124 from 260 ESTs with high and moderate similarity scores, based on BlastX, were organized into categories according to their putative function. The largest category of sequences (21%) contained stress or defence response genes. Eleven per cent of identified ESTs were redundant. In a second approach, differential hybridization analysis of the cDNA library using high-density filters resulted in the identification of novel genes and previously characterized M. grisea genes, including several that had previously been implicated in the infection process. A survey of up-regulated cDNA clones revealed clone 29003, which corresponded to the rice peroxidase POX22.3. This gene is known to be expressed in rice upon infection with Xanthomonas oryzae pv. oryzae, the bacterial blight pathogen. Importantly, this approach demonstrates the utility of gene discovery, through ESTs, for revealing novel genes in addition to those previously characterized as being potentially implicated in host-pathogen interactions.
Collapse
Affiliation(s)
- P Rauyaree
- Department of Plant Pathology and Physiology, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
144
|
Ehrt S, Schnappinger D, Bekiranov S, Drenkow J, Shi S, Gingeras TR, Gaasterland T, Schoolnik G, Nathan C. Reprogramming of the macrophage transcriptome in response to interferon-gamma and Mycobacterium tuberculosis: signaling roles of nitric oxide synthase-2 and phagocyte oxidase. J Exp Med 2001; 194:1123-40. [PMID: 11602641 PMCID: PMC2193509 DOI: 10.1084/jem.194.8.1123] [Citation(s) in RCA: 362] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2001] [Accepted: 09/14/2001] [Indexed: 01/18/2023] Open
Abstract
Macrophage activation determines the outcome of infection by Mycobacterium tuberculosis (Mtb). Interferon-gamma (IFN-gamma) activates macrophages by driving Janus tyrosine kinase (JAK)/signal transducer and activator of transcription-dependent induction of transcription and PKR-dependent suppression of translation. Microarray-based experiments reported here enlarge this picture. Exposure to IFN-gamma and/or Mtb led to altered expression of 25% of the monitored genome in macrophages. The number of genes suppressed by IFN-gamma exceeded the number of genes induced, and much of the suppression was transcriptional. Five times as many genes related to immunity and inflammation were induced than suppressed. Mtb mimicked or synergized with IFN-gamma more than antagonized its actions. Phagocytosis of nonviable Mtb or polystyrene beads affected many genes, but the transcriptional signature of macrophages infected with viable Mtb was distinct. Studies involving macrophages deficient in inducible nitric oxide synthase and/or phagocyte oxidase revealed that these two antimicrobial enzymes help orchestrate the profound transcriptional remodeling that underlies macrophage activation.
Collapse
Affiliation(s)
- Sabine Ehrt
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, the
| | - Dirk Schnappinger
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305
| | - Stefan Bekiranov
- Laboratory of Computational Genomics, The Rockefeller University
| | | | - Shuangping Shi
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, the
- Immunology Program, Weill Graduate School of Medical Sciences of Cornell University, New York, NY 10021
| | | | | | - Gary Schoolnik
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305
| | - Carl Nathan
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, the
- Immunology Program, Weill Graduate School of Medical Sciences of Cornell University, New York, NY 10021
| |
Collapse
|
145
|
Abstract
In studies of both short and relatively long human genomic DNA, we found a clustering of the consensus site for the transcription factor GCF at the 5' boundary of a subset of human genes. In studies of promoter regions with known transcription initiation site, the cluster of consensus GCF site appeared near the transcription initiation site and in some sequences it extended into the transcribed region defining the leader mRNA. We also found a detectable correlation between the 5' boundary of human genes and recognition motifs for other transcription factors that bind to GC-rich sequences. But in these cases, the correlation was not as general as the correlation observed for the consensus GCF site.
Collapse
Affiliation(s)
- M Bina
- Department of Chemistry, Purdue University, W. Lafayette, IN 47907-1393, USA.
| | | |
Collapse
|
146
|
Camargo AA, Samaia HP, Dias-Neto E, Simão DF, Migotto IA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, Espreafico EM, Habr-Gama A, Giannella-Neto D, Goldman GH, Gruber A, Hackel C, Kimura ET, Maciel RM, Marie SK, Martins EA, Nobrega MP, Paco-Larson ML, Pardini MI, Pereira GG, Pesquero JB, Rodrigues V, Rogatto SR, da Silva ID, Sogayar MC, Sonati MF, Tajara EH, Valentini SR, Alberto FL, Amaral ME, Aneas I, Arnaldi LA, de Assis AM, Bengtson MH, Bergamo NA, Bombonato V, de Camargo ME, Canevari RA, Carraro DM, Cerutti JM, Correa ML, Correa RF, Costa MC, Curcio C, Hokama PO, Ferreira AJ, Furuzawa GK, Gushiken T, Ho PL, Kimura E, Krieger JE, Leite LC, Majumder P, Marins M, Marques ER, Melo AS, Melo MB, Mestriner CA, Miracca EC, Miranda DC, Nascimento AL, Nobrega FG, Ojopi EP, Pandolfi JR, Pessoa LG, Prevedel AC, Rahal P, Rainho CA, Reis EM, Ribeiro ML, da Ros N, de Sa RG, Sales MM, Sant'anna SC, dos Santos ML, da Silva AM, da Silva NP, Silva WA, da Silveira RA, Sousa JF, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira ES, Nunes DN, Correa RG, Zalcberg H, Carvalho AF, Reis LF, Brentani RR, Simpson AJ, de Souza SJ, et alCamargo AA, Samaia HP, Dias-Neto E, Simão DF, Migotto IA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, Espreafico EM, Habr-Gama A, Giannella-Neto D, Goldman GH, Gruber A, Hackel C, Kimura ET, Maciel RM, Marie SK, Martins EA, Nobrega MP, Paco-Larson ML, Pardini MI, Pereira GG, Pesquero JB, Rodrigues V, Rogatto SR, da Silva ID, Sogayar MC, Sonati MF, Tajara EH, Valentini SR, Alberto FL, Amaral ME, Aneas I, Arnaldi LA, de Assis AM, Bengtson MH, Bergamo NA, Bombonato V, de Camargo ME, Canevari RA, Carraro DM, Cerutti JM, Correa ML, Correa RF, Costa MC, Curcio C, Hokama PO, Ferreira AJ, Furuzawa GK, Gushiken T, Ho PL, Kimura E, Krieger JE, Leite LC, Majumder P, Marins M, Marques ER, Melo AS, Melo MB, Mestriner CA, Miracca EC, Miranda DC, Nascimento AL, Nobrega FG, Ojopi EP, Pandolfi JR, Pessoa LG, Prevedel AC, Rahal P, Rainho CA, Reis EM, Ribeiro ML, da Ros N, de Sa RG, Sales MM, Sant'anna SC, dos Santos ML, da Silva AM, da Silva NP, Silva WA, da Silveira RA, Sousa JF, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira ES, Nunes DN, Correa RG, Zalcberg H, Carvalho AF, Reis LF, Brentani RR, Simpson AJ, de Souza SJ, Melo M. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome. Proc Natl Acad Sci U S A 2001; 98:12103-8. [PMID: 11593022 PMCID: PMC59775 DOI: 10.1073/pnas.201182798] [Show More Authors] [Citation(s) in RCA: 93] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Collapse
Affiliation(s)
- A A Camargo
- Ludwig Institute for Cancer Research, 01509-010, São Paulo, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
147
|
Abstract
The Cancer Genome Anatomy Project (CGAP) has built informational, technological, and physical resources to interface genomics with basic and clinical cancer research. The CGAP web site (http://cgap.nci.nih.gov) provides informatics tools for in silico analysis of the CGAP datasets as well as information for accessing each of the CGAP resources. Published in 2001 by John Wiley & Sons, Ltd.
Collapse
|
148
|
Clark MD, Hennig S, Herwig R, Clifton SW, Marra MA, Lehrach H, Johnson SL. An oligonucleotide fingerprint normalized and expressed sequence tag characterized zebrafish cDNA library. Genome Res 2001; 11:1594-602. [PMID: 11544204 PMCID: PMC311136 DOI: 10.1101/gr.186901] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The zebrafish is a powerful system for understanding the vertebrate genome, allowing the combination of genetic, molecular, and embryological analysis. Expressed sequence tags (ESTs) provide a rapid means of identifying an organism's genes for further analysis, but any EST project is limited by the availability of suitable libraries. Such cDNA libraries must be of high quality and provide a high rate of gene discovery. However, commonly used normalization and subtraction procedures tend to select for shorter, truncated, and internally primed inserts, seriously affecting library quality. An alternative procedure is to use oligonucleotide fingerprinting (OFP) to precluster clones before EST sequencing, thereby reducing the re-sequencing of common transcripts. Here, we describe the use of OFP to normalize and subtract 75,000 clones from two cDNA libraries, to a minimal set of 25,102 clones. We generated 25,788 ESTs (11,380 3' and 14,408 5') from over 16,000 of these clones. Clustering of 10,654 high-quality 3' ESTs from this set identified 7232 clusters (likely genes), corresponding to a 68% gene diversity rate, comparable to what has been reported for the best normalized human cDNA libraries, and indicating that the complete set of 25,102 clones contains as many as 17,000 genes. Yet, the library quality remains high. The complete set of 25,102 clones is available for researchers as glycerol stocks, filters sets, and as individual EST clones. These resources have been used for radiation hybrid, genetic, and physical mapping of the zebrafish genome, as well as positional cloning and candidate gene identification, molecular marker, and microarray development.
Collapse
Affiliation(s)
- M D Clark
- Max-Planck-Institut für Molekulare Genetik, 14195 Berlin, Germany.
| | | | | | | | | | | | | |
Collapse
|
149
|
|
150
|
Hraber PT, Weller JW. On the species of origin: diagnosing the source of symbiotic transcripts. Genome Biol 2001; 2:RESEARCH0037. [PMID: 11574056 PMCID: PMC56898 DOI: 10.1186/gb-2001-2-9-research0037] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2001] [Revised: 07/11/2001] [Accepted: 07/25/2001] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Most organisms have developed ways to recognize and interact with other species. Symbiotic interactions range from pathogenic to mutualistic. Some molecular mechanisms of interspecific interaction are well understood, but many remain to be discovered. Expressed sequence tags (ESTs) from cultures of interacting symbionts can help identify transcripts that regulate symbiosis, but present a unique challenge for functional analysis. Given a sequence expressed in an interaction between two symbionts, the challenge is to determine from which organism the transcript originated. For high-throughput sequencing from interaction cultures, a reliable computational approach is needed. Previous investigations into GC nucleotide content and comparative similarity searching provide provisional solutions, but a comparative lexical analysis, which uses a likelihood-ratio test of hexamer counts, is more powerful. RESULTS Validation with genes whose origin and function are known yielded 94% accuracy. Microbial (non-plant) transcripts comprised 75% of a Phytophthora sojae-infected soybean (Glycine max cv Harasoy) library, contrasted with 15% or less in root tissue libraries of Medicago truncatula from axenic, Phytophthora medicaginis-infected, mycorrhizal, and rhizobacterial treatments. Mycorrhizal libraries contained about 23% microbial transcripts; an axenic plant library contained a similar proportion of putative microbial transcripts. CONCLUSIONS Comparative lexical analysis offers numerous advantages over alternative approaches. Many of the transcripts isolated from mixed cultures were of unknown function, suggesting specificity to symbiotic metabolism and therefore candidates likely to be interesting for further functional investigation. Future investigations will determine whether the abundance of non-plant transcripts in a pure plant library indicates procedural artifacts, horizontally transferred genes, or other phenomena.
Collapse
Affiliation(s)
- P T Hraber
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA.
| | | |
Collapse
|