301
|
Abstract
Genetics has played only a modest role in drug discovery, but new technologies will radically change this. Whole genome sequencing will identify new drug discovery targets, and emerging methods for the determination of gene function will increase the ability to select robust targets. Detection of single nucleotide polymorphisms and common polymorphisms will enhance the investigation of polygenic diseases and the use of genetics in drug development. Oligonucleotide arraying technologies will allow analysis of gene expression patterns in novel ways.
Collapse
Affiliation(s)
- L M Gelbert
- Applied Genomics and Metabolic Diseases, Bristol-Myers Squibb Company, Pharmaceutical Research Institute, Princeton, NJ 08543-4000, USA.
| | | |
Collapse
|
302
|
Abstract
The FASTA package of sequence comparison programs has been expanded to include FASTX and FASTY, which compare a DNA sequence to a protein sequence database, translating the DNA sequence in three frames and aligning the translated DNA sequence to each sequence in the protein database, allowing gaps and frameshifts. Also new are TFASTX and TFASTY, which compare a protein sequence to a DNA sequence database, translating each sequence in the DNA database in six frames and scoring alignments with gaps and frameshifts. FASTX and TFASTX allow only frameshifts between codons, while FASTY and TFASTY allow substitutions or frameshifts within a codon. We examined the performance of FASTX and FASTY using different gap-opening, gap-extension, frameshift, and nucleotide substitution penalties. In general, FASTX and FASTY perform equivalently when query sequences contain 0-10% errors. We also evaluated the statistical estimates reported by FASTX and FASTY. These estimates are quite accurate, except when an out-of-frame translation produces a low-complexity protein sequence. We used FASTX to scan the Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii genomes for unidentified or misidentified protein-coding genes. We found at least 9 new protein-coding genes in the three genomes and at least 35 genes with potentially incorrect boundaries.
Collapse
Affiliation(s)
- W R Pearson
- Department of Biochemistry, University of Virginia, Charlottesville 22908, USA.
| | | | | | | |
Collapse
|
303
|
Brady KP, Rowe LB, Her H, Stevens TJ, Eppig J, Sussman DJ, Sikela J, Beier DR. Genetic mapping of 262 loci derived from expressed sequences in a murine interspecific cross using single-strand conformational polymorphism analysis. Genome Res 1997; 7:1085-93. [PMID: 9371744 PMCID: PMC310685 DOI: 10.1101/gr.7.11.1085] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We have demonstrated previously that noncoding sequences of genes are a robust source of polymorphisms between mouse species when tested using single-strand conformation polymorphism (SSCP) analysis, and that these polymorphisms are useful for genetic mapping. In this report we demonstrate that presumptive 3'-untranslated region sequence obtained from expressed sequence tags (ESTs) can be analyzed in a similar fashion, and we have used this approach to map 262 loci using an interspecific backcross. These results demonstrate SSCP analysis of genes or ESTs is a simple and efficient means for the genetic localization of transcribed sequences, and is furthermore an approach that is applicable to any system for which there is sufficient sequence polymorphism.
Collapse
Affiliation(s)
- K P Brady
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
304
|
Affiliation(s)
- G Scangos
- Exelixis Pharmaceuticals, South San Francisco, CA 94080, USA.
| |
Collapse
|
305
|
Barak LS, Ferguson SS, Zhang J, Caron MG. A beta-arrestin/green fluorescent protein biosensor for detecting G protein-coupled receptor activation. J Biol Chem 1997; 272:27497-500. [PMID: 9346876 DOI: 10.1074/jbc.272.44.27497] [Citation(s) in RCA: 359] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
G protein-coupled receptors (GPCR) represent the single most important drug targets for medical therapy, and information from genome sequencing and genomic data bases has substantially accelerated their discovery. The lack of a systematic approach either to identify the function of a new GPCR or to associate it with a cognate ligand has added to the growing number of orphan receptors. In this work we provide a novel approach to this problem using a beta-arrestin2/green fluorescent protein conjugate (betaarr2-GFP). It provides a real-time and single cell based assay to monitor GPCR activation and GPCR-G protein-coupled receptor kinase or GPCR-arrestin interactions. Confocal microscopy demonstrates the translocation of betaarr2-GFP to more than 15 different ligand-activated GPCRs. These data clearly support the common hypothesis that the beta-arrestin binding of an activated receptor is a convergent step of GPCR signaling, increase by 5-fold the number of GPCRs known to interact with beta-arrestins, demonstrate that the cytosol is the predominant reservoir of biologically active beta-arrestins, and provide the first direct demonstration of the critical importance of G protein-coupled receptor kinase phosphorylation to the biological regulation of beta-arrestin activity and GPCR signal transduction in living cells. The use of betaarr2-GFP as a biosensor to recognize the activation of pharmacologically distinct GPCRs should accelerate the identification of orphan receptors and permit the optical study of their signal transduction biology intractable to ordinary biochemical methods.
Collapse
Affiliation(s)
- L S Barak
- Howard Hughes Medical Institute Laboratories and Department of Cell Biology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | |
Collapse
|
306
|
Wilke K, Wiemann S, Gaul R, Gong W, Poustka A. Isolation of human and mouse HMG2a cDNAs: evidence for an HMG2a-specific 3' untranslated region. Gene 1997; 198:269-74. [PMID: 9370291 DOI: 10.1016/s0378-1119(97)00324-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We have isolated cDNAs of the human gene for high mobility group protein HMG2a, using the method of direct cDNA selection. The gene maps to chromosome band Xq28, and is located within 40 kb from marker DXS1684, at a distance of 5.4 Mb from the telomere. The deduced human HMG2a protein sequence has a length of 199 amino acids and is 97% identical to the sequence of chicken HMG2a. The 3' untranslated regions of the HMG2a gene in both species are highly homologous (87% identical nucleotides), and are even more conserved than the coding sequences (84% identical nucleotides). In addition, a partial cDNA sequence of the putative HMG2a gene from mouse was identified. The 3' untranslated regions from human and mouse are 90% identical. We conclude that the 3' untranslated sequences have been under strong selective pressure during evolution. Whereas expression of the chicken HMG2a gene has previously been demonstrated in liver of newly hatched chicken, the human HMG2a gene is transcribed mainly in placenta.
Collapse
Affiliation(s)
- K Wilke
- Deutsches Krebsforschungszentrum, Abteilung Molekulare Genomanalyse, Heidelberg, Germany
| | | | | | | | | |
Collapse
|
307
|
Miller G, Fuchs R, Lai E. IMAGE cDNA clones, UniGene clustering, and ACeDB: an integrated resource for expressed sequence information. Genome Res 1997; 7:1027-32. [PMID: 9331373 PMCID: PMC310673 DOI: 10.1101/gr.7.10.1027] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In this study we describe a new information resource that provides integrated access to information on IMAGE (integrated molecular analysis of genomes and their expression) cDNA library clones and derived expressed sequence tags (ESTs). We have developed an automated procedure that collates data from various public sources into a single ACeDB database. This database is a valuable tool for electronic cloning experiments and gene expression studies. It allows researchers to find information about cDNA libraries, plate addresses, insert sizes, and sequence data for IMAGE clones, the assignment of ESTs to UniGene clusters, and the chromosomal location of those genes in an efficient, graphically oriented manner.
Collapse
Affiliation(s)
- G Miller
- Department of Bioinformatics, Glaxo Wellcome Research and Development, Research Triangle Park, North Carolina 27709, USA
| | | | | |
Collapse
|
308
|
Abstract
Genes differentially expressed in different tissues, during development, or during specific pathologies are of foremost interest to both basic and pharmaceutical research. "Transcript profiles" or "digital Northerns" are generated routinely by partially sequencing thousands of randomly selected clones from relevant cDNA libraries. Differentially expressed genes can then be detected from variations in the counts of their cognate sequence tags. Here we present the first systematic study on the influence of random fluctuations and sampling size on the reliability of this kind of data. We establish a rigorous significance test and demonstrate its use on publicly available transcript profiles. The theory links the threshold of selection of putatively regulated genes (e.g., the number of pharmaceutical leads) to the fraction of false positive clones one is willing to risk. Our results delineate more precisely and extend the limits within which digital Northern data can be used.
Collapse
Affiliation(s)
- S Audic
- Laboratory of Structural and Genetic Information, Centre National de la Recherche Scientifique-E.P.91, Marseille 13402, France
| | | |
Collapse
|
309
|
Abstract
Only a few of the methods currently used for identification of differentially expressed genes take advantage of the fact that (near) complete sets of cDNA clones and sequences representing all human and mouse genes will be available for high throughput survey of gene expression. Accordingly, strategies based on hybridization of complex (cDNA or RNA) probes to cDNA microarrays, either on glass slides or on chips, are likely to become increasingly more advantageous. Recognizing, however, that the power of these methods depends upon the availability of such resources, strategies are being pursued to facilitate completion of the ongoing efforts to identify all human and mouse genes.
Collapse
Affiliation(s)
- M B Soares
- Department of Pediatrics, University of Iowa, Iowa City 52245, USA.
| |
Collapse
|
310
|
Adati N, Ito T, Sakaki Y, Shiokawa K. Isolation and expression study of a maternally expressed novel Xenopus gene Xem1 encoding a putative evolutionarily conserved membrane protein. Biochem Biophys Res Commun 1997; 238:899-904. [PMID: 9325189 DOI: 10.1006/bbrc.1997.7215] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A novel Xenopus maternally expressed gene, Xem1, was isolated by differential display PCR and 5'-RACE. Xem1 coded for a putative transmembrane protein of 172 amino acids. Sequence analysis, including the clustering and reconstruction of ESTs (Expressed Sequence Tags), revealed that homologs of Xem1 are widely distributed in eukaryotic phyla, suggesting that Xem1 is a member of evolutionarily conserved proteins. Expression of Xem1 mRNA occurred from the previtellogenic stage and its level increased during oogenesis, maintained throughout oocyte maturation to blastula stage and then decreased in post gastrula stages. In cleavage stage, Xem1 RNA was distributed uniformly, and in adult, occurred predominantly in ovary and testis. We assume that Xenopus Xem1 may have its function in gametogenesis and in early phase of embryogenesis, whose function may be related to transport mechanism of small molecular weight substances like metal ions, from analogy to the function of its homologs in other organisms.
Collapse
Affiliation(s)
- N Adati
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| | | | | | | |
Collapse
|
311
|
Abstract
Familial Mediterranean fever (FMF) is an autosomal recessive disorder characterized by attacks of fever and serositis. In this paper, we define a minimal co-segregating region of 60 kb containing the FMF gene (MEFV) and identify four different transcript units within this region. One of these transcripts encodes a new protein (marenostrin) related to the ret-finger protein and to butyrophllin. Four conservative missense variations co-segregating with FMF have been found within the MEFV candidate gene in 85% of the carrier chromosomes. These variations, which cluster at the carboxy terminal domain of the protein, were not present in 308 control chromosomes, including 162 validated non-carriers. We therefore propose that the sequence alterations in the marenostrin protein are responsible for the FMF disease.
Collapse
|
312
|
Katsanis N, Yaspo ML, Fisher EM. Identification and mapping of a novel human gene, HRMT1L1, homologous to the rat protein arginine N-methyltransferase 1 (PRMT1) gene. Mamm Genome 1997; 8:526-9. [PMID: 9196002 DOI: 10.1007/s003359900491] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- N Katsanis
- Neurogenetics Unit, Imperial College School of Medicine at St. Mary's, Norfolk Place, London W2 1PG, UK
| | | | | |
Collapse
|
313
|
Orstavik S, Natarajan V, Taskén K, Jahnsen T, Sandberg M. Characterization of the human gene encoding the type I alpha and type I beta cGMP-dependent protein kinase (PRKG1). Genomics 1997; 42:311-8. [PMID: 9192852 DOI: 10.1006/geno.1997.4743] [Citation(s) in RCA: 99] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The type I cGMP-dependent protein kinase (cGK) has been shown to play a crucial role in the relaxation of vascular smooth muscle by lowering the intracellular level of calcium. Two isoforms of type I cGK have been described, type I alpha and type I beta, differing only in their N-terminal parts. This report describes the cloning of the gene PRKG1 encoding both human type I cGK isoforms. PRKG1 is a single-copy gene consisting of 19 exons encompassing at least 220 kb. Several of the splice sites previously observed in the Drosophila melanogaster DG2 gene have been conserved in PRKG1, and these conserved splice sites correlated well with the boundaries between several of the previously proposed functional domains of type I cGK. The first two exons of the type I cGK gene were shown to encode the type I alpha- and type I beta-specific parts of the cGK. Using 5'-rapid amplification of cDNA ends, potential sites for transcription initiation were identified 5' upstream of both these exons. Northern blot analyses demonstrated distinct patterns of expression of the isoforms of type I alpha and I beta cGK in different human tissues.
Collapse
Affiliation(s)
- S Orstavik
- Institute of Medical Biochemistry, University of Oslo, Norway.
| | | | | | | | | |
Collapse
|
314
|
Nelson MA, Kang S, Braun EL, Crawford ME, Dolan PL, Leonard PM, Mitchell J, Armijo AM, Bean L, Blueyes E, Cushing T, Errett A, Fleharty M, Gorman M, Judson K, Miller R, Ortega J, Pavlova I, Perea J, Todisco S, Trujillo R, Valentine J, Wells A, Werner-Washburne M, Natvig DO. Expressed sequences from conidial, mycelial, and sexual stages of Neurospora crassa. Fungal Genet Biol 1997; 21:348-63. [PMID: 9290248 DOI: 10.1006/fgbi.1997.0986] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In the Neurospora Genome Project at the University of New Mexico, expressed sequence tags (ESTs) corresponding to three stages of the life cycle of the filamentous fungus Neurospora crassa are being analyzed. The results of a pilot project to identify expressed genes and determine their patterns of expression are presented. 1,865 partial complementary DNA (cDNA) sequences for 1,409 clones were determined using single-pass sequencing. Contig analysis allowed the identification of 838 unique ESTs and 156 ESTs present in multiple cDNA clones. For about 34% of the sequences, highly or moderately significant matches to sequences (of known and unknown function) in the NCBI database were detected. Approximately 56% of the ESTs showed no similarity to previously identified genes. Among genes with assigned function, about 43.3% were involved in metabolism, 32.9% in protein synthesis and 8.4% in RNA synthesis. Fewer were involved in defense (6%), cell signalling (3.4%), cell structure (3.4%) and cell division (2.6%).
Collapse
Affiliation(s)
- M A Nelson
- Department of Biology, University of New Mexico, Albuquerque 87131, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
315
|
Abstract
Bioinformatics is now an essential tool in many aspects of human molecular genetics research. Methods for the prediction of gene structure are essential components in genomic sequencing projects and provide the key to deriving protein sequence and locating intron/exon junctions. Sequence comparison and database searching are the pre-eminent approaches for predicting the likely biochemical function of new genes, although sequence profiles derived from families of aligned sequences have advantages in the detection of remote sequence relationships. The use of sequence database analysis for large-scale comparative analysis of genome sequence data from model organisms is emerging as the most important recent development in the application of bioinformatics methods for characterizing candidate disease genes.
Collapse
Affiliation(s)
- C J Rawlings
- SmithKline Beecham Pharmaceuticals, Department of Bioinformatics, Harlow, Essex, UK.
| | | |
Collapse
|
316
|
Botstein D, Cherry JM. Molecular linguistics: extracting information from gene and protein sequences. Proc Natl Acad Sci U S A 1997; 94:5506-7. [PMID: 9159100 PMCID: PMC34160 DOI: 10.1073/pnas.94.11.5506] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
|
317
|
Marlor CW, Delaria KA, Davis G, Muller DK, Greve JM, Tamburini PP. Identification and cloning of human placental bikunin, a novel serine protease inhibitor containing two Kunitz domains. J Biol Chem 1997; 272:12202-8. [PMID: 9115294 DOI: 10.1074/jbc.272.18.12202] [Citation(s) in RCA: 73] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Interrogation of the public expressed sequence tag (EST) data base with the sequence of preproaprotinin identified ESTs encoding two potential new members of the Kunitz family of serine protease inhibitors. Through reiterative interrogation, an EST contig was obtained, the consensus sequence from which encoded both of the novel Kunitz domains in a single open reading frame. This consensus sequence was used to direct the isolation of a full-length cDNA clone from a placental library. The resulting cDNA sequence predicted a 252-residue protein containing a putative NH2-terminal signal peptide followed sequentially by each of the two Kunitz domains within a 170-residue ectodomain, a putative transmembrane domain, and a 31-residue hydrophilic COOH terminus. The gene for this putative novel protein was mapped by use of a radiation hybrid panel to chromosome 19q13, and Northern analysis showed that the corresponding mRNA was expressed at high levels in human placenta and pancreas and at lower levels in brain, lung, and kidney. An endogenous soluble form of this protein, which was designated as placental bikunin, was highly purified from human placenta by sequential kallikrein-Sepharose affinity, gel filtration, and C18 reverse-phase chromatography. The natural protein exhibited the same NH2 terminus as predicted from the cloned cDNA and inhibited trypsin, plasma kallikrein, and plasmin with IC50 values in the nanomolar range.
Collapse
Affiliation(s)
- C W Marlor
- Institute of Bone and Joint Disease and Cancer, Bayer Corporation, West Haven, Connecticut 06516, USA
| | | | | | | | | | | |
Collapse
|
318
|
Frazer KA, Ueda Y, Zhu Y, Gifford VR, Garofalo MR, Mohandas N, Martin CH, Palazzolo MJ, Cheng JF, Rubin EM. Computational and biological analysis of 680 kb of DNA sequence from the human 5q31 cytokine gene cluster region. Genome Res 1997; 7:495-512. [PMID: 9149945 DOI: 10.1101/gr.7.5.495] [Citation(s) in RCA: 97] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
With the human genome project advancing into what will be a 7- to 10-year DNA sequencing phase, we are presented with the challenge of developing strategies to convert genomic sequence data, as they become available, into biologically meaningful information. We have analyzed 680 kb of noncontiguous DNA sequence from a 1-Mb region of human chromosome 5q31, coupling computational analysis with gene expression studies of tissues isolated from humans as well as from mice containing human YAC transgenes. This genomic interval has been noted previously for containing the cytokine gene cluster and a quantitative trait locus associated with inflammatory diseases. Our analysis identified and verified expression of 16 new genes, as well as 7 previously known genes. Of the total of 23 genes in this region, 78% had similarity matches to sequences in protein databases and 83% had exact expressed sequence tag (EST) database matches. Comparative mapping studies of eight of the new human genes discovered in the 5q31 region revealed that all are located in the syntenic region of mouse chromosome 11q. Our analysis demonstrates an approach for examining human sequence as it is made available from large sequencing programs and has resulted in the discovery of several biomedically important genes, including a cyclin, a transcription factor that is homologous to an oncogene, a protein involved in DNA repair, and several new members of a family of transporter proteins.
Collapse
Affiliation(s)
- K A Frazer
- Human Genome Center, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California 94720, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
319
|
Affiliation(s)
- J L Weber
- Center for Medical Genetics, Marshfield Medical Research Foundation, Wisconsin 54449, USA.
| | | |
Collapse
|
320
|
Affiliation(s)
- P Green
- Department of Molecular Biotechnology, University of Washington, Seattle 98195, USA.
| |
Collapse
|
321
|
Wolfsberg TG, Landsman D. A comparison of expressed sequence tags (ESTs) to human genomic sequences. Nucleic Acids Res 1997; 25:1626-32. [PMID: 9092672 PMCID: PMC146621 DOI: 10.1093/nar/25.8.1626] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The approximately 415 000 human ESTs represent a valuable, low priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely, other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST.
Collapse
Affiliation(s)
- T G Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 8N-807, Bethesda, MD 20894, USA
| | | |
Collapse
|
322
|
Yu W, Andersson B, Worley KC, Muzny DM, Ding Y, Liu W, Ricafrente JY, Wentland MA, Lennon G, Gibbs RA. Large-scale concatenation cDNA sequencing. Genome Res 1997; 7:353-8. [PMID: 9110174 PMCID: PMC139146 DOI: 10.1101/gr.7.4.353] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/1996] [Accepted: 02/04/1997] [Indexed: 02/04/2023]
Abstract
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7-2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (> 20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (> or = 98% identity), and 16 clones generated nonexact matches (57%-97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching.
Collapse
|
323
|
Ansari-Lari MA, Shen Y, Muzny DM, Lee W, Gibbs RA. Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination. Genome Res 1997; 7:268-80. [PMID: 9074930 DOI: 10.1101/gr.7.3.268] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The detailed genomic organization of a gene-dense region at human chromosome 12p13, spanning 223 kb of contiguous sequence, was determined. This region is composed of 20 genes and several other expressed sequences. Experimental tools including RT-PCR and cDNA sequencing, combined with gene prediction programs, were utilized in the analysis of the sequence. Various computer software programs were employed for sequence similarity searches and functional predictions. The high number of genes with diverse functions and complex transcriptional patterns make this region ideal for addressing challenges of gene discovery and genomic characterization amenable to large-scale sequence analysis.
Collapse
Affiliation(s)
- M A Ansari-Lari
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
| | | | | | | | | |
Collapse
|
324
|
Touchman JW, Bouffard GG, Weintraub LA, Idol JR, Wang L, Robbins CM, Nussbaum JC, Lovett M, Green ED. 2006 expressed-sequence tags derived from human chromosome 7-enriched cDNA libraries. Genome Res 1997; 7:281-92. [PMID: 9074931 DOI: 10.1101/gr.7.3.281] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The establishment and mapping of gene-specific DNA sequences greatly complement the ongoing efforts to map and sequence all human chromosomes. To facilitate our studies of human chromosome 7, we have generated and analyzed 2006 expressed-sequence tags (ESTs) derived from a collection of direct selection cDNA libraries that are highly enriched for human chromosome 7 gene sequences. Similarity searches indicate that approximately two-thirds of the ESTs are not represented by sequences in the public databases, including those in dbEST. In addition, a large fraction (68%) of the ESTs do not have redundant or overlapping sequences within our collection. Human DNA-specific sequence-tagged sites (STSs) have been developed from 190 of the ESTs. Remarkably, 180 (96%) of these STSs map to chromosome 7, demonstrating the robustness of chromosome enrichment in constructing the direct selection cDNA libraries. Thus far, 140 of these EST-specific STSs have been assigned unequivocally to YAC contigs that are distributed across the chromosome. Together, these studies provide > 2000 ESTs highly enriched for chromosome 7 gene sequences, 180 new chromosome 7 STSs corresponding to ESTs, and a definitive demonstration of the ability to enrich for chromosome-specific cDNAs by direct selection. Furthermore, the libraries, sequence data, and mapping information will contribute to the construction of a chromosome 7 transcript map.
Collapse
Affiliation(s)
- J W Touchman
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
325
|
Abstract
The GenBank sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from authors and from large-scale sequencing projects. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive coverage. GenBank continues to focus on quality control and annotation while expanding data coverage and retrieval services. An integrated retrieval system, known asEntrez, incorporates data from the major DNA and protein sequence databases, along with genome maps and protein structure information. MEDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation. Sequence similarity searching is offered through the BLAST family of programs. All of NCBI's services are offered through the World Wide Web. In addition, there are specialized server/client versions as well as FTP and e-mail server access.
Collapse
Affiliation(s)
- D A Benson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|
326
|
Aaronson JS, Eckman B, Blevins RA, Borkowski JA, Myerson J, Imran S, Elliston KO. Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res 1996; 6:829-45. [PMID: 8889550 DOI: 10.1101/gr.6.9.829] [Citation(s) in RCA: 84] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.
Collapse
Affiliation(s)
- J S Aaronson
- Merck Research Laboratories, Department of Bioinformatics, Rahway, New Jersey 07065, USA.
| | | | | | | | | | | | | |
Collapse
|
327
|
Bonaldo MF, Lennon G, Soares MB. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res 1996; 6:791-806. [PMID: 8889548 DOI: 10.1101/gr.6.9.791] [Citation(s) in RCA: 364] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process involved in the identification and cloning of human disease genes. However, the integrity of the data and the pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediate frequency classes comprise as much as 50-65% of the total mRNA mass, but represent no more than 1000-2000 different mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become overwhelming relatively early in any such random gene discovery programs, thus seriously compromising their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (INIB) and fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of which have been contributed to our integrated Molecular Analysis of Genomes and Their Expression (IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and (mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed. Here we present a detailed description and a comparative analysis of four methods that we developed and used to generate normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma mansoni (1). In addition, we describe the construction and preliminary characterization of a subtracted liver/spleen library (INFLS-SI) that resulted from the elimination (or reduction of representation) of -5000 INFLS-IMAGE clones from the INFLS library.
Collapse
Affiliation(s)
- M F Bonaldo
- Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, New York, USA
| | | | | |
Collapse
|
328
|
Variations on a theme: Combined molecular chaperone and proteolysis functions in Clp/HSP100 proteins. J Biosci 1996. [DOI: 10.1007/bf02703106] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|