1
|
|
2
|
|
3
|
Abstract
There are a number of ways to investigate the structure, function and evolution of the human genome. These include examining the morphology of normal and abnormal chromosomes, constructing maps of genomic landmarks, following the genetic transmission of phenotypes and DNA sequence variations, and characterizing thousands of individual genes. To this list we can now add the elucidation of the genomic DNA sequence, albeit at 'working draft' accuracy. The current challenge is to weave together these disparate types of data to produce the information infrastructure needed to support the next generation of biomedical research. Here we provide an overview of the different sources of information about the human genome and how modern information technology, in particular the internet, allows us to link them together.
Collapse
|
4
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2001; 29:11-6. [PMID: 11125038 PMCID: PMC29800 DOI: 10.1093/nar/29.1.11] [Citation(s) in RCA: 196] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2000] [Accepted: 10/04/2000] [Indexed: 11/14/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap'99, Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri-tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov.
Collapse
|
5
|
Abstract
We have constructed a public gene expression data repository and online data access and analysis, WWW and FTP sites for serial analysis of gene expression (SAGE) data. The WWW and FTP components of this resource, SAGEmap, are located at http://www.ncbi.nlm.nih. gov/sage and ftp://ncbi.nlm.nih.gov/pub/sage, respectively. We herein describe SAGE data submission procedures, the construction and characteristics of SAGE tags to gene assignments, the derivation and use of a novel statistical test designed specifically for differential-type analyses of SAGE data, and the organization and use of this resource.
Collapse
|
6
|
Abstract
This paper describes a fast and scalable strategy for constructing a radiation hybrid (RH) map from data on different RH panels. The maps on each panel are then integrated to produce a single RH map for the genome. Recurring problems in using maps from several sources are that the maps use different markers, the maps do not place the overlapping markers in same order, and the objective functions for map quality are incomparable. We use methods from combinatorial optimization to develop a strategy that addresses these issues. We show that by the standard objective functions of obligate chromosome breaks and maximum likelihood, software for the traveling salesman problem produces RH maps with better quality much more quickly than using software specifically tailored for RH mapping. We use known algorithms for the longest common subsequence problem as part of our map integration strategy. We demonstrate our methods by reconstructing and integrating maps for markers typed on the Genebridge 4 (GB4) and the Stanford G3 panels publicly available from the RH database. We compare map quality of our integrated map with published maps for GB4 panel and G3 panel by considering whether markers occur in the same order on a map and in DNA sequence contigs submitted to GenBank. We find that all of the maps are inconsistent with the sequence data for at least 50% of the contigs, but our integrated maps are more consistent. The map integration strategy not only scales to multiple RH maps but also to any maps that have comparable criteria for measuring map quality. Our software improves on current technology for doing RH mapping in areas of computation time and algorithms for considering a large number of markers for mapping. The essential impediments to producing dense high-quality RH maps are data quality and panel size, not computation.
Collapse
|
7
|
Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2000; 28:10-4. [PMID: 10592169 PMCID: PMC102437 DOI: 10.1093/nar/28.1.10] [Citation(s) in RCA: 297] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/1999] [Revised: 09/14/1999] [Accepted: 10/08/1999] [Indexed: 11/14/2022] Open
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap'99, Davis Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov
Collapse
|
8
|
Abstract
Radiation hybrid (RH) maps are a useful tool for genome analysis, providing a direct method for localizing genes and anchoring physical maps and genomic sequence along chromosomes. The construction of a comprehensive RH map for the human genome has resulted in gene maps reflecting the location of more than 30,000 human genes. Here we report the first comprehensive RH map of the mouse genome. The map contains 2,486 loci screened against an RH panel of 93 cell lines. Most loci (93%) are simple sequence length polymorphisms (SSLPs) taken from the mouse genetic map, thereby providing direct integration between these two key maps. We performed RH mapping by a new and efficient approach in which we replaced traditional gel- or hybridization-based assays by a homogeneous 5'-nuclease assays involving a single common probe for all genetic markers. The map provides essentially complete connectivity and coverage across the genome, and good resolution for ordering loci, with 1 centiRay (cR) corresponding to an average of approximately 100 kb. The RH map, together with an accompanying World-Wide Web server, makes it possible for any investigator to rapidly localize sequences in the mouse genome. Together with the previously constructed genetic map and a YAC-based physical map reported in a companion paper, the fundamental maps required for mouse genomics are now available.
Collapse
|
9
|
|
10
|
Abstract
A crucial event in the history of the Human Genome Project was the decision to use sequence-tagged sites (STSs) as common landmarks for genomic mapping. Following several years of constructing STS-based maps of ever-increasing detail, the emphasis has recently shifted towards large-scale genomic sequencing. A computational procedure called 'electronic PCR' allows STS landmarks to be revealed as data emerge from the sequencing pipeline, thereby bridging the gap between mapping and sequencing activities.
Collapse
|
11
|
Abstract
A map of 30,181 human gene-based markers was assembled and integrated with the current genetic map by radiation hybrid mapping. The new gene map contains nearly twice as many genes as the previous release, includes most genes that encode proteins of known function, and is twofold to threefold more accurate than the previous version. A redesigned, more informative and functional World Wide Web site (www.ncbi.nlm.nih.gov/genemap) provides the mapping information and associated data and annotations. This resource constitutes an important infrastructure and tool for the study of complex genetic traits, the positional cloning of disease genes, the cross-referencing of mammalian genomes, and validated human transcribed sequences for large-scale studies of gene expression.
Collapse
|
12
|
|
13
|
Abstract
Microarray technology makes it possible to simultaneously study the expression of thousands of genes during a single experiment. We have developed an information system, ArrayDB, to manage and analyse large-scale expression data. The underlying relational database was designed to allow flexibility in the nature and structure of data input and also in the generation of standard or customized reports through a web-browser interface. ArrayDB provides varied options for data retrieval and analysis tools that should facilitate the interpretation of complex hybridization results. A sampling of ArrayDB storage, retrieval and analysis capabilities is available (www.nhgri.nih.gov/DIR/LCG/15K/HTML/ ), along with information on a set of approximately 15,000 genes used to fabricate several widely used microarrays. Information stored in ArrayDB is used to provide integrated gene expression reports by linking array target sequences with NCBI's Entrez retrieval system, UniGene and KEGG pathway views. The integration of external information resources is essential in interpreting intrinsic patterns and relationships in large-scale gene expression data.
Collapse
|
14
|
|
15
|
|
16
|
Genome maps 7. The human transcript map. Wall chart. Science 1996; 274:547-62. [PMID: 8928009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
17
|
|
18
|
A gene map of the human genome. Science 1996; 274:540-6. [PMID: 8849440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.
Collapse
|
19
|
|
20
|
|
21
|
Abstract
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand.
Collapse
|
22
|
Germ line c-myc is not down-regulated by loss or exclusion of activating factors in myc-induced macrophage tumors. Mol Cell Biol 1989; 9:3482-90. [PMID: 2477687 PMCID: PMC362395 DOI: 10.1128/mcb.9.8.3482-3490.1989] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
As in tumors with c-myc chromosomal translocations, c-myc retrovirus-induced monocyte tumors constitutively express an activated form of c-myc (the proviral gene), whereas the normal endogenous c-myc genes are transcriptionally silent. Treatment of these retrovirus-induced tumor cells with a number of bioactive chemicals and growth factors that are known to induce c-myc expression in cells of the monocyte lineage failed to induce the endogenous c-myc gene. In contrast, the same treatments induced the c-fos gene in both tumors and a control macrophage line. To investigate c-myc suppression further, a normal copy of the human c-myc gene was introduced into tumor and control cell lines by using a retrovirus with self-inactivating long terminal repeats. This transduced normal gene was expressed at equivalent levels in all cells, regardless of the state of endogenous c-myc gene expression, and was strongly induced by agents that induce the normal gene in the control cells. These results indicate that the signal transduction pathways that normally activate the c-myc gene are functional in myc-induced tumor cells and suggest that endogenous c-myc is actively suppressed. An examination of the c-myc locus itself showed that the lack of transcriptional activity correlated with the absence of several prominent DNase I-hypersensitive sites in the 5'-flanking region of the gene but without loss of general DNase sensitivity. Furthermore, analysis of 22 methylation-sensitive restriction enzyme sites in the 5'-flanking region, first exon, and first intron indicated that the silent c-myc genes remained in the same unmethylated state as did actively expressed genes. Thus, c-myc suppression does not appear to result from the most frequently described mechanisms of gene inactivation.
Collapse
|
23
|
Continued withdrawal from the cell cycle and regulation of cellular genes in mouse erythroleukemia cells blocked in differentiation by the c-myc oncogene. Mol Cell Biol 1989; 9:1714-20. [PMID: 2657403 PMCID: PMC362590 DOI: 10.1128/mcb.9.4.1714-1720.1989] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Constitutive expression of the c-myc oncogene blocks dimethyl sulfoxide (DMSO)-induced differentiation of mouse erythroleukemia (MEL) cells. During the first 12 h of treatment with DMSO, MEL cells undergo a temporary decrease in the level of c-myc mRNA, followed by a temporary withdrawal from the cell cycle. We found the same shutoff of DNA synthesis during the first 12 to 30 h after DMSO induction in normal MEL cells (which differentiate) and in c-myc-transfected MEL cells (which do not differentiate). We also examined whether deregulated c-myc expression grossly interfered with the regulation of gene expression during MEL cell differentiation. We used run-on transcription assays to monitor the rate of transcription of four oncogenes (c-myc, c-myb, c-fos, and c-K-ras); all except c-K-ras showed a rapid but temporary decrease in transcription after induction in both c-myc-transfected and control cells. Finally, we found the same regulation of cytoplasmic mRNA expression in both types of cells for four oncogenes and three housekeeping genes associated with growth. We conclude that in the MEL cell system, the effects of deregulated c-myc expression do not occur through a disruption of cell cycle control early in induction, nor do they occur through gross deregulation of gene expression.
Collapse
MESH Headings
- Animals
- Cell Cycle/drug effects
- Cell Differentiation/drug effects
- DNA, Neoplasm/biosynthesis
- Dimethyl Sulfoxide/pharmacology
- Gene Expression Regulation/drug effects
- Leukemia, Erythroblastic, Acute/genetics
- Leukemia, Erythroblastic, Acute/pathology
- Mice
- Proto-Oncogene Proteins/genetics
- Proto-Oncogene Proteins c-myc
- Proto-Oncogenes/drug effects
- RNA, Neoplasm/genetics
- RNA, Neoplasm/metabolism
- Transcription, Genetic/drug effects
- Tumor Cells, Cultured/drug effects
- Tumor Cells, Cultured/metabolism
- Tumor Cells, Cultured/pathology
Collapse
|
24
|
Studies of secondary transforming events in murine c-myc retrovirus-induced monocyte tumors. Curr Top Microbiol Immunol 1989; 149:89-98. [PMID: 2659285 DOI: 10.1007/978-3-642-74623-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
25
|
Abstract
Regulation of mRNA turnover has emerged as an important control point in lymphokine and oncogene expression. We have studied a monocytic tumor in which activation of GM-CSF expression results from the constitutive stabilization of the normally short-lived GM-CSF mRNA. Linkage of the germ-line 3' untranslated region of the GM-CSF gene to a neo reporter gene demonstrated that mRNA stabilization is mediated by a tumor-specific trans-acting factor(s), rather than by an alteration of the GM-CSF gene itself. Significantly, similar fusions of the c-myc and c-fos 3' untranslated regions to neo yielded mRNAs that turned over rapidly in all cells, including the tumor cells. These results demonstrate that AU-rich mRNA turnover signals are recognized differentially in trans within the same cell.
Collapse
|