1
|
Stevens H. Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration. JOURNAL OF THE HISTORY OF BIOLOGY 2018; 51:657-691. [PMID: 28986915 DOI: 10.1007/s10739-017-9490-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Genomics is increasingly considered a global enterprise - the fact that biological information can flow rapidly around the planet is taken to be important to what genomics is and what it can achieve. However, the large-scale international circulation of nucleotide sequence information did not begin with the Human Genome Project. Efforts to formalize and institutionalize the circulation of sequence information emerged concurrently with the development of centralized facilities for collecting that information. That is, the very first databases build for collecting and sharing DNA sequence information were, from their outset, international collaborative enterprises. This paper describes the origins of the International Nucleotide Sequence Database Collaboration between GenBank in the United States, the European Molecular Biology Laboratory Databank, and the DNA Database of Japan. The technical and social groundwork for the international exchange of nucleotide sequences created the conditions of possibility for imagining nucleotide sequences (and subsequently genomes) as a "global" objects. The "transnationalism" of nucleotide sequence was critical to their ontology - what DNA sequences came to be during the Human Genome Project was deeply influenced by international exchange.
Collapse
Affiliation(s)
- Hallam Stevens
- School of Humanities and Social Sciences, Nanyang Technological University, 14 Nanyang Drive #05-07, Singapore, 637332, Singapore.
| |
Collapse
|
2
|
Hammami R, Fliss I. Use of SciDBMaker as Tool for the Design of Specialized Biological Databases. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The exponential growth of molecular biology research in recent decades has brought concomitant growth in the number and size of genomic and proteomic databases used to interpret experimental findings. Particularly, growth of protein sequence records created the need for smaller and manually annotated databases. Since scientists are continually developing new specific databases to enhance their understanding of biological processes, the authors created SciDBMaker to provide a tool for easy building of new specialized protein knowledge bases. This chapter also suggests best practices for specialized biological databases design, and provides examples for the implementation of these practices.
Collapse
|
3
|
Wolfsberg TG, Madden TL. Sequence similarity searching using the BLAST family of programs. ACTA ACUST UNITED AC 2008; Chapter 19:Unit 19.3. [PMID: 18265177 DOI: 10.1002/0471142727.mb1903s46] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.
Collapse
Affiliation(s)
- T G Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland, USA
| | | |
Collapse
|
4
|
Bellgard M, Ye J, Gojobori T, Appels R. The bioinformatics challenges in comparative analysis of cereal genomes-an overview. Funct Integr Genomics 2004; 4:1-11. [PMID: 14770300 DOI: 10.1007/s10142-004-0102-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2003] [Revised: 12/16/2003] [Accepted: 12/16/2003] [Indexed: 11/24/2022]
Abstract
Comparative genomic analysis is the cornerstone of in silico-based approaches to understanding biological systems and processes across cereal species, such as rice, wheat and barley, in order to identify genes of agronomic interest. The size of the genomic repositories is nearly doubling every year, and this has significant implications on the way bioinformatics analyses are carried out. In this overview the concepts and technology underpinning bioinformatics as applied to comparative genomic analysis are considered in the context of other manuscripts appearing in this issue of Functional and Integrative Genomics.
Collapse
Affiliation(s)
- M Bellgard
- Molecular Plant Breeding CRC, Murdoch University, South Street, WA 6152 Murdoch, Australia
| | | | | | | |
Collapse
|
5
|
Miyazaki S, Sugawara H, Ikeo K, Gojobori T, Tateno Y. DDBJ in the stream of various biological data. Nucleic Acids Res 2004; 32:D31-4. [PMID: 14681352 PMCID: PMC308861 DOI: 10.1093/nar/gkh127] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2003] [Revised: 10/03/2003] [Accepted: 10/23/2003] [Indexed: 11/13/2022] Open
Abstract
In the past year we at DDBJ (http://www.ddbj.nig. ac.jp) have made a steady increase in the number of data submissions with a 50.6% increment in the number of bases or 46.5% increment in the number of entries. Among them the genome data of man, ascidian and rice hold the top three. Our activity has extended to providing a tool that enables sequence retrieval using regular expressions, and to launching our SOAP server and web services to facilitate the acquisition of proper data and tools from a huge number of biological data resources on websites worldwide. We have also opened our public gene expression database, CIBEX.
Collapse
Affiliation(s)
- S Miyazaki
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Yata, Mishima 411-8540, Japan
| | | | | | | | | |
Collapse
|
6
|
Matsuura Y, Tohya Y, Nakamura K, Shimojima M, Roerink F, Mochizuki M, Takase K, Akashi H, Sugimura T. Complete nucleotide sequence, genome organization and phylogenic analysis of the canine calicivirus. Virus Genes 2003; 25:67-73. [PMID: 12206310 DOI: 10.1023/a:1020174225622] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The complete genomic sequence of canine calicivirus (CaCV) isolated from feces of a dog with diarrhea was determined. The CaCV genome, a positive-sense single-stranded RNA, contained 8513 nucleotides excluding the poly(A) tail and was longer than that of any other calicivirus strain with a completely known sequence. There were three open reading frames (ORF1, nt 12-5801; ORF2, nt 5805-7880; and ORF3, nt 7877-8278). ORF1 encoded a polyprotein (calculated Mr of 214,802) which had the conserved motifs of non-structural proteins of other caliciviruses and picornaviruses. Regions containing characteristic motifs in the non-structural polyprotein of CaCV showed highest similarity with those of the species Feline calicivirus and Vesicular exanthema of swine virus in the genus Vesivirus. Phylogenic analysis indicated that CaCV formed a distinct branch within the genus. Our results strongly suggested that CaCV is a new species in the genus Vesivirus.
Collapse
Affiliation(s)
- Yuichi Matsuura
- Department of Veterinary Medicine, Faculty of Agriculture, Kagoshima University, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has collected and released more entries and bases than last year. This is mainly due to large-scale submissions from Japanese sequencing teams on mouse, rice, chimpanzee, nematoda and other organisms. The contributions of DDBJ over the past year are 17.3% (entries) and 10.3% (bases) of the combined outputs of the International Nucleotide Sequence Databases (INSD). Our complete genome sequence database, Genome Information Broker (GIB), has been improved by incorporating XML. It is now possible to perform a more sophisticated database search against the new GIB than the ordinary BLAST or FASTA search.
Collapse
Affiliation(s)
- S Miyazaki
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Yata, Mishima 411-8540, Japan
| | | | | | | |
Collapse
|
8
|
Tateno Y, Imanishi T, Miyazaki S, Fukami-Kobayashi K, Saitou N, Sugawara H, Gojobori T. DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res 2002; 30:27-30. [PMID: 11752245 PMCID: PMC99140 DOI: 10.1093/nar/30.1.27] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools.
Collapse
Affiliation(s)
- Y Tateno
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Yata, Mishima 411-8540, Japan
| | | | | | | | | | | | | |
Collapse
|
9
|
Li J, Glick BR. Transcriptional regulation of the Enterobacter cloacae UW4 1-aminocyclopropane-1-carboxylate (ACC) deaminase gene (acdS). Can J Microbiol 2001; 47:359-67. [PMID: 11358176 DOI: 10.1139/w01-009] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Based on DNA sequence analysis and 1-aminocyclopropane-1-carboxylate (ACC) deaminase activity, the region of DNA immediately upstream of the Enterobacter cloacae UW4 ACC deaminase gene (acdS) contains several features that appear to be involved in its transcriptional regulation. In the present study, the 5' upstream region of acdS was cloned into the promoter-probe vector, pQF70, which carries the promoterless luciferase gene (luxAB), and luciferase expression was monitored. The data obtained from studying the expression of the luciferase gene showed that (i) a leucine responsive regulatory protein (LRP)-like protein encoded within the upstream region is located on the opposite strand from acdS under the control of a promoter stronger than the one responsible for acdS transcription, (ii) luciferase gene expression required both ACC and the LRP-like protein, (iii) luciferase expression was increased three-fold under anaerobic conditions, consistent with the involvement of a fumarate-nitrate reduction (FNR)-like regulatory protein box within the upstream region, and (iv) the addition of leucine to the growth medium decreased luciferase activity in the presence of ACC and increased luciferase activity in the absence of ACC, consistent with leucine acting as a regulator of the expression of the LRP-like protein.
Collapse
Affiliation(s)
- J Li
- Department of Biology, University of Waterloo, ON, Canada
| | | |
Collapse
|
10
|
Tateno Y, Miyazaki S, Ota M, Sugawara H, Gojobori T. DNA data bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Res 2000; 28:24-6. [PMID: 10592172 PMCID: PMC102400 DOI: 10.1093/nar/28.1.24] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We at DDBJ (http://www.ddbj.nig.ac.jp) process and publicise the massive amounts of data submitted mainly by Japanese genome projects and sequencing teams. It is emphasised that the collaboration between data producing teams and the data bank is crucial in carrying out these processes smoothly. The amount of data submitted in 1999 is so large that it alone exceeds the total amount submitted in the preceding 10 years. To cope with this situation, we have developed tools not only for processing such massive amounts of data but also for efficiently retrieving data on demand.
Collapse
Affiliation(s)
- Y Tateno
- Center for Information Biology, National Institute of Genetics, Yata, Mishima 411-8540, Japan.
| | | | | | | | | |
Collapse
|
11
|
Shirai T, Mitsuyama C, Niwa Y, Matsui Y, Hotta H, Yamane T, Kamiya H, Ishii C, Ogawa T, Muramoto K. High-resolution structure of the conger eel galectin, congerin I, in lactose-liganded and ligand-free forms: emergence of a new structure class by accelerated evolution. Structure 1999; 7:1223-33. [PMID: 10545323 DOI: 10.1016/s0969-2126(00)80056-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
BACKGROUND Congerin I is a member of the galectin (animal beta-galactoside-binding lectin) family and is found in the skin mucus of conger eel. The galectin family proteins perform a variety of biological activities. Because of its histological localization and activity against marine bacteria and starfish embryos, congerin I is thought to take part in the eels' biological defense system against parasites. RESULTS The crystal structure of congerin I has been determined in both lactose-liganded and ligand-free forms to 1. 5 A and 1.6 A resolution, respectively. The protein is a homodimer of 15 kDa subunits. Congerin I has a beta-sheet topology that is markedly different from those of known relatives. One of the beta-strands is exchanged between two identical subunits. This strand swap might increase the dimer stability. Of the known galectin complexes, congerin I forms the most extensive interaction with lactose molecules. Most of these interactions are substituted by similar interactions with water molecules, including a pi-electron hydrogen bond, in the ligand-free form. This observation indicates an increased affinity of congerin I for the ligand. CONCLUSIONS The genes for congerin I and an isoform, congerin II, are known to have evolved under positive selection pressure. The strand swap and the modification in the carbohydrate-binding site might enhance the cross-linking activity, and should be the most apparent consequence of positive selection. The protein has been adapted to functioning in skin mucus that is in direct contact with surrounding environments by an enhancement in cross-linking activity. The structure of congerin I demonstrates the emergence of a new structure class by accelerated evolution under selection pressure.
Collapse
Affiliation(s)
- T Shirai
- Department of Biotechnology and Biomaterial Chemistry Graduate School of Engineering, Nagoya University, Chikusa-Ku, Nagoya, 464-8603, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Miyamoto Y, Itoh K. Design of cluster-specific 16S rDNA oligonucleotide probes to identify bacteria of the Bacteroides subgroup harbored in human feces. FEMS Microbiol Lett 1999; 177:143-9. [PMID: 10436932 DOI: 10.1111/j.1574-6968.1999.tb13725.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
To develop a new simple method for identification of bacteria in the Bacteroides subgroup isolated from human feces, we designed a panel of four 16S rDNA-targeted oligonucleotide probes specific for each cluster of the Bacteroides subgroup. The probes Bac and bacvul were targeted to the Bacteroides cluster, and the probes Pre and Por have their target regions characteristic to those of the Prevotella cluster and the Porphyromonas cluster, respectively. The probes presented in this work were constructed to be specific for reference strains in each cluster of the Bacteroides subgroup and were not cross-hybridized with other major intestinal bacteria. The use of combination of these four probes will faciliate the identification of the clusters of the Bacteroides subgroup harbored in the intestine as compared to biological and biochemical testings.
Collapse
Affiliation(s)
- Y Miyamoto
- Laboratory of Veterinary Public Health, Graduate School of Agriculture and Life Science, University of Tokyo, Japan
| | | |
Collapse
|
13
|
Pedersen AG, Baldi P, Chauvin Y, Brunak S. The biology of eukaryotic promoter prediction--a review. COMPUTERS & CHEMISTRY 1999; 23:191-207. [PMID: 10404615 DOI: 10.1016/s0097-8485(99)00015-7] [Citation(s) in RCA: 136] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Computational prediction of eukaryotic promoters from the nucleotide sequence is one of the most attractive problems in sequence analysis today, but it is also a very difficult one. Thus, current methods predict in the order of one promoter per kilobase in human DNA, while the average distance between functional promoters has been estimated to be in the range of 30-40 kilobases. Although it is conceivable that some of these predicted promoters correspond to cryptic initiation sites that are used in vivo, it is likely that most are false positives. This suggests that it is important to carefully reconsider the biological data that forms the basis of current algorithms, and we here present a review of data that may be useful in this regard. The review covers the following topics: (1) basal transcription and core promoters, (2) activated transcription and transcription factor binding sites, (3) CpG islands and DNA methylation, (4) chromosomal structure and nucleosome modification, and (5) chromosomal domains and domain boundaries. We discuss the possible lessons that may be learned, especially with respect to the wealth of information about epigenetic regulation of transcription that has been appearing in recent years.
Collapse
Affiliation(s)
- A G Pedersen
- Department of Biotechnology, Technical University of Denmark, Lyngby, Denmark.
| | | | | | | |
Collapse
|
14
|
Wolfsberg TG, Madden TL. Sequence Similarity Searching Using the
BLAST
Family of Programs. ACTA ACUST UNITED AC 1999; Chapter 2:Unit2.5. [DOI: 10.1002/0471140864.ps0205s15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tyra G. Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| | - Thomas L. Madden
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| |
Collapse
|
15
|
Sanchez C, Lachaize C, Janody F, Bellon B, Röder L, Euzenat J, Rechenmann F, Jacq B. Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database. Nucleic Acids Res 1999; 27:89-94. [PMID: 9847149 PMCID: PMC148104 DOI: 10.1093/nar/27.1.89] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
FlyNets (http://gifts.univ-mrs.fr/FlyNets/FlyNets_home_page.++ +html) is a WWW database describing molecular interactions (protein-DNA, protein-RNA and protein-protein) in the fly Drosophila melanogaster. It is composed of two parts, as follows. (i) FlyNets-base is a specialized database which focuses on molecular interactions involved in Drosophila development. The information content of FlyNets-base is distributed among several specific lines arranged according to a GenBank-like format and grouped into five thematic zones to improve human readability. The FlyNets database achieves a high level of integration with other databases such as FlyBase, EMBL, GenBank and SWISS-PROT through numerous hyperlinks. (ii) FlyNets-list is a very simple and more general databank, the long-term goal of which is to report on any published molecular interaction occuring in the fly, giving direct web access to corresponding s in Medline and in FlyBase. In the context of genome projects, databases describing molecular interactions and genetic networks will provide a link at the functional level between the genome, the proteome and the transcriptome worlds of different organisms. Interaction databases therefore aim at describing the contents, structure, function and behaviour of what we herein define as the interactome world.
Collapse
Affiliation(s)
- C Sanchez
- Laboratoire de Génétique et Physiologie du Développement, IBDM, Parc Scientifique de Luminy, CNRS Case 907, 13288 Marseille Cedex 09, France
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Lefranc MP, Giudicelli V, Ginestoux C, Bodmer J, Müller W, Bontrop R, Lemaitre M, Malik A, Barbié V, Chaume D. IMGT, the international ImMunoGeneTics database. Nucleic Acids Res 1999; 27:209-12. [PMID: 9847182 PMCID: PMC148137 DOI: 10.1093/nar/27.1.209] [Citation(s) in RCA: 313] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMGT, the international ImMunoGeneTics database (http://imgt.cnusc. fr:8104), is a high-quality integrated database specialising in Immunoglobulins (Ig), T cell Receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species, created in 1989 by Marie-Paule Lefranc, Université Montpellier II, CNRS, Montpellier, France (lefranc@ligm.igh.cnrs.fr). IMGT comprises three databases: LIGM-DB, a comprehensive database of Ig and TcR, MHC/HLA-DB, and PRIMER-DB (the last two in development); a tool, IMGT/DNAPLOT, developed for sequence analysis and alignments; and expertised data based on the IMGT scientific chart, the IMGT repertoire. By its high quality and its easy data distribution, IMGT has important implications in medical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas), therapeutic approaches (antibody engineering), genome diversity and genome evolution studies. IMGT is freely available at http://imgt.cnusc. fr:8104
Collapse
Affiliation(s)
- M P Lefranc
- Laboratoire d'ImmunoGénétique Moléculaire, Université Montpellier II, UPR CNRS 1142 IGH, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BF, Rapp BA, Wheeler DL. GenBank. Nucleic Acids Res 1999; 27:12-7. [PMID: 9847132 PMCID: PMC148087 DOI: 10.1093/nar/27.1.12] [Citation(s) in RCA: 436] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov
Collapse
Affiliation(s)
- D A Benson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Since the obtention of the complete sequence of Haemophilus influenzae Rd in 1995, the number of bacterial genomes entirely sequenced has regularly increased. A problem is that the quality of the annotations of these very large sequences is usually lower than those of the shorter entries encountered in the repository collections. Moreover, classical sequence database management systems have difficulties in handling entries of that size. In this context, we have decided to build the Enhanced Microbial Genomes Library (EMGLib) in which these two problems are alleviated. This library contains all the complete genomes from bacteria already sequenced and the yeast genome in GenBank format. The annotations are improved by the introduction of data on codon usage, gene orientation on the chromosome and gene families. It is possible to access EMGLib through two database systems set up on World Wide Web servers: the PBIL server at http://pbil.univ-lyon1.fr/emglib/emglib. html and the MICADO server at http://locus.jouy.inra.fr/micado
Collapse
Affiliation(s)
- G Perrière
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard - Lyon 1, 43, boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
19
|
Ringwald M, Mangan ME, Eppig JT, Kadin JA, Richardson JE. GXD: a gene expression database for the laboratory mouse. The Gene Expression Database Group. Nucleic Acids Res 1999; 27:106-12. [PMID: 9847152 PMCID: PMC148107 DOI: 10.1093/nar/27.1.106] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene Expression Database (GXD) is a community resource that stores and integrates expression information for the laboratory mouse, with a particular emphasis on mouse development, and makes these data freely available in formats appropriate for comprehensive analysis. GXD is implemented as a relational database and integrated with the Mouse Genome Database (MGD) to enable global analysis of genotype, expression and phenotype information. Interconnections with sequence databases and with databases from other species further extend GXD's utility for the analysis of gene expression data. GXD is available through the Mouse Genome Informatics Web Site at http://www.informatics.jax.org/
Collapse
Affiliation(s)
- M Ringwald
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA.
| | | | | | | | | |
Collapse
|
20
|
Mohr E, Horn F, Janody F, Sanchez C, Pillet V, Bellon B, Röder L, Jacq B. FlyNets and GIF-DB, two internet databases for molecular interactions in Drosophila melanogaster. Nucleic Acids Res 1998; 26:89-93. [PMID: 9399807 PMCID: PMC147170 DOI: 10.1093/nar/26.1.89] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
GIF-DB and FlyNets are two WWW databases describing molecular (protein-DNA, protein-RNA and protein-protein) interactions occuring in the fly Drosophila melanogaster (http://gifts.univ-mrs.fr/GIFTS_home_page.html ). GIF-DB is a specialised database which focuses on molecular interactions involved in the process of embryonic pattern formation, whereas FlyNets is a new and more general database, the long-term goal of which is to report on any published molecular interaction occuring in the fly. The information content of both databases is distributed in specific lines arranged into an EMBL- (or GenBank-) like format. These databases achieve a high level of integration with other databases such as FlyBase, EMBL, GenBank and SWISS-PROT through numerous hyperlinks. In addition, we also describe SOS-DGDB, a new collection of annotated Drosophila gene sequences, in which binding sites for regulatory proteins are directly visible on the DNA primary sequence and hyperlinked both to GIF-DB and TRANSFAC database entries.
Collapse
Affiliation(s)
- E Mohr
- Laboratoire de Génétique et Physiologie du Développement, IBDM, Parc Scientifique de Luminy, CNRS Case 907, 13288 Marseille Cedex 09, France
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Lefranc MP, Giudicelli V, Busin C, Bodmer J, Müller W, Bontrop R, Lemaitre M, Malik A, Chaume D. IMGT, the International ImMunoGeneTics database. Nucleic Acids Res 1998; 26:297-303. [PMID: 9399859 PMCID: PMC147225 DOI: 10.1093/nar/26.1.297] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
IMGT, the international ImMunoGeneTics database, is an integrated database specialising in Immunoglobulins (Ig), T cell Receptors (TcR) and Major Histocompatibility Complex (MHC) of all vertebrate species, created by Marie-Paule Lefranc, CNRS, Montpellier II University, Montpellier, France (lefranc@ligm.crbm.cnrs-mop.fr). IMGT includes three databases: LIGM-DB (for Ig and TcR), MHC/HLA-DB and PRIMER-DB (the last two in development). IMGT comprises expertly annotated sequences and alignment tables. LIGM-DB contains more than 23 000 Immunoglobulin and T cell Receptor sequences from 78 species. MHC/HLA-DB contains Class I and Class II Human Leucocyte Antigen alignment tables. An IMGT tool, DNAPLOT, developed for Ig, TcR and MHC sequence alignments, is also available. IMGT works in close collaboration with the EMBL database. IMGT goals are to establish a common data access to all immunogenetics data, including nucleotide and protein sequences, oligonucleotide primers, gene maps and other genetic data of Ig, TcR and MHC molecules, and to provide a graphical user friendly data access. IMGT has important implications in medical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas), therapeutical approaches (antibody engineering), genome diversity and genome evolution studies. IMGT is freely available at http://imgt.cnusc.fr:8104
Collapse
Affiliation(s)
- M P Lefranc
- Laboratoire d'ImmunoGénétique Moléculaire, LIGM, UMR 5535 (CNRS, Université Montpellier II), 1919 route de Mende, 34293 Montpellier Cedex 5, France. lefranc.ligm.crbm.cnrs-mop.fr
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Perrière G, Gouy M, Gojobori T. The non-redundant Bacillus subtilis (NRSub) database: update 1998. Nucleic Acids Res 1998; 26:60-2. [PMID: 9399801 PMCID: PMC147236 DOI: 10.1093/nar/26.1.60] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The non-redundant Bacillus subtilis database (NRSub) has been developed in the context of the sequencing project devoted to this bacterium. As this project has reached completion, the whole genome is now available as a single contig. Thanks to the ACNUC database management system and its associated retrieval system Query_win, each functional region of the genome can be accessed individually. Extra annotations have been added such as accession numbers for the genes, locations on the genetic map, codon adaptation index values, as well as cross-references with other collections. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access NRSub through two dedicated World Wide Web servers located in France (http://acnuc. univ-lyon1.fr/nrsub/nrsub.html ) and in Japan (http://ddbjs4h.genes. nig.ac.jp/ ).
Collapse
Affiliation(s)
- G Perrière
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Lyon 1, 43 boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|