1
|
The 2024 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2024; 52:D1-D9. [PMID: 38035367 PMCID: PMC10767945 DOI: 10.1093/nar/gkad1173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 12/02/2023] Open
Abstract
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven. Nucleic acid databases include the new NAKB for structural information and updates from Genbank, ENA, GEO, Tarbase and JASPAR. The Issue's Breakthrough Article concerns NMPFamsDB for novel prokaryotic protein families and the AlphaFold Protein Structure Database has an important update. Metabolism is covered by updates from Reactome, Wikipathways and Metabolights. Microbes are covered by RefSeq, UNITE, SPIRE and P10K; viruses by ViralZone and PhageScope. Medically-oriented databases include the familiar COSMIC, Drugbank and TTD. Genomics-related resources include Ensembl, UCSC Genome Browser and Monarch. New arrivals cover plant imaging (OPIA and PlantPAD) and crop plants (SoyMD, TCOD and CropGS-Hub). The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Over the last year the NAR online Molecular Biology Database Collection has been updated, reviewing 1060 entries, adding 97 new resources and eliminating 388 discontinued URLs bringing the current total to 1959 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
|
2
|
RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes. Nucleic Acids Res 2024; 52:D762-D769. [PMID: 37962425 PMCID: PMC10767926 DOI: 10.1093/nar/gkad988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains over 315 000 bacterial and archaeal genomes and 236 million proteins with up-to-date and consistent annotation. In the past 3 years, we have expanded the diversity of the RefSeq collection by including the best quality metagenome-assembled genomes (MAGs) submitted to INSDC (DDBJ, ENA and GenBank), while maintaining its quality by adding validation checks. Assemblies are now more stringently evaluated for contamination and for completeness of annotation prior to acceptance into RefSeq. MAGs now account for over 17000 assemblies in RefSeq, split over 165 orders and 362 families. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP), which is used to annotate nearly all RefSeq assemblies include better detection of protein-coding genes. Nearly 83% of RefSeq proteins are now named by a curated Protein Family Model, a 4.7% increase in the past three years ago. In addition to literature citations, Enzyme Commission numbers, and gene symbols, Gene Ontology terms are now assigned to 48% of RefSeq proteins, allowing for easier multi-genome comparison. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/. PGAP is available as a stand-alone tool able to produce GenBank-ready files at https://github.com/ncbi/pgap.
Collapse
|
3
|
Abstract
GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 420 000 formally described species. Most GenBank submissions are made using BankIt, the NCBI Submission Portal, or the tool tbl2asn, and are obtained from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include an expansion of sequence identifier formats to accommodate expected database growth, submission wizards for ribosomal RNA, and the transfer of Expressed Sequence Tag (EST) and Genome Survey Sequence (GSS) data into the Nucleotide database.
Collapse
|
4
|
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2018; 46:D1-D7. [PMID: 29316735 PMCID: PMC5753253 DOI: 10.1093/nar/gkx1235] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 11/29/2017] [Indexed: 12/20/2022] Open
Abstract
The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
|
5
|
The UNITE database for molecular identification of fungi--recent updates and future perspectives. THE NEW PHYTOLOGIST 2010; 186:281-5. [PMID: 20409185 DOI: 10.1111/j.1469-8137.2009.03160.x] [Citation(s) in RCA: 972] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
|
6
|
|
7
|
[International collaboration among DDBJ, EMBL Bank and GenBank]. TANPAKUSHITSU KAKUSAN KOSO. PROTEIN, NUCLEIC ACID, ENZYME 2008; 53:182-189. [PMID: 18240597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
|
8
|
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk/. Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.
Collapse
|
9
|
From hype to mothballs in four years: troubles in the development of large-scale DNA biobanks in Europe. Public Health Genomics 2006; 9:184-9. [PMID: 16741348 DOI: 10.1159/000092655] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
This paper analyses the difficulties experienced by three large European DNA biobanks. The first, Icelandic-based deCode, generated immense commercial interest and intense ethical controversy. As a biotechnology company, deCode succeeded, but the Icelandic Health Sector Data Base failed. The second firm, Swedish UmanGenomics, marketed itself as the 'ethical' biotech company. Management problems including the inadequate recognition of intellectual property issues led to the company failing to secure adequate investment. The third and largest, UK Biobank, has, as a non-profit organization, not experienced these problems. But when the product - bio information--is marketed, the issue of ethically acceptable purchasers could well become contentious.
Collapse
|
10
|
|
11
|
Pointing the way. Int J Immunogenet 2006; 33:151. [PMID: 16712642 DOI: 10.1111/j.1744-313x.2006.00601.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Abstract
OBJECT In this report the authors review the role of bioinformatics in the design of a research project in which the molecular genetics of malignant glioma were studied. A project to characterize Pokemon expression in malignant glioma was developed, refined, and implemented using bioinformatics methods. METHODS Using the resources available from the National Center for Biotechnology Information, the messenger RNA (mRNA) sequence for Pokemon was determined. With this information and online primer design tools, novel primers were designed that would specifically amplify Pokemon mRNA by using reverse transcription-polymerase chain reaction assays. CONCLUSIONS The promise of bioinformatics is in the rapid and widespread dissemination and analysis of genomic information. This information is then used in research investigating the genetic basis of disease. In this paper the authors review the bioinformatics methods used in their study of Pokemon expression in malignant glioma.
Collapse
|
13
|
Nucleic acid sequence data turns 100,000,000,000 and looks to the future. J Clin Invest 2005; 115:2588. [PMID: 16200189 PMCID: PMC1236709 DOI: 10.1172/jci26755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
14
|
Abstract
In the past year, we at DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp) collected and released 1 066 084 entries or 718 072 425 bases including the whole chromosome 22 of chimpanzee, the whole-genome shotgun sequences of silkworm and various others. On the other hand, we hosted workshops for human full-length cDNA annotation and participated in jamborees of mouse full-length cDNA annotation. The annotated data are made public at DDBJ. We are also in collaboration with a RIKEN team to accept and release the CAGE (Cap Analysis Gene Expression) data under a new category, MGA (Mass Sequences for Genome Annotation). The data will be useful for studying gene expression control in many aspects.
Collapse
|
15
|
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
Collapse
|
16
|
Abstract
Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focuses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.
Collapse
|
17
|
Genomic resources for ascidians: sequence/expression databases and genome projects. Methods Cell Biol 2005; 74:759-74. [PMID: 15575630 DOI: 10.1016/s0091-679x(04)74031-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
|
18
|
Abstract
BACKGROUND The proliferation of structural and functional studies of RNA has revealed an increasing range of RNA's structural repertoire. Toward the objective of systematic cataloguing of RNA's structural repertoire, we have recently described the basis of a graphical approach for organizing RNA secondary structures, including existing and hypothetical motifs. DESCRIPTION We now present an RNA motif database based on graph theory, termed RAG for RNA-As-Graphs, to catalogue and rank all theoretically possible, including existing, candidate and hypothetical, RNA secondary motifs. The candidate motifs are predicted using a clustering algorithm that classifies RNA graphs into RNA-like and non-RNA groups. All RNA motifs are filed according to their graph vertex number (RNA length) and ranked by topological complexity. CONCLUSIONS RAG's quantitative cataloguing allows facile retrieval of all classes of RNA secondary motifs, assists identification of structural and functional properties of user-supplied RNA sequences, and helps stimulate the search for novel RNAs based on predicted candidate motifs.
Collapse
|
19
|
A plea for “Omics” research in complex diseases such as multiple sclerosis—a change of mind is needed. J Neurol Sci 2004; 222:3-5. [PMID: 15240188 DOI: 10.1016/j.jns.2004.02.013] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2004] [Accepted: 02/24/2004] [Indexed: 11/26/2022]
|
20
|
Carpe diem. Retooling the publish or perish model into the share and survive model. PLANT PHYSIOLOGY 2004; 134:543-7. [PMID: 14966244 PMCID: PMC523887 DOI: 10.1104/pp.103.035907] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2003] [Revised: 11/14/2003] [Accepted: 11/14/2003] [Indexed: 05/18/2023]
|
21
|
Abstract
The advent of structural genomics presents new challenges to the archive of biomacromolecular structures--the Protein Data Bank (PDB). As technologies involved in structure determination have advanced, both the number and size of structures available in the PDB have increased rapidly. The structural genomics initiatives are creating a large amount of data that needs to be tracked, archived, and made easily available. The PDB has developed tools to facilitate the rapid deposition of data produced by the structural genomics initiatives and has created databases to track the progress of the work.
Collapse
|
22
|
|
23
|
Abstract
BACKGROUND The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.
Collapse
|
24
|
MatGAT: an application that generates similarity/identity matrices using protein or DNA sequences. BMC Bioinformatics 2003; 4:29. [PMID: 12854978 PMCID: PMC166169 DOI: 10.1186/1471-2105-4-29] [Citation(s) in RCA: 649] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2003] [Accepted: 07/10/2003] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The rapid increase in the amount of protein and DNA sequence information available has become almost overwhelming to researchers. So much information is now accessible that high-quality, functional gene analysis and categorization has become a major goal for many laboratories. To aid in this categorization, there is a need for non-commercial software that is able to both align sequences and also calculate pairwise levels of similarity/identity. RESULTS We have developed MatGAT (Matrix Global Alignment Tool), a simple, easy to use computer application that generates similarity/identity matrices for DNA or protein sequences without needing pre-alignment of the data. CONCLUSIONS The advantages of this program over other software are that it is open-source freeware, can analyze a large number of sequences simultaneously, can visualize both sequence alignment and similarity/identity values concurrently, employs global alignment in calculations, and has been formatted to run under both the Unix and the Microsoft Windows Operating Systems. We are presently completing the Macintosh-based version of the program.
Collapse
|
25
|
The nucleic acid database. METHODS OF BIOCHEMICAL ANALYSIS 2003; 44:199-216. [PMID: 12688301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
26
|
Other structure-based databases. METHODS OF BIOCHEMICAL ANALYSIS 2003; 44:217-36. [PMID: 12647388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
27
|
Abstract
Gene-mapping studies that look for complex traits among human populations have deepened our understanding of disease causes, but do they hold promise for identifying drug targets?
Collapse
|
28
|
SeqVISTA: a graphical tool for sequence feature visualization and comparison. BMC Bioinformatics 2003; 4:1. [PMID: 12513700 PMCID: PMC140037 DOI: 10.1186/1471-2105-4-1] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2002] [Accepted: 01/04/2003] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite strand? Subsequently, your favorite sequence analysis software informs you that there is an interesting feature at position 13982-14013. By painstakingly counting the 10 bp blocks, you are able to examine the sequence at this location. But now you want to see what other features have been annotated close by, and this information is buried several screenfuls higher up the web page. RESULTS SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet. Our viewer may be launched on a GenBank record by a single click of a button installed in the web browser. CONCLUSION SeqVISTA allows insights to be gained by viewing the totality of sequence annotations and predictions, which may be more revealing than the sum of their parts. SeqVISTA runs on any operating system with a Java 1.4 virtual machine. It is freely available to academic users at http://zlab.bu.edu/SeqVISTA.
Collapse
|
29
|
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
Collapse
|
30
|
|
31
|
|
32
|
Exploiting big biology: integrating large-scale biological data for function inference. Brief Bioinform 2001; 2:363-74. [PMID: 11808748 DOI: 10.1093/bib/2.4.363] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The amount of data produced by molecular biologists is growing at an exponential rate. Some of the fastest growing sets of data are measurements of gene expression, comparable in quantity only to gene sequences and the vast biological literature. Both gene expression data and sequence data offer hints as to the functions of thousands of newly discovered genes, but neither give complete answers. Therefore, much effort is being focused on integrating these large data sets and combining them with all available functional data to draw inferences about the functions of uncharacterised genes. This review discusses the most pertinent functional data for genome-wide functional inference and describes several methods by which these disparate data types are being integrated.
Collapse
|
33
|
Risking ethical insolvency: a survey of trends in criminal DNA databanking. THE JOURNAL OF LAW, MEDICINE & ETHICS : A JOURNAL OF THE AMERICAN SOCIETY OF LAW, MEDICINE & ETHICS 2000; 28:209-223. [PMID: 11210371 DOI: 10.1111/j.1748-720x.2000.tb00661.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Over ten years have elapsed since Virginia passed the nation's first criminal DNA banking law, which authorized law enforcement authorities to collect DNA samples from certain categories of offenders for the purposes of performing profile analysis. Within nine years, Rhode Island became the fiftieth state to enact a similar statute. The passage of a decade since the first enactment provides a convenient opportunity to assess the strengths and weaknesses of ethical safeguards under present law as well as predict the likely direction of future developments.DNA forensics are merely the latest in a long line of biologically based identifying law enforcement technologies that include fingerprints and serotyping. Nevertheless, DNA has properties that make it significantly different than its predecessors with respect to the ethical and social concerns it raises.
Collapse
|