1
|
Maxson Jones K, Ankeny RA, Cook-Deegan R. The Bermuda Triangle: The Pragmatics, Policies, and Principles for Data Sharing in the History of the Human Genome Project. JOURNAL OF THE HISTORY OF BIOLOGY 2018; 51:693-805. [PMID: 30390178 PMCID: PMC7307446 DOI: 10.1007/s10739-018-9538-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The Bermuda Principles for DNA sequence data sharing are an enduring legacy of the Human Genome Project (HGP). They were adopted by the HGP at a strategy meeting in Bermuda in February of 1996 and implemented in formal policies by early 1998, mandating daily release of HGP-funded DNA sequences into the public domain. The idea of daily sharing, we argue, emanated directly from strategies for large, goal-directed molecular biology projects first tested within the "community" of C. elegans researchers, and were introduced and defended for the HGP by the nematode biologists John Sulston and Robert Waterston. In the C. elegans community, and subsequently in the HGP, daily sharing served the pragmatic goals of quality control and project coordination. Yet in the HGP human genome, we also argue, the Bermuda Principles addressed concerns about gene patents impeding scientific advancement, and were aspirational and flexible in implementation and justification. They endured as an archetype for how rapid data sharing could be realized and rationalized, and permitted adaptation to the needs of various scientific communities. Yet in addition to the support of Sulston and Waterston, their adoption also depended on the clout of administrators at the US National Institutes of Health (NIH) and the UK nonprofit charity the Wellcome Trust, which together funded 90% of the HGP human sequencing effort. The other nations wishing to remain in the HGP consortium had to accommodate to the Bermuda Principles, requiring exceptions from incompatible existing or pending data access policies for publicly funded research in Germany, Japan, and France. We begin this story in 1963, with the biologist Sydney Brenner's proposal for a nematode research program at the Laboratory of Molecular Biology (LMB) at the University of Cambridge. We continue through 2003, with the completion of the HGP human reference genome, and conclude with observations about policy and the historiography of molecular biology.
Collapse
Affiliation(s)
- Kathryn Maxson Jones
- Department of History, Princeton University, Princeton, NJ, USA.
- MBL McDonnell Foundation Scholar, Marine Biological Laboratory, Woods Hole, MA, USA.
| | - Rachel A Ankeny
- School of Humanities, The University of Adelaide, Adelaide, Australia
| | - Robert Cook-Deegan
- School for the Future of Innovation in Society, Consortium for Science, Policy & Outcomes, Arizona State University, Barrett & O'Connor Washington Center, Washington, D.C., USA
| |
Collapse
|
2
|
Cook-Deegan R, Ankeny RA, Maxson Jones K. Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance. Annu Rev Genomics Hum Genet 2017; 18:389-415. [PMID: 28415857 PMCID: PMC5634517 DOI: 10.1146/annurev-genom-083115-022515] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The Human Genome Project modeled its open science ethos on nematode biology, most famously through daily release of DNA sequence data based on the 1996 Bermuda Principles. That open science philosophy persists, but daily, unfettered release of data has had to adapt to constraints occasioned by the use of data from individual people, broader use of data not only by scientists but also by clinicians and individuals, the global reach of genomic applications and diverse national privacy and research ethics laws, and the rising prominence of a diverse commercial genomics sector. The Global Alliance for Genomics and Health was established to enable the data sharing that is essential for making meaning of genomic variation. Data-sharing policies and practices will continue to evolve as researchers, health professionals, and individuals strive to construct a global medical and scientific information commons.
Collapse
Affiliation(s)
- Robert Cook-Deegan
- School for the Future of Innovation in Society, Arizona State University, Washington, DC 20009;
| | - Rachel A Ankeny
- School of Humanities, University of Adelaide, Adelaide, South Australia 5005, Australia
| | - Kathryn Maxson Jones
- Program in History of Science, Department of History, Princeton University, Princeton, New Jersey 08544
| |
Collapse
|
3
|
Abstract
Genomics and human genetics are scientifically fundamental and commercially valuable. These fields grew to prominence in an era of growth in government and nonprofit research funding, and of even greater growth of privately funded research and development in biotechnology and pharmaceuticals. Patents on DNA technologies are a central feature of this story, illustrating how patent law adapts-and sometimes fails to adapt-to emerging genomic technologies. In instrumentation and for therapeutic proteins, patents have largely played their traditional role of inducing investment in engineering and product development, including expensive post-discovery clinical research to prove safety and efficacy. Patents on methods and DNA sequences relevant to clinical genetic testing show less evidence of benefits and more evidence of problems and impediments, largely attributable to university exclusive licensing practices. Whole-genome sequencing will confront uncertainty about infringing granted patents, but jurisprudence trends away from upholding the broadest and potentially most troublesome patent claims.
Collapse
Affiliation(s)
- Robert Cook-Deegan
- Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|
4
|
Murray D, Doran P, MacMathuna P, Moss AC. In silico gene expression analysis--an overview. Mol Cancer 2007; 6:50. [PMID: 17683638 PMCID: PMC1964762 DOI: 10.1186/1476-4598-6-50] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2007] [Accepted: 08/07/2007] [Indexed: 12/18/2022] Open
Abstract
Efforts aimed at deciphering the molecular basis of complex disease are underpinned by the availability of high throughput strategies for the identification of biomolecules that drive the disease process. The completion of the human genome-sequencing project, coupled to major technological developments, has afforded investigators myriad opportunities for multidimensional analysis of biological systems. Nowhere has this research explosion been more evident than in the field of transcriptomics. Affordable access and availability to the technology that supports such investigations has led to a significant increase in the amount of data generated. As most biological distinctions are now observed at a genomic level, a large amount of expression information is now openly available via public databases. Furthermore, numerous computational based methods have been developed to harness the power of these data. In this review we provide a brief overview of in silico methodologies for the analysis of differential gene expression such as Serial Analysis of Gene Expression and Digital Differential Display. The performance of these strategies, at both an operational and result/output level is assessed and compared. The key considerations that must be made when completing an in silico expression analysis are also presented as a roadmap to facilitate biologists. Furthermore, to highlight the importance of these in silico methodologies in contemporary biomedical research, examples of current studies using these approaches are discussed. The overriding goal of this review is to present the scientific community with a critical overview of these strategies, so that they can be effectively added to the tool box of biomedical researchers focused on identifying the molecular mechanisms of disease.
Collapse
Affiliation(s)
- David Murray
- General Clinical Research Unit, UCD School of Medicine and Medical Sciences, Mater Misericordiae University Hospital, Dublin 7, Ireland
| | - Peter Doran
- General Clinical Research Unit, UCD School of Medicine and Medical Sciences, Mater Misericordiae University Hospital, Dublin 7, Ireland
| | - Padraic MacMathuna
- Gastrointestinal Unit, Mater Misericordiae University Hospital, Dublin 7, Ireland
| | - Alan C Moss
- Division of Gastroenterology, Beth Israel Deaconess Medical Center, 330 Brookline Ave, Boston, MA 02215, USA
| |
Collapse
|
5
|
Abstract
The "science commons," knowledge that is widely accessible at low or no cost, is a uniquely important input to scientific advance and cumulative technological innovation. It is primarily, although not exclusively, funded by government and nonprofit sources. Much of it is produced at academic research centers, although some academic science is proprietary and some privately funded R&D enters the science commons. Science in general aspires to Mertonian norms of openness, universality, objectivity, and critical inquiry. The science commons diverges from proprietary science primarily in being open and being very broadly available. These features make the science commons particularly valuable for advancing knowledge, for training innovators who will ultimately work in both public and private sectors, and in providing a common stock of knowledge upon which all players-both public and private-can draw readily. Open science plays two important roles that proprietary R&D cannot: it enables practical benefits even in the absence of profitable markets for goods and services, and its lays a shared foundation for subsequent private R&D. The history of genomics in the period 1992-2004, covering two periods when genomic startup firms attracted significant private R&D investment, illustrates these features of how a science commons contributes value. Commercial interest in genomics was intense during this period. Fierce competition between private sector and public sector genomics programs was highly visible. Seemingly anomalous behavior, such as private firms funding "open science," can be explained by unusual business dynamics between established firms wanting to preserve a robust science commons to prevent startup firms from limiting established firms' freedom to operate. Deliberate policies to create and protect a large science commons were pursued by nonprofit and government funders of genomics research, such as the Wellcome Trust and National Institutes of Health. These policies were crucial to keeping genomic data and research tools widely available at low cost.
Collapse
Affiliation(s)
- Robert Cook-Deegan
- Center for Genome Ethics, Law & Policy, Institute for Genome Sciences & Policy and Sanford Institute of Public Policy and Duke Medical School, Duke University, 242 North Building, Durham, NC 27708-0141 USA
| |
Collapse
|
6
|
Cook-Deegan R, Dedeurwaerdere T. The science commons in life science research: structure, function, and value of access to genetic diversity. ACTA ACUST UNITED AC 2006; 58:299-317. [PMID: 32336774 PMCID: PMC7165960 DOI: 10.1111/j.1468-2451.2006.00620.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Innovation in the life sciences depends on how much information is produced as well as how widely and easily it is shared. Policies governing the science commons – or alternative, more restricted informational spaces – determine how widely and quickly information is distributed. The purpose of this paper is to highlight why the science commons matters and to analyse its structure and function. The main lesson from our analysis is that both the characteristics of the physical resources (from genes to microbes, plants and animals) and the norms and beliefs of the different research communities – think of the Bermuda rules in the human genome case or the Belem declaration for bioprospecting – matter in the institutional choices made when organising the science commons. We also show that the science commons contributes to solving some of the collective action dilemmas that arise in the production of knowledge in Pasteur's Quadrant, when information is both scientifically important and practically applicable. We show the importance of two of these dilemmas for the life sciences, which we call respectively the diffusion–innovation dilemma (how readily innovation diffuses) and the exploration–exploitation dilemma (when application requires collective action).
Collapse
|
7
|
Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MSH, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YSN, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Morrin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJM, Holt RA, Baross A, Marra MA, Clifton S, Makowski KA, Bosak S, Malek J. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res 2004; 14:2121-7. [PMID: 15489334 PMCID: PMC528928 DOI: 10.1101/gr.2596504] [Citation(s) in RCA: 405] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
Collapse
|
8
|
Boguski MS, Jones AR. Neurogenomics: at the intersection of neurobiology and genome sciences. Nat Neurosci 2004; 7:429-33. [PMID: 15114353 DOI: 10.1038/nn1232] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Neurogenomics is the study of how the genome as a whole contributes to the evolution, development, structure and function of the nervous system. It includes investigations of how genome products (transcriptomes and proteomes) vary in time and space. Neurogenomics differs markedly from the application of genome sciences to other systems, particularly in the spatial category, because anatomy and connectivity are paramount to our understanding of function in the nervous system. We focus here on some of the influences of genomics and its associated technologies on neuroscience. We discuss comparative genomics, gene expression atlases of the brain, network genetics and applications to behavioral phenotypes, and consider the culture, organization and funding of genome-scale projects.
Collapse
Affiliation(s)
- Mark S Boguski
- Allen Institute for Brain Science, 551 N. 34th Street, Seattle, Washington 98103, USA.
| | - Allan R Jones
- Allen Institute for Brain Science, 551 N. 34th Street, Seattle, Washington 98103, USA
| |
Collapse
|
9
|
Sogayar MC, Camargo AA, Bettoni F, Carraro DM, Pires LC, Parmigiani RB, Ferreira EN, de Sá Moreira E, do Rosário D de O Latorre M, Simpson AJG, Cruz LO, Degaki TL, Festa F, Massirer KB, Sogayar MC, Filho FC, Camargo LP, Cunha MAV, De Souza SJ, Faria M, Giuliatti S, Kopp L, de Oliveira PSL, Paiva PB, Pereira AA, Pinheiro DG, Puga RD, S de Souza JE, Albuquerque DM, Andrade LEC, Baia GS, Briones MRS, Cavaleiro-Luna AMS, Cerutti JM, Costa FF, Costanzi-Strauss E, Espreafico EM, Ferrasi AC, Ferro ES, Fortes MAHZ, Furchi JRF, Giannella-Neto D, Goldman GH, Goldman MHS, Gruber A, Guimarães GS, Hackel C, Henrique-Silva F, Kimura ET, Leoni SG, Macedo C, Malnic B, Manzini B CV, Marie SKN, Martinez-Rossi NM, Menossi M, Miracca EC, Nagai MA, Nobrega FG, Nobrega MP, Oba-Shinjo SM, Oliveira MK, Orabona GM, Otsuka AY, Paço-Larson ML, Paixão BMC, Pandolfi JRC, Pardini MIMC, Passos Bueno MR, Passos GAS, Pesquero JB, Pessoa JG, Rahal P, Rainho CA, Reis CP, Ricca TI, Rodrigues V, Rogatto SR, Romano CM, Romeiro JG, Rossi A, Sá RG, Sales MM, Sant'Anna SC, Santarosa PL, Segato F, Silva WA, Silva IDCG, Silva NP, Soares-Costa A, Sonati MF, Strauss BE, Tajara EH, Valentini SR, Villanova FE, Ward LS, Zanette DL. A transcript finishing initiative for closing gaps in the human transcriptome. Genome Res 2004; 14:1413-23. [PMID: 15197164 PMCID: PMC442158 DOI: 10.1101/gr.2111304] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2003] [Accepted: 03/12/2004] [Indexed: 11/24/2022]
Abstract
We report the results of a transcript finishing initiative, undertaken for the purpose of identifying and characterizing novel human transcripts, in which RT-PCR was used to bridge gaps between paired EST clusters, mapped against the genomic sequence. Each pair of EST clusters selected for experimental validation was designated a transcript finishing unit (TFU). A total of 489 TFUs were selected for validation, and an overall efficiency of 43.1% was achieved. We generated a total of 59,975 bp of transcribed sequences organized into 432 exons, contributing to the definition of the structure of 211 human transcripts. The structure of several transcripts reported here was confirmed during the course of this project, through the generation of their corresponding full-length cDNA sequences. Nevertheless, for 21% of the validated TFUs, a full-length cDNA sequence is not yet available in public databases, and the structure of 69.2% of these TFUs was not correctly predicted by computer programs. The TF strategy provides a significant contribution to the definition of the complete catalog of human genes and transcripts, because it appears to be particularly useful for identification of low abundance transcripts expressed in a restricted set of tissues as well as for the delineation of gene boundaries and alternatively spliced isoforms.
Collapse
|
10
|
Clark HF, Gurney AL, Abaya E, Baker K, Baldwin D, Brush J, Chen J, Chow B, Chui C, Crowley C, Currell B, Deuel B, Dowd P, Eaton D, Foster J, Grimaldi C, Gu Q, Hass PE, Heldens S, Huang A, Kim HS, Klimowski L, Jin Y, Johnson S, Lee J, Lewis L, Liao D, Mark M, Robbie E, Sanchez C, Schoenfeld J, Seshagiri S, Simmons L, Singh J, Smith V, Stinson J, Vagts A, Vandlen R, Watanabe C, Wieand D, Woods K, Xie MH, Yansura D, Yi S, Yu G, Yuan J, Zhang M, Zhang Z, Goddard A, Wood WI, Godowski P, Gray A. The secreted protein discovery initiative (SPDI), a large-scale effort to identify novel human secreted and transmembrane proteins: a bioinformatics assessment. Genome Res 2003; 13:2265-70. [PMID: 12975309 PMCID: PMC403697 DOI: 10.1101/gr.1293003] [Citation(s) in RCA: 259] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2003] [Accepted: 07/28/2003] [Indexed: 11/24/2022]
Abstract
A large-scale effort, termed the Secreted Protein Discovery Initiative (SPDI), was undertaken to identify novel secreted and transmembrane proteins. In the first of several approaches, a biological signal sequence trap in yeast cells was utilized to identify cDNA clones encoding putative secreted proteins. A second strategy utilized various algorithms that recognize features such as the hydrophobic properties of signal sequences to identify putative proteins encoded by expressed sequence tags (ESTs) from human cDNA libraries. A third approach surveyed ESTs for protein sequence similarity to a set of known receptors and their ligands with the BLAST algorithm. Finally, both signal-sequence prediction algorithms and BLAST were used to identify single exons of potential genes from within human genomic sequence. The isolation of full-length cDNA clones for each of these candidate genes resulted in the identification of >1000 novel proteins. A total of 256 of these cDNAs are still novel, including variants and novel genes, per the most recent GenBank release version. The success of this large-scale effort was assessed by a bioinformatics analysis of the proteins through predictions of protein domains, subcellular localizations, and possible functional roles. The SPDI collection should facilitate efforts to better understand intercellular communication, may lead to new understandings of human diseases, and provides potential opportunities for the development of therapeutics.
Collapse
Affiliation(s)
- Hilary F Clark
- Departments of Bioinformatics, Molecular Biology and Protein Chemistry, Genentech, Inc, South San Francisco, California 94080, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Cerutti JM, Riggins GJ, de Souza SJ. What can digital transcript profiling reveal about human cancers? Braz J Med Biol Res 2003; 36:975-85. [PMID: 12886451 DOI: 10.1590/s0100-879x2003000800003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Important biological and clinical features of malignancy are reflected in its transcript pattern. Recent advances in gene expression technology and informatics have provided a powerful new means to obtain and interpret these expression patterns. A comprehensive approach to expression profiling is serial analysis of gene expression (SAGE), which provides digital information on transcript levels. SAGE works by counting transcripts and storing these digital values electronically, providing absolute gene expression levels that make historical comparisons possible. SAGE produces a comprehensive profile of gene expression and can be used to search for candidate tumor markers or antigens in a limited number of samples. The Cancer Genome Anatomy Project has created a SAGE database of human gene expression levels for many different tumors and normal reference tissues and provides online tools for viewing, comparing, and downloading expression profiles. Digital expression profiling using SAGE and informatics have been useful for identifying genes that have a role in tumor invasion and other aspects of tumor progression.
Collapse
Affiliation(s)
- J M Cerutti
- Laboratório de Endocrinologia Molecular, Divisão de Endocrinologia, Departamento de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brasil
| | | | | |
Collapse
|
12
|
Weeraratna AT. Serial analysis of gene expression (SAGE): advances, analysis and applications to pigment cell research. PIGMENT CELL RESEARCH 2003; 16:183-9. [PMID: 12753384 DOI: 10.1034/j.1600-0749.2003.00042.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
As cells progress from normal to diseased states, they may undergo a series of gene expression changes. Advances in molecular biology allow us to examine a host of these changes at once, in a high throughput fashion. Serial analysis of gene expression (SAGE) allows for the expression profiling of the complete transcriptome of a given cell, and has the potential for identifying novel genes as well as those in low abundance. In this review, we will outline the technique, how one analyzes the massive amounts of data generated, and describe pigment cell libraries currently in the making.
Collapse
Affiliation(s)
- Ashani T Weeraratna
- Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
13
|
|
14
|
Jackson DB, Minch E, Munro RE. Bioinformatics. EXS 2003:31-69. [PMID: 12613171 DOI: 10.1007/978-3-0348-7997-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
15
|
Abstract
The advent of whole-genome data resources--not only sequence but also other genome-scale data collections such as gene expression, protein interaction, and genetic variation--is having two marked, complementary effects on the relatively new discipline of bioinformatics. First, the veritable flood of data is creating a need and demand for new tools for dealing adequately with the deluge, and, second, the unprecedented extent, diversity, and impending completeness of the data sets are creating opportunities for new approaches to discovery based on computational methods.
Collapse
Affiliation(s)
- D B Searls
- Bioinformatics Department, SmithKline Beecham Pharmaceuticals, King of Prussia, Pennsylvania 19406, USA.
| |
Collapse
|
16
|
Schmid EF, James K, Smith DA. The Impact of Technological Advances on Drug Discovery Today. ACTA ACUST UNITED AC 2001. [DOI: 10.1177/009286150103500105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
17
|
Affiliation(s)
- R L Strausberg
- Cancer Genomics Office, National Cancer Institute, Bethesda, MD 20892, USA.
| | | |
Collapse
|
18
|
Camargo AA, Samaia HP, Dias-Neto E, Simão DF, Migotto IA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, Espreafico EM, Habr-Gama A, Giannella-Neto D, Goldman GH, Gruber A, Hackel C, Kimura ET, Maciel RM, Marie SK, Martins EA, Nobrega MP, Paco-Larson ML, Pardini MI, Pereira GG, Pesquero JB, Rodrigues V, Rogatto SR, da Silva ID, Sogayar MC, Sonati MF, Tajara EH, Valentini SR, Alberto FL, Amaral ME, Aneas I, Arnaldi LA, de Assis AM, Bengtson MH, Bergamo NA, Bombonato V, de Camargo ME, Canevari RA, Carraro DM, Cerutti JM, Correa ML, Correa RF, Costa MC, Curcio C, Hokama PO, Ferreira AJ, Furuzawa GK, Gushiken T, Ho PL, Kimura E, Krieger JE, Leite LC, Majumder P, Marins M, Marques ER, Melo AS, Melo MB, Mestriner CA, Miracca EC, Miranda DC, Nascimento AL, Nobrega FG, Ojopi EP, Pandolfi JR, Pessoa LG, Prevedel AC, Rahal P, Rainho CA, Reis EM, Ribeiro ML, da Ros N, de Sa RG, Sales MM, Sant'anna SC, dos Santos ML, da Silva AM, da Silva NP, Silva WA, da Silveira RA, Sousa JF, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira ES, Nunes DN, Correa RG, Zalcberg H, Carvalho AF, Reis LF, Brentani RR, Simpson AJ, de Souza SJ, Melo M. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome. Proc Natl Acad Sci U S A 2001; 98:12103-8. [PMID: 11593022 PMCID: PMC59775 DOI: 10.1073/pnas.201182798] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Collapse
Affiliation(s)
- A A Camargo
- Ludwig Institute for Cancer Research, 01509-010, São Paulo, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
The Cancer Genome Anatomy Project (CGAP) has built informational, technological, and physical resources to interface genomics with basic and clinical cancer research. The CGAP web site (http://cgap.nci.nih.gov) provides informatics tools for in silico analysis of the CGAP datasets as well as information for accessing each of the CGAP resources. Published in 2001 by John Wiley & Sons, Ltd.
Collapse
|
20
|
Abstract
The recent release of the draft sequence and the eventual completion of the human genome present the scientific community with a rich source of data to mine. Yet, these data are content poor in the absence of additional correlative information. Expressed sequence tag (EST) datasets and their associated gene indices have existed for many years, and represent the first attempt at understanding the complexity of the genome. These datasets remain extremely important as information sources and, in particular, as tools for analyzing the completed genomes. Here, we discuss the nature of ESTs and their associated tools and gene-indexing databases. In particular, we will compare three EST gene indices (UNIGENE, Merck Gene Index Version 2.0 and Doubletwist CAT), discuss how these gene indices are applied for both genome analysis and drug discovery, and demonstrate their importance as a complementary dataset to the annotated human genome.
Collapse
Affiliation(s)
- J Yuan
- Department of Bioinformatics, Merck & Co., Inc., P.O. Box 2000-RY80-A1, Rahway, NJ 07065, USA.
| | | | | | | | | |
Collapse
|
21
|
Abstract
The year 2000 stands as a landmark in modern biology: the first draft of the human genome sequence has been completed. For the pharmaceutical industry, this achievement provides tremendous opportunities because the genomic sequence exposes all human drug targets for therapeutic intervention. The challenge for the pharmaceutical companies is to exploit this definitive resource for the identification of potential molecular targets, rapid characterization of their function and validation of their involvement in disease pathology. Bioinformatics approaches provide increasingly crucial tools to systematically support this exploratory target drug discovery activity.
Collapse
Affiliation(s)
- P Sanseau
- Target Bioinformatics, Glaxo SmithKline, Gunnels Wood Road, SG1 2NY, Stevenage, UK
| |
Collapse
|
22
|
Kawamoto S, Yoshii J, Mizuno K, Ito K, Miyamoto Y, Ohnishi T, Matoba R, Hori N, Matsumoto Y, Okumura T, Nakao Y, Yoshii H, Arimoto J, Ohashi H, Nakanishi H, Ohno I, Hashimoto J, Shimizu K, Maeda K, Kuriyama H, Nishida K, Shimizu-Matsumoto A, Adachi W, Ito R, Kawasaki S, Chae KS. BodyMap: a collection of 3' ESTs for analysis of human gene expression information. Genome Res 2000; 10:1817-27. [PMID: 11076866 PMCID: PMC310944 DOI: 10.1101/gr.151500] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BodyMap is a collection of site-directed 3' expressed sequence tags (ESTs) (gene signatures, GSs) that contains the transcript compositions of various human tissues and was the first systematic effort to acquire gene expression data. For the construction of BodyMap, cDNA libraries were made, preserving abundance information and histologic resolutions of tissue mRNAs. By sequencing 164,000 randomly selected clones, 88,587 GSs that represent chromosomally coded transcripts have been collected from 51 human organs and tissues. They were clustered into 18,722 independent 3' termini from transcripts, and more than 3000 of these were not found among ESTs assembled in UniGene (Build 75). Assessment of the prevalence of polyadenylation signals and comparison with GenBank cDNAs indicated that there was no significant contamination by internally primed cDNAs or genomic fragments but that there was a relatively high incidence (12%) of alternative polyadenylation sites. We evaluated the sensitivity and resolution of expression information in BodyMap by in silico Northern hybridization and selection of tissue-specific gene probes. BodyMap is a unique resource for estimation of the absolute abundance of transcripts and selection of gene probes for efficient hybridization-based gene expression profiling.
Collapse
Affiliation(s)
- S Kawamoto
- Institute for Molecular and Cellular Biology, Osaka University, Osaka 565-0871, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Dempsey AA, Ton C, Liew CC. A cardiovascular EST repertoire: progress and promise for understanding cardiovascular disease. MOLECULAR MEDICINE TODAY 2000; 6:231-7. [PMID: 10840381 DOI: 10.1016/s1357-4310(00)01727-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The application of expressed sequence tag (EST) technology has proven to be an effective tool for gene discovery and the generation of gene expression profiles. The generation of an EST resource for the cardiovascular system has revealed significant insights into the changes in gene expression that guide heart development and disease. Furthermore, an important genetic resource has been developed for cardiovascular biology that is valuable for data mining and disease gene discovery.
Collapse
Affiliation(s)
- A A Dempsey
- The Cardiovascular Genome Unit, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | | |
Collapse
|
24
|
Abstract
Bioinformatics has, out of necessity, become a key aspect of drug discovery in the genomic revolution, contributing to both target discovery and target validation. The author describes the role that bioinformatics has played and will continue to play in response to the waves of genome-wide data sources that have become available to the industry, including expressed sequence tags, microbial genome sequences, model organism sequences, polymorphisms, gene expression data and proteomics. However, these knowledge sources must be intelligently integrated.
Collapse
|
25
|
Strausberg RL, Buetow KH, Emmert-Buck MR, Klausner RD. The cancer genome anatomy project: building an annotated gene index. Trends Genet 2000; 16:103-6. [PMID: 10689348 DOI: 10.1016/s0168-9525(99)01937-x] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
26
|
Abstract
The Mammalian Gene Collection (MGC) project is a new effort by the NIH to generate full-length complementary DNA (cDNA) resources. This project will provide publicly accessible resources to the full research community. The MGC project entails the production of libraries, sequencing, and database and repository development, as well as the support of library construction, sequencing, and analytic technologies dedicated to the goal of obtaining a full set of human and other mammalian full-length (open reading frame) sequences and clones of expressed genes.
Collapse
Affiliation(s)
- R L Strausberg
- National Cancer Institute, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | | |
Collapse
|