1
|
Kurylo C, Guyomar C, Foissac S, Djebali S. TAGADA: a scalable pipeline to improve genome annotations with RNA-seq data. NAR Genom Bioinform 2023; 5:lqad089. [PMID: 37850035 PMCID: PMC10578202 DOI: 10.1093/nargab/lqad089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/11/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open
Abstract
Genome annotation plays a crucial role in providing comprehensive catalog of genes and transcripts for a particular species. As research projects generate new transcriptome data worldwide, integrating this information into existing annotations becomes essential. However, most bioinformatics pipelines are limited in their ability to effectively and consistently update annotations using new RNA-seq data. Here we introduce TAGADA, an RNA-seq pipeline for Transcripts And Genes Assembly, Deconvolution, and Analysis. Given a genomic sequence, a reference annotation and RNA-seq reads, TAGADA enhances existing gene models by generating an improved annotation. It also computes expression values for both the reference and novel annotation, identifies long non-coding transcripts (lncRNAs), and provides a comprehensive quality control report. Developed using Nextflow DSL2, TAGADA offers user-friendly functionalities and ensures reproducibility across different computing platforms through its containerized environment. In this study, we demonstrate the efficacy of TAGADA using RNA-seq data from the GENE-SWiTCH project alongside chicken and pig genome annotations as references. Results indicate that TAGADA can substantially increase the number of annotated transcripts by approximately [Formula: see text] in these species. Furthermore, we illustrate how TAGADA can integrate Illumina NovaSeq short reads with PacBio Iso-Seq long reads, showcasing its versatility. TAGADA is available at github.com/FAANG/analysis-TAGADA.
Collapse
Affiliation(s)
- Cyril Kurylo
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Cervin Guyomar
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Sylvain Foissac
- GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Toulouse, France
| | - Sarah Djebali
- IRSD, Université de Toulouse, INSERM, INRAE, ENVT, Univ Toulouse III - Paul Sabatier (UPS), Toulouse, France
| |
Collapse
|
2
|
Uva P, Da Sacco L, Del Cornò M, Baldassarre A, Sestili P, Orsini M, Palma A, Gessani S, Masotti A. Rat mir-155 generated from the lncRNA Bic is 'hidden' in the alternate genomic assembly and reveals the existence of novel mammalian miRNAs and clusters. RNA (NEW YORK, N.Y.) 2013; 19:365-79. [PMID: 23329697 PMCID: PMC3677247 DOI: 10.1261/rna.035394.112] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
MicroRNAs (miRNAs) are a class of small noncoding RNAs acting as post-transcriptional gene expression regulators in many physiological and pathological conditions. During the last few years, many novel mammalian miRNAs have been predicted experimentally with bioinformatics approaches and validated by next-generation sequencing. Although these strategies have prompted the discovery of several miRNAs, the total number of these genes still seems larger. Here, by exploiting the species conservation of human, mouse, and rat hairpin miRNAs, we discovered a novel rat microRNA, mir-155. We found that mature miR-155 is overexpressed in rat spleen myeloid cells treated with LPS, similarly to humans and mice. Rat mir-155 is annotated only on the alternate genome, suggesting the presence of other "hidden" miRNAs on this assembly. Therefore, we comprehensively extended the homology search also to mice and humans, finally validating 34 novel mammalian miRNAs (two in humans, five in mice, and up to 27 in rats). Surprisingly, 15 of these novel miRNAs (one for mice and 14 for rats) were found only on the alternate and not on the reference genomic assembly. To date, our findings indicate that the choice of genomic assembly, when mapping small RNA reads, is an important option that should be carefully considered, at least for these animal models. Finally, the discovery of these novel mammalian miRNA genes may contribute to a better understanding of already acquired experimental data, thereby paving the way to still unexplored investigations and to unraveling the function of miRNAs in disease models.
Collapse
Affiliation(s)
- Paolo Uva
- CRS4 Bioinformatics Laboratory, Parco Scientifico e Tecnologico POLARIS, 09010 Pula, Cagliari, Italy
| | - Letizia Da Sacco
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
| | - Manuela Del Cornò
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Antonella Baldassarre
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
| | - Paola Sestili
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Massimiliano Orsini
- CRS4 Bioinformatics Laboratory, Parco Scientifico e Tecnologico POLARIS, 09010 Pula, Cagliari, Italy
| | - Alessia Palma
- Genomic Core Facility, Bambino Gesù Children’s Hospital, IRCCS, 00139 Rome, Italy
| | - Sandra Gessani
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Andrea Masotti
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
- Corresponding authorE-mail E-mail
| |
Collapse
|
3
|
KUZNETSOV VLADIMIRA, PICKALOV VALERYV, SENKO OLEGV, KNOTT GARYD. ANALYSIS OF THE EVOLVING PROTEOMES: PREDICTIONS OF THE NUMBER OF PROTEIN DOMAINS IN NATURE AND THE NUMBER OF GENES IN EUKARYOTIC ORGANISMS. J BIOL SYST 2012. [DOI: 10.1142/s0218339002000767] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Motivation: Obtaining accurate estimates of the numbers of protein-coding genes and protein domains in a proteome, and the number of protein domains in nature is a daunting challenge. Computational analysis of the protein domain sets in the proteomes of many species allows us to estimate these numbers and to find their evolution relationships.Results: We have analyzed the distributions of the number of occurrences of protein domains in sample proteomes of the 70 fully sequenced genome organisms of three major kingdoms of life: Archaea, Bacteria and Eukaryota. We found that a large fraction of the identified distinct protein domains (i.e., unique domains and homologous domain families) in these 70 proteomes (1051 (23%) out of 4493) are found in at least one organism in each of these kingdoms of life and that 43 (1%) of these domains are common to all the 70 organisms. All the observed domain occurrence frequency distributions for these 70 proteomes are well fitted by a family of Pareto-like functions, associated with the steady state distributions of a linear Markov random process. We present explicit formulas that accurately predict the number of distinct protein domains and the number of protein-coding genes for a given organism as functions of the number of non-redundant domain-to-protein links in the proteomes. These functions allows us to predict that there are 42,740, 27,900, and 21,200 protein-coding genes/open reading frames in the human, A. thaliana, and mouse genomes, respectively. We also estimate that there are 5271, 2955, and 4915 distinct protein domains in the human, A. thaliana, and mouse proteomes, respectively, and about 5500 distinct protein domains in the entire "proteome world".
Collapse
Affiliation(s)
- VLADIMIR A. KUZNETSOV
- The Laboratory of Integrative and Medical Biophysics, National Institute of Child Health and Human Development, 13 South Drive, Bethesda, MD 20892, USA
| | - VALERY V. PICKALOV
- Institute of Theoretical and Applied Mechanics SB RAS, Novosibirsk, 630090, Russia
| | - OLEG V. SENKO
- Computer Center of Russian Academy of Sciences, Vavilov str. 40, 117967 Moscow, Russia
| | - GARY D. KNOTT
- Civilized Software, Inc., 12109 Heritage Park Circle, Silver Spring, MD 20906, USA
| |
Collapse
|
4
|
Identifying HIV-1 host cell factors by genome-scale RNAi screening. Methods 2010; 53:3-12. [PMID: 20654720 DOI: 10.1016/j.ymeth.2010.07.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 07/15/2010] [Accepted: 07/15/2010] [Indexed: 12/30/2022] Open
Abstract
Advances in the application of RNA interference (RNAi) have facilitated the establishment of systematic cell-based loss-of-function screening platforms. Widespread implementation of this technology has enabled genome-wide genetic analysis of a diverse array of cellular phenotypes, including the identification of host cell factors involved in viral replication. Four recent studies employed whole-genome RNAi technologies to elucidate cellular genes important for the replication of HIV-1. While these four genome-scale screens shared a common objective, they differ in their scope and experimental design. In this review we explore alternative strategies for developing RNAi screens, and discuss potential pitfalls of the technology. Important technical considerations include the choice of silencing reagents, experimental systems, assay readout and analysis methods. We focus on experimental and computational parameters that can impact the outcome of high-throughput genetic screens, and provide guidelines for the development of reliable cell-based RNAi screens.
Collapse
|
5
|
Antonescu C, Antonescu V, Sultana R, Quackenbush J. Using the DFCI gene index databases for biological discovery. ACTA ACUST UNITED AC 2010; Chapter 1:1.6.1-1.6.36. [PMID: 20205187 DOI: 10.1002/0471250953.bi0106s29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
|
6
|
Albertin W, Langella O, Joets J, Négroni L, Zivy M, Damerval C, Thiellement H. Comparative proteomics of leaf, stem, and root tissues of synthetic Brassica napus. Proteomics 2009; 9:793-9. [PMID: 19132686 DOI: 10.1002/pmic.200800479] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Warren Albertin
- Team Evolutionary Genetics: Adaptation and Redundancy, UMR 0320/UMR 8120 Génétique Végétale, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Ferme du Moulon, Gif-sur-Yvette, France.
| | | | | | | | | | | | | |
Collapse
|
7
|
van Baal JWPM, Krishnadath KK. High throughput techniques for characterizing the expression profile of Barrett's esophagus. Dis Esophagus 2008; 21:634-40. [PMID: 18564162 DOI: 10.1111/j.1442-2050.2008.00853.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Barrett's esophagus (BE) is the metaplastic change of the normal lined squamous epithelium of the distal esophagus to a columnar type of epithelium as a result of chronic long-standing gastroesophageal reflux disease. Patients with BE have a significantly increased risk of developing an esophageal adenocarcinoma, with an estimated annual incidence varying from 0.4 to 1.8%. Over the last 3 decades, the incidence of BE and its associated adenocarcinoma has increased in Western countries at a rate that exceeds that of any other malignancy. Despite all the research performed on BE, there is still an inadequate understanding of the biological basis of this mucosal transformation. With the upcoming modern high throughput technologies, important progression has been made in unraveling the expression profiles and gaining more insight in the biology of BE and esophageal adenocarcinoma. Several studies reported genome, transcriptome, proteome, and kinome investigations using high throughput techniques. These studies were conducted to find biomarkers that can be used to detect BE patients with increased risk for malignant progression or to obtain more insight in the mechanism underlying BE development. In the following review, we first discuss the different techniques that are currently available and summarize findings in this field, including several recent publications of our group.
Collapse
Affiliation(s)
- J W P M van Baal
- Laboratory of Experimental Internal Medicine, Academic Medical Center, Amsterdam, The Netherlands.
| | | |
Collapse
|
8
|
Lee Y, Quackenbush J. Using the TIGR gene index databases for biological discovery. ACTA ACUST UNITED AC 2008; Chapter 1:Unit 1.6. [PMID: 18428690 DOI: 10.1002/0471250953.bi0106s03] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Collapse
Affiliation(s)
- Yuandan Lee
- The Institute for Genomic Research, Rockville, Maryland, USA
| | | |
Collapse
|
9
|
Phillippy AM, Schatz MC, Pop M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 2008; 9:R55. [PMID: 18341692 PMCID: PMC2397507 DOI: 10.1186/gb-2008-9-3-r55] [Citation(s) in RCA: 183] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2007] [Revised: 01/10/2008] [Accepted: 03/14/2008] [Indexed: 01/08/2023] Open
Abstract
A collection of software tools is combined for the first time in an automated pipeline for detecting large-scale genome assembly errors and for validating genome assemblies. We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at .
Collapse
Affiliation(s)
- Adam M Phillippy
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
10
|
Guan DY, Fang ZQ, Zhu X, Wu ZH, Zhang H. Cloning of novel hepatoma gene in rats by invigorating the spleen to supplement Qi. Shijie Huaren Xiaohua Zazhi 2008; 16:265-271. [DOI: 10.11569/wcjd.v16.i3.265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
AIM: To clone the integrated cDNA expression sequence of EST segments in down-regulated genes by invigorating the spleen to supplement Qi.
METHODS: Integrated cDNA expression sequences of EST gene segments were cloned for the 689 down-regulated genes by invigorating the spleen to supplement Qi, with electron cloning in combination with PCR.
RESULTS: cDNA expression sequences were detected in 11 EST segments (G2, G4, G5, G6, G11, G14, G15, G16, G17, G18, G20). BLAST analysis showed that G14, G15 and G20 were novel genes which were submitted to GenBank (their accession number is DQ480745, DQ480746 and DQ480747, respectively).
CONCLUSION: Invigorating the spleen to supplement Qi can clone the cDNA expression sequences of EST segments in the down-regulated genes. Further study is needed to observe the functions of these novel genes and the mechanism of action of different TCM therapies.
Collapse
|
11
|
Tsai YS, Chen CM. Driven polymer transport through a nanopore controlled by a rotating electric field: off-lattice computer simulations. J Chem Phys 2007; 126:144910. [PMID: 17444746 DOI: 10.1063/1.2717187] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The driven translocation kinetics of a single strand polynucleotide chain through a nanopore is studied using off-lattice Monte Carlo simulations, by which the authors demonstrate a novel method in controlling the driven polymer transport through a nanopore by a rotating electric field. The recorded time series of blockade current from the driven polynucleotide transport are used to determine the sequence of polynucleotides by implementing a modified Monte Carlo algorithm, in which the energy landscape paving technique is incorporated to avoid trapping at deep local minima. It is found that only six-time series of block current are required to completely determine the polynucleotide sequence if the average missing rate (AMR) of current signals in these time series is smaller than 20%. For those time series with AMR greater than 20%, the error rate in sequencing an unknown polynucleotide decreases rapidly by increasing the number of time series. To find the most appropriate experimental conditions, the authors have investigated the dependence of AMR of current signals and qualified rate of measured time series of blockade current on various controllable experimental variables.
Collapse
Affiliation(s)
- Y-S Tsai
- Physics Department, National Taiwan Normal University, Taipei 116, Taiwan, Republic of China
| | | |
Collapse
|
12
|
Halasz G, van Batenburg MF, Perusse J, Hua S, Lu XJ, White KP, Bussemaker HJ. Detecting transcriptionally active regions using genomic tiling arrays. Genome Biol 2007; 7:R59. [PMID: 16859498 PMCID: PMC1779562 DOI: 10.1186/gb-2006-7-7-r59] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2005] [Revised: 01/05/2006] [Accepted: 07/05/2006] [Indexed: 11/10/2022] Open
Abstract
We have developed a method for interpreting genomic tiling array data, implemented as the program TranscriptionDetector. Probed loci expressed above background are identified by combining replicates in a way that makes minimal assumptions about the data. We performed medium-resolution Anopheles gambiae tiling array experiments and found extensive transcription of both coding and non-coding regions. Our method also showed improved detection of transcriptional units when applied to high-density tiling array data for ten human chromosomes.
Collapse
Affiliation(s)
- Gabor Halasz
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Integrated Program in Cellular, Molecular and Biophysical Studies, Columbia University, 630 w. 168Street, New York, NY, 10032 USA
| | - Marinus F van Batenburg
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands
| | - Joelle Perusse
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
| | - Sujun Hua
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
| | - Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
| | - Kevin P White
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT, 06520-8005, USA
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street, PO Box 208106, New Haven, CT, 06250-8106, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY, 10027 USA
- Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY, USA
| |
Collapse
|
13
|
Albertin W, Alix K, Balliau T, Brabant P, Davanture M, Malosse C, Valot B, Thiellement H. Differential regulation of gene products in newly synthesized Brassica napus allotetraploids is not related to protein function nor subcellular localization. BMC Genomics 2007; 8:56. [PMID: 17313678 PMCID: PMC1805753 DOI: 10.1186/1471-2164-8-56] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Accepted: 02/21/2007] [Indexed: 12/24/2022] Open
Abstract
Background Allopolyploidy is a preeminent process in plant evolution that results from the merger of distinct genomes in a common nucleus via inter-specific hybridization. Allopolyploid formation is usually related to genome-wide structural and functional changes though the underlying mechanisms operating during this "genomic shock" still remain poorly known. The aim of the present study was to investigate the modifications occurring at the proteomic level following an allopolyploidization event and to determine whether these changes are related to functional properties of the proteins. In a previous report, we applied comparative proteomics to synthetic amphiploids of Brassica napus and to its diploid progenitors B. rapa and B. oleracea. Although several hundred polypeptides displayed additivity (i.e. mid-parent values) in the amphiploids, many of them showed non-additivity. Here, we report the in silico functional characterization of the "non-additive" proteins (the ones with a non-additive pattern of regulation) in synthetic B. napus. Results The complete set of non-additive proteins (335 in the stem and 205 in the root), as well as a subset of additive polypeptides (200 per organ), was identified by mass spectrometry. Several protein isoforms were found, and most of them (~55%) displayed "different" or "opposite" patterns of regulation in the amphiploids, i.e. isoforms of the same protein showing both up-regulation and down-regulation in the synthetic B. napus compared to the mid-parent value. Components of protein complexes were identified of which ~50% also displayed "different" or "opposite" patterns of regulation in the allotetraploids. In silico functional categorization of the identified proteins was carried out, and showed that neither functional category nor metabolic pathway were systematically affected by non-additivity in the synthetic amphiploids. In addition, no subcellular compartment was found to be over- or under-represented among the proteins displaying non-additive values in the allopolyploids. Conclusion Protein identification showed that functionally related polypeptides (isoforms and complex subunits) could be differentially regulated in synthetic B. napus in comparison to its diploid progenitors while such proteins are usually expected to display co-regulation. The genetic redundancy within an allopolyploid could explain why functionally related proteins could display imbalanced levels of expression. No functional category, no metabolic pathway and no subcellular localization was found to be over- or under-represented within non-additive polypeptides, suggesting that the differential regulation of gene products was not related to functional properties of the proteins. Thus, at the protein level, there is no evidence for the "genomic shock" expected in neo-polyploids and the overall topology of protein networks and metabolic pathways is conserved in synthetic allotetraploids of B. napus in comparison to its diploid progenitors B. rapa and B. oleracea.
Collapse
Affiliation(s)
- Warren Albertin
- UMR de Génétique Végétale, INRA/CNRS/UPSud/INA P-G, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Karine Alix
- UMR de Génétique Végétale, INRA/CNRS/UPSud/INA P-G, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Thierry Balliau
- Plate-forme de Protéomique, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Philippe Brabant
- UMR de Génétique Végétale, INRA/CNRS/UPSud/INA P-G, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Marlène Davanture
- Plate-forme de Protéomique, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Christian Malosse
- Plate-forme de Protéomique de Versailles, INRA, 78026 Versailles, France
| | - Benoît Valot
- Plate-forme de Protéomique, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| | - Hervé Thiellement
- UMR de Génétique Végétale, INRA/CNRS/UPSud/INA P-G, La ferme du Moulon, 91190 Gif-sur-Yvette, France
| |
Collapse
|
14
|
Zhang W, Wang H, Song SW, Fuller GN. Insulin-like growth factor binding protein 2: gene expression microarrays and the hypothesis-generation paradigm. Brain Pathol 2006; 12:87-94. [PMID: 11770904 PMCID: PMC8095777 DOI: 10.1111/j.1750-3639.2002.tb00425.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
A major goal of modern medicine is to identify key genes and their products that are altered in the diseased state and to elucidate the molecular mechanisms underlying disease development, progression, and resistance to therapy. This is a daunting task given the exceptionally high complexity of the human genome. The paradigm for research has historically been hypothesis-driven despite the fact that the hypotheses under scrutiny often rest on tenuous subjective grounds or are derived from and dependent on chance observation. The imminent deciphering of the complete human genome, coupled with recent advances in high-throughput bioanalytical technology, has made possible a new paradigm in which data-based hypothesis-generation is the initial step in the investigative process, followed by hypothesis-testing. Genomics technologies are the primary source of the new hypothesis-generating capabilities that are now empowering biomedical researchers. The synergistic interaction between contemporary genomics technologies and the hypothesis-generation paradigm is well-illustrated by the discovery and subsequent ongoing study of the role of insulin-like growth factor binding protein 2 (IGFBP2) in human glioma biology. Using gene expression microarray technology, the IGFBP2 gene was recently found to be highly and differentially overexpressed in the most advanced grade of human glioma, glioblastoma. Based on this discovery, subsequent functional studies were initiated that suggest that IGFBP2 overexpression may contribute to the invasive nature of glioblastoma, and that IGFBP2 may exert its function via a newly identified novel binding protein. The IGFBP2 story is but one example of the power and potential of the new molecular methodologies that are transforming modern diagnostic and investigative neuropathology.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston 77030, USA.
| | | | | | | |
Collapse
|
15
|
Chambers D, Mason I. A high throughput messenger RNA differential display screen identifies discrete domains of gene expression and novel patterning processes along the developing neural tube. BMC DEVELOPMENTAL BIOLOGY 2006; 6:9. [PMID: 16504111 PMCID: PMC1397802 DOI: 10.1186/1471-213x-6-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Accepted: 02/24/2006] [Indexed: 11/15/2022]
Abstract
BACKGROUND During early development the vertebrate neural tube is broadly organized into the forebrain, midbrain, hindbrain and spinal cord regions. Each of these embryonic zones is patterned by a combination of genetic pathways and the influences of local signaling centres. However, it is clear that much remains to be learned about the complete set of molecular cues that are employed to establish the identity and intrinsic neuronal diversity of these territories. In order to address this, we performed a high-resolution messenger RNA differential display screen to identify molecules whose expression is regionally restricted along the anteroposterior (AP) neuraxis during early chick development, with particular focus on the midbrain and hindbrain vesicles. RESULTS This approach identified 44 different genes, with both known and unknown functions, whose transcription is differentially regulated along the AP axis. The identity and ontological classification of these genes is presented. The wide variety of functional classes of transcripts isolated in this screen reflects the diverse spectrum of known influences operating across these embryonic regions. Of these 44 genes, several have been selected for detailed in situ hybridization analysis to validate the screen and accurately define the expression domains. Many of the identified cDNAs showed no identity to the current databases of known or predicted genes or ESTs. Others represent genes whose embryonic expression has not been previously reported. Expression studies confirmed the predictions of the primary differential display data. Moreover, the nature of identified genes, not previously associated with regionalisation of the brain, identifies novel potential mechanisms in that process. CONCLUSION This study provides an insight into some of the varied and novel molecular networks that operate during the regionalization of embryonic neural tissue and expands our knowledge of molecular repertoire used during development.
Collapse
Affiliation(s)
- David Chambers
- MRC Centre for Developmental Neurobiology, 4Floor New Hunt's House, King's College London, Guy's Campus, London, SE1 1UL, UK
- Wellcome Trust Functional Genomics Development Initiative, MRC Centre for Developmental Neurobiology, 4Floor New Hunt's House, King's College London, Guy's Campus, London, SE1 1UL, UK
| | - Ivor Mason
- MRC Centre for Developmental Neurobiology, 4Floor New Hunt's House, King's College London, Guy's Campus, London, SE1 1UL, UK
| |
Collapse
|
16
|
Abstract
Of the major issues that dermatopathology will face in the immediate future, two powerful challenges loom large. The first is the application of novel nondestructive imaging technologies to in vivo diagnosis in humans. The second is the application of molecular technologies to a diagnostic arena which formerly belonged exclusively to the light microscopist. The first to be considered in this context is the application of near infrared spectroscopy to the noninvasive in vivo diagnosis of neoplastic skin disease. The second will be a discussion of application, methodology and the current state of the art in microarray technologies as they apply to neoplastic dermatopathology and, in particular, the diagnosis and prognostication of melanoma.
Collapse
Affiliation(s)
- A Neil Crowson
- Departments of Dermatology, Pathology, and Surgery, University of Oklahoma and Regional Medical Laboratory, St John Medical Center, Tulsa, OK, USA
| |
Collapse
|
17
|
Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, Mauceli E, Xie X, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, DeJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S, Grabherr M, Kellis M, Kleber M, Bardeleben C, Goodstadt L, Heger A, Hitte C, Kim L, Koepfli KP, Parker HG, Pollinger JP, Searle SMJ, Sutter NB, Thomas R, Webber C, Baldwin J, Abebe A, Abouelleil A, Aftuck L, Ait-Zahra M, Aldredge T, Allen N, An P, Anderson S, Antoine C, Arachchi H, Aslam A, Ayotte L, Bachantsang P, Barry A, Bayul T, Benamara M, Berlin A, Bessette D, Blitshteyn B, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Brown A, Cahill P, Calixte N, Camarata J, Cheshatsang Y, Chu J, Citroen M, Collymore A, Cooke P, Dawoe T, Daza R, Decktor K, DeGray S, Dhargay N, Dooley K, Dooley K, Dorje P, Dorjee K, Dorris L, Duffey N, Dupes A, Egbiremolen O, Elong R, Falk J, Farina A, Faro S, Ferguson D, Ferreira P, Fisher S, FitzGerald M, Foley K, Foley C, Franke A, Friedrich D, Gage D, Garber M, Gearin G, Giannoukos G, Goode T, Goyette A, Graham J, Grandbois E, Gyaltsen K, Hafez N, Hagopian D, Hagos B, Hall J, Healy C, Hegarty R, Honan T, Horn A, Houde N, Hughes L, Hunnicutt L, Husby M, Jester B, Jones C, Kamat A, Kanga B, Kells C, Khazanovich D, Kieu AC, Kisner P, Kumar M, Lance K, Landers T, Lara M, Lee W, Leger JP, Lennon N, Leuper L, LeVine S, Liu J, Liu X, Lokyitsang Y, Lokyitsang T, Lui A, Macdonald J, Major J, Marabella R, Maru K, Matthews C, McDonough S, Mehta T, Meldrim J, Melnikov A, Meneus L, Mihalev A, Mihova T, Miller K, Mittelman R, Mlenga V, Mulrain L, Munson G, Navidi A, Naylor J, Nguyen T, Nguyen N, Nguyen C, Nguyen T, Nicol R, Norbu N, Norbu C, Novod N, Nyima T, Olandt P, O'Neill B, O'Neill K, Osman S, Oyono L, Patti C, Perrin D, Phunkhang P, Pierre F, Priest M, Rachupka A, Raghuraman S, Rameau R, Ray V, Raymond C, Rege F, Rise C, Rogers J, Rogov P, Sahalie J, Settipalli S, Sharpe T, Shea T, Sheehan M, Sherpa N, Shi J, Shih D, Sloan J, Smith C, Sparrow T, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Stone S, Sykes S, Tchuinga P, Tenzing P, Tesfaye S, Thoulutsang D, Thoulutsang Y, Topham K, Topping I, Tsamla T, Vassiliev H, Venkataraman V, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Yang S, Yang X, Young G, Yu Q, Zainoun J, Zembek L, Zimmer A, Lander ES. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005; 438:803-19. [PMID: 16341006 DOI: 10.1038/nature04338] [Citation(s) in RCA: 1713] [Impact Index Per Article: 85.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Accepted: 10/11/2005] [Indexed: 12/12/2022]
Abstract
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Collapse
Affiliation(s)
- Kerstin Lindblad-Toh
- Broad Institute of Harvard and MIT, 320 Charles Street, Cambridge, Massachusetts 02141, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Brosius J. Echoes from the past--are we still in an RNP world? Cytogenet Genome Res 2005; 110:8-24. [PMID: 16093654 DOI: 10.1159/000084934] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2004] [Accepted: 05/04/2004] [Indexed: 11/19/2022] Open
Abstract
Availability of the human genome sequence and those of other species is unmeasured in their value for a comprehensive understanding of the architecture, function and evolution of genomes and cells. Various mechanisms keep genomes in flux and generate intra- and interspecies variation. The conversion of RNA modules into DNA and their more or less random integration into chromosomes (retroposition) is in many lineages including our own the most pervasive and perhaps the most enigmatic. The proclivity of such events in extant multicellular eukaryotes, even in more recent evolutionary times, gives the impression that the transition period from the RNP (ribonucleoprotein) world to the emergence of modern cells, where DNA became the predominant carrier of genetic information, has lasted billions of years and is an endlessly drawn-out process rather than the punctuated event one might expect. Apart from the impact of such RNA-mediated processes as retroposition, the role of RNA in a wide variety of cellular functions has only recently become more widely appreciated.
Collapse
Affiliation(s)
- J Brosius
- Institute of Experimental Pathology, ZMBE, University of Munster, Munster, Germany.
| |
Collapse
|
19
|
Sellheyer K, Belbin TJ. DNA microarrays: from structural genomics to functional genomics. The applications of gene chips in dermatology and dermatopathology. J Am Acad Dermatol 2005; 51:681-92; quiz 693-6. [PMID: 15523345 DOI: 10.1016/j.jaad.2004.03.038] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The human genome project was successful in sequencing the entire human genome and ended earlier than expected. The vast genetic information now available will have far-reaching consequences for medicine in the twenty-first century. The knowledge gained from the mapping and sequencing of human genes on a genome-wide scale--commonly referred to as structural genomics--is prerequisite for studies that focus on the functional aspects of genes. A recently invented technique, known as gene chip, or DNA microarray, technology, allows the study of the function of thousands of genes at once, thereby opening the door to the new field of functional genomics. At its core, the DNA microarray utilizes a unique feature of DNA known as complementary hybridization. As such, it is not different from Southern (DNA) blot or northern (RNA) blot hybridizations, or the polymerase chain reaction, with the exception that it allows expression profiling of the entire human genome in a single hybridization experiment. The article highlights the principles, technology, and applications of DNA microarrays as they pertain to the field of dermatology and dermatopathology. The most important applications are the gene expression profiling of skin cancer, especially of melanoma. Other potential applications include gene expression profiling of inflammatory skin diseases, the mutational analysis of genodermatoses, and polymorphism screening, as well as drug development and chemosensitivity prediction. cDNA microarrays will shape the diagnostic approach of the dermatology and the dermatopathology of the future and may lead to new therapeutic options.
Collapse
Affiliation(s)
- Klaus Sellheyer
- Department of Dermatology, The Cleveland Clinic Foundation, Cleveland, Ohio 44195, USA.
| | | |
Collapse
|
20
|
Dvorak CMT, Hyland KA, Machado JG, Zhang Y, Fahrenkrug SC, Murtaugh MP. Gene discovery and expression profiling in porcine Peyer's patch. Vet Immunol Immunopathol 2005; 105:301-15. [PMID: 15808308 DOI: 10.1016/j.vetimm.2005.02.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Peyer's patches of the intestinal mucosa are essential for host defense and immune regulation in the enteric system. To better understand molecular mechanisms of Peyer's patch function, we have screened for differentially expressed genes specific to Peyer's patch. cDNA libraries were created from normal Peyer's patch, immune stimulated Peyer's patch, and pooled cDNA subtracted with fibroblast RNA. From the subtracted library, 3687 expressed sequence tags (ESTs), representing 2414 unique nucleotide sequences, were isolated, identified by BLAST searches against public databases, and spotted onto a microarray for gene expression profiling. Approximately 30% of these ESTs BLAST to genes of unknown function and 20% have no known homology in the public databases (novel genes). Of the novel genes, 70% are expressed in normal immune tissues by microarray analysis, suggesting that at least 371 of the unidentified EST sequences from the subtracted library are novel porcine genes and can now be further characterized to determine their function in the porcine Peyer's patch. We surmise that the products of these genes participate in biochemical and cellular functions related to the unique immunological and gastroenterological functions of the small intestine. The BLAST and gene ontology information for each of the subtracted library EST sequences, the normal and immune stimulated libraries, and the microarray are all valuable resources that will facilitate further examination of the biological function of porcine Peyer's patch tissue.
Collapse
Affiliation(s)
- C M T Dvorak
- Department of Veterinary and Biomedical Sciences, University of Minnesota, 1971 Commonwealth Avenue, St. Paul, MN 55108, USA.
| | | | | | | | | | | |
Collapse
|
21
|
Larsson TP, Murray CG, Hill T, Fredriksson R, Schiöth HB. Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery. FEBS Lett 2005; 579:690-8. [PMID: 15670830 DOI: 10.1016/j.febslet.2004.12.046] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2004] [Revised: 12/13/2004] [Accepted: 12/13/2004] [Indexed: 11/25/2022]
Abstract
Large amounts of refined sequence material in the form of predicted, curated and annotated genes and expressed sequences tags (ESTs) have recently been added to the NCBI databases. We matched the transcript-sequences of RefSeq, Ensembl and dbEST in an attempt to provide an updated overview of how many unique human genes can be found. The results indicate that there are about 25000 unique genes in the union of RefSeq and Ensembl with 12-18% and 8-13% of the genes in each set unique to the other set, respectively. About 20% of all genes had splice variants. There are a considerable number of ESTs (2200000) that do not match the identified genes and we used an in-house pipeline to identify 22 novel genes from Genscan predictions that have considerable EST coverage. The study provides an insight into the current status of human gene catalogues and shows that considerable refinement of methods and datasets is needed to come to a conclusive gene count.
Collapse
Affiliation(s)
- Thomas P Larsson
- Department of Neuroscience, Uppsala University, BMC Box 593, 751 24 Uppsala, Sweden.
| | | | | | | | | |
Collapse
|
22
|
Vanti WB, Swaminathan S, Blevins R, Bonini JA, O'Dowd BF, George SR, Weinshank RL, Smith KE, Bailey WJ. Patent status of the therapeutically important G-protein-coupled receptors. Expert Opin Ther Pat 2005. [DOI: 10.1517/13543776.11.12.1861] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
23
|
Wu JQ, Garcia AM, Hulyk S, Sneed A, Kowis C, Yuan Y, Steffen D, McPherson JD, Gunaratne PH, Gibbs RA. Large-scale RT-PCR recovery of full-length cDNA clones. Biotechniques 2004; 36:690-6, 698-700. [PMID: 15088387 DOI: 10.2144/04364dd03] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Pseudogenes, alternative transcripts, noncoding RNA, and polymorphisms each add extensive complexity to the mammalian transcriptome and confound estimation of the total number of genes. Despite advanced algorithms for gene prediction and several large-scale efforts to obtain cDNA clones for all human open reading frames (ORFs), no single collection is complete. To enhance this effort, we have developed a high-throughput pipeline for reverse transcription PCR (RT-PCR) gene recovery. Most importantly, novel molecular strategies for improving RT-PCR yield of transcripts that have been difficult to isolate by other means and computational strategies for clone sequence validation have been developed and optimized. This systematic gene recovery pipeline allows both rescue of predicted human and rat genes and provides insight into the complexity of the transcriptome through comparisons with existing data sets.
Collapse
Affiliation(s)
- Jia Qian Wu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Schadt EE, Edwards SW, GuhaThakurta D, Holder D, Ying L, Svetnik V, Leonardson A, Hart KW, Russell A, Li G, Cavet G, Castle J, McDonagh P, Kan Z, Chen R, Kasarskis A, Margarint M, Caceres RM, Johnson JM, Armour CD, Garrett-Engele PW, Tsinoremas NF, Shoemaker DD. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol 2004; 5:R73. [PMID: 15461792 PMCID: PMC545593 DOI: 10.1186/gb-2004-5-10-r73] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 07/07/2004] [Accepted: 08/16/2004] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. RESULTS The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. CONCLUSIONS These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized.
Collapse
Affiliation(s)
- Eric E Schadt
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Stephen W Edwards
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | | | - Dan Holder
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Lisa Ying
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Vladimir Svetnik
- Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA
| | - Amy Leonardson
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Kyle W Hart
- Rally Scientific, 41 Fayette Street, Suite 1, Watertown, MA 02472, USA
| | - Archie Russell
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Guoya Li
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Guy Cavet
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - John Castle
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Paul McDonagh
- Amgen Inc, 1201 Amgen Court W, Seattle, WA 98119, USA
| | - Zhengyan Kan
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Ronghua Chen
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Andrew Kasarskis
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Mihai Margarint
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Ramon M Caceres
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | - Jason M Johnson
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| | | | | | | | - Daniel D Shoemaker
- Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA
| |
Collapse
|
25
|
Abstract
The availability of entire genome sequences is expected to revolutionize the way in which biology and medicine are conducted for years to come. However, achieving this promise still requires significant effort in the areas of gene annotation, cloning and expression of thousands of known and heretofore unknown protein-encoding genes. Traditional technologies of manipulating genes are too cumbersome and inefficient when one is dealing with more than a few genes at a time. Entire libraries composed of all protein-encoding open reading frames (ORFs) cloned in highly flexible vectors will be needed to take full advantage of the information found in any genome sequence. The creation of such ORFeome resources using novel technologies for cloning and expressing entire proteomes constitutes an effective gateway from whole genome sequencing efforts to downstream 'omics' applications.
Collapse
Affiliation(s)
- Jean-François Rual
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, 44 Binney Street, Boston, MA 02115, USA
| | | | | |
Collapse
|
26
|
Waterston RH, Hillier LW, Fulton LA, Fulton RS, Graves TA, Pepin KH, Bork P, Suyama M, Torrents D, Chinwalla AT, Mardis ER, McPherson JD, Wilson RK. The human genome: genes, pseudogenes, and variation on chromosome 7. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2004; 68:13-22. [PMID: 15338598 DOI: 10.1101/sqb.2003.68.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Affiliation(s)
- R H Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Hui L, Zhang X, Wu X, Lin Z, Wang Q, Li Y, Hu G. Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment. Oncogene 2004; 23:3013-23. [PMID: 15048092 DOI: 10.1038/sj.onc.1207362] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Several databases have been published to predict alternative splicing of mRNAs by analysing the exon linkage relationship by alignment of expressed sequence tags (ESTs) to the genome sequence; however, little effort has been made to investigate the relationship between cancers and alternative splicing. We developed a program, Alternative Splicing Assembler (ASA), to look for splicing variants of human gene transcripts by genome-wide ESTs alignment. Using ASA, we constructed the biosino alternative splicing database (BASD), which predicted splicing variants for reference sequences from the reference sequence database (RefSeq) and presented them in both graph and text formats. EST clusters that differ from the reference sequences in at least one splicing site were counted as splicing variants. Of 4322 genes screened, 3490 (81%) were observed with at least one alternative splicing variants. To discover the variants associated with cancers, tissue sources of EST sequences were extracted from the UniLib database and ESTs from the same tissue type were counted. These were regarded as the indicators for gene expression level. Using Fisher's exact test, alternative splicing variants, of which EST counts were significantly different between cancer tissues and their counterpart normal tissues, were identified. It was predicted that 2149 variants, or 383 variants after Bonferroni correction, of 26 812 variants were likely tumor-associated. By reverse transcription-PCR, 11 of 13 novel alternative splicing variants and eight of nine variants' tissue specificity were confirmed in hepatocellular carcinoma and in lung cancer. The possible involvement of alternative splicing in cancer is discussed.
Collapse
Affiliation(s)
- Lijian Hui
- State Key Laboratory of Molecular Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences,Yueyang Road 320, Shanghai 200031, China
| | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
Genomic technologies are rapidly evolving and have demonstrated utility in augmenting oncological pathology or clinical presentation in disease classification and risk of relapse assessment. Numerous malignancies have been subject to microarray examination, and through a variety of analysis methodologies, groups of reporter genes have been identified to generate 'molecular portraits' of these diseases. Once validated, it is likely that assessment of the expression levels of subsets of reporter genes will contribute to personalized genomic medicine through diagnosis and selection of treatment options for patients. The dynamic nature of this field ensures that new developments are missing from this review.
Collapse
Affiliation(s)
- Elizabeth A Raetz
- Division of Hematology-Oncology, Huntsman Cancer Institute, University of Utah School of Medicine, Primary Children's Medical Center, Salt Lake City, Utah, USA
| | | |
Collapse
|
29
|
Méchin V, Balliau T, Château-Joubert S, Davanture M, Langella O, Négroni L, Prioul JL, Thévenot C, Zivy M, Damerval C. A two-dimensional proteome map of maize endosperm. PHYTOCHEMISTRY 2004; 65:1609-18. [PMID: 15276456 DOI: 10.1016/j.phytochem.2004.04.035] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2004] [Revised: 04/27/2004] [Indexed: 05/23/2023]
Abstract
We have established a proteome reference map for maize (Zea mays L.) endosperm by means of two-dimensional gel electrophoresis and protein identification with LC-MS/MS analysis. This investigation focussed on proteins in major spots in a 4-7 pI range and 10-100 kDa M(r) range. Among the 632 protein spots processed, 496 were identified by matching against the NCBInr and ZMtuc-tus databases (using the SEQUEST software). Forty-two per cent of the proteins were identified against maize sequences, 23% against rice sequences and 21% against Arabidopsis sequences. Identified proteins were not only cytoplasmic but also nuclear, mitochondrial or amyloplastic. Metabolic processes, protein destination, protein synthesis, cell rescue, defense, cell death and ageing are the most abundant functional categories, comprising almost half of the 632 proteins analyzed in our study. This proteome map constitutes a powerful tool for physiological studies and is the first step for investigating the maize endosperm development.
Collapse
Affiliation(s)
- Valérie Méchin
- INRA/INA-PG/UPS/CNRS UMR8120, Ferme du Moulon, Gif-sur-Yvette, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 2004; 101:6062-7. [PMID: 15075390 PMCID: PMC395923 DOI: 10.1073/pnas.0400782101] [Citation(s) in RCA: 2798] [Impact Index Per Article: 133.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Accepted: 03/02/2004] [Indexed: 01/14/2023] Open
Abstract
The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.
Collapse
Affiliation(s)
- Andrew I Su
- The Genomics Institute of the Novartis Research Foundation, 10675 John J. Hopkins Drive, San Diego, CA 92121, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Kiyosawa H, Kawashima T, Silva D, Petrovsky N, Hasegawa Y, Sakai K, Hayashizaki Y. Systematic genome-wide approach to positional candidate cloning for identification of novel human disease genes. Intern Med J 2004; 34:79-90. [PMID: 15030454 DOI: 10.1111/j.1444-0903.2004.00581.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
BACKGROUND Recent large-scale genome projects afford a unique opportunity to identify many novel disease genes and thereby better understand the genetic basis of human disease. Functional Annotation of Mouse (FANTOM) 2, the largest mouse transcriptome project yet, provides a wealth of data on novel genes, splice variants and non-coding RNA, and provides a unique opportunity to identify novel human disease genes. AIMS To demonstrate the power of combining the FANTOM 2 cDNA dataset with a positional candidate approach and bioinformatics analysis to identify genes underlying human genetic disease. RESULTS By mapping all FANTOM 2 cDNA to the human genome, we were able to identify mouse clones that co-localised on the human genome with mapped but uncloned human disease loci. By this method we identified mouse and corresponding human genes mapping within the loci of 100 different human genetic diseases (mapped interval of <5 cM). Of particular interest was the elucidation through FANTOM 2 novel mouse gene data of candidate human genes for the following: (i) developmental -disorders: neural tube defect, Meckel syndrome, Wolf--Hirschhorn syndrome and keratosis follicularis spinulosa decalvans cum ophiasi; (ii) neurological disorders: benign familial infantile convulsions 3, early-onset cerebellar ataxia with retained tendon reflexes, infantile-onset spinocerebellar ataxia and vacuolar neuro-myopathy and (iii) cancer-related syndromes: tylosis with oesophageal cancer and low-grade B-cell chronic lymphatic leukaemia. CONCLUSIONS The FANTOM 2 data will dramatically accelerate efforts to identify genes underlying human disease. It will also facilitate the creation of transgenic mouse models to help elucidate the function of potential human disease genes.
Collapse
Affiliation(s)
- H Kiyosawa
- Technology and Development team for Mammalian Cellular Dynamics, Bioresource Center, RIKEN Tsukuba Institute, Tsukuba, Ibaraki, Japan
| | | | | | | | | | | | | |
Collapse
|
32
|
Porcel BM, Delfour O, Castelli V, De Berardinis V, Friedlander L, Cruaud C, Ureta-Vidal A, Scarpelli C, Wincker P, Schächter V, Saurin W, Gyapay G, Salanoubat M, Weissenbach J. Numerous novel annotations of the human genome sequence supported by a 5'-end-enriched cDNA collection. Genome Res 2004; 14:463-71. [PMID: 14962985 PMCID: PMC353234 DOI: 10.1101/gr.1481104] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A collection of 90,000 human cDNA clones generated to increase the fraction of "full-length" cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5' end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that approximately 380 gene models described in LocusLink could be extended at their 5' end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.
Collapse
Affiliation(s)
- Betina M Porcel
- Genoscope-Centre National de Séquençage and CNRS UMR-8030, 91000 Evry, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
The profiling of gene expression patterns with DNA microarrays is recently being widely used not only in basic molecular biological studies but also in the practical fields. In clinical application, for example, this technique is expected to be quite useful in making a correct diagnosis. In the pharmacological area, the microarray analysis can be applied to drug discovery and individualized drug treatment. Although not so popular as these examples, DNA microarrays could also be a powerful tool in studies relevant to occupational health. This review will describe the outline of gene expression profiling with DNA microarrays and prospects in occupational health research.
Collapse
Affiliation(s)
- Shinji Koizumi
- Department of Hazard Assessment, National Institute of Industrial Health, Japan.
| |
Collapse
|
34
|
Erkeland SJ, Valkhof M, Heijmans-Antonissen C, van Hoven-Beijen A, Delwel R, Hermans MHA, Touw IP. Large-scale identification of disease genes involved in acute myeloid leukemia. J Virol 2004; 78:1971-80. [PMID: 14747562 PMCID: PMC369447 DOI: 10.1128/jvi.78.4.1971-1980.2004] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2003] [Accepted: 10/27/2003] [Indexed: 11/20/2022] Open
Abstract
Acute myeloid leukemia (AML) is a heterogeneous group of diseases in which chromosomal aberrations, small insertions or deletions, or point mutations in certain genes have profound consequences for prognosis. However, the majority of AML patients present without currently known genetic defects. Retroviral insertion mutagenesis in mice has become a powerful tool for identifying new disease genes involved in the pathogenesis of leukemia and lymphoma. Here we have used the Graffi-1.4 strain of murine leukemia virus, which causes predominantly AML, in a screen to identify novel genes involved in the pathogenesis of this disease. We report 79 candidate disease genes in common integration sites (CISs) and 15 genes whose family members previously were found to be affected in other studies. The majority of the identified sequences (60%) were not found in lymphomas and monocytic leukemias in previous screens, suggesting a specific involvement in AML. Although most of the virus integrations occurred in or near the 5' or 3' ends of the genes, suggesting deregulation of gene expression as a consequence of virus integration, 18 CISs were located exclusively within the genes, conceivably causing gene disruption.
Collapse
Affiliation(s)
- Stefan J Erkeland
- Department of Hematology, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
35
|
Coulouarn C, Lefebvre G, Derambure C, Lequerre T, Scotte M, Francois A, Cellier D, Daveau M, Salier JP. Altered gene expression in acute systemic inflammation detected by complete coverage of the human liver transcriptome. Hepatology 2004; 39:353-64. [PMID: 14767988 DOI: 10.1002/hep.20052] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
The goal of the current study was to provide complete coverage of the liver transcriptome with human probes corresponding to every gene expressed in embryonic, adult, and/or cancerous liver. We developed dedicated tools, namely, the Liverpool nylon array of complementary DNA (cDNA) probes for approximately 10,000 nonredundant genes and the LiverTools database. Inflammation-induced transcriptome changes were studied in liver tissue samples from patients with an acute systemic inflammation and from control subjects. One hundred and fifty-four messenger RNAs (mRNA) correlated statistically with the extent of inflammation. Of these, 134 mRNA samples were not associated previously with an acute-phase (AP) response. The hepatocyte origin and proinflammatory cytokine responsiveness of these mRNAs were confirmed by quantitative reverse-transcription polymerase chain reaction (Q-RT-PCR) in cytokine-challenged hepatoma cells. The corresponding gene promoters were enriched in potential binding sites for inflammation-driven transcription factors in the liver. Some of the corresponding proteins may provide novel blood markers of clinical relevance. The mRNAs whose level is most correlated with the AP extent (P <.05) were enriched in intracellular signaling molecules, transcription factors, glycosylation enzymes, and up-regulated plasma proteins. In conclusion, the hepatocyte responded to the AP extent by fine tuning some mRNA levels, controlling most, if not all, intracellular events from early signaling to the final secretion of proteins involved in innate immunity. Supplementary material for this article can be found on the HEPATOLOGY website (http://interscience.wiley.com/jpages/0270-9139/suppmat/index.html).
Collapse
Affiliation(s)
- Cédric Coulouarn
- INSERM Unité 519 and Faculté de Médecine-Pharmacie, Institut Fédératif de Recherches Multidisciplinaires sur les Peptides, Rouen, France
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Shin JH, Yang JW, Juranville JF, Fountoulakis M, Lubec G. Evidence for existence of thirty hypothetical proteins in rat brain. Proteome Sci 2004; 2:1. [PMID: 14754459 PMCID: PMC373456 DOI: 10.1186/1477-5956-2-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2003] [Accepted: 01/30/2004] [Indexed: 11/14/2022] Open
Abstract
Background The rapid completion of genome sequences has created an infrastructure of biological information and provided essential information to link genes to gene products, proteins, the building blocks for cellular functions. In addition, genome/cDNA sequences make it possible to predict proteins for which there is no experimental evidence. Clues for function of hypothetical proteins are provided by sequence similarity with proteins of known function in model organisms. Results We constructed a two-dimensional protein map and searched for expression of hypothetical proteins in rat brain. Two-dimensional electrophoresis (2-DE) with subsequent in-gel digestion of spots and matrix-assisted laser desorption/ionization (MALDI) spectrometric identification were applied. In total about 3700 spots were analysed, which resulted in the identification of about 1700 polypeptides, that were the products of 190 different genes. A number of hypothetical gene products were detected (30 of 190, 15.8%) and are considered brain proteins. Conclusions A major finding of this study is the demonstration of the existence of putative proteins that were so far only deduced from their nucleic acid structure by a protein chemical method independent of antibody availability and specificity and unambiguously identifying proteins.
Collapse
Affiliation(s)
- Joo-Ho Shin
- Department of Pediatrics, University of Vienna, Vienna, Austria
| | - Jae-Won Yang
- Department of Pediatrics, University of Vienna, Vienna, Austria
| | | | | | - Gert Lubec
- Department of Pediatrics, University of Vienna, Vienna, Austria
| |
Collapse
|
37
|
Blomberg LA, Zuelke KA. Serial analysis of gene expression (SAGE) during porcine embryo development. Reprod Fertil Dev 2004. [DOI: 10.1071/rd03081] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Functional genomics provides a powerful means for delving into the molecular mechanisms involved in pre-implantation development of porcine embryos. High rates of embryonic mortality (30%), following either natural mating or artificial insemination, emphasise the need to improve the efficiency of reproduction in the pig. The poor success rate of live offspring from in vitro-manipulated pig embryos also hampers efforts to generate transgenic animals for biotechnology applications. Previous analysis of differential gene expression has demonstrated stage-specific gene expression for in vivo-derived embryos and altered gene expression for in vitro-derived embryos. However, the methods used to date examine relatively few genes simultaneously and, thus, provide an incomplete glimpse of the physiological role of these genes during embryogenesis. The present review will focus on two aspects of applying functional genomics research strategies for analysing the expression of genes during elongation of pig embryos between gestational day (D) 11 and D12. First, we compare and contrast current methodologies that are being used for gene discovery and expression analysis during pig embryo development. Second, we establish a paradigm for applying serial analysis of gene expression as a functional genomics tool to obtain preliminary information essential for discovering the physiological mechanisms by which distinct embryonic phenotypes are derived.
Collapse
|
38
|
Affiliation(s)
- Helen Kim
- Department of Pharmacology and Toxicology, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.
| | | | | |
Collapse
|
39
|
Qi Z, Cui Y, Fang W, Ling L, Chen R. Autosomal similarity revealed by eukaryotic genomic comparison. J Biol Phys 2004; 30:305-12. [PMID: 23345874 DOI: 10.1007/s10867-004-0996-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
To describe eukaryotic autosomes quantitatively and determine differences between them in terms of amino acid sequences of genes, functional classification of proteins, and complete DNA sequences, we applied two theoretical methods, the Proteome-vector method and the function of degree of disagreement (FDOD) method, that are based on function and sequence similarity respectively, to autosomes from nine eukaryotes. No matter what aspect of the autosome is considered, the autosomal differences within each organism were less than that between species. Our results show that eukaryotic autosomes resemble each other within a species while those from different organisms differ. We propose a hypothesis (named intra-species autosomal random shuffling) as an explanation for our results and suggest that lateral gene transfer (LGT) did not occur frequently during the evolution of eukarya.
Collapse
Affiliation(s)
- Zhen Qi
- Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101 PR China
| | | | | | | | | |
Collapse
|
40
|
Hild M, Beckmann B, Haas SA, Koch B, Solovyev V, Busold C, Fellenberg K, Boutros M, Vingron M, Sauer F, Hoheisel JD, Paro R. An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol 2003; 5:R3. [PMID: 14709175 PMCID: PMC395735 DOI: 10.1186/gb-2003-5-1-r3] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2003] [Revised: 10/13/2003] [Accepted: 11/19/2003] [Indexed: 11/19/2022] Open
Abstract
A novel Drosophila microarray constructed on the basis of an integrated in silico/wet biology approach provides evidence for the transcription of approximately 2,600 additional genes. Validation indicates a lower limit of 2,000 novel annotations, thus raising the number of genes that make a fly. Background While the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software. Results Here we provide evidence for the transcription of approximately 2,600 additional genes predicted by Fgenesh. Validation of the developmental profiling data by RT-PCR and in situ hybridization indicates a lower limit of 2,000 novel annotations, thus substantially raising the number of genes that make a fly. Conclusions The successful design and application of this novel Drosophila microarray on the basis of our integrated in silico/wet biology approach confirms our expectation that in silico approaches alone will always tend to be incomplete. The identification of at least 2,000 novel genes highlights the importance of gathering experimental evidence to discover all genes within a genome. Moreover, as such an approach is independent of homology criteria, it will allow the discovery of novel genes unrelated to known protein families or those that have not been strictly conserved between species.
Collapse
Affiliation(s)
- M Hild
- Zentrum für Molekulare Biologie Heidelberg (ZMBH), University of Heidelberg, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - B Beckmann
- Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - SA Haas
- Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| | - B Koch
- Zentrum für Molekulare Biologie Heidelberg (ZMBH), University of Heidelberg, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| | - V Solovyev
- Softberry, Inc., 116 Radio Circle, Suite 400, Mount Kisko, NY 10549, USA
| | - C Busold
- Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - K Fellenberg
- Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - M Boutros
- Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - M Vingron
- Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| | - F Sauer
- Zentrum für Molekulare Biologie Heidelberg (ZMBH), University of Heidelberg, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
- Department of Biochemistry, University of California, Riverside, CA 92521, USA
| | - JD Hoheisel
- Division of Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - R Paro
- Zentrum für Molekulare Biologie Heidelberg (ZMBH), University of Heidelberg, Im Neuenheimer Feld 282, 69120 Heidelberg, Germany
| |
Collapse
|
41
|
Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol 2003; 4:R74. [PMID: 14611660 PMCID: PMC329124 DOI: 10.1186/gb-2003-4-11-r74] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2003] [Revised: 09/02/2003] [Accepted: 09/25/2003] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Abundant pseudogenes are a feature of mammalian genomes. Processed pseudogenes (PPs) are reverse transcribed from mRNAs. Recent molecular biological studies show that mammalian long interspersed element 1 (L1)-encoded proteins may have been involved in PP reverse transcription. Here, we present the first comprehensive analysis of human PPs using all known human genes as queries. RESULTS The human genome was queried and 3,664 candidate PPs were identified. The most abundant were copies of genes encoding keratin 18, glyceraldehyde-3-phosphate dehydrogenase and ribosomal protein L21. A simple method was developed to estimate the level of nucleotide substitutions (and therefore the age) of PPs. A Poisson-like age distribution was obtained with a mean age close to that of the Alu repeats, the predominant human short interspersed elements. These data suggest a nearly simultaneous burst of PP and Alu formation in the genomes of ancestral primates. The peak period of amplification of these two distinct retrotransposons was estimated to be 40-50 million years ago. Concordant amplification of certain L1 subfamilies with PPs and Alus was observed. CONCLUSIONS We suggest that a burst of formation of PPs and Alus occurred in the genome of ancestral primates. One possible mechanism is that proteins encoded by members of particular L1 subfamilies acquired an enhanced ability to recognize cytosolic RNAs in trans.
Collapse
Affiliation(s)
- Kazuhiko Ohshima
- School and Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan
| | - Masahira Hattori
- RIKEN Genomic Sciences Center, 1-7-22, Suehiro Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Laboratory of Genome Information, Kitasato Institute for Life Science, Kitasato University, 1-15-1, Kitasato, Sagamihara, Kanagawa 228-8555, Japan
| | - Tetsusi Yada
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Takashi Gojobori
- Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| | - Yoshiyuki Sakaki
- RIKEN Genomic Sciences Center, 1-7-22, Suehiro Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Norihiro Okada
- School and Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan
| |
Collapse
|
42
|
Inada DC, Bashir A, Lee C, Thomas BC, Ko C, Goff SA, Freeling M. Conserved noncoding sequences in the grasses. Genome Res 2003; 13:2030-41. [PMID: 12952874 PMCID: PMC403677 DOI: 10.1101/gr.1280703] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
As orthologous genes from related species diverge over time, some sequences are conserved in noncoding regions. In mammals, large phylogenetic footprints, or conserved noncoding sequences (CNSs), are known to be common features of genes. Here we present the first large-scale analysis of plant genes for CNSs. We used maize and rice, maximally diverged members of the grass family of monocots. Using a local sequence alignment set to deliver only significant alignments, we found one or more CNSs in the noncoding regions of the majority of genes studied. Grass genes have dramatically fewer and much smaller CNSs than mammalian genes. Twenty-seven percent of grass gene comparisons revealed no CNSs. Genes functioning in upstream regulatory roles, such as transcription factors, are greatly enriched for CNSs relative to genes encoding enzymes or structural proteins. Further, we show that a CNS cluster in an intron of the knotted1 homeobox gene serves as a site of negative regulation. We showthat CNSs in the adh1 gene do not correlate with known cis-acting sites. We discuss the potential meanings of CNSs and their value as analytical tools and evolutionary characters. We advance the idea that many CNSs function to lock-in gene regulatory decisions.
Collapse
Affiliation(s)
- Dan Choffnes Inada
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California 94720, USA
| | | | | | | | | | | | | |
Collapse
|
43
|
Huntley D, Hummerich H, Smedley D, Kittivoravitkul S, McCarthy M, Little P, Sergot M. GANESH: software for customized annotation of genome regions. Genome Res 2003; 13:2195-202. [PMID: 12952886 PMCID: PMC403729 DOI: 10.1101/gr.698103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching and genome-analysis packages. The results are stored in compressed form in a relational database, and are updated automatically on a regular schedule so that they are always immediately available in their most up-to-date versions. A Java front-end, executed as a stand alone application or web applet, provides a graphical interface for navigating the database and for viewing the annotations. There are facilities for importing and exporting data in the format of the Distributed Annotation System (DAS), enabling a GANESH database to be used as a component of a DAS configuration. The system has been used to construct databases for about a dozen regions of human chromosomes and for three regions of mouse chromosomes.
Collapse
Affiliation(s)
- Derek Huntley
- Department of Computing, Imperial College, London SW7 2AZ, UK
| | | | | | | | | | | | | |
Collapse
|
44
|
Evans EJ, Hene L, Sparks LM, Dong T, Retiere C, Fennelly JA, Manso-Sancho R, Powell J, Braud VM, Rowland-Jones SL, McMichael AJ, Davis SJ. The T cell surface--how well do we know it? Immunity 2003; 19:213-23. [PMID: 12932355 DOI: 10.1016/s1074-7613(03)00198-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The overall degree of complexity of the T cell surface has been unclear, constraining our understanding of its biology. Using global gene expression analysis, we show that 111 of 374 genes encoding well-characterized leukocyte surface antigens are expressed by a resting cytotoxic T cell. Unexpectedly, of 97 stringently defined, T cell-specific transcripts with unknown functions that we identify, none encode proteins with the modular architecture characteristic of 80% of leukocyte surface antigens. Only two encode proteins with membrane topologies found exclusively in cell surface molecules. Our analysis indicates that the cell type-specific composition of the resting CD8+ T cell surface is now largely defined, providing an insight into the overall compositional complexity of the mammalian cell surface and a framework for formulating systematic models of T cell surface-dependent processes, such as T cell receptor triggering.
Collapse
Affiliation(s)
- Edward J Evans
- Nuffield Department of Clinical Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford OX3 9DU, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Riveros-Rosas H, Julián-Sánchez A, Villalobos-Molina R, Pardo JP, Piña E. Diversity, taxonomy and evolution of medium-chain dehydrogenase/reductase superfamily. EUROPEAN JOURNAL OF BIOCHEMISTRY 2003; 270:3309-34. [PMID: 12899689 DOI: 10.1046/j.1432-1033.2003.03704.x] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A comprehensive, structural and functional, in silico analysis of the medium-chain dehydrogenase/reductase (MDR) superfamily, including 583 proteins, was carried out by use of extensive database mining and the blastp program in an iterative manner to identify all known members of the superfamily. Based on phylogenetic, sequence, and functional similarities, the protein members of the MDR superfamily were classified into three different taxonomic categories: (a) subfamilies, consisting of a closed group containing a set of ideally orthologous proteins that perform the same function; (b) families, each comprising a cluster of monophyletic subfamilies that possess significant sequence identity among them and might share or not common substrates or mechanisms of reaction; and (c) macrofamilies, each comprising a cluster of monophyletic protein families with protein members from the three domains of life, which includes at least one subfamily member that displays activity related to a very ancient metabolic pathway. In this context, a superfamily is a group of homologous protein families (and/or macrofamilies) with monophyletic origin that shares at least a barely detectable sequence similarity, but showing the same 3D fold. The MDR superfamily encloses three macrofamilies, with eight families and 49 subfamilies. These subfamilies exhibit great functional diversity including noncatalytic members with different subcellular, phylogenetic, and species distributions. This results from constant enzymogenesis and proteinogenesis within each kingdom, and highlights the huge plasticity that MDR superfamily members possess. Thus, through evolution a great number of taxa-specific new functions were acquired by MDRs. The generation of new functions fulfilled by proteins, can be considered as the essence of protein evolution. The mechanisms of protein evolution inside MDR are not constrained to conserve substrate specificity and/or chemistry of catalysis. In consequence, MDR functional diversity is more complex than sequence diversity. MDR is a very ancient protein superfamily that existed in the last universal common ancestor. It had at least two (and probably three) different ancestral activities related to formaldehyde metabolism and alcoholic fermentation. Eukaryotic members of this superfamily are more related to bacterial than to archaeal members; horizontal gene transfer among the domains of life appears to be a rare event in modern organisms.
Collapse
Affiliation(s)
- Héctor Riveros-Rosas
- Depto. Bioquímica, Fac. Medicina, UNAM, Cd. Universitaria, México D.F., México; Depto. Farmacobiología, CINVESTAV-Sede Sur, México D.F., México
| | | | | | | | | |
Collapse
|
46
|
Kao CY, Chen Y, Zhao YH, Wu R. ORFeome-based search of airway epithelial cell-specific novel human [beta]-defensin genes. Am J Respir Cell Mol Biol 2003; 29:71-80. [PMID: 12600824 DOI: 10.1165/rcmb.2002-0205oc] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
beta-Defensin is one of the major host defense shields produced by various tissues and organs against microbial infection. To date, four human beta-defensins (DEFBs) gene products that share a consensus six-cysteine motif have been discovered. The hidden Markov model (HMM) profile was constructed from the common features of those known beta-defensin peptides to search for additional novel DEFB genes. A genome-wide search of the profile against ORFeome-based peptide databases (e.g., Ensembl project) led to the identification of six new DEFB members that also shared the conserved six-cysteine motif. Phylogenetic analysis supported a close relationship of these six new members with existing DEFB genes. Polymerase Chain Reaction studies of human tissue cDNA panels confirmed the expression of all six novel DEFB genes in various tissues. Two of them, DEFB106 and DEFB109, were expressed in the lung. A pilot study with cRNA probes for in situ hybridization and a synthetic propeptide for the functional characterization demonstrated the tissue-/cell-specific expression and the strong antimicrobial activity of DEFB106. These results support the utility of ORFeome-based HMM search in gene discovery for members of a specific gene family. The novel DEFB genes identified in this study may significantly contribute to overall antimicrobial host defenses.
Collapse
Affiliation(s)
- Cheng Yuan Kao
- Center for Comparative Respiratory Biology and Medicine, University of California at Davis, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
47
|
Keresztes G, Mutai H, Heller S. TMC and EVER genes belong to a larger novel family, the TMC gene family encoding transmembrane proteins. BMC Genomics 2003; 4:24. [PMID: 12812529 PMCID: PMC165604 DOI: 10.1186/1471-2164-4-24] [Citation(s) in RCA: 108] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2003] [Accepted: 06/17/2003] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Mutations in the transmembrane cochlear expressed gene 1 (TMC1) cause deafness in human and mouse. Mutations in two homologous genes, EVER1 and EVER2 increase the susceptibility to infection with certain human papillomaviruses resulting in high risk of skin carcinoma. Here we report that TMC1, EVER1 and EVER2 (now TMC6 and TMC8) belong to a larger novel gene family, which is named TMC for trans membrane channel-like gene family. RESULTS Using a combination of iterative database searches and reverse transcriptase-polymerase chain reaction (RT-PCR) experiments we assembled contigs for cDNA encoding human, murine, puffer fish, and invertebrate TMC proteins. TMC proteins of individual species can be grouped into three subfamilies A, B, and C. Vertebrates have eight TMC genes. The majority of murine TMC transcripts are expressed in most organs; some transcripts, however, in particular the three subfamily A members are rare and more restrictively expressed. CONCLUSION The eight vertebrate TMC genes are evolutionary conserved and encode proteins that form three subfamilies. Invertebrate TMC proteins can also be categorized into these three subfamilies. All TMC genes encode transmembrane proteins with intracellular amino- and carboxyl-termini and at least eight membrane-spanning domains. We speculate that the TMC proteins constitute a novel group of ion channels, transporters, or modifiers of such.
Collapse
Affiliation(s)
- Gabor Keresztes
- Department of Otolaryngology and Program in Neuroscience, Harvard Medical School
- Eaton Peabody Laboratory, Massachusetts Eye & Ear Infirmary, Boston, MA
| | - Hideki Mutai
- Department of Otolaryngology and Program in Neuroscience, Harvard Medical School
- Eaton Peabody Laboratory, Massachusetts Eye & Ear Infirmary, Boston, MA
| | - Stefan Heller
- Department of Otolaryngology and Program in Neuroscience, Harvard Medical School
- Eaton Peabody Laboratory, Massachusetts Eye & Ear Infirmary, Boston, MA
| |
Collapse
|
48
|
Zhang L, Pavlovic V, Cantor CR, Kasif S. Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res 2003; 13:1190-202. [PMID: 12743024 PMCID: PMC403647 DOI: 10.1101/gr.703903] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2002] [Accepted: 02/03/2003] [Indexed: 11/24/2022]
Abstract
The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding methodology used. Because the pattern of conservation in coding regions is expected to be different from intronic or intergenic regions, a comparative computational analysis can lead, in principle, to an improved computational identification of genes in the human genome by using a reference, such as mouse genome. However, this comparative methodology critically depends on three important factors: (1) the selection of the most appropriate reference genome. In particular, it is not clear whether the mouse is at the correct evolutionary distance from the human to provide sufficiently distinctive conservation levels in different genomic regions, (2) the selection of comparative features that provide the most benefit to gene recognition, and (3) the selection of evidence integration architecture that effectively interprets the comparative features. We address the first question by a novel evolutionary analysis that allows us to explicitly correlate the performance of the gene recognition system with the evolutionary distance (time) between the two genomes. Our simulation results indicate that there is a wide range of reference genomes at different evolutionary time points that appear to deliver reasonable comparative prediction of human genes. In particular, the evolutionary time between human and mouse generally falls in the region of good performance; however, better accuracy might be achieved with a reference genome further than mouse. To address the second question, we propose several natural comparative measures of conservation for identifying exons and exon boundaries. Finally, we experiment with Bayesian networks for the integration of comparative and compositional evidence.
Collapse
Affiliation(s)
- Lingang Zhang
- Center for Advanced Biotechnology, Boston University, Boston, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
49
|
Chen H, Wang N, Huo Y, Sklar P, MacKinnon DF, Potash JB, McMahon FJ, Antonarakis SE, DePaulo JR, Ross CA, McInnis MG. Trapping and sequence analysis of 1138 putative exons from human chromosome 18. Mol Psychiatry 2003; 8:619-23. [PMID: 12851638 DOI: 10.1038/sj.mp.4001288] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In a search for novel genes on chromosome 18 (HC18), on which several regions have been linked to bipolar disorder, we applied exon trapping to HC18-specific cosmids. Among the 1138 exons trapped, 1052 of them have been mapped to HC18, and the remaining 86 have not been localized. No exons were localized to genomic regions other than HC18. BLAST database search revealed that 190 exons were identical to 98 Unigenes on HC18; 98 identical to additional 82 clusters of ESTs not present in the HC18 Unigene set; 39 homologous to genes from human and other species (e<10(-3)); and the remaining 811 exons had no significant homology to transcripts in public databases. The mapped exons were compared to the 867 annotated genes on HC18 in the Celera databases; 216 exons were identical to 104 Celera 'genes' and the remaining 836 exons were not found in the Celera databases. On average, there were two exons for a matched transcript (known genes and ESTs). Therefore, the 850 novel exons may represent hundreds of novel genes on chromosome 18.
Collapse
Affiliation(s)
- H Chen
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21278-7463, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
With the sequence of the human genome now complete, studies must focus on how the genome is functionally organized within the confines of the cell nucleus and the dynamic interplay between the genome and its regulatory factors to effectively control gene expression and silencing. In this review I describe our current state of knowledge with regard to the organization of chromosomes within the nucleus and the positioning of active versus inactive genes. In addition, I discuss studies on the dynamics of chromosomes and specific genetic loci within living cells and its relationship to gene activity and the cell cycle. Furthermore, our current understanding of the distribution and dynamics of RNA polymerase II transcription factors is discussed in relation to chromosomal loci and other nuclear domains.
Collapse
Affiliation(s)
- David L Spector
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA.
| |
Collapse
|