1
|
Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019; 20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. Electronic supplementary material The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jeanne Wilbrandt
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany. .,Present address: Hoffmann Research Group, Leibniz Institute on Aging - Fritz Lipmann Institute, Beutenbergstraße 11, 07745, Jena, Germany.
| | - Bernhard Misof
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany
| | - Kristen A Panfilio
- School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
| | - Oliver Niehuis
- Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University, Hauptstr. 1, 79104, Freiburg, Germany
| |
Collapse
|
2
|
Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, Jones CD. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet 2013; 9:e1003860. [PMID: 24146629 PMCID: PMC3798262 DOI: 10.1371/journal.pgen.1003860] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 08/19/2013] [Indexed: 11/19/2022] Open
Abstract
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. De novo genes are protein-coding genes with no clear homology to previously existing protein-coding genes. Since their discovery in Drosophila and other species including humans, their existence has been controversial, with some doubt as to how they would arise, whether they produce proteins, and whether they could possibly perform any useful function. Here, we show that RNAi of several Drosophila de novo genes causes lethality – in fact, a higher proportion of de novo genes cause lethality than was found in a similar screen of other young and novel genes. Further, we find that de novo genes do produce proteins in the majority of cases and that in some cases, they were transcribed prior to the emergence of an open reading frame. Our data suggests that Drosophila de novo genes are an unexpected avenue for non-coding DNA sequences to contribute evolutionary and functional novelty.
Collapse
Affiliation(s)
- Josephine A. Reinhardt
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Biology, University of Maryland at College Park, College Park, Maryland, United States of America
- * E-mail:
| | - Betty M. Wanjiru
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Alicia T. Brant
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Perot Saelao
- Center for Population Biology, University of California, Davis, Davis, California, United States of America
| | - David J. Begun
- Center for Population Biology, University of California, Davis, Davis, California, United States of America
| | - Corbin D. Jones
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
3
|
|
4
|
Prasad TSK, Harsha HC, Keerthikumar S, Sekhar NR, Selvan LDN, Kumar P, Pinto SM, Muthusamy B, Subbannayya Y, Renuse S, Chaerkady R, Mathur PP, Ravikumar R, Pandey A. Proteogenomic Analysis of Candida glabrata using High Resolution Mass Spectrometry. J Proteome Res 2011; 11:247-60. [DOI: 10.1021/pr200827k] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- T. S. Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
- Manipal University, Madhav Nagar, Manipal, Karnataka 576104; India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - H. C. Harsha
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
| | | | - Nirujogi Raja Sekhar
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Lakshmi Dhevi N. Selvan
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Praveen Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Sneha M. Pinto
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Manipal University, Madhav Nagar, Manipal, Karnataka 576104; India
| | - Babylakshmi Muthusamy
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Yashwanth Subbannayya
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Rajiv Gandhi University of Health Sciences, Jayanagar, Bangalore −560
041, India
| | - Santosh Renuse
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
- Amrita School of Biotechnology, Amrita University, Kollam -690 525, India
| | - Raghothama Chaerkady
- Institute of Bioinformatics, International Technology Park, Bangalore
-560 066, India
| | - Premendu P. Mathur
- Centre
of Excellence in Bioinformatics,
Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India
| | - Raju Ravikumar
- Department of
Neuromicrobiology, National Institute of Mental Health and Neuro Sciences, Bangalore -560029, India
| | | |
Collapse
|
5
|
Affiliation(s)
- Daniele Guerzoni
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
- * E-mail:
| |
Collapse
|
6
|
Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 2009; 10:67. [PMID: 19236712 PMCID: PMC2653490 DOI: 10.1186/1471-2105-10-67] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/23/2009] [Indexed: 11/22/2022] Open
Abstract
Background The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review. Results In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – H. sapiens, M. musculus, D. melanogaster, A. gambiae, and C. elegans. Conclusion Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.
Collapse
|
7
|
Translational research in the development of novel sepsis therapeutics: logical deductive reasoning or mission impossible? Crit Care Med 2009; 37:S10-5. [PMID: 19104207 DOI: 10.1097/ccm.0b013e3181921497] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The successful translation of promising research findings from basic research laboratories into useful clinical products for the management of septic patients has proven to be a daunting challenge. The complexity and variability of the clinical entity referred to as sepsis makes it intrinsically difficult to model preclinical systems and predict efficacy of potentially useful, experimental, therapeutic agents. Technological innovations in microarrays, microfluidics, and nanotechnology make it feasible to study the evolution of sepsis in small animal models in considerable detail. The recognized limitations of standard preclinical platforms used to study sepsis have lead to innovative approaches to study sepsis in silico, and in more complex and clinically more valid ex vivo tissue perfusion models and animal systems. It is abundantly clear that sepsis researchers need to do a better job informing clinicians about the possible benefits and potential risks of new treatment interventions as they traverse the gap between the bench and the bedside.
Collapse
|
8
|
|
9
|
DasGupta R, Nybakken K, Booker M, Mathey-Prevot B, Gonsalves F, Changkakoty B, Perrimon N. A case study of the reproducibility of transcriptional reporter cell-based RNAi screens in Drosophila. Genome Biol 2008; 8:R203. [PMID: 17903264 PMCID: PMC2375041 DOI: 10.1186/gb-2007-8-9-r203] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Revised: 09/05/2007] [Accepted: 09/28/2007] [Indexed: 11/12/2022] Open
Abstract
A second generation dsRNA library was used to re-assess factors that influence the outcome of transcriptional reporter-based whole-genome RNAi screens for the Wnt/Wingless (wg) and Hedgehog (hh)-signaling pathways. Off-target effects have been demonstrated to be a major source of false-positives in RNA interference (RNAi) high-throughput screens. In this study, we re-assess the previously published transcriptional reporter-based whole-genome RNAi screens for the Wingless and Hedgehog signaling pathways using second generation double-stranded RNA libraries. Furthermore, we investigate other factors that may influence the outcome of such screens, including cell-type specificity, robustness of reporters, and assay normalization, which determine the efficacy of RNAi-knockdown of target genes.
Collapse
Affiliation(s)
- Ramanuj DasGupta
- New York University School of Medicine/Cancer Institute, Department of Pharmacology, First Avenue, New York, NY 10016, USA
| | - Kent Nybakken
- Boston Biomedical Research Institute, 64 Grove Street, Watertown, MA, 02472, USA
| | - Matthew Booker
- Department of Genetics, Harvard Medical School, Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Bernard Mathey-Prevot
- Department of Genetics, Harvard Medical School, Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Foster Gonsalves
- New York University School of Medicine/Cancer Institute, Department of Pharmacology, First Avenue, New York, NY 10016, USA
| | - Binita Changkakoty
- New York University School of Medicine/Cancer Institute, Department of Pharmacology, First Avenue, New York, NY 10016, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School, Avenue Louis Pasteur, Boston, MA 02115, USA
| |
Collapse
|
10
|
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 2008; 450:219-32. [PMID: 17994088 DOI: 10.1038/nature06340] [Citation(s) in RCA: 462] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2007] [Accepted: 10/04/2007] [Indexed: 12/25/2022]
Abstract
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
Collapse
|
11
|
Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genes Dev 2008; 18:188-96. [PMID: 18025269 PMCID: PMC2134774 DOI: 10.1101/gr.6743907] [Citation(s) in RCA: 1234] [Impact Index Per Article: 77.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 09/18/2007] [Indexed: 12/23/2022]
Abstract
We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators to independently annotate eukaryotic genomes and create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices. MAKER is also easily trainable: Outputs of preliminary runs are used to automatically retrain its gene-prediction algorithm, producing higher-quality gene-models on subsequent runs. MAKER's inputs are minimal, and its outputs can be used to create a GMOD database. Its outputs can also be viewed in the Apollo Genome browser; this feature of MAKER provides an easy means to annotate, view, and edit individual contigs and BACs without the overhead of a database. As proof of principle, we have used MAKER to annotate the genome of the planarian Schmidtea mediterranea and to create a new genome database, SmedGD. We have also compared MAKER's performance to other published annotation pipelines. Our results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database. MAKER should prove especially useful for emerging model organism genome projects for which extensive bioinformatics resources may not be readily available.
Collapse
Affiliation(s)
- Brandi L. Cantarel
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Ian Korf
- Department of Molecular and Cellular Biology and Genome Center, UC Davis, Davis, California 95616, USA
| | - Sofia M.C. Robb
- Department of Neurobiology & Anatomy, University of Utah School of Medicine, Salt Lake City, Utah 84132, USA
| | - Genis Parra
- Department of Molecular and Cellular Biology and Genome Center, UC Davis, Davis, California 95616, USA
| | - Eric Ross
- Howard Hughes Medical Institute, University of Utah School of Medicine, Salt Lake City, Utah 84132, USA
| | - Barry Moore
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Carson Holt
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Alejandro Sánchez Alvarado
- Department of Neurobiology & Anatomy, University of Utah School of Medicine, Salt Lake City, Utah 84132, USA
- Howard Hughes Medical Institute, University of Utah School of Medicine, Salt Lake City, Utah 84132, USA
| | - Mark Yandell
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| |
Collapse
|
12
|
Lin MF, Carlson JW, Crosby MA, Matthews BB, Yu C, Park S, Wan KH, Schroeder AJ, Gramates LS, St. Pierre SE, Roark M, Wiley KL, Kulathinal RJ, Zhang P, Myrick KV, Antone JV, Celniker SE, Gelbart WM, Kellis M. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genes Dev 2007; 17:1823-36. [PMID: 17989253 PMCID: PMC2099591 DOI: 10.1101/gr.6679507] [Citation(s) in RCA: 125] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2007] [Accepted: 09/21/2007] [Indexed: 11/24/2022]
Abstract
The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.
Collapse
Affiliation(s)
- Michael F. Lin
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA
| | - Joseph W. Carlson
- Berkeley Drosophila Genome Project, Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Madeline A. Crosby
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Beverley B. Matthews
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Charles Yu
- Berkeley Drosophila Genome Project, Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Soo Park
- Berkeley Drosophila Genome Project, Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Kenneth H. Wan
- Berkeley Drosophila Genome Project, Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Andrew J. Schroeder
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - L. Sian Gramates
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Susan E. St. Pierre
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Margaret Roark
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Kenneth L. Wiley
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Rob J. Kulathinal
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Peili Zhang
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Kyl V. Myrick
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Jerry V. Antone
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Susan E. Celniker
- Berkeley Drosophila Genome Project, Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - William M. Gelbart
- FlyBase, The Biological Laboratories, Harvard University, Cambridge, Massachusetts 02138, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02139, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
13
|
Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury. BMC Genomics 2007; 8:249. [PMID: 17650329 PMCID: PMC1949825 DOI: 10.1186/1471-2164-8-249] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2006] [Accepted: 07/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. RESULTS Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. CONCLUSION Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data.
Collapse
|
14
|
Mathey-Prevot B, Perrimon N. Drosophila genome-wide RNAi screens: are they delivering the promise? COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2007; 71:141-8. [PMID: 17381290 DOI: 10.1101/sqb.2006.71.027] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The emergence of RNA interference (RNAi) on the heels of the successful completion of the Drosophila genome project was seen by many as the ace in functional genomics: Its application would quickly assign a function to all genes in this organism and help delineate the complex web of interactions or networks linking them at the systemic level. A few years wiser and a number of genome-wide Drosophila RNAi screens later, we reflect on the state of high-throughput RNAi screens in Drosophila and ask whether the initial promise was fulfilled. We review the impact that this approach has had in the field of Drosophila research and chart out strategies to extract maximal benefit from the application of RNAi to gene discovery and pursuit of systems biology.
Collapse
Affiliation(s)
- B Mathey-Prevot
- Department of Genetics, Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
15
|
Perrimon N, Mathey-Prevot B. Applications of high-throughput RNA interference screens to problems in cell and developmental biology. Genetics 2007; 175:7-16. [PMID: 17244760 PMCID: PMC1775003 DOI: 10.1534/genetics.106.069963] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
RNA interference (RNAi) in tissue culture cells has emerged as an excellent methodology for identifying gene functions systematically and in an unbiased manner. Here, we describe how RNAi high-throughput screening (HTS) in Drosophila cells are currently being performed and emphasize the strengths and weaknesses of the approach. Further, to demonstrate the versatility of the technology, we provide examples of the various applications of the method to problems in signal transduction and cell and developmental biology. Finally, we discuss emerging technological advances that will extend RNAi-based screening methods.
Collapse
Affiliation(s)
- Norbert Perrimon
- Department of Genetics, Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts 02175, USA.
| | | |
Collapse
|
16
|
Flores CA, Niemeyer MI, Sepúlveda FV, Cid LP. Two splice variants derived from a Drosophila melanogaster candidate ClC gene generate ClC-2-type Cl- channels. Mol Membr Biol 2006; 23:149-56. [PMID: 16754358 DOI: 10.1080/09687860500449978] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Members of the ClC family of membrane proteins have been found in a variety of species and they can function as Cl- channels or Cl-/H+ antiporters. Three potential ClC genes are present in the Drosophila melanogaster genome. Only one of them shows homology with a branch of the mammalian ClC genes that encode plasma membrane Cl- channels. The remaining two are close to mammalian homologues coding for intracellular ClC proteins. Using RT-PCR we have identified two splice variants showing highest homology (41% residue identity) to the mammalian ClC-2 chloride channel. One splice variant (DmClC-2S) is expressed in the fly head and body and an additional, larger variant (DmClC-2L) is only present in the head. Both putative Drosophila channels conserve key features of the ClC channels cloned so far, including residues conforming the selectivity filter and C-terminus CBS domains. The splice variants differ in a stretch of 127 aa at the intracellular C-terminal portion separating cystathionate beta synthase (CBS) domains. Expression of either Drosophila ClC-2 variant in HEK-293 cells generated inwardly rectifying Cl- currents with similar activation and deactivation characteristics. There was great similarity in functional characteristics between DmClC-2 variants and their mammalian counterpart, save for slower opening kinetics and faster closing rate. As CBS domains are believed to be sites of regulation of channel gating and trafficking, it is suggested that the extra amino acids present between CBS domains in DmClC-2L might endow the channel with a differential response to signals present in the fly cells where it is expressed.
Collapse
|
17
|
Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 2006; 103:9935-9. [PMID: 16777968 PMCID: PMC1502557 DOI: 10.1073/pnas.0509809103] [Citation(s) in RCA: 236] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Descriptions of recently evolved genes suggest several mechanisms of origin including exon shuffling, gene fission/fusion, retrotransposition, duplication-divergence, and lateral gene transfer, all of which involve recruitment of preexisting genes or genetic elements into new function. The importance of noncoding DNA in the origin of novel genes remains an open question. We used the well annotated genome of the genetic model system Drosophila melanogaster and genome sequences of related species to carry out a whole-genome search for new D. melanogaster genes that are derived from noncoding DNA. Here, we describe five such genes, four of which are X-linked. Our RT-PCR experiments show that all five putative novel genes are expressed predominantly in testes. These data support the idea that these novel genes are derived from ancestral noncoding sequence and that new, favored genes are likely to invade populations under selective pressures relating to male reproduction.
Collapse
Affiliation(s)
- Mia T Levine
- Center for Population Biology, University of California-Davis, Davis, CA 95616, USA.
| | | | | | | | | |
Collapse
|
18
|
Inagaki S, Numata K, Kondo T, Tomita M, Yasuda K, Kanai A, Kageyama Y. Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila. Genes Cells 2006; 10:1163-73. [PMID: 16324153 DOI: 10.1111/j.1365-2443.2005.00910.x] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
One of the most surprising results to emerge from mammalian cDNA sequencing projects is that thousands of mRNA-like non-coding RNAs (ncRNAs) are expressed and constitute at least 10% of poly(A)(+) RNAs. In most cases, however, the functions of these RNA molecules remain unclear. To clarify the biological significance of mRNA-like ncRNAs, we computationally screened 11,691 Drosophila melanogaster full-length cDNAs. After eliminating presumable protein-coding transcripts, 136 were identified as strong candidates for mRNA-like ncRNAs. Although most of these putative ncRNAs are found throughout the Drosophila genus, predicted amino acid sequences are not conserved even in related species, suggesting that these transcripts are actually non-coding RNAs. In situ hybridization analyses revealed that 35 of the transcripts are expressed during embryogenesis, of which 27 were detected only in specific tissues including the tracheal system, midgut primordial cells, visceral mesoderm, germ cells and the central and peripheral nervous system. These highly regulated expression patterns suggest that many mRNA-like ncRNAs play important roles in multiple steps of organogenesis and cell differentiation in Drosophila. This is the first report that the majority of mRNA-like ncRNAs in a model organism are expressed in specific tissues and cell types.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Base Sequence
- Cell Differentiation/genetics
- Conserved Sequence
- DNA, Complementary/analysis
- DNA, Complementary/genetics
- Drosophila/embryology
- Drosophila/genetics
- Embryonic Development/genetics
- Evolution, Molecular
- Gene Expression Regulation, Developmental
- Models, Genetic
- Open Reading Frames/genetics
- Organogenesis/genetics
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Untranslated/chemistry
- RNA, Untranslated/genetics
- Species Specificity
- Transcription, Genetic
Collapse
Affiliation(s)
- Sachi Inagaki
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Takayama, Ikoma, Japan
| | | | | | | | | | | | | |
Collapse
|
19
|
Li J, Riehle MM, Zhang Y, Xu J, Oduol F, Gomez SM, Eiglmeier K, Ueberheide BM, Shabanowitz J, Hunt DF, Ribeiro JMC, Vernick KD. Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms. Genome Biol 2006; 7:R24. [PMID: 16569258 PMCID: PMC1557760 DOI: 10.1186/gb-2006-7-3-r24] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Revised: 01/19/2006] [Accepted: 02/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. RESULTS We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. CONCLUSION Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms.
Collapse
Affiliation(s)
- Jun Li
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| | - Michelle M Riehle
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| | - Yan Zhang
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| | - Jiannong Xu
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| | - Frederick Oduol
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| | - Shawn M Gomez
- Unité de Biochimie et Biologie Moléculaire des Insectes and CNRS FRE 2849, Institut Pasteur, 75724 Paris Cedex 15, France
| | - Karin Eiglmeier
- Unité de Biochimie et Biologie Moléculaire des Insectes and CNRS FRE 2849, Institut Pasteur, 75724 Paris Cedex 15, France
| | - Beatrix M Ueberheide
- Department of Chemistry, McCormick Rd, University of Virginia, Charlottesville, VA 22904, USA
| | - Jeffrey Shabanowitz
- Department of Chemistry, McCormick Rd, University of Virginia, Charlottesville, VA 22904, USA
| | - Donald F Hunt
- Department of Chemistry, McCormick Rd, University of Virginia, Charlottesville, VA 22904, USA
| | - José MC Ribeiro
- Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA
| | - Kenneth D Vernick
- Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
| |
Collapse
|
20
|
Ashburner M, Bergman CM. Drosophila melanogaster: a case study of a model genomic sequence and its consequences. Genome Res 2006; 15:1661-7. [PMID: 16339363 DOI: 10.1101/gr.3726705] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The sequencing and annotation of the Drosophila melanogaster genome, first published in 2000 through collaboration between Celera Genomics and the Drosophila Genome Projects, has provided a number of important contributions to genome research. By demonstrating the utility of methods such as whole-genome shotgun sequencing and genome annotation by a community "jamboree," the Drosophila genome established the precedents for the current paradigm used by most genome projects. Subsequent releases of the initial genome sequence have been improved by the Berkeley Drosophila Genome Project and annotated by FlyBase, the Drosophila community database, providing one of the highest-quality genome sequences and annotations for any organism. We discuss the impact of the growing number of genome sequences now available in the genus on current Drosophila research, and some of the biological questions that these resources will enable to be solved in the future.
Collapse
Affiliation(s)
- Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, United Kingdom.
| | | |
Collapse
|
21
|
Björklund M, Taipale M, Varjosalo M, Saharinen J, Lahdenperä J, Taipale J. Identification of pathways regulating cell size and cell-cycle progression by RNAi. Nature 2006; 439:1009-13. [PMID: 16496002 DOI: 10.1038/nature04469] [Citation(s) in RCA: 219] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2005] [Accepted: 11/21/2005] [Indexed: 01/26/2023]
Abstract
Many high-throughput loss-of-function analyses of the eukaryotic cell cycle have relied on the unicellular yeast species Saccharomyces cerevisiae and Schizosaccharomyces pombe. In multicellular organisms, however, additional control mechanisms regulate the cell cycle to specify the size of the organism and its constituent organs. To identify such genes, here we analysed the effect of the loss of function of 70% of Drosophila genes (including 90% of genes conserved in human) on cell-cycle progression of S2 cells using flow cytometry. To address redundancy, we also targeted genes involved in protein phosphorylation simultaneously with their homologues. We identify genes that control cell size, cytokinesis, cell death and/or apoptosis, and the G1 and G2/M phases of the cell cycle. Classification of the genes into pathways by unsupervised hierarchical clustering on the basis of these phenotypes shows that, in addition to classical regulatory mechanisms such as Myc/Max, Cyclin/Cdk and E2F, cell-cycle progression in S2 cells is controlled by vesicular and nuclear transport proteins, COP9 signalosome activity and four extracellular-signal-regulated pathways (Wnt, p38betaMAPK, FRAP/TOR and JAK/STAT). In addition, by simultaneously analysing several phenotypes, we identify a translational regulator, eIF-3p66, that specifically affects the Cyclin/Cdk pathway activity.
Collapse
Affiliation(s)
- Mikael Björklund
- Molecular and Cancer Biology Program, Biomedicum Helsinki, PO Box 63 (Haartmaninkatu 8), FI-00014 University of Helsinki, Finland
| | | | | | | | | | | |
Collapse
|
22
|
Bush EC, Lahn BT. Selective constraint on noncoding regions of hominid genomes. PLoS Comput Biol 2005; 1:e73. [PMID: 16362073 PMCID: PMC1314883 DOI: 10.1371/journal.pcbi.0010073] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2005] [Accepted: 11/09/2005] [Indexed: 02/07/2023] Open
Abstract
An important challenge for human evolutionary biology is to understand the genetic basis of human–chimpanzee differences. One influential idea holds that such differences depend, to a large extent, on adaptive changes in gene expression. An important step in assessing this hypothesis involves gaining a better understanding of selective constraint on noncoding regions of hominid genomes. In noncoding sequence, functional elements are frequently small and can be separated by large nonfunctional regions. For this reason, constraint in hominid genomes is likely to be patchy. Here we use conservation in more distantly related mammals and amniotes as a way of identifying small sequence windows that are likely to be functional. We find that putatively functional noncoding elements defined in this manner are subject to significant selective constraint in hominids. A major goal of human evolutionary biology is to understand what genetic changes make humans unique. One influential idea is that changes in gene expression are most responsible for unique human characteristics. Regulatory elements in noncoding DNA play a key role in controlling gene expression, so one approach is to study human–chimpanzee differences in these elements. Here we use conservation in more distantly related mammals and amniotes as a way of identifying small sequence windows that are likely to be functional. We find that putatively functional noncoding elements defined in this manner are subject to significant selective constraint in hominids. Contrary to some previous reports, these results argue that hominid noncoding regions are not evolving free of constraint.
Collapse
Affiliation(s)
- Eliot C Bush
- Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Bruce T Lahn
- Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
23
|
Wagstaff BJ, Begun DJ. Molecular population genetics of accessory gland protein genes and testis-expressed genes in Drosophila mojavensis and D. arizonae. Genetics 2005; 171:1083-101. [PMID: 16085702 PMCID: PMC1456813 DOI: 10.1534/genetics.105.043372] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Molecular population genetic investigation of Drosophila male reproductive genes has focused primarily on melanogaster subgroup accessory gland protein genes (Acp's). Consistent with observations from male reproductive genes of numerous taxa, Acp's evolve more rapidly than nonreproductive genes. However, within the Drosophila genus, large data sets from additional types of male reproductive genes and from different species groups are lacking. Here we report findings from a molecular population genetics analysis of male reproductive genes of the repleta group species, Drosophila arizonae and D. mojavensis. We find that Acp's have dramatically higher average pairwise Ka/Ks (0.93) than testis-enriched genes (0.19) and previously reported melanogaster subgroup Acp's (0.42). Overall, 10 of 19 Acp's have Ka/Ks > 1 either in nonpolarized analyses or in at least one lineage of polarized analyses. Of the nine Acp's for which outgroup data were available, average Ka/Ks was considerably higher in D. mojavensis (2.08) than in D. arizonae (0.87). Contrasts of polymorphism and divergence suggest that adaptive protein evolution at Acp's is more common in D. mojavensis than in D. arizonae.
Collapse
Affiliation(s)
- Bradley J Wagstaff
- Section of Integrative Biology, University of Texas, Austin, Texas 78712, USA
| | | |
Collapse
|
24
|
Brown RH, Gross SS, Brent MR. Begin at the beginning: predicting genes with 5' UTRs. Genome Res 2005; 15:742-7. [PMID: 15867435 PMCID: PMC1088303 DOI: 10.1101/gr.3696205] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2005] [Accepted: 02/14/2005] [Indexed: 02/03/2023]
Abstract
The retrainable, comparative gene predictor N-SCAN integrates multigenome modeling and 5' untranslated region (5' UTR) modeling. In this article, we evaluate N-SCAN's transcription-start site (TSS) and first exon predictions both computationally and experimentally. The computational results indicate that N-SCAN is more accurate than any of the other tools we tested at predicting the TSS and the complete first exon. It is the only one of these tools that can predict complete gene structures together with 5' UTRs. Experimental evaluation shows that N-SCAN can be used to validate novel UTR introns in human gene predictions that do not overlap any RefSeq gene and even to correct RefSeq mRNAs by adding validated UTR exons that are missing from RefSeq.
Collapse
Affiliation(s)
- Randall H Brown
- Laboratory for Computational Genomics, Washington University, St. Louis, MO 63130, USA
| | | | | |
Collapse
|
25
|
|
26
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|